Jump to: navigation, search

Difference between revisions of "Sahara/Templates"

Line 109: Line 109:
 
       [
 
       [
 
           {
 
           {
 +
              "name": "master node"
 
               "node-group-template": "aee4-strf-o14s-fd34",
 
               "node-group-template": "aee4-strf-o14s-fd34",
               "count": 3
+
               "count": 1
 
           },
 
           },
 
           {
 
           {
 +
              "name": "workers"
 
               "node-group-template": "fe1t-2t4f-1oa4-fdik",
 
               "node-group-template": "fe1t-2t4f-1oa4-fdik",
               "count": 1
+
               "count": 3
 
           }
 
           }
 
       ]
 
       ]

Revision as of 09:54, 27 May 2013


Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the cluster template and number of nodes for the starting cluster. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation.

Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that one can use a template only with the plugin it is created for. The same applies to Hadoop version.

Savanna has two types of templates: cluster template and node group template.

Node Group Templates

A node group template contains configuration for a node in the cluster. It has “Node Group” in its name because cluster consists of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor.

Node template contains the following parameters:

Name Type Constraints Comments
id string required, unique
flavor string required, should contain a valid flavor id
name string required, unique
description string optional
plugin string required, should contain a valid plugin id
hadoop_version string required
node_processes list of strings required
node_configs dict of dicts required see example below for exact structure

Example:

   {
       "id": "aee4-strf-o14s-fd34",
       "flavor": "4",
       "image": "ah91-aij1-u78x-iunm",
       "name": ”fat task tracker + data node”
       "description": “a template for big nodes ...”,
       "plugin": “apache-hadoop”,
       "hadoop_version": “1.1.1”
       "node_processes": [“task tracker”, “data node”]
       "node_configs":
           {
               ”service:mapreduce”:
                   {
                       "mapred.tasktracker.map.tasks.maximum": 8,
                       "mapred.tasktracker.reduce.tasks.maximum": 3,
                       ...
                   }
               ”service:hdfs”:
                   {
                         …
                   }
               ”general”:
                   {
                         …
                   }
           }
   }

Cluster template

Cluster template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template.

Name Type Constraints Comments
id string required, unique
name string required, unique
description string optional
plugin string required
hadoop_version string required
configs dict required
node_groups list of dicts required see example below for exact structure

Example:

 {
     "id": "asdf-wdvc-9as0-q23w",
     "name": ”small cluster”,
     "description": “a template for a small cluster”,
     "plugin": “apache hadoop”,
     "hadoop_version": “1.1.1”
     "configs": 
         {
             "service:mapreduce":
                 {
                     "compression": "snappy"
                 }
             "service:hdfs":
                 {
                     "hdfs_replication_factor": 3
                 }
             "general":
                 {
                     ...
                 }
     }
     "node_groups_templates":
     [
         {
             "name": "master node"
             "node-group-template": "aee4-strf-o14s-fd34",
             "count": 1
         },
         {
             "name": "workers"
             "node-group-template": "fe1t-2t4f-1oa4-fdik",
             "count": 3
         }
     ]	
 }

Plugin integration

Plugin should provide the following facilities to support templates:

  • Provide list of configs by implementing get_configs(...) method
  • In validate_cluster(...) method plugin must verify user inputs specified for configs.

Savanna Templates.png

Configs

Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter might target either the whole cluster or single node or a specific service (mapreduce, hdfs). Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values.

Example config:

 {
     "name": “mapred.tasktracker.map.tasks.maximum”,
     "applicable_target": ”service:mapreduce”,
     "scope": "node"
     "default": 2,
     "required": true,
     "type": “int”,
     "description": “amount of map tasks per node”
 }