Revision as of 09:54, 27 May 2013

Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the cluster template and number of nodes for the starting cluster. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation.

Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that one can use a template only with the plugin it is created for. The same applies to Hadoop version.

Savanna has two types of templates: cluster template and node group template.

Node Group Templates

A node group template contains configuration for a node in the cluster. It has “Node Group” in its name because cluster consists of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor.

Node template contains the following parameters:

Name	Type	Constraints	Comments
id	string	required, unique
flavor	string	required, should contain a valid flavor id
name	string	required, unique
description	string	optional
plugin	string	required, should contain a valid plugin id
hadoop_version	string	required
node_processes	list of strings	required
node_configs	dict of dicts	required	see example below for exact structure

Example:

   {
       "id": "aee4-strf-o14s-fd34",
       "flavor": "4",
       "image": "ah91-aij1-u78x-iunm",
       "name": ”fat task tracker + data node”
       "description": “a template for big nodes ...”,
       "plugin": “apache-hadoop”,
       "hadoop_version": “1.1.1”
       "node_processes": [“task tracker”, “data node”]
       "node_configs":
           {
               ”service:mapreduce”:
                   {
                       "mapred.tasktracker.map.tasks.maximum": 8,
                       "mapred.tasktracker.reduce.tasks.maximum": 3,
                       ...
                   }
               ”service:hdfs”:
                   {
                         …
                   }
               ”general”:
                   {
                         …
                   }
           }
   }

Cluster template

Cluster template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template.

Name	Type	Constraints	Comments
id	string	required, unique
name	string	required, unique
description	string	optional
plugin	string	required
hadoop_version	string	required
configs	dict	required
node_groups	list of dicts	required	see example below for exact structure

Example:

 {
     "id": "asdf-wdvc-9as0-q23w",
     "name": ”small cluster”,
     "description": “a template for a small cluster”,
     "plugin": “apache hadoop”,
     "hadoop_version": “1.1.1”
     "configs": 
         {
             "service:mapreduce":
                 {
                     "compression": "snappy"
                 }
             "service:hdfs":
                 {
                     "hdfs_replication_factor": 3
                 }
             "general":
                 {
                     ...
                 }
     }
     "node_groups_templates":
     [
         {
             "name": "master node"
             "node-group-template": "aee4-strf-o14s-fd34",
             "count": 1
         },
         {
             "name": "workers"
             "node-group-template": "fe1t-2t4f-1oa4-fdik",
             "count": 3
         }
     ]	
 }

Plugin integration

Plugin should provide the following facilities to support templates:

Provide list of configs by implementing get_configs(...) method
In validate_cluster(...) method plugin must verify user inputs specified for configs.

Configs

Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter might target either the whole cluster or single node or a specific service (mapreduce, hdfs). Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values.

Example config:

 {
     "name": “mapred.tasktracker.map.tasks.maximum”,
     "applicable_target": ”service:mapreduce”,
     "scope": "node"
     "default": 2,
     "required": true,
     "type": “int”,
     "description": “amount of map tasks per node”
 }

Difference between revisions of "Sahara/Templates"

Revision as of 09:54, 27 May 2013

Node Group Templates

Cluster template

Plugin integration

Configs