Latest revision as of 23:51, 24 November 2014

Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the cluster template and provide parameters that need to be changed. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation.

Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that one can use a template only with the plugin it is created for. The same applies to Hadoop version.

Sahara has two types of templates: cluster template and node group template.

Node Group Templates

A node group template contains configuration for a node in the cluster. It has “Node Group” in its name because cluster consists of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor.

Node template contains the following parameters:

Name	Type	Constraints	Comments
id	string	required, unique
flavor	string	required, should contain a valid flavor id
name	string	required, unique
description	string	optional
plugin	string	required, should contain a valid plugin id
hadoop_version	string	required
node_processes	list of strings	required
node_configs	dict of dicts	required	see example below for exact structure

Example:

   {
       "id": "aee4-strf-o14s-fd34",
       "flavor": "4",
       "image": "ah91-aij1-u78x-iunm",
       "name": ”fat task tracker + data node”
       "description": “a template for big nodes ...”,
       "plugin": “apache-hadoop”,
       "hadoop_version": “1.1.1”
       "node_processes": [“task tracker”, “data node”]
       "node_configs":
           {
               ”service:mapreduce”:
                   {
                       "mapred.tasktracker.map.tasks.maximum": 8,
                       "mapred.tasktracker.reduce.tasks.maximum": 3,
                       ...
                   }
               ”service:hdfs”:
                   {
                         …
                   }
               ”general”:
                   {
                         …
                   }
           }
   }

Cluster template

Cluster template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template.

Name	Type	Constraints	Comments
id	string	required, unique
name	string	required, unique
description	string	optional
plugin	string	required
hadoop_version	string	required
configs	dict	required
node_groups	list of dicts	required	see example below for exact structure

Example:

 {
     "id": "asdf-wdvc-9as0-q23w",
     "name": ”small cluster”,
     "description": “a template for a small cluster”,
     "plugin": “apache hadoop”,
     "hadoop_version": “1.1.1”
     "configs": 
         {
             "service:mapreduce":
                 {
                     "compression": "snappy"
                 }
             "service:hdfs":
                 {
                     "hdfs_replication_factor": 3
                 }
             "general":
                 {
                     ...
                 }
     }
     "node_groups_templates":
     [
         {
             "name": "master node",
             "node-group-template": "aee4-strf-o14s-fd34",
             "count": 1
         },
         {
             "name": "workers",
             "node-group-template": "fe1t-2t4f-1oa4-fdik",
             "count": 3
         }
     ]	
 }

Plugin integration

Plugin should provide the following facilities to support templates:

Provide list of configs by implementing get_configs(...) method
In validate_cluster(...) method plugin must verify user inputs specified for configs.

Configs

Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter could target either general configuration or a specific service (mapreduce, hdfs) configuration. It can also have scope of either cluster or a specific node group. The scope determines in which type of templates the config will be presented. 'cluster' scoped configs will be presented in cluster template.

'node' scoped configs appear both in cluster and in node group template. When they are specified in cluster template, they serve as new defaults for node templates used in cluster.

Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values.

Example config:

 {
     "name": “mapred.tasktracker.map.tasks.maximum”,
     "applicable_target": ”service:mapreduce”,
     "scope": "node"
     "default": 2,
     "required": true,
     "type": “int”,
     "description": “amount of map tasks per node”
 }

@@ Line 1: / Line 1: @@
-Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the number of nodes and their flavors for the starting cluster. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation.
+__NOTOC__
-Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that you can use a template only with the plugin it is created for. The same applies to Hadoop version.
+Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the cluster template and provide parameters that need to be changed. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation.
-Savanna has two types of templates: cluster template and node group template.
+Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that one can use a template only with the plugin it is created for. The same applies to Hadoop version.
+Sahara has two types of templates: cluster template and node group template.
 == Node Group Templates ==
-Such template contains configuration for a node in the cluster. It is called “Node Group” because cluster will consist of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor.
+A node group template contains configuration for a node in the cluster. It has “Node Group” in its name because cluster consists of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor.
 Node template contains the following parameters:
@@ Line 27: / Line 29: @@
 | node_processes || list of strings || required ||
 |-
-| node_confs || dict of dicts || required || see example below for exact structure
+| node_configs || dict of dicts || required || see example below for exact structure
 |}
 Example:
      {
-         id: "aee4-strf-o14s-fd34",
+         "id": "aee4-strf-o14s-fd34",
-         flavor: "4",
+         "flavor": "4",
-         image: "ah91-aij1-u78x-iunm",
+         "image": "ah91-aij1-u78x-iunm",
-         name: ”fat task tracker + data node”
+         "name": ”fat task tracker + data node”
-         description: “a template for big nodes ...”,
+         "description": “a template for big nodes ...”,
-         plugin: “apache-hadoop”,
+         "plugin": “apache-hadoop”,
-         hadoop_version: “1.1.1”
+         "hadoop_version": “1.1.1”
-         node_processes: [“task tracker”, “data node”]
+         "node_processes": [“task tracker”, “data node”]
-         node_confs:
+         "node_configs":
              {
-                 ”task tracker”:
+                 ”service:mapreduce”:
                      {
-                         mapred.tasktracker.map.tasks.maximum: 8,
+                         "mapred.tasktracker.map.tasks.maximum": 8,
-                         mapred.tasktracker.reduce.tasks.maximum: 3,
+                         "mapred.tasktracker.reduce.tasks.maximum": 3,
                          ...
                      }
-                 ”data node”:
+                 ”service:hdfs”:
                      {
                            …
                      }
-                 ”OS settings”:
+                 ”general”:
                      {
                            …
@@ Line 60: / Line 62: @@
 == Cluster template ==
-This template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template.
+Cluster template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template.
 {| class="wikitable sortable"
@@ Line 76: / Line 78: @@
 | hadoop_version || string || required ||
 |-
-| config || dict || required ||
+| configs || dict || required ||
 |-
 | node_groups || list of dicts || required || see example below for exact structure
@@ Line 84: / Line 86: @@
    {
-       id: "asdf-wdvc-9as0-q23w",
+       "id": "asdf-wdvc-9as0-q23w",
-       name: ”small cluster”,
+       "name": ”small cluster”,
-       description: “a template for a small cluster”,
+       "description": “a template for a small cluster”,
-       plugin: “apache hadoop”,
+       "plugin": “apache hadoop”,
-       hadoop_version: “1.1.1”
+       "hadoop_version": “1.1.1”
-       config:
+       "configs":
-      {
+          {
-          hdfs_replication_factor: 3,
+              "service:mapreduce":
-          compression: "snappy",
+                  {
-          ...
+                      "compression": "snappy"
+                  }
+              "service:hdfs":
+                  {
+                      "hdfs_replication_factor": 3
+                  }
+              "general":
+                  {
+                      ...
+                  }
        }
-       node_groups:
+       "node_groups_templates":
        [
            {
-               node-group-template: "aee4-strf-o14s-fd34",
+               "name": "master node",
-               count: 3
+              "node-group-template": "aee4-strf-o14s-fd34",
+               "count": 1
            },
            {
-               node-group-template: "fe1t-2t4f-1oa4-fdik",
+               "name": "workers",
-               count: 1
+              "node-group-template": "fe1t-2t4f-1oa4-fdik",
+               "count": 3
            }
        ]
@@ Line 116: / Line 129: @@
 == Configs ==
-Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter might target either the whole cluster or single node or a specific Hadoop process (task tracker, data node, etc). Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values.
+Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter could target either general configuration or a specific service (mapreduce, hdfs) configuration. It can also have scope of either cluster or a specific node group. The scope determines in which type of templates the config will be presented. 'cluster' scoped configs will be presented in cluster template.
+'node' scoped configs appear both in cluster and in node group template. When they are specified in cluster template, they serve as new defaults for node templates used in cluster.
+Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values.
 Example config:
    {
-       name: “mapred.tasktracker.map.tasks.maximum”,
+       "name": “mapred.tasktracker.map.tasks.maximum”,
-       applicable_target: ”task tracker”,
+       "applicable_target": ”service:mapreduce”,
-       default: 2,
+       "scope": "node"
-       required: true,
+      "default": 2,
-       type: “int”,
+       "required": true,
-       description: “map amount per node”
+       "type": “int”,
+       "description": “amount of map tasks per node”
    }

Difference between revisions of "Sahara/Templates"

Latest revision as of 23:51, 24 November 2014

Node Group Templates

Cluster template

Plugin integration

Configs