Difference between revisions of "Sahara/Templates"
(Created page with "Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows ...") |
David Lyle (talk | contribs) m |
||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | __NOTOC__ | |
− | Template | + | Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the cluster template and provide parameters that need to be changed. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation. |
− | + | Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that one can use a template only with the plugin it is created for. The same applies to Hadoop version. | |
+ | |||
+ | Sahara has two types of templates: cluster template and node group template. | ||
== Node Group Templates == | == Node Group Templates == | ||
− | + | A node group template contains configuration for a node in the cluster. It has “Node Group” in its name because cluster consists of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor. | |
Node template contains the following parameters: | Node template contains the following parameters: | ||
Line 27: | Line 29: | ||
| node_processes || list of strings || required || | | node_processes || list of strings || required || | ||
|- | |- | ||
− | | | + | | node_configs || dict of dicts || required || see example below for exact structure |
|} | |} | ||
Example: | Example: | ||
{ | { | ||
− | id: "aee4-strf-o14s-fd34", | + | "id": "aee4-strf-o14s-fd34", |
− | flavor: "4", | + | "flavor": "4", |
− | image: "ah91-aij1-u78x-iunm", | + | "image": "ah91-aij1-u78x-iunm", |
− | name: ”fat task tracker + data node” | + | "name": ”fat task tracker + data node” |
− | description: “a template for big nodes ...”, | + | "description": “a template for big nodes ...”, |
− | plugin: “apache-hadoop”, | + | "plugin": “apache-hadoop”, |
− | hadoop_version: “1.1.1” | + | "hadoop_version": “1.1.1” |
− | node_processes: [“task tracker”, “data node”] | + | "node_processes": [“task tracker”, “data node”] |
− | + | "node_configs": | |
{ | { | ||
− | + | ”service:mapreduce”: | |
{ | { | ||
− | mapred.tasktracker.map.tasks.maximum: 8, | + | "mapred.tasktracker.map.tasks.maximum": 8, |
− | mapred.tasktracker.reduce.tasks.maximum: 3, | + | "mapred.tasktracker.reduce.tasks.maximum": 3, |
... | ... | ||
} | } | ||
− | + | ”service:hdfs”: | |
{ | { | ||
… | … | ||
} | } | ||
− | + | ”general”: | |
{ | { | ||
… | … | ||
Line 60: | Line 62: | ||
== Cluster template == | == Cluster template == | ||
− | + | Cluster template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template. | |
{| class="wikitable sortable" | {| class="wikitable sortable" | ||
Line 76: | Line 78: | ||
| hadoop_version || string || required || | | hadoop_version || string || required || | ||
|- | |- | ||
− | | | + | | configs || dict || required || |
|- | |- | ||
| node_groups || list of dicts || required || see example below for exact structure | | node_groups || list of dicts || required || see example below for exact structure | ||
Line 84: | Line 86: | ||
{ | { | ||
− | id: "asdf-wdvc-9as0-q23w", | + | "id": "asdf-wdvc-9as0-q23w", |
− | name: ”small cluster”, | + | "name": ”small cluster”, |
− | description: “a template for a small cluster”, | + | "description": “a template for a small cluster”, |
− | plugin: “apache hadoop”, | + | "plugin": “apache hadoop”, |
− | hadoop_version: “1.1.1” | + | "hadoop_version": “1.1.1” |
− | + | "configs": | |
− | + | { | |
− | + | "service:mapreduce": | |
− | + | { | |
− | + | "compression": "snappy" | |
+ | } | ||
+ | "service:hdfs": | ||
+ | { | ||
+ | "hdfs_replication_factor": 3 | ||
+ | } | ||
+ | "general": | ||
+ | { | ||
+ | ... | ||
+ | } | ||
} | } | ||
− | + | "node_groups_templates": | |
[ | [ | ||
{ | { | ||
− | node-group-template: "aee4-strf-o14s-fd34", | + | "name": "master node", |
− | count: | + | "node-group-template": "aee4-strf-o14s-fd34", |
+ | "count": 1 | ||
}, | }, | ||
{ | { | ||
− | node-group-template: "fe1t-2t4f-1oa4-fdik", | + | "name": "workers", |
− | count: | + | "node-group-template": "fe1t-2t4f-1oa4-fdik", |
+ | "count": 3 | ||
} | } | ||
] | ] | ||
Line 116: | Line 129: | ||
== Configs == | == Configs == | ||
− | Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter | + | Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter could target either general configuration or a specific service (mapreduce, hdfs) configuration. It can also have scope of either cluster or a specific node group. The scope determines in which type of templates the config will be presented. 'cluster' scoped configs will be presented in cluster template. |
+ | |||
+ | 'node' scoped configs appear both in cluster and in node group template. When they are specified in cluster template, they serve as new defaults for node templates used in cluster. | ||
+ | |||
+ | Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values. | ||
Example config: | Example config: | ||
{ | { | ||
− | name: “mapred.tasktracker.map.tasks.maximum”, | + | "name": “mapred.tasktracker.map.tasks.maximum”, |
− | applicable_target: | + | "applicable_target": ”service:mapreduce”, |
− | default: 2, | + | "scope": "node" |
− | required: true, | + | "default": 2, |
− | type: “int”, | + | "required": true, |
− | description: | + | "type": “int”, |
+ | "description": “amount of map tasks per node” | ||
} | } |
Latest revision as of 23:51, 24 November 2014
Hadoop has a large number of parameters and it is hard for end users to find an appropriate configuration for a cluster to achieve good performance. Template mechanism allows to simplify the process of creation and configuration of Hadoop cluster. The end user only needs to specify the cluster template and provide parameters that need to be changed. Cluster configuration will come from the template. It is assumed that templates are created by experienced Hadoop administrators. If user needs to redefine a parameter, he could create a custom template or override the parameter during cluster creation.
Template usage is limited by two things: plugin and Hadoop version. A template is always plugin-specific, because it contain configurations specific for the plugin. That means that one can use a template only with the plugin it is created for. The same applies to Hadoop version.
Sahara has two types of templates: cluster template and node group template.
Node Group Templates
A node group template contains configuration for a node in the cluster. It has “Node Group” in its name because cluster consists of group of nodes having the same configuration. Template includes configuration for Hadoop processes and VM characteristics (e.g. number of reduce slots for task tracker, number of CPUs and amount of RAM). The VM characteristics are specified with OpenStack flavor.
Node template contains the following parameters:
Name | Type | Constraints | Comments |
---|---|---|---|
id | string | required, unique | |
flavor | string | required, should contain a valid flavor id | |
name | string | required, unique | |
description | string | optional | |
plugin | string | required, should contain a valid plugin id | |
hadoop_version | string | required | |
node_processes | list of strings | required | |
node_configs | dict of dicts | required | see example below for exact structure |
Example:
{ "id": "aee4-strf-o14s-fd34", "flavor": "4", "image": "ah91-aij1-u78x-iunm", "name": ”fat task tracker + data node” "description": “a template for big nodes ...”, "plugin": “apache-hadoop”, "hadoop_version": “1.1.1” "node_processes": [“task tracker”, “data node”] "node_configs": { ”service:mapreduce”: { "mapred.tasktracker.map.tasks.maximum": 8, "mapred.tasktracker.reduce.tasks.maximum": 3, ... } ”service:hdfs”: { … } ”general”: { … } } }
Cluster template
Cluster template contains configuration that applies to the whole cluster, e.g. HDFS replication factor or HDFS block size. It also contains list of node group templates. Ideally, this will allow user to create cluster in one click, by just specifying the cluster template.
Name | Type | Constraints | Comments |
---|---|---|---|
id | string | required, unique | |
name | string | required, unique | |
description | string | optional | |
plugin | string | required | |
hadoop_version | string | required | |
configs | dict | required | |
node_groups | list of dicts | required | see example below for exact structure |
Example:
{ "id": "asdf-wdvc-9as0-q23w", "name": ”small cluster”, "description": “a template for a small cluster”, "plugin": “apache hadoop”, "hadoop_version": “1.1.1” "configs": { "service:mapreduce": { "compression": "snappy" } "service:hdfs": { "hdfs_replication_factor": 3 } "general": { ... } } "node_groups_templates": [ { "name": "master node", "node-group-template": "aee4-strf-o14s-fd34", "count": 1 }, { "name": "workers", "node-group-template": "fe1t-2t4f-1oa4-fdik", "count": 3 } ] }
Plugin integration
Plugin should provide the following facilities to support templates:
- Provide list of configs by implementing get_configs(...) method
- In validate_cluster(...) method plugin must verify user inputs specified for configs.
Configs
Plugin should provide a list of configs editable by users. Essentially a “config” is specification of a single parameter. The parameter could target either general configuration or a specific service (mapreduce, hdfs) configuration. It can also have scope of either cluster or a specific node group. The scope determines in which type of templates the config will be presented. 'cluster' scoped configs will be presented in cluster template.
'node' scoped configs appear both in cluster and in node group template. When they are specified in cluster template, they serve as new defaults for node templates used in cluster.
Since plugin provides the list of configs, it must also be able to apply them to cluster when user provides some values.
Example config:
{ "name": “mapred.tasktracker.map.tasks.maximum”, "applicable_target": ”service:mapreduce”, "scope": "node" "default": 2, "required": true, "type": “int”, "description": “amount of map tasks per node” }