Jump to: navigation, search


< Sahara
Revision as of 14:36, 26 April 2013 by Alexander Ignatov (talk | contribs) (Overview)


Savanna Pluggable Provisioning Mechanism aims to deploy Hadoop clusters and integrate them with 3rd party vendor management tools like Cloudera Management Console, Hortonworks Ambari, Intel Hadoop Distribution.

Additionally we change/provide some objects: node processes and node types.

Node Process is just a process that could be runned at some node in cluster. Here is a list of the supported node processes:

  1. management / mgmt
  2. jobtracker / jt
  3. namenode / nn
  4. tasktracker / tt
  5. datanode/ dn

Node Type is a description of which (one or several) node processes should be executed at the specific node of cluster. Here is a list of some node types:

  1. mgmt
  2. jt+nn
  3. jt
  4. nn
  5. tt+dn
  6. tt
  7. dn

User-Savanna-Plugin interoperability


Savanna Plugable Mechanism consists of three components:

  1. Image Registry;
  2. VM Manager;
  3. Plugins.

Components responsibility:

  1. Image Registry:
    1. register image in Savanna;
    2. add/remove tags to/from images;
    3. get images by tags;
  2. VM Manager:
    1. launch/terminate vms;
    2. get vm status;
    3. ssh/scp/etc to vm;
  3. Plugins:
    1. get extra conf (specific for the concrete plugin);
    2. launch / terminate clusters;
    3. add / remove node;
    4. validation ops.

Zones of responsibility

  1. Savanna:
    1. provides resources and infrastructure (pre-configured vms, dns, etc.);
    2. cluster topologies, nodes and storage placement;
    3. cluster/hadoop/tooling configurations and state storage;
  2. Plugins:
    1. cluster monitoring;
    2. additional tools installation and management (Pig, Hive, etc.);
    3. final cluster configuration and hadoop management;
    4. add/remove nodes to/from cluster (with prepared by Savanna resources).


Cluster creation workflow for User:

  1. get list of plugins;
  2. specify cluster name;
  3. choose plugin version and hadoop version (only minor variation);
  4. specifies cluster configuration:
  5. choose a common cluster configuration if needed;
  6. specify flavors for job tracker and name node;
  7. [optional] choose flavor for the management node (if applicable);
  8. add worker nodes with specific node type (data node, task tracker or data node + task tracker) and flavor (each of them could be specified several # times with different flavors or templates);
  9. [optional] fetch list of custom templates and override the cluster configuration, using these templates;
  10. [optional] override some cluster parameters;
  11. launch cluster;
  12. savanna performs basic validation and passes cluster configuration to the plugin;
  13. plugin validates request, if it’s valid then the infrastructure request will be generated;
  14. infrastructure request will contain:
    1. list of tuples (flavor, image, number of instances);
    2. list of actions that are needed to be done after machine started e.g. password-less ssh, setup DNS;
  15. savanna creates and prepares infrastructure and passes description to plugin;
  16. plugin launches Hadoop cluster.

Savanna - plugins interoperability workflow:

  1. User fetches extra cluster configs from Savanna API (Savanna delegates this call to the concrete provisioning plugin);
  2. User launches cluster (adds/removes nodes) using Savanna API;
  3. Savanna parses request and run common validations on it;
  4. Savanna determines which provisioning plugin should be used;
  5. Savanna runs plugin-specific validation for the current operation;
  6. Savanna creates (modifies) cluster object in DB, returns response to user and starts background job that will provision and launch cluster;
  7. User receives response with info about created (modified) cluster from Savanna API;
  8. Savanna calls in background “launch cluster” (add/remove nodes) method if the provisioning plugin;
  9. Plugin receives cluster configuration and can start vms from tagged images optionally using VM Manager and Image Registry;
  10. VM Manager provides helpers for ssh/scp/etc to vms;
  11. Plugin should configure and start 3rd party vendor management tool at the management vm and this tool will control Hadoop cluster;
  12. Plugin can update cluster status and info to expose information about it.

Python API level functions

Provisioning plugin functions:

  1. get_versions() - get all versions of hadoop that could be used with plugin
  2. get_configs() - list of all configs supported by plugin with descriptions, defaults and node process for which this config is applicable
  3. get_supported_types() - list of all supported NodeTypes, for example, nn+jt and tt+dn
  4. validate_cluster(cluster_description) - custom validation
  5. get_infra(cluster_description) - cluster should return list of triplets (flavor, image, count, config=”reset_pswd, generate_keys, etc.”)
  6. configure_cluster(cluster_description, vms)
  7. start_cluster(cluster_description, vms)
  8. on_terminate_cluster(cluster_description)

Image registry will provide an ability to set Glance properties to store some info about image, for example:

  1. _savanna_tag_<tag-name>: True
  2. _savanna_description: “short description”
  3. _savanna_os: “ubuntu-12.04-x86_64”
  4. _savanna_hadoop: “hadoop-1.1.1”

Image Registry functions:

  1. cluster image-related properties:
    1. base image info (applied to all nodes in cluster)
      1. base_image_tag
      2. base_image_id
    2. management image info (applied to management node only)
      1. management_image_tag
      2. management_image_id
  2. ability to register image with some tags and description
  3. ability to add/remove tag to/from image