Jump to: navigation, search


< Sahara
Revision as of 13:53, 26 April 2013 by Sergey Lukjanov (talk | contribs) (Overview)


Savanna Pluggable Provisioning Mechanism aims to deploy Hadoop clusters and integrate them with 3rd party vendor management tools like Cloudera Management Console, Hortonworks Ambari, Intel Hadoop Distribution.

Additionally we changes/provides some objects: node processes and node types.

Node Process is just a process that could be runned at some node in cluster. Here is a list of the supported node processes:

  1. management / mgmt
  2. jobtracker / jt
  3. namenode / nn
  4. tasktracker / tt
  5. datanode/ dn

Node Type is a description of which (one or several) node processes should be executed at the specific node of cluster. Here is a list of some node types:

  1. mgmt
  2. jt+nn
  3. jt
  4. nn
  5. tt+dn
  6. tt
  7. dn

Savanna Plugable Mechanism consists of three components:

  1. Image Registry;
  2. VM Manager;
  3. Plugins.

Components responsibility:

  1. Image Registry:
    1. register image in Savanna;
    2. add/remove tags to/from images;
    3. get images by tags;
  2. VM Manager:
    1. launch/terminate vms;
    2. get vm status;
    3. ssh/scp/etc to vm;
  3. Plugins:
    1. get extra conf (specific for the concrete plugin);
    2. launch / terminate clusters;
    3. add / remove node;
    4. validation ops.

Zones of responsibility

  1. Savanna:
    1. provides resources and infrastructure (pre-configured vms, dns, etc.);
    2. cluster topologies, nodes and storage placement;
    3. cluster/hadoop/tooling configurations and state storage;
  2. Plugins:
    1. cluster monitoring;
    2. additional tools installation and management (Pig, Hive, etc.);
    3. final cluster configuration and hadoop management;
    4. add/remove nodes to/from cluster (with prepared by Savanna resources).


Cluster creation workflow for User:

  1. get list of plugins;
  2. specify cluster name;
  3. choose plugin version and hadoop version (only minor variation);
  4. specifies cluster configuration:
  5. choose a common cluster configuration if needed;
  6. specify flavors for job tracker and name node;
  7. [optional] choose flavor for the management node (if applicable);
  8. add worker nodes with specific node type (data node, task tracker or data node + task tracker) and flavor (each of them could be specified several # times with different flavors or templates);
  9. [optional] fetch list of custom templates and override the cluster configuration, using these templates;
  10. [optional] override some cluster parameters;
  11. launch cluster;
  12. savanna performs basic validation and passes cluster configuration to the plugin;
  13. plugin validates request, if it’s valid then the infrastructure request will be generated;
  14. infrastructure request will contain:
  15. list of tuples (flavor, image, number of instances);
  16. list of actions that are needed to be done after machine started e.g. password-less ssh, setup DNS;
  17. savanna creates and prepares infrastructure and passes description to plugin;
  18. plugin launches Hadoop cluster.

Savanna - plugins interoperability workflow:

  1. User fetches extra cluster configs from Savanna API (Savanna delegates this call to the concrete provisioning plugin);
  2. User launches cluster (adds/removes nodes) using Savanna API;
  3. Savanna parses request and run common validations on it;
  4. Savanna determines which provisioning plugin should be used;
  5. Savanna runs plugin-specific validation for the current operation;
  6. Savanna creates (modifies) cluster object in DB, returns response to user and starts background job that will provision and launch cluster;
  7. User receives response with info about created (modified) cluster from Savanna API;
  8. Savanna calls in background “launch cluster” (add/remove nodes) method if the provisioning plugin;
  9. Plugin receives cluster configuration and can start vms from tagged images optionally using VM Manager and Image Registry;
  10. VM Manager provides helpers for ssh/scp/etc to vms;
  11. Plugin should configure and start 3rd party vendor management tool at the management vm and this tool will control Hadoop cluster;
  12. Plugin can update cluster status and info to expose information about it.