Jump to: navigation, search

Sahara/PluggableProvisioning

Overview

Savanna Pluggable Provisioning Mechanism aims to deploy Hadoop clusters and integrate them with 3rd party vendor management tools like Cloudera Management Console, Hortonworks Ambari, Intel Hadoop Distribution.

Additionally we changes/provides some objects: node processes and node types.

Node Process is just a process that could be runned at some node in cluster. Here is a list of the supported node processes:

  1. management / mgmt
  2. jobtracker / jt
  3. namenode / nn
  4. tasktracker / tt
  5. datanode/ dn


Node Type is a description of which (one or several) node processes should be executed at the specific node of cluster. Here is a list of some node types:

  1. mgmt
  2. jt+nn
  3. jt
  4. nn
  5. tt+dn
  6. tt
  7. dn

User-Savanna-Plugin interoperability

Savanna-plugin-interop.png


Savanna Plugable Mechanism consists of three components:

  1. Image Registry;
  2. VM Manager;
  3. Plugins.

Components responsibility:

  1. Image Registry:
    1. register image in Savanna;
    2. add/remove tags to/from images;
    3. get images by tags;
  2. VM Manager:
    1. launch/terminate vms;
    2. get vm status;
    3. ssh/scp/etc to vm;
  3. Plugins:
    1. get extra conf (specific for the concrete plugin);
    2. launch / terminate clusters;
    3. add / remove node;
    4. validation ops.

Zones of responsibility

  1. Savanna:
    1. provides resources and infrastructure (pre-configured vms, dns, etc.);
    2. cluster topologies, nodes and storage placement;
    3. cluster/hadoop/tooling configurations and state storage;
  2. Plugins:
    1. cluster monitoring;
    2. additional tools installation and management (Pig, Hive, etc.);
    3. final cluster configuration and hadoop management;
    4. add/remove nodes to/from cluster (with prepared by Savanna resources).

Workflows

Cluster creation workflow for User:

  1. get list of plugins;
  2. specify cluster name;
  3. choose plugin version and hadoop version (only minor variation);
  4. specifies cluster configuration:
  5. choose a common cluster configuration if needed;
  6. specify flavors for job tracker and name node;
  7. [optional] choose flavor for the management node (if applicable);
  8. add worker nodes with specific node type (data node, task tracker or data node + task tracker) and flavor (each of them could be specified several # times with different flavors or templates);
  9. [optional] fetch list of custom templates and override the cluster configuration, using these templates;
  10. [optional] override some cluster parameters;
  11. launch cluster;
  12. savanna performs basic validation and passes cluster configuration to the plugin;
  13. plugin validates request, if it’s valid then the infrastructure request will be generated;
  14. infrastructure request will contain:
    1. list of tuples (flavor, image, number of instances);
    2. list of actions that are needed to be done after machine started e.g. password-less ssh, setup DNS;
  15. savanna creates and prepares infrastructure and passes description to plugin;
  16. plugin launches Hadoop cluster.


Savanna - plugins interoperability workflow:

  1. User fetches extra cluster configs from Savanna API (Savanna delegates this call to the concrete provisioning plugin);
  2. User launches cluster (adds/removes nodes) using Savanna API;
  3. Savanna parses request and run common validations on it;
  4. Savanna determines which provisioning plugin should be used;
  5. Savanna runs plugin-specific validation for the current operation;
  6. Savanna creates (modifies) cluster object in DB, returns response to user and starts background job that will provision and launch cluster;
  7. User receives response with info about created (modified) cluster from Savanna API;
  8. Savanna calls in background “launch cluster” (add/remove nodes) method if the provisioning plugin;
  9. Plugin receives cluster configuration and can start vms from tagged images optionally using VM Manager and Image Registry;
  10. VM Manager provides helpers for ssh/scp/etc to vms;
  11. Plugin should configure and start 3rd party vendor management tool at the management vm and this tool will control Hadoop cluster;
  12. Plugin can update cluster status and info to expose information about it.

Python API level functions

Provisioning plugin functions:

  1. get_versions() - get all versions of hadoop that could be used with plugin
  2. get_configs() - list of all configs supported by plugin with descriptions, defaults and node process for which this config is applicable
  3. get_supported_types() - list of all supported NodeTypes, for example, nn+jt and tt+dn
  4. validate_cluster(cluster_description) - custom validation
  5. get_infra(cluster_description) - cluster should return list of triplets (flavor, image, count, config=”reset_pswd, generate_keys, etc.”)
  6. configure_cluster(cluster_description, vms)
  7. start_cluster(cluster_description, vms)
  8. on_terminate_cluster(cluster_description)


Image registry will provide an ability to set Glance properties to store some info about image, for example:

  1. _savanna_tag_<tag-name>: True
  2. _savanna_description: “short description”
  3. _savanna_os: “ubuntu-12.04-x86_64”
  4. _savanna_hadoop: “hadoop-1.1.1”


Image Registry functions:

  1. cluster image-related properties:
    1. base image info (applied to all nodes in cluster)
      1. base_image_tag
      2. base_image_id
    2. management image info (applied to management node only)
      1. management_image_tag
      2. management_image_id
  2. ability to register image with some tags and description
  3. ability to add/remove tag to/from image