Sahara/PluggableProvisioning

Overview
Savanna Pluggable Provisioning Mechanism aims to deploy Hadoop clusters and integrate them with 3rd party vendor management tools like Apache Ambari and Cloudera Management Console. Additionally all functionality related to provisioning and managing vanilla Hadoop will be extracted to separate Vanilla Hadoop plugin. This page describes all components and their workflows. Please visit the following resources for more information:

Other resources
Plugin API and Object Model Savanna/PluggableProvisioning/PluginAPI

Cluster Object Lifecycle Savanna/PluggableProvisioning/ClusterLifecycle

Components and their description
Savanna Pluggable Mechanism consists of the following main components: And there are two more components which are used by Controller and Plugins:
 * 1) Savanna (Controller)
 * 2) Plugins
 * 1) Image Registry (IR);
 * 2) VM Manager;

Image Registry
Savanna in general and in particular each plugin need to have special prepared images in Glance.

E.g. if user wants to deploy the cluster from scratch then only one image is needed - Ubuntu, for example. So there should be an ability to get the exact Ubuntu image from Glance. In more general case there should be an ability to determine which image belongs to specific plugin. Beside this, some plugins operte with special ‘management’ node type which requires special management images.

IR provides a mechanism to resolve this issue: associate image with one or several tags and set some additional info about image, for example:

Information about images is persisted in PostgreSQL. The image has the same id in Glance and in IR.
 * 1) _savanna_tag_: True
 * 2)      _savanna_description: “short description”
 * 3)      _savanna_os: “ubuntu-12.04-x86_64”
 * 4)      _savanna_hadoop: “hadoop-1.1.1”

VM Manager
This component is just a pack of low-level helpers to help plugin interact with vms.

API for IR and VM Manager
Please visit Savanna/PluggableProvisioning/IRAndVMManagerAPI

Savanna and Plugins
It is very important to determine zones of responsibility between Savanna and Plugins:
 * Savanna:
 * provides resources and infrastructure (pre-configured vms, dns, etc.);
 * cluster topologies, nodes and storage placement;
 * cluster/hadoop/tooling configurations and state storage;
 * Plugins:
 * cluster monitoring;
 * additional tools installation and management (Pig, Hive, etc.);
 * final cluster configuration and hadoop management;
 * add/remove nodes to/from cluster (It is opaque for Savanna).

Savanna - plugins interoperability

 * 1) User launches cluster (adds/removes nodes) using Savanna API;
 * 2) User determines which provisioning plugin should be used;
 * 3) If User wants to start cluster from plugin's specific Configuration File then plugin validates the file’s correctness and creates Cluster object. Please see the corresponding workflow in the next section;
 * 4) Savanna parses user’s changes and runs common validations on it; Besides this, Savanna runs plugin-specific validation;
 * 5) Savanna creates (modifies) cluster object in DB, returns response to user and starts background job that will provision and launch cluster;
 * 6) User receives response with info about created (modified) cluster from Savanna API;
 * 7) VM Manager provides helpers for ssh/scp/etc to vms;
 * 8) Plugin should configure and start 3rd party vendor management tool at the management vm and this tool will control Hadoop cluster;
 * 9) Plugin can update cluster status and info to expose information about it.

Cluster creation’s workflow from User’s perspective
The workflow is the following:


 * specify cluster name;
 * get list of plugins;
 * choose plugin version and hadoop version (only minor variation);
 * select cluster’s creation type:
 * User has two variants of cluster creation:
 * a.From Cluster Template (see …)
 * b.From Configuration File (CF)
 * Cluster Template as well as CF describes Cluster’s schema. But CF’s processing and format are specific for each Plugin.


 * If user choose a. then there is an ability to change, add or delete node-groups. During this actions user may vary amount of nodes in each group, change Node Group Template or choose another name for the group. When all changes are made Cluster is ready to be validated.


 * In the second variant the flow is almost the same. But User has no ability to add a new node group or remove the old one during editing node groups’. Instead of ‘Node Group Template’ Flavor is used here.


 * Add or delete node groups if possible and specify flavors or node templates for each node group;
 * launch cluster;
 * [b-flow] Plugin validates the configuration file
 * savanna performs basic validation and passes cluster configuration to the plugin;
 * plugin validates request, if it’s valid then the infrastructure request will be generated;
 * infrastructure request will contain:
 * list of tuples (flavor, image, number of instances);
 * list of actions that are needed to be done after machine started e.g. password-less ssh, setup DNS;
 * savanna creates and prepares infrastructure and passes description to plugin;
 * plugin launches Hadoop cluster.