Sahara/NextGenArchitecture

NextGen Savanna Architecture blueprint

Currently Savanna is the only one service that works in one Python process. All of the REST API, DB interop, provisioning mechanism are working together. It produces many problems with high I/O load while cluster creation, lack of the HA and etc. There should be two phases of improving Savanna Architecture:

Phase 1: Split Savanna to different services Savanna should be split into several horizontally scalable services: savanna-api (s-api); savanna-conductor (s-conductor); savanna-engine (s-engine).

In this phase the following problems should be solved: launching large clusters: need to use several engines to provision one Hadoop cluster; split cluster creation to many atomic tasks; very important to support parallel ssh to many nodes from many engines; starting many clusters in one time: need to utilize more than one process; need to utilize more than one node; long-running tasks: we should avoid waiting for a long time on one engine to minimize risk of interruption; new Provisioning Plugins SPI: it should make provisioning easier for plugins; it should generate tasks that could be executed simultaneously; support EDP: it’ll require long-running tasks; Savanna should support execution of several simultaneously running tasks (several jobs on one cluster); something else?.

The next several sections explains proposed Savanna services’ meanings and responsibilities.

Savanna Architecture Diagram

Service savanna-api (s-api) It should be horizontally scalable and use RPC to run tasks. Responsibilities: REST API: http endpoint/method validation; objects existence validation; jsonschema-based validation; plugin/operation-specific validation: savanna-side validation (for example, it should be restricted to delete templates that are used in other templates or clusters); ask plugin to validate operation; create initial task for operation; this service will be endpoint for Savanna and should be registered in Keystone, in case of many savanna-api we should use HAProxy as endpoint that will balance load between several savanna-api.

Service savanna-conductor (s-conductor) This service should work with database and hide db from all other services. It should support local mode too when all services will directly connect to db. Responsibilities: database interop.

Service savanna-engine (s-engine) Savanna engine service should execute tasks and create new ones. It could call plugins to make something. Responsibilities: run tasks and long-running tasks.

Task execution framework It should be super simple task framework that will use RPC to synchroTBD

Phase 2: Implement HA for provisioning tasks All important operations in Savanna should be reliable, it means that Savanna should support replayings transactions and rollbacks for all tasks. Here is a library for doing it https://wiki.openstack.org/wiki/TaskFlow in stackforge. Looks like that it could be used for Phase 2 of improving Savanna Architecture. To solve reliability and consistency problems we should possibly add one more Savanna service for tasks/workflows coordination - savanna-taskmanager (s-taskmanager) that should horizontally scalable too.

In this phase the following problems should be solved: support for workflows; reliability for tasks/workflows execution; reliability for long-running tasks.

Some notes and FAQ Why not provision agents to Hadoop cluster’s to provision all other stuff?

There are several cons for such behaviour: there will problems with scaling agents for launching large clusters; we can have problems with migrating agents while cluster scaling; agents are unexpected instance resources consumers (it could influence on Hadoop configurations); these agents will need to communicate with all other services (same message queue, same database, etc.), users will have an ability to login to these instances, so it’s a security vulnerability; we will need to support various Linux distros with different python and lib versions.

Why not implement HA now?

Our main focus for now it to implement EDP and improve Savanna performance by support of using several engines to provision one Hadoop cluster and several Hadoop clusters simultaneously. So, we want to split this hard work to several steps and move on step by step.

Why do we need long-running operations?

There are several long-running operations that we should execute, for example, decommissioning when we decreasing cluster size.

How services will interop between each other?

Oslo RPC framework and Message Queue (RabbitMQ, Qpid, etc.) will be used to interop between services.

Potential metrics to determine architecture upgrade success s-api: number of handled requests per second (per all request groups); s-conductor: number of different db queries per second; s-engine: number of simultaneously launching Hadoop clusters per savanna controller; speed of creation one Hadoop cluster with specific number of nodes on specific number of controllers; cluster creation time with specific number of nodes (excluding instances boot time).

==

List of blueprints that’ll be created for the Phase 1: split savanna to several services savanna-api savanna-conductor savanna-engine implement simple distributed task execution mechanism implement long-running tasks support scalable provisioning plugin spi

TODO More details on task framework WTF???