Difference between revisions of "Mistral"

Revision as of 08:42, 22 October 2013

Mistral

Mistral is a standalone Task Service that's supposed to be installed within an OpenStack cloud and provide various scheduling and orchestration capabilities for generic computational tasks. Some tasks in turn can depend on other tasks therefore forming a graph. Mistral service provides a convenient API based on simple generic DSL for executing any task flows.

Use cases

Distributed cron. A user wants to schedule a simple script execution on multiple VMs. The user creates a simple task definition with schedule.
Scheduling. A administrator can schedule some business-process to run on time (periodical events etc.) or on external alarm (Ceilometer etc). This can be done even if the program author was not aware of Ceilometer. This is an example of inversion of control because we can replace a particular trigger application without end application being even aware of it.
Live migration. A user specifies tasks for VM live migration triggered upon an event from Ceilometer (CPU consumption 100%).
Long-running business process. A user makes a request to run a complex multi-step business process and wants it to be fault-tolerant so that if the execution crashes at some point on one node then another active node of the system can automatically take on and continue from the exact same point where it stopped. In this use case the user splits the business process into a set of tasks and let Mistral handle them in a sense that it serves as a coordinator and decides what particular task should be started at what time. So that Mistral calls back with "Execute action X, here is the data". If an application that executes action X dies then another instance takes the responsibility to continue the work.
BigData analysis & reporting. A data analyst can use Mistral as a tool for data crawling. For example, in order to prepare a financial report the whole set of steps for gathering and processing required report data can be represented as a graph of related Mistral tasks. As with other cases, Mistral makes sure to supply fault tolerance, high availability and scalability.

Rationale

The main idea behind this services includes the following main points: Ability to upload custom task graph definitions. Graph definitions should be agnostic of any details of specific domains (like orchestration, deployment and so forth). The actual task execution is not performed by the service itself. The service rather serves a coordinator for other worker processes that do the actual work and notify back about task execution results. In other words, task execution should be asynchronous thus providing flexibility for plugging in any domain specific handling and opportunities to make this service scalable and highly available. The service must not contain a predefined set of actions that can be performed. All actions are specific to a particular task graph and described along with the graph itself using simple DSL. Basically, actions represent generic actions that the state machine can schedule to be executed on a worker. The worker itself has a knowledge about how to interpret the task graph actions and do the specific work.

Terminology

Task graph Graph of all possible tasks and valid transitions between them.
Flow Route in a task graph that reflects one possible set of actions performed in a linear fashion. At the same time, the service logically can run individual flows independently thereby leaving freedom for various optimization on an implementation level such as using multiple parallel worker threads.
Session A particular execution. That is, for the given task graph definition and chosen task the service should perform all required actions (subtasks) in order to complete this task. All transitions must be compliant to allowed configured transitions in the task graph definition. Identified by session_id.
Task Defines a flow execution step. Each task is defined with its dependant tasks which the flow execution can jump from in order to reach that task. Identified by session_id + task_name.
Target task The task that a client needs to execute at some point in time. Any task can be chosen as target task in the task graph definition. Once this task has been processed with success the session is considered completed.
Action A particular instruction associated with a task that needs to be performed once the task dependencies are satisfied.
Task state A task can be in a number of predefined states reflecting its current status:

INACTIVE - task dependencies are not satisfied.
PENDING - task dependencies are satisfied but task hasn’t started yet.
RUNNING - task is currently being executed.
SUCCESS - task has finished successfully.
FAILURE - task has finished with an error. All the actual task states belonging to current Session are persisted in DB under session_id key.

Trigger There are several types of conditions which cause a new session to be created when it is met. The actual condition can occur many times and each time (with some limitations specified in the condition itself) a new session will be created.

Design

There is no final decision on the service design. It is actively discussed in mailing lists and IRC #openstck-mistal.

Implementation

There is no implementation yet.

Links & IRC

Project at Launchpad: http://launchpad.net/mistral
Weekly IRC meeting is held on Mondays at 16:00 UTC on #openstack-meeting at Freenode.
Weekly IRC meeting agenda: https://wiki.openstack.org/wiki/Meetings/MistralAgenda

@@ Line 2: / Line 2: @@
 Mistral is a standalone Task Service that's supposed to be installed within an OpenStack cloud and provide various scheduling and orchestration capabilities for generic computational tasks. Some tasks in turn can depend on other tasks therefore forming a graph. Mistral service provides a convenient API based on simple generic DSL for executing any task flows.
+== Use cases ==
+* '''Distributed cron'''. A user wants to schedule a simple script execution on multiple VMs. The user creates a simple task definition with schedule.
+* '''Scheduling'''. A administrator can schedule some business-process to run on time (periodical events etc.) or on external alarm (Ceilometer etc). This can be done even if the program author was not aware of Ceilometer. This is an example of inversion of control because we can replace a particular trigger application without end application being even aware of it.
+* '''Live migration'''. A user specifies tasks for VM live migration triggered upon an event from Ceilometer (CPU consumption 100%).
+* '''Long-running business process'''. A user makes a request to run a complex multi-step business process and wants it to be fault-tolerant so that if the execution crashes at some point on one node then another active node of the system can automatically take on and continue from the exact same point where it stopped. In this use case the user splits the business process into a set of tasks and let Mistral handle them in a sense that it serves as a coordinator and decides what particular task should be started at what time. So that Mistral calls back with "Execute action X, here is the data". If an application that executes action X dies then another instance takes the responsibility to continue the work.
+* '''BigData analysis & reporting'''. A data analyst can use Mistral as a tool for data crawling. For example, in order to prepare a financial report the whole set of steps for gathering and processing required report data can be represented as a graph of related Mistral tasks. As with other cases, Mistral makes sure to supply fault tolerance, high availability and scalability.
 == Rationale ==