Jump to: navigation, search

Difference between revisions of "Convection"

(What is Convection)
 
(38 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== PROPOSAL ONLY: Workflow-as-a-Service (Convection) ==
+
== NOTICE: Similar project -> Mistral ==
 +
Leveraging some of the ideas of the proposal here for Convection, at the Icehouse design summit in Hong Kong in the Fall of 2013, a project called Mistral.  Active ongoing work by a few OpenStack contributors has begun on this project.  The proposal here should remain as ideas to reference. For new ideas, it may be beneficial to collaborate with project Mistral:
 +
https://wiki.openstack.org/wiki/Mistral
  
Please note that this is a PROPOSAL ONLY. This is not yet implemented.  
+
 
 +
== PROPOSAL ONLY: TaskSystem-as-a-Service (Convection)  ==
 +
 
 +
Please note that this is a PROPOSAL ONLY. Please refer to [https://wiki.openstack.org/mistral Mistral] project which started in October 2013 and aims to implement the ideas from this proposal and even more. <br/>
 +
Nova's [https://etherpad.openstack.org/task-system Requirements Etherpad]:
  
 
== What is Convection ==
 
== What is Convection ==
Convection is a proposal for a new Workflow-as-a-Service project for OpenStack clouds.  Convection could be a public facing API service that provides <code>task and state management</code> capabilities, enabling OpenStack API consumers to build complex multi-step applications running on an OpenStack cloud which could be a public cloud, private cloud, or a hybrid cloud.  Convection could also be a service that other OpenStack projects could leverage to perform work.  e.x.  One possible method for Heat to perform orchestration of standing up cloud stacks could be to leverage a Workflow service for task oriented steps of spinning up and connecting cloud resources.  Conversely, customers wanting to run meta-workflows could leverage Heat as one task in their meta workflow where orchestration of a stack is one task in a larger meta workflow.
+
Convection is a proposal for a new open sourced TaskSystem-as-a-Service project for cloud workloads. (NOTE: Some may consider this a Workflow-as-a-Service System when compared to similar offers from other cloud vendors, however, the term Task System more accurately reflects the intentions of this service than a Workflow which is often thought of in terms of Business Process Management which may include both automated and manual complex flows across multiple organizations and systems within a business).  Convection could be a public facing API service that provides <code>task and state management</code> capabilities, enabling OpenStack API consumers to build complex multi-step applications running on an OpenStack cloud which could be a public cloud, private cloud, or a hybrid cloud.  Convection could also be a service that other OpenStack projects leverage to perform work.  e.g.  One possible method for Heat to perform orchestration of standing up cloud stacks could be to leverage a Task Service for the steps of spinning up and connecting cloud resources.  Conversely, customers wanting to run meta-task-flows could leverage Heat as one task where orchestration of a stack is a single task in the larger meta-task-flow.
 +
 
 +
== Why the name Convection? ==
 +
Convection was a name proposed by Tim Simpson (Trove developer).  The idea is that (1) Convection "conveys," implying organization of order; (2) Convection is often thought of in context of ovens which produce heat, and the OpenStack project Heat could be one possible consumer of Task Flow where task flows could be analogous to air flow in a convection oven.
 +
 
 +
== What is a Task Flow (sometimes referred to as a Workflow)? ==
 +
[http://en.wikipedia.org/wiki/Workflow Definition]
 +
Note:  There are static workflows and dynamic workflows.
 +
<br />
 +
 
 +
Isn't Workflow an overloaded term?  YES!  There are misconceptions about what the term Workflow actually means, and it is often used to mean things different from the definition above. This is one of the main reasons this service is now being referred to as a Task Flow Service, not a Workflow service.  For Convection conversation purposes, let's define the following terminology:
 +
 
 +
=== Task Flow Terms ===
 +
# Just-in-Sequence (Static) Task Flow:  In an academic context, a workflow is sometimes described as a collection of ordered tasks that occur with a defined start, order, and end. Some tasks may be able to execute in parallel, but a pre-determined tree of workflow steps (and parellel branches) is known before runtime, and the flow of the tree is followed upon every execution of the workflow.
 +
# Just-in-Time (Dynamic) Event Based Task Flow:  A collection of tasks, some of which may or may not have a required order of execution, where task execution is coordinated through communication of events by individual task start/stop/status notifications.  In an event based flow system, there could be a central task execution coordinator that handles listening for events of task completion and sending events for new tasks to start.  Or, code that executes an individual task can encode its own logic to know when to execute based off events directly sent from other tasks. 
 +
<br />
 +
 
 +
I do not wish to specify the idealistic implementation here in this proposal.  I simply want to document some Task Flow concepts and leverage the community for collaborative design of a useful Task Flow system for OpenStack based workloads.
 +
 
 +
=== TaskFlow-as-a-Service is not Orchestration ===
 +
Orchestration (the purpose of project Heat), is not the same as Task Flow management.  A project such as Heat could leverage a Task Flow service or code Library.  A Task Flow service could leverage Heat in that one task of a meta-task-flow could be to call Heat to spin up a stack.  Task Flow is concerned with "task state management'' and "storing of "rules and order" for task execution.  The task system may or may not actually take responsibility for executing the tasks.  Orchestration is concerned with intelligently ''creating, organizing, connecting, and coordinating'' cloud based resources, which may involve creating a task flow and/or executing tasks.
 +
 
 +
=== Use Cases for TaskFlow-as-a-Service ===
 +
We see merit in a standalone Task Flow service that would allow for a variety of functionality to be carried out by other services (e.g. Heat could be one service to make use of Task flow). While the OpenStack project Heat focuses on orchestration of resources and resource connections, Task Flow could be responsible for:
  
== What is a Workflow? ==
+
* A sequence of tasks that have a start and end
[https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDwQFjAB&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWorkflow&ei=GvJtUcG-MaKNigLQ8oHIDg&usg=AFQjCNGgtP47O4sChC0oTAicJrbVTM8BkA&sig2=0W5Hm0Ht-cGRWCpCf8Qmrg&bvm=bv.45368065,d.cGE Definition]
+
* Batch processes (multiple sets of sequences of tasks with starts and ends)
There are static workflows and dynamic workflows.
+
* A persistent job/process (for example an Auto-Scale policy) that remains running until manually terminated
 +
* A job to run for a specified duration (such as run this automated stress test for 2 days, then exit).
 +
<br />
  
=== Isn't Workflow an overloaded term? ===
+
At a high level, one can consider Task Flows as being "batch" (with start/end) and "long running" which execute for some duration or until some triggering event occurs.
YES!  There are arguments about what the term Workflow actually means. For Convection conversation purposes, let's define the following terminology:
 
=== Workflow Terms ===
 
# Deterministic (Static) Workflow:  In an academic context, a workflow is sometimes described as a collection of ordered tasks that occur with a defined start, order, and end. Some tasks may be able to execute in parallel, but a pre-determined tree of workflow steps (and parellel branches) is known before runtime, and the flow of the tree is followed upon every execution of the workflow.
 
# Event Based (Dynamic) Workflow:  A collection of tasks, some of which may or may not have a required order of execution, where task execution is coordinated through communication of events by individual task start/stop/status notifications.  In an event based workflow system, there could be a central task execution coordinator that handles listening for events of task completion and sending events for new tasks to start.  Or, code that executes an individual task can encode its own logic to know when to execute based off events directly sent from other tasks.
 
  
I do not wish to specify the idealistic implementation here in this PROPOSALI simply want to document some workflow concepts, and leverage the community for collaborative design of a useful Workflow system for OpenStack clouds.
+
== Potential TaskSystem-as-a-Service Capabilities ==
 +
The following is a list of proposed capabilities for ConvectionThese are not necessarily required for a minimum viable service and are just ideas of what a Task Flow service might entail:
  
== Use Cases for Workflow-as-a-Service ==
+
=== Conceptual Components ===
Possible use cases for a Workflow Service could include:
 
* Long running processes
 
* Batch Processes (e.g. encoding/decoding)
 
* Task execution management for spinning up resources via an orchestration system (such as Heat)
 
* Scheduled workflows via Nova Scheduler
 
* etc
 
  
== Why the name Convection? ==
+
* <code>Task Flow Engine:</code>  A task flow engine could provide generic task and state management capabilities.  A task flow engine could act as a central state coordinator, enabling task flow client applications to be distributed across public cloud and on-premise deployments.  Task Flow clients offload state management to the Task Flow service thereby allowing the Task Flow clients to be stateless, scalable, and tolerant of process and client failures. The Task Flow engine could support configurable constraints at both the flow and task level, e.g. timeouts, retry count, retry intervals, etc.
Convection was a name proposed by Tim Simpson (Reddwarf developer).  The idea is that (1) Convection "conveys," implying order; (2) Convection is often thought in terms of ovens which produce heat, and the OpenStack project Heat could be one possible consumer of Workflow where task flows could be analogous to air flow in a convection oven.
+
 
 +
* <code>DSL to encapsulate task flow logic:</code>A task flow system does not need to execute task flow logic, but it could as a value added enhancement.  For example, in a simplistic implementation of a Task Flow service, the service itself could maintain task state and leave it up to the clients of Task Flow to implement the business logic of task flow execution.  An enhanced version of a Task Flow service could allow a client to provide task flow business logic to the service in a declarative DSL and the Task Flow engine could execute enforcement of the task flow business logic (e.g. notifying tasks when to run, stop, restart, etc.).
 +
 
 +
* <code>Command Line Tool / Dashboard:</code>  Since OpenStack is a cloud operating system, some operating system tools like ''top'' to see a list of running jobs in the cloud could be very useful.  Tools could provide a drill down of existing task flows, currently running task flows, task flows in states of various execution: running, completed, failed, ready-to-run... and provide the ability to resubmit/retry failed task flow jobs.  Task Flow tools could also provide ''analytics'' -- metrics which could help identify performance bottlenecks or common areas of failures in a task flow that is repeated over and over.  Some possible metrics could be:  average execution time for a task flow, average execution time for individual flow tasks, task/workflow failure rates, etc.
 +
 
 +
* <code>Task Flow Repository:</code>  A task flow repository could expose a set of pre-determined common task flows (e.g. spin up a server and add it to a load balancer).  The Repository facilitates reuse and makes available a compelling set of pre-defined task flow sets. 
 +
 
 +
<br />
 +
One proposal for a Task Flow service could be that it not require clients to upload code to the Task Flow service. Clients would have full flexibility in the language/execution/deployment for the Task Flow tasks. The only requirement is that the task workers are able to access the REST API’s exposed by the service and/or receive notifications from the Task Flow system (e.g. via webhooks or some other mechanism).
 +
 
 +
=== Task Flow Engine ===
 +
Conceptually, a Task Flow consists of a set of tasks that need to execute in a certain order. The order in which the tasks execute could be pre-determined; the ordering could also be determined dynamically based on execution results of a previous task.
 +
<br />
 +
 
 +
==== Capabilities ====
 +
A Task Flow Engine could provide the following features:
 +
# Register a Task Flow and the tasks associated with the task flow via REST API calls
 +
# Ability to specify configurable constraints at the Flow and the task level i.e. timeouts, retry count, retry interval, etc.
 +
# Invoke Task Flow instances
 +
# Query the state of a Task Flow instance
 +
# Query for a list of all the running Task Flow instances for a given Task Flow definition
 +
# Support versioning of Task Flow definitions
 +
# Cancel a Task Flow instance
 +
# Support multiple, parallel invocations of Task Flows
 +
# A Task Flow instance could invoke another task flow instance [Master-child task flows]
 +
<br />
 +
 
 +
==== Datastore ====
 +
The following information could be stored in the Task Flow service datastore:
 +
# List of registered flows, tasks, and the associated constraints like timeouts, retries
 +
# Execution state for the Task Flow instances (completed, running, error, ready to run)
 +
# Scheduled Task Queues. The Task Flow engine could maintain a task queue for each of the registered task types. The Task Flow engine could publish task items to the task queues when a task needs to be scheduled for execution
 +
# Task Flow Process Context containing the runtime information associated with a given task flow instance i.e. the input data that came from the application that invoked the task flow, the output data generated by the Task Flow tasks, and any other data needed for administering the task flow instance, like the start time, running duration, etc.
 +
<br />
  
== Potential Workflow-as-a-Service Capabilities ==
+
==== Conceptual Diagram ====
The following is a list of PROPOSED capabilities for Convection. These are not necessarily required for a minimum viable service and are just ideas of what a workflow service might provide:
+
The diagram below depicts a possible interaction between the Task Flow engine and a Task Flow client making use of the service. The green boxes are implemented by the Task Flow client.  Note that while the diagram below shows an interaction where it is expected that the client will poll the engine for state (i.e. there are no notifications being sent from the engine), one could envision a system where poll, push, or a combination of methods are used to "notify" about state changes.
 +
<br />
 +
[[File:workflow.png|framed|center]]
  
* '''Workflow Engine:''' A workflow engine could provide generic task and state management capabilities.  A workflow engine could act as a central state coordinator, enabling workflow client applications to be distributed across public cloud and on-premise deploymentsWorkflow clients offload state management to the Workflow service thereby allowing the workflow clients to be stateless, scalable, and tolerant of process and client failures. The workflow engine could support configurable constraints at both the workflow and task level, e.g. timeouts, retry count, retry intervals, etc.
+
== Strategy for Implementation ==
 +
April, 2013At the Havana design summit, it was proposed (and generally agreed upon) that a Task Flow System is valuable to a number of projects and customer cloud use cases and should be developedThe following general approach is desired:
 +
# Incubate a Task Flow/System library function within Heat
 +
# Graduate and propose the library to Oslo upon ensured stability and reasonable maturity
 +
# Develop a standalone TaskSystem service using the capabilities of the Task Flow library in Oslo
 +
<br />
  
* '''Workflow Repository:'''  A workflow repository could expose a set of pre-determined common task flows (e.g. spin up a server and add it to a load balancer).
+
The following is the presentation that was given at the Havana summit that led to the outcomes noted above:
 +
[[File:Workflow-proposal-presentation.pdf|thumbnail|Presentation on Convection at Havana Summit]]
  
* '''Command Line Tool / Dashboard:'''  Since OpenStack is a cloud operating system, some operating system tools like ''top'' to see a list of running jobs in the cloud could be very useful.  Tools could provide a drill down of existing workflows, currently running workflows, workflows in states of various execution: running, completed, failed, ready-to-run... and provide the ability to resubmit/retry failed workflow jobs. Workflow tools could also provide ''analytics'' -- metrics which could help identify performance bottlenecks or common areas of failures in a workflow that is repeated over and over. Some possible metrics could be:  average execution time for a workflow, average execution time for individual workflow tasks, task/workflow failure rates, etc.
+
The etherpad notes collected during the un-conference presentation are as follows:  
 +
https://etherpad.openstack.org/Convection

Latest revision as of 22:13, 3 December 2013

NOTICE: Similar project -> Mistral

Leveraging some of the ideas of the proposal here for Convection, at the Icehouse design summit in Hong Kong in the Fall of 2013, a project called Mistral. Active ongoing work by a few OpenStack contributors has begun on this project. The proposal here should remain as ideas to reference. For new ideas, it may be beneficial to collaborate with project Mistral: https://wiki.openstack.org/wiki/Mistral


PROPOSAL ONLY: TaskSystem-as-a-Service (Convection)

Please note that this is a PROPOSAL ONLY. Please refer to Mistral project which started in October 2013 and aims to implement the ideas from this proposal and even more.
Nova's Requirements Etherpad:

What is Convection

Convection is a proposal for a new open sourced TaskSystem-as-a-Service project for cloud workloads. (NOTE: Some may consider this a Workflow-as-a-Service System when compared to similar offers from other cloud vendors, however, the term Task System more accurately reflects the intentions of this service than a Workflow which is often thought of in terms of Business Process Management which may include both automated and manual complex flows across multiple organizations and systems within a business). Convection could be a public facing API service that provides task and state management capabilities, enabling OpenStack API consumers to build complex multi-step applications running on an OpenStack cloud which could be a public cloud, private cloud, or a hybrid cloud. Convection could also be a service that other OpenStack projects leverage to perform work. e.g. One possible method for Heat to perform orchestration of standing up cloud stacks could be to leverage a Task Service for the steps of spinning up and connecting cloud resources. Conversely, customers wanting to run meta-task-flows could leverage Heat as one task where orchestration of a stack is a single task in the larger meta-task-flow.

Why the name Convection?

Convection was a name proposed by Tim Simpson (Trove developer). The idea is that (1) Convection "conveys," implying organization of order; (2) Convection is often thought of in context of ovens which produce heat, and the OpenStack project Heat could be one possible consumer of Task Flow where task flows could be analogous to air flow in a convection oven.

What is a Task Flow (sometimes referred to as a Workflow)?

Definition Note: There are static workflows and dynamic workflows.

Isn't Workflow an overloaded term? YES! There are misconceptions about what the term Workflow actually means, and it is often used to mean things different from the definition above. This is one of the main reasons this service is now being referred to as a Task Flow Service, not a Workflow service. For Convection conversation purposes, let's define the following terminology:

Task Flow Terms

  1. Just-in-Sequence (Static) Task Flow: In an academic context, a workflow is sometimes described as a collection of ordered tasks that occur with a defined start, order, and end. Some tasks may be able to execute in parallel, but a pre-determined tree of workflow steps (and parellel branches) is known before runtime, and the flow of the tree is followed upon every execution of the workflow.
  2. Just-in-Time (Dynamic) Event Based Task Flow: A collection of tasks, some of which may or may not have a required order of execution, where task execution is coordinated through communication of events by individual task start/stop/status notifications. In an event based flow system, there could be a central task execution coordinator that handles listening for events of task completion and sending events for new tasks to start. Or, code that executes an individual task can encode its own logic to know when to execute based off events directly sent from other tasks.


I do not wish to specify the idealistic implementation here in this proposal. I simply want to document some Task Flow concepts and leverage the community for collaborative design of a useful Task Flow system for OpenStack based workloads.

TaskFlow-as-a-Service is not Orchestration

Orchestration (the purpose of project Heat), is not the same as Task Flow management. A project such as Heat could leverage a Task Flow service or code Library. A Task Flow service could leverage Heat in that one task of a meta-task-flow could be to call Heat to spin up a stack. Task Flow is concerned with "task state management and "storing of "rules and order" for task execution. The task system may or may not actually take responsibility for executing the tasks. Orchestration is concerned with intelligently creating, organizing, connecting, and coordinating cloud based resources, which may involve creating a task flow and/or executing tasks.

Use Cases for TaskFlow-as-a-Service

We see merit in a standalone Task Flow service that would allow for a variety of functionality to be carried out by other services (e.g. Heat could be one service to make use of Task flow). While the OpenStack project Heat focuses on orchestration of resources and resource connections, Task Flow could be responsible for:

  • A sequence of tasks that have a start and end
  • Batch processes (multiple sets of sequences of tasks with starts and ends)
  • A persistent job/process (for example an Auto-Scale policy) that remains running until manually terminated
  • A job to run for a specified duration (such as run this automated stress test for 2 days, then exit).


At a high level, one can consider Task Flows as being "batch" (with start/end) and "long running" which execute for some duration or until some triggering event occurs.

Potential TaskSystem-as-a-Service Capabilities

The following is a list of proposed capabilities for Convection. These are not necessarily required for a minimum viable service and are just ideas of what a Task Flow service might entail:

Conceptual Components

  • Task Flow Engine: A task flow engine could provide generic task and state management capabilities. A task flow engine could act as a central state coordinator, enabling task flow client applications to be distributed across public cloud and on-premise deployments. Task Flow clients offload state management to the Task Flow service thereby allowing the Task Flow clients to be stateless, scalable, and tolerant of process and client failures. The Task Flow engine could support configurable constraints at both the flow and task level, e.g. timeouts, retry count, retry intervals, etc.
  • DSL to encapsulate task flow logic:A task flow system does not need to execute task flow logic, but it could as a value added enhancement. For example, in a simplistic implementation of a Task Flow service, the service itself could maintain task state and leave it up to the clients of Task Flow to implement the business logic of task flow execution. An enhanced version of a Task Flow service could allow a client to provide task flow business logic to the service in a declarative DSL and the Task Flow engine could execute enforcement of the task flow business logic (e.g. notifying tasks when to run, stop, restart, etc.).
  • Command Line Tool / Dashboard: Since OpenStack is a cloud operating system, some operating system tools like top to see a list of running jobs in the cloud could be very useful. Tools could provide a drill down of existing task flows, currently running task flows, task flows in states of various execution: running, completed, failed, ready-to-run... and provide the ability to resubmit/retry failed task flow jobs. Task Flow tools could also provide analytics -- metrics which could help identify performance bottlenecks or common areas of failures in a task flow that is repeated over and over. Some possible metrics could be: average execution time for a task flow, average execution time for individual flow tasks, task/workflow failure rates, etc.
  • Task Flow Repository: A task flow repository could expose a set of pre-determined common task flows (e.g. spin up a server and add it to a load balancer). The Repository facilitates reuse and makes available a compelling set of pre-defined task flow sets.


One proposal for a Task Flow service could be that it not require clients to upload code to the Task Flow service. Clients would have full flexibility in the language/execution/deployment for the Task Flow tasks. The only requirement is that the task workers are able to access the REST API’s exposed by the service and/or receive notifications from the Task Flow system (e.g. via webhooks or some other mechanism).

Task Flow Engine

Conceptually, a Task Flow consists of a set of tasks that need to execute in a certain order. The order in which the tasks execute could be pre-determined; the ordering could also be determined dynamically based on execution results of a previous task.

Capabilities

A Task Flow Engine could provide the following features:

  1. Register a Task Flow and the tasks associated with the task flow via REST API calls
  2. Ability to specify configurable constraints at the Flow and the task level i.e. timeouts, retry count, retry interval, etc.
  3. Invoke Task Flow instances
  4. Query the state of a Task Flow instance
  5. Query for a list of all the running Task Flow instances for a given Task Flow definition
  6. Support versioning of Task Flow definitions
  7. Cancel a Task Flow instance
  8. Support multiple, parallel invocations of Task Flows
  9. A Task Flow instance could invoke another task flow instance [Master-child task flows]


Datastore

The following information could be stored in the Task Flow service datastore:

  1. List of registered flows, tasks, and the associated constraints like timeouts, retries
  2. Execution state for the Task Flow instances (completed, running, error, ready to run)
  3. Scheduled Task Queues. The Task Flow engine could maintain a task queue for each of the registered task types. The Task Flow engine could publish task items to the task queues when a task needs to be scheduled for execution
  4. Task Flow Process Context containing the runtime information associated with a given task flow instance i.e. the input data that came from the application that invoked the task flow, the output data generated by the Task Flow tasks, and any other data needed for administering the task flow instance, like the start time, running duration, etc.


Conceptual Diagram

The diagram below depicts a possible interaction between the Task Flow engine and a Task Flow client making use of the service. The green boxes are implemented by the Task Flow client. Note that while the diagram below shows an interaction where it is expected that the client will poll the engine for state (i.e. there are no notifications being sent from the engine), one could envision a system where poll, push, or a combination of methods are used to "notify" about state changes.

Workflow.png

Strategy for Implementation

April, 2013: At the Havana design summit, it was proposed (and generally agreed upon) that a Task Flow System is valuable to a number of projects and customer cloud use cases and should be developed. The following general approach is desired:

  1. Incubate a Task Flow/System library function within Heat
  2. Graduate and propose the library to Oslo upon ensured stability and reasonable maturity
  3. Develop a standalone TaskSystem service using the capabilities of the Task Flow library in Oslo


The following is the presentation that was given at the Havana summit that led to the outcomes noted above: File:Workflow-proposal-presentation.pdf

The etherpad notes collected during the un-conference presentation are as follows: https://etherpad.openstack.org/Convection