Jump to: navigation, search

Difference between revisions of "TaskFlow/Engines"

m (Formatting updated, TOC is back)
(Supported Types)
 
(43 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
'''Revised on:''' {{REVISIONMONTH1}}/{{REVISIONDAY}}/{{REVISIONYEAR}} by {{REVISIONUSER}}
  
== Disclamer ==
+
== Engine ==
 
 
This is very initial description of the idea of one developer (imelnikov), who even is not (yet) taskflow contributor. Don't consider it to be actual design doc for taskflow or something like that.
 
 
 
== The idea ==
 
  
As of this writing, flow classes (under <code>taskflow.patterns</code>
+
Engines are what ''really'' runs your <code>tasks</code> and <code>flows</code>.
pacakge) have several responsibilities.
 
  
'''Describing flow structure''': implicit and explicit dependencies
+
[[File:4StrokeEngine_Ortho_3D_Small_Mini.gif|frame]]
between tasks and task ordering are part of flow definition.
 
  
'''Hold runtime data''': task results and states are part of flow
+
An engine takes a <code>flow</code> structure (described by patterns) and uses it to decide which <code>task</code> to run and when.
instances internal states (see e.g.
 
<code>taskflow.patterns.linear_flow.Flow.results</code>).
 
  
'''Executing the task''': flow is responsible to select next task(s) to
+
There may be different implementation of engines. Some may be easier to use (ie, require no additional infrastructure setup) and understand, others might require more complicated setup but provide better scalability. The idea and ''ideal'' is that deployers/developers of a service that uses taskflow can select an engine that suites their setup best without modifying the code of said service. This allows for that deployer/developer to start off using a simpler  implementation and scaling out the service that is powered by taskflow as the service grows. In concept, all engines should implement the same interface to make it easy to replace one engine with another, and provide the same guarantees on how patterns are interpreted -- for example, if an engine runs a linear flow, the tasks should be run one after another in order no matter what type of engine is actually running that linear flow.
run or revert based on flow structure and current state and actually run
 
the code.
 
  
It would be nice and cool and actually useful to split the flow into
+
'''Note:''' engines might have different capabilities/configuration but overall the interface '''will''' remain the same and should be transparent to developers and users using taskflow.
three entities so that this responsibilities become separated.
 
  
== Why ==
+
=== Supported Types ===
  
=== Three Highly Disputable Statements ===
+
==== Distributed ====
  
If pattern secifies implementation details, it is not a "pattern", but something else.
+
When you want your applications <code>tasks</code> and <code>flows</code> to be performed in a system that is highly available & resilient to individual failure.
 
 
If logbook does not know about the latest state changes, it is doomed to fail to solve the problems it was made for.
 
 
 
Something should be done about it, like clarifying some of the basic abstractions, listed below.
 
 
 
== Pattern ==
 
 
 
Pattern is a tool to describe '''structure'''.
 
 
 
Possible pattern examples are:
 
* Linear -- run one task after another;
 
* Parallel -- just run all the tasks, in any order or even simultaneously;
 
* DAG -- run tasks with dependency-driven ordering, with no cycles;
 
* Generic graph -- run tasks with dependency-driven ordering, potentially with cycles;
 
* Blocks -- combine all of above into more complicated structure.
 
 
 
The idea is that graph flow (based on topological sort) and threaded
 
flow (work in progress as of this writing, https://review.openstack.org/34488)
 
are the '''same flow patterns''': graph is build from task dependencies,
 
which is analysed to get task ordering. You can run the same tasks via
 
distributed flow on celery, and it will be '''same flow'''.
 
 
 
Because what's matter is code that runs, and everything else are
 
details (though important ones, where devils do hide).
 
 
 
It would be cool to be able to specify how flow is run at runtime or in
 
a configuration file: simple stuff for debugging tasks, distributed for
 
lagre-scale deployments, etc. This is how we come to...
 
 
 
== Engine ==
 
  
Engine is what really runs the tasks. It should take flow structure
+
[[Distributed_Task_Management_With_RPC|Distributed via RPC]]
(described by patterns) and use it to decide which task to run and
 
when.
 
  
Possible engines include:
+
Supports the following:
* simple one -- just takes e.g. linear flow and runs tasks from it one after another, never bothering itself with dependency graphs and such stuff -- should be useful for debugging tasks;
 
* topological -- builds dependency graph from patterns and sorts tasks topologically by it;
 
* threaded -- same as topological, but runs tasks in separate threads enabling them to run in parallel (even several implemetantions are possible);
 
* distributed -- loads tasks to celery (or some other external service) that uses tasks deps to determine ordering.
 
  
Engines might have different capabilities. For example, topological
+
* Remote workers that connect over [http://kombu.readthedocs.org/ kombu] supported transports.
engine will not interpret generic graph patterns because dependency
+
* Combined with jobboards, provides a high-available engine ''orchestrator'' and worker combination.
cycles are error for it, while distributed engine should be fine with it.
+
* ''And more...''
  
Engines should implement same interface to make it easy to replace one
+
==== Traditional ====
engine with another. To describe this interface semantics some
 
abstraction like [[TaskMachine]] would be useful -- every engine should
 
behave 'as if' it is some kind of task machine.
 
  
== Storage ==
+
When you want your <code>tasks</code> and <code>flows</code> to just run inside your applications existing framework and still take advantage of the functionality  offered.
  
We already have it in taskflow -- that's logbook. But it should be
+
Supports the following:
emphasized that logbook should become the authoritative, and,
 
preferably, the '''only''' source of runtime state information. When
 
task returns result, it should be written directly to logbook. When task
 
or flow state changes in any way, logbook is first to know. Flow
 
should '''not''' store task results -- there is logbook for that.
 
  
Logbook is responsible to store the actual data -- it specifies
+
* Threaded engine using a thread based [http://docs.python.org/dev/library/concurrent.futures.html#executor-objects executor].
persistence mechanism (how data is saved and where -- memory, database,
+
* Threaded engine using a provided [http://eventlet.net/ eventlet] greenthread based [http://docs.python.org/dev/library/concurrent.futures.html#executor-objects executor].
whatever), and persistence policy (when data is saved -- every time it
+
* Single threaded engine using no threads.
changes or at some particular moments or simply never).
+
* ''And more...''

Latest revision as of 00:45, 15 March 2014

Revised on: 3/15/2014 by Harlowja

Engine

Engines are what really runs your tasks and flows.

4StrokeEngine Ortho 3D Small Mini.gif

An engine takes a flow structure (described by patterns) and uses it to decide which task to run and when.

There may be different implementation of engines. Some may be easier to use (ie, require no additional infrastructure setup) and understand, others might require more complicated setup but provide better scalability. The idea and ideal is that deployers/developers of a service that uses taskflow can select an engine that suites their setup best without modifying the code of said service. This allows for that deployer/developer to start off using a simpler implementation and scaling out the service that is powered by taskflow as the service grows. In concept, all engines should implement the same interface to make it easy to replace one engine with another, and provide the same guarantees on how patterns are interpreted -- for example, if an engine runs a linear flow, the tasks should be run one after another in order no matter what type of engine is actually running that linear flow.

Note: engines might have different capabilities/configuration but overall the interface will remain the same and should be transparent to developers and users using taskflow.

Supported Types

Distributed

When you want your applications tasks and flows to be performed in a system that is highly available & resilient to individual failure.

Distributed via RPC

Supports the following:

  • Remote workers that connect over kombu supported transports.
  • Combined with jobboards, provides a high-available engine orchestrator and worker combination.
  • And more...

Traditional

When you want your tasks and flows to just run inside your applications existing framework and still take advantage of the functionality offered.

Supports the following:

  • Threaded engine using a thread based executor.
  • Threaded engine using a provided eventlet greenthread based executor.
  • Single threaded engine using no threads.
  • And more...