Jump to: navigation, search

Difference between revisions of "TaskFlow/Best practices"

(Tips & Tricks)
 
(35 intermediate revisions by 3 users not shown)
Line 9: Line 9:
 
=== Tips & Tricks ===
 
=== Tips & Tricks ===
  
* Create '''reuseable''' tasks that do '''one''' thing well; if that task has side-effects '''undo''' those side-effects in the <code>revert</code> method.
+
* Create '''reuseable''' tasks that do '''one''' thing well in their <code>execute</code> method; if that task has side-effects '''undo''' those side-effects in the <code>revert</code> method.
 +
* Create '''idempotent''' tasks, consider what happens during failure and design your tasks with resumption in mind (this may mean splitting up a large task into smaller pieces so that the execution engine can resume for you at a more granular unit).
 
* '''Clearly''' define what a task <code>provides</code> so that future (potentially not yet created) tasks can depend on those outputs.
 
* '''Clearly''' define what a task <code>provides</code> so that future (potentially not yet created) tasks can depend on those outputs.
 
* '''Clearly''' define what a task <code>requires</code> so that <code>engine/s</code> can deduce the correct order to run your tasks in.
 
* '''Clearly''' define what a task <code>requires</code> so that <code>engine/s</code> can deduce the correct order to run your tasks in.
 
* If your tasks have requirements with the same name, but you want to accept a '''different''' input, use the '''remapping/rebinding''' feature.
 
* If your tasks have requirements with the same name, but you want to accept a '''different''' input, use the '''remapping/rebinding''' feature.
* Using shared inputs by declaring what a task requires is '''desirable''', but be '''careful''' though about '''sharing''' the output of a task (especially if the output may not be thread-safe) with more than one receiving task (prefer immutability if this is the case).  
+
* Using shared inputs by declaring what a task requires is '''desirable''', but be '''careful''' though about '''sharing''' the output of a task (especially if the output may not be thread-safe) with more than one receiving task (prefer [http://en.wikipedia.org/wiki/Immutable_object immutability] if this is the case).  
 
** Be careful since it is '''very''' easy to switch an engine from <code>serial</code> to <code>parallel</code> (this is a feature not a bug).
 
** Be careful since it is '''very''' easy to switch an engine from <code>serial</code> to <code>parallel</code> (this is a feature not a bug).
 
** If this is a still concern use the [http://eventlet.net/ eventlet] based executor to avoid synchronization issues yet still run in a parallel manner (but not using threads).
 
** If this is a still concern use the [http://eventlet.net/ eventlet] based executor to avoid synchronization issues yet still run in a parallel manner (but not using threads).
Line 19: Line 20:
 
* For hierarchical workflows, prefer pattern '''composition''' (a flow with a subflow...) over excessive '''manual''' linking.
 
* For hierarchical workflows, prefer pattern '''composition''' (a flow with a subflow...) over excessive '''manual''' linking.
 
* Prefer '''automatic''' ordering deduction over '''manual''' ordering deduction as the former is more resilient to future alternations while the latter is not.
 
* Prefer '''automatic''' ordering deduction over '''manual''' ordering deduction as the former is more resilient to future alternations while the latter is not.
* Be '''very''' careful about RPC boundaries and be '''meticulous''' about how these boundaries affect your flows & tasks & your application.
+
* Be '''very''' careful about RPC boundaries and be '''meticulous''' about how these boundaries affect your flows, tasks and application.
 
* '''Always''' associate a (major, minor) version with your tasks so that on software upgrades the previously (potentially partially) completed tasks can be migrated & resumed/reverted.
 
* '''Always''' associate a (major, minor) version with your tasks so that on software upgrades the previously (potentially partially) completed tasks can be migrated & resumed/reverted.
 
* Return '''serializable''' objects from tasks not '''resources''' (or other non-serializable objects); expect that all returned objects may be persisted indefinitely.
 
* Return '''serializable''' objects from tasks not '''resources''' (or other non-serializable objects); expect that all returned objects may be persisted indefinitely.
* '''Clean up after yourself''', logbooks should be eventually expired and the underlying data deleted; do this periodically.
+
* '''Clean up after yourself''', logbooks should eventually expire and the underlying data be deleted; do this periodically.
** '''TODO:''' the following [http://blueprints.launchpad.net/taskflow/+spec/book-retention blueprint] should make this programmable & less manual.
+
** '''TODO:''' the following [http://blueprints.launchpad.net/taskflow/+spec/book-retention blueprint] should make this programmable and less manual.
* Clearly name your tasks with '''relevant''' names;  names are how restarted flows are re-associated with there previously (potentially partially) completed tasks for resumption or reversion so choose carefully.
+
* Clearly name your tasks with '''relevant''' names;  names are how restarted flows are re-associated with their previously (potentially partially) completed tasks for resumption or reversion so choose carefully.
* Raise '''meaningful''' exceptions to trigger task and flow reversion;  the exception which triggered the flow reversion may be persistently stored (and can be referred to later), make sure it is as useful & meaningful as possible.
+
* Raise '''meaningful''' exceptions to trigger task and flow reversion;  the exception which triggered the flow reversion may be persistently stored (and can be referred to later), make sure it is as useful and meaningful as possible.
 
* Be '''careful''' with conditions, currently all tasks in a flow will run (unconditional); understand and design '''ahead of time''' for this.
 
* Be '''careful''' with conditions, currently all tasks in a flow will run (unconditional); understand and design '''ahead of time''' for this.
** '''TODO:''' there will be a [http://blueprints.launchpad.net/taskflow/+spec/conditional-flow-choices blueprint] to help with this soon.
+
** '''TODO:''' there is ongoing [http://review.openstack.org/#/c/98946/ design work] to solve this (feedback welcome).
 
 
=== Mind-Altering Substances ===
 
 
 
Using taskflow requires a slight shift in mind-set and changes a little bit of
 
how your normal code workflow would run. The taskflow team has tried to keep
 
the amount of mind-altering required to use taskflow to as much of a minimum
 
as possible (since mind-altering means learning new concepts, or suppressing
 
existing ones) to make it easy to adopt taskflow into your service/application/library.
 
 
 
==== Exceptions ====
 
 
 
Exceptions that occur in a task, and which are not caught by the internals of a
 
task will be default currently trigger reversion of the workflow that task was
 
in to start occurring. If multiple tasks in a workflow raise exceptions
 
(say they are executing at the same time via a parallel engine or a distributed
 
engine) then the individual ''paths'' that lead to that task will be reverted
 
(if an ancestor task is shared by multiple failing tasks, it will be reverted
 
only once). In the future [https://blueprints.launchpad.net/taskflow/+spec/reversion-strategies reversion strategies]
 
should be able to make this more customizable (allowing more customizable ways
 
to handle or alter the reversion process).
 
 
 
==== Execution flow ====
 
 
 
When a set of tasks and associated structure that contains those tasks (aka the
 
flows that create that structure) are given to an engine, along with a possible
 
(but not needed) backend where the engine can store intermediate results (which
 
is needed if the workflow should be able to resume on failure) the engine becomes
 
the execution unit that is responsible for reliably executing the tasks that are
 
contained in the flows that you provide it. That task will ensure the structure
 
that is provided is retained when executing. For example a linear ordering of tasks
 
by using a linear_flow structure will '''always''' be ran in linear order. A
 
set of tasks that are structured in dependency ordering will '''always''' be ran
 
in that dependency order. These ''constraints'' the engine must adhere to; note
 
other constraints may be enforced by the engine type that is being activated (ie
 
a single threaded engine will only run in a single thread, a distributed or worker
 
based engine will run remotely). So when selecting an engine to use, make sure
 
to carefully select the desired feature set that will work for your application.
 
 
 
==== Control flow ====
 
 
 
This is a variation in how a programmer normally programs, in order to be able
 
to track the execution of your workflow, the workflow must be split up into
 
small pieces (in a way similar to functions) that taskflow can then run in a
 
'''ahead of time''' defined structure (aka a flow). Taskflow engines using this
 
information then can run your structure in a well defined and resumable manner.
 
This does though currently have a few side-effects in that certain traditional
 
operations (if-then-else, do-while, fork-join, map-reduce) become more
 
complex in that those types of control flows do not easily map to a
 
representation that can be easily resumed. To keep taskflow relatively minimal
 
we are trying to use the minimal run workflow operation to completion with
 
customized reverting strategies to accomplish most of the use-cases in openstack.
 
If these control flows become valueable then we will revisit if and how we should
 
make these accessible to users of taskflow.
 
 
 
'''NOTE:''' Inside of a task the execute() method of that task may use whichever
 
existing control flow it desires (any supported by python), but outside of the
 
execute() the set of control flow operators are more minimal (due to above
 
reasoning).
 
 
 
==== Chaining ====
 
 
 
==== Composition ====
 
 
 
==== Piece by piece ====
 

Latest revision as of 11:32, 19 January 2015

Revised on: 1/19/2015 by David K

Why

Since taskflow creates a path toward stable, resumable, and trackable workflows it is helpful to have a small set of recommended best practices to make sure you as a user of taskflow (the library) maximize the benefit you receive from using taskflow. Certain common patterns and best practices will be listed below to show the what, why and how so that you can maximize your taskflow experience and minimize the pain associated with understanding some of the taskflow key primitives.

Note: this list will likely continue growing as new usages of taskflow emerge (and new features are introduced into taskflow), please feel free to add any of your own below.

Tips & Tricks

  • Create reuseable tasks that do one thing well in their execute method; if that task has side-effects undo those side-effects in the revert method.
  • Create idempotent tasks, consider what happens during failure and design your tasks with resumption in mind (this may mean splitting up a large task into smaller pieces so that the execution engine can resume for you at a more granular unit).
  • Clearly define what a task provides so that future (potentially not yet created) tasks can depend on those outputs.
  • Clearly define what a task requires so that engine/s can deduce the correct order to run your tasks in.
  • If your tasks have requirements with the same name, but you want to accept a different input, use the remapping/rebinding feature.
  • Using shared inputs by declaring what a task requires is desirable, but be careful though about sharing the output of a task (especially if the output may not be thread-safe) with more than one receiving task (prefer immutability if this is the case).
    • Be careful since it is very easy to switch an engine from serial to parallel (this is a feature not a bug).
    • If this is a still concern use the eventlet based executor to avoid synchronization issues yet still run in a parallel manner (but not using threads).
  • Link tasks that do one thing well together with the supplied patterns to create flows that do many things well together.
  • For hierarchical workflows, prefer pattern composition (a flow with a subflow...) over excessive manual linking.
  • Prefer automatic ordering deduction over manual ordering deduction as the former is more resilient to future alternations while the latter is not.
  • Be very careful about RPC boundaries and be meticulous about how these boundaries affect your flows, tasks and application.
  • Always associate a (major, minor) version with your tasks so that on software upgrades the previously (potentially partially) completed tasks can be migrated & resumed/reverted.
  • Return serializable objects from tasks not resources (or other non-serializable objects); expect that all returned objects may be persisted indefinitely.
  • Clean up after yourself, logbooks should eventually expire and the underlying data be deleted; do this periodically.
    • TODO: the following blueprint should make this programmable and less manual.
  • Clearly name your tasks with relevant names; names are how restarted flows are re-associated with their previously (potentially partially) completed tasks for resumption or reversion so choose carefully.
  • Raise meaningful exceptions to trigger task and flow reversion; the exception which triggered the flow reversion may be persistently stored (and can be referred to later), make sure it is as useful and meaningful as possible.
  • Be careful with conditions, currently all tasks in a flow will run (unconditional); understand and design ahead of time for this.
    • TODO: there is ongoing design work to solve this (feedback welcome).