Jump to: navigation, search

TaskFlow/Best practices

Revised on: 1/19/2015 by David K


Since taskflow creates a path toward stable, resumable, and trackable workflows it is helpful to have a small set of recommended best practices to make sure you as a user of taskflow (the library) maximize the benefit you receive from using taskflow. Certain common patterns and best practices will be listed below to show the what, why and how so that you can maximize your taskflow experience and minimize the pain associated with understanding some of the taskflow key primitives.

Note: this list will likely continue growing as new usages of taskflow emerge (and new features are introduced into taskflow), please feel free to add any of your own below.

Tips & Tricks

  • Create reuseable tasks that do one thing well in their execute method; if that task has side-effects undo those side-effects in the revert method.
  • Create idempotent tasks, consider what happens during failure and design your tasks with resumption in mind (this may mean splitting up a large task into smaller pieces so that the execution engine can resume for you at a more granular unit).
  • Clearly define what a task provides so that future (potentially not yet created) tasks can depend on those outputs.
  • Clearly define what a task requires so that engine/s can deduce the correct order to run your tasks in.
  • If your tasks have requirements with the same name, but you want to accept a different input, use the remapping/rebinding feature.
  • Using shared inputs by declaring what a task requires is desirable, but be careful though about sharing the output of a task (especially if the output may not be thread-safe) with more than one receiving task (prefer immutability if this is the case).
    • Be careful since it is very easy to switch an engine from serial to parallel (this is a feature not a bug).
    • If this is a still concern use the eventlet based executor to avoid synchronization issues yet still run in a parallel manner (but not using threads).
  • Link tasks that do one thing well together with the supplied patterns to create flows that do many things well together.
  • For hierarchical workflows, prefer pattern composition (a flow with a subflow...) over excessive manual linking.
  • Prefer automatic ordering deduction over manual ordering deduction as the former is more resilient to future alternations while the latter is not.
  • Be very careful about RPC boundaries and be meticulous about how these boundaries affect your flows, tasks and application.
  • Always associate a (major, minor) version with your tasks so that on software upgrades the previously (potentially partially) completed tasks can be migrated & resumed/reverted.
  • Return serializable objects from tasks not resources (or other non-serializable objects); expect that all returned objects may be persisted indefinitely.
  • Clean up after yourself, logbooks should eventually expire and the underlying data be deleted; do this periodically.
    • TODO: the following blueprint should make this programmable and less manual.
  • Clearly name your tasks with relevant names; names are how restarted flows are re-associated with their previously (potentially partially) completed tasks for resumption or reversion so choose carefully.
  • Raise meaningful exceptions to trigger task and flow reversion; the exception which triggered the flow reversion may be persistently stored (and can be referred to later), make sure it is as useful and meaningful as possible.
  • Be careful with conditions, currently all tasks in a flow will run (unconditional); understand and design ahead of time for this.
    • TODO: there is ongoing design work to solve this (feedback welcome).