Difference between revisions of "TaskFlow/Persistence"

Revision as of 22:59, 15 September 2013

Revised on: 9/15/2013 by Harlowja

Overview

A persistence API as well as base persistence types are provided with taskflow for the purpose of ensuring that jobs, flows, and there associated tasks can be backed up in a database or in memory (or elsewhere). The user, when configuring the persistence API, has the option to specify which backend is desired and subsequently store and retrieve the data associated with the jobs, flows, and tasks in use.

Why?

Allows for reconstruction and resumption of flows and there associated tasks.
Allows for redundant checks that expected data is provided.
Allows for the user to view the history of a jobs, flows and there associated tasks.
Facilitates debugging of taskflow usage and integration (and runtime/post-runtime analysis).

Backends

Configuration

When configuring the backend to use, a stevedore driver (which uses python entrypoints) can be specified to locate the backend that your applications desires to use. This allows for easy extensibility of the backend that your application may plan to use (and does not limit the selection of backends to those that are included by default).

Defaults

SQLAlchemy:
- Makes use of the sqlalchemy library to store all data in a SQLite (or postgres or mysql) database.
- Will be persisted in the event of a system failure.
In-memory:
- Makes use of a in-memory dictionaries to store data in memory in a thread-safe manner.
- Will NOT be persisted in the event of a system failure.
More to come...

Types

Regardless of the backend chosen to persist taskflow data, the generic API (taskflow.persistence.backends.api) must always return one of the following types.

Logbook

Stores a collection of flow details + any metadata about the logbook (last_updated, deleted, name...).
Typically connected to job with which the logbook has a one-to-one relationship.
Provides all of the data necessary to automatically reconstruct a job object.

Field	Description
Name	Name of the logbook
UUID	Unique identifier for the logbook
Meta	JSON blob of non-indexable associated logbook information

Flow detail

Stores a collection of task details, metadata about the flow and potentially any task relationships.
Persistence representation of a specific run instance of a flow.
Provides all of the details necessary for automatic reconstruction of a flow object.

Field	Description
Name	Name of the flow
Type	Type of the flow (mod.cls format)
UUID	Unique identifier for the flow
State	State of the flow
Meta	JSON blob of non-indexable associated flow information

Task detail

Stores all of the information associated with one specific run instance of a task.

Field	Description
Name	Name of the task
Type	Type of the flow (mod.cls format)
UUID	Unique identifier for the task
State	State of the task
Results	Results that the task may have produced
Exception	Serialized exception that the task may have produced
Stack trace	Stack trace of the exception that the task may have produced
Version	Version of the task that was ran
Meta	JSON blob of non-indexable associated task information

Checkpointing

A WIP topic/discussion is the concept of check-pointing.

See: Checkpointing

Contributors

Kevin Chen (Rackspace)
Joshua Harlow (Yahoo!)
Jessica Lucci (Rackspace)

@@ Line 22: / Line 22: @@
 When configuring the backend to use, a [http://stevedore.readthedocs.org/en/latest/ stevedore] driver (which uses python entrypoints) can be specified to locate the backend that your applications desires to use. This allows for easy extensibility of the backend that your application may plan to use (and does not limit the selection of backends to those that are included by default).
-=== Options ===
+=== Defaults ===
 * [http://www.sqlalchemy.org/ SQLAlchemy]:
 ** Makes use of the sqlalchemy library to store all data in a SQLite (or postgres or mysql) database.