Difference between revisions of "TaskFlow/Persistence"

Revision as of 20:48, 16 August 2013

Revised on: 8/16/2013 by Harlowja

Overview

A persistence API as well as root persistence types are provided with taskflow for the purpose of ensuring that jobs, flows, and there associated tasks can be backed up in a database or in memory. The user, when configuring the persistence API, has the option to specify which backend is desired and subsequently store and retrieve the data associated with the jobs, flows, and tasks in use. Retrieval and storage, when desired, are performed by making use of the root persistence types (i.e. LogBook, FlowDetails, and TaskDetails).

Why?

Allows for reconstruction and resumption of flows and there associated tasks.
Allows for redundant checks that expected data is provided.
Allows for the user to view the history of a jobs, flows and there associated tasks.
Facilitates debugging of taskflow usage and integration (and runtime/post-runtime analysis).

Backends

Configuration

When configuring the backend to use, a string is provided to specify how the data is to be stored. The options currently available for use are detailed below.

Options

SQLAlchemy ('db_backend'):
- Makes use of the sqlalchemy library to store all data in a SQLite (or postgres or mysql) database.
- Will be persisted in the event of a system failure.
In-memory ('mem_backend'):
- Makes use of a in-memory dictionaries to store data in memory in a thread-safe manner.
- Will NOT be persisted in the event of a system failure.
More to come...

Generic Types

Regardless of the backend chosen to persist taskflow data, the generic API (taskflow.persistence.api) must always return one of the following types. These are the basic types with which the user will interface with the backend. When requested from the backend, the returned generic types are a snapshot of the data stored in the backend. Any changes made to the generic types 'may not be automatically updated in the backend, rather only when the user calls the save method of the changed object.

Logbook

Stores a collection of flow details + any metadata about the logbook (last_updated, deleted, name...).
Typically connected to job with which the logbook has a one-to-one relationship.
Provides all of the data necessary to automatically reconstruct a job object.

FlowDetail

Stores a collection of TaskDetails and TaskDetail relations
Persistence representation of a specific run instance of a Flow
Provides all of the TaskDetails and TaskDetail relations necessary for automatic reconstruction of a Flow object

TaskDetail

Stores all of the information associated with one specific run instance of a Task:
- state
- results
- exception
- stacktrace
- metadata

Contributors

Kevin Chen (Rackspace)
Joshua Harlow (Yahoo!)
Jessica Lucci (Rackspace)

@@ Line 1: / Line 1: @@
+'''Revised on:''' {{REVISIONMONTH1}}/{{REVISIONDAY}}/{{REVISIONYEAR}} by {{REVISIONUSER}}
 == Overview ==
-A persistence API as well as generic persistence types are provided with the TaskFlow project for the purpose of ensuring that Jobs, Flows, and Tasks can be backed up in a database or in memory. The user, when configuring the persistence API, has the option to specify which backend is desired and subsequently store and retrieve the data associated with the Jobs, Flows, and Tasks in use. Retrieval and storage, when desired, are performed by making use of generic persistence types i.e. LogBook, FlowDetails, and TaskDetails. Each of these generic types has the ability to save itself to the backend and are also returned by default by the persistence API's getter methods.
+A persistence API as well as root persistence types are provided with taskflow for the purpose of ensuring that jobs, flows, and there associated  tasks can be backed up in a database or in memory. The user, when configuring the persistence API, has the option to specify which backend is desired and subsequently store and retrieve the data associated with the jobs, flows, and tasks in use. Retrieval and storage, when desired, are performed by making use of the root persistence types (i.e. LogBook, FlowDetails, and TaskDetails).
 === Why? ===
-* allows for reconstruction and resumption of Flows
+* Allows for reconstruction and resumption of flows and there associated tasks.
-* allows for redundant checks that expected data is provided
+* Allows for redundant checks that expected data is provided.
-* allows for the user to view the history of a Jobs, Flows, and Tasks
+* Allows for the user to view the history of a jobs, flows and there associated tasks.
-* facilitates debugging of TaskFlow
+* Facilitates debugging of taskflow usage and integration (and runtime/post-runtime analysis).
 == Backends ==
 === Configuration ===
-When configuring the backend to use, a string is provided to specify how the data is to be stored. The options currently available for use are detailed below.
+When configuring the backend to use, a string is provided to specify how the data is to be stored.  The options currently available for use are detailed below.
 === Options ===
-* SQLite ('db_backend'):
+* [http://www.sqlalchemy.org/ SQLAlchemy]  ('db_backend'):
-** Makes use of the sqlalchemy library to store all data in a SQLite table
+** Makes use of the sqlalchemy library to store all data in a SQLite (or postgres or mysql) database.
-** Will be persisted in the event of a system failure
+** Will be persisted in the event of a system failure.
 * In-memory ('mem_backend'):
-** Makes use of a LockingDict class to store data in memory in thread-safe dicts
+** Makes use of a in-memory dictionaries to store data in memory in a thread-safe manner.
-** Will NOT be persisted in the event of a system failure
+** Will '''NOT''' be persisted in the event of a system failure.
 * More to come...
 == Generic Types ==
-Regardless of the backend chosen to persist TaskFlow data, the generic API (taskflow.backends.api) must always return one of the following types. These are the basic types with which the user will interface with the backend. When requested from the backend, the returned generic types are a snapshot of the data stored in the backend. Any changes made to the generic types will not be automatically updated in the backend, rather only when the user calls the save method of the changed object.
+Regardless of the backend chosen to persist taskflow data, the generic API (taskflow.persistence.api) must always return one of the following types. These are the basic types with which the user will interface with the backend. When requested from the backend, the returned generic types are a snapshot of the data stored in the backend. Any changes made to the generic types '''may'' not be automatically updated in the backend, rather only when the user calls the save method of the changed object.
-=== LogBook ===
-* Stores a collection of FlowDetails
+=== [https://en.wikipedia.org/wiki/Logbook Logbook] ===
-* Persistence representation of a Job with which the LogBook has a one-to-one relationship
+* Stores a collection of flow details + any metadata about the logbook (last_updated, deleted, name...).
-* Provides all of the data necessary to automatically reconstruct a Job object
+* Typically connected to [[StructuredWorkflowPrimitives|job]] with which the logbook has a one-to-one relationship.
+* Provides all of the data necessary to automatically reconstruct a job object.
 === FlowDetail ===
 * Stores a collection of TaskDetails and TaskDetail relations
 * Persistence representation of a specific run instance of a Flow