Difference between revisions of "TaskFlow/Patterns and Engines/Persistence"

Revision as of 05:33, 7 February 2014

Revised on: 2/7/2014 by Harlowja

Big Picture

How can we persist the flow so that it can be resumed, restarted or rolled-back on engine failure?

Since a flow is a set of tasks and relations between tasks we need to create a model and corresponding information that allows us to persist the right amount of information to preserve, resume, and rollback a flow on software or hardware failure. To start when a flow is loaded into engine, there is a translation step that associates that flow with persistent data in a storage backend. In particular, for each flow there is a corresponding FlowDetail record, and for each task there is a corresponding TaskDetail record. These form the basic level of information about how a flow will be persisted (see below for there schema).

To allow for resumption taskflow must be able to re-create the flow and re-connect the links between tasks (and between tasks->task details and so on) in order to revert them or resume those tasks in the correct ordering. For that to happen, typically a factory function should be provided. The fully qualified name of the function must be saved in FlowDetail; then, we'll be able to import and call this function again, e. g. after service restart. This function will be responsible with recreating a new set of tasks and flows (which may be the same or altered set of tasks, see below for cases).

Note: Putting flow creation in a separate function should not be to much burden -- the pattern is already used, look for example at [cinder code].

Requirements

A task is associated with TaskDetail by name. This requires that task names are unique within particular flow.

First-Time Loading

When new flow is loaded into engine, there is no persisted data for it yet, so a corresponding FlowDetail object will be created, as well as a TaskDetail object for each task and these will be immediately saved into the persistence backend that is configured. If no persistence backend is configured, then as expected nothing will be saved and the tasks will be ran in a non-persistent manner.

Second-Time-Loading

When we resume the flow from a persistent backend (for example, if the flow was interrupted and engine destroyed to save resources or if the service was restarted), we need to re-create the flow. For that, we will call the function that was saved on first-time loading that builds the flow for us (aka; the flow factory function described above).

Then, we load the flow into the engine. For each task, there must already be a TaskDetail object. We then re-associate tasks with TaskDetail objects (if a matching one exists) by using task names and then the engine resumes. In this scenario task states and results should be the only information needed to resume the flow.

Note: It might be useful to allow for a migration 'object' (or list of objects) to be provided as well during this re-association process so that tasks can be automatically migrated from there older version to there newer versions.

Resumption Scenarios

The upgrade use-case is probably more interesting and challenging.

While there are several options, the best (and recommended) practice for upgrades should be loading a new (potentially changed) flow/s and migrating or merging previous state of older flows into this new (potentially changed) flow/s. This allows for that new/updated flow to take advantage of pre-existing (potentially partially) completed state, or cleanup from that pre-existing state using a well-behaved and well-defined process.

A couple of options are possible here:

- This is done with the same process as loading a unchanged flow: tasks are associated with saved state by names when the flow is loaded into engine.

Note: Likely though this will not work out-of-the box for all use cases: sometimes, data/task migrations (much like database migrations, but on higher level) will be needed.

Let's consider several use-cases.

Task was added

This is the simplest use case.

As there is no state for the new task, a new TaskDetail record will be created for it automatically.

Task was removed

Nothing should be done -- flow structure is reloaded from factory function, and removed task is not in it -- so, flow will be ran as if it was not there, and any results it returned if it was completed before will be ignored.

Task code was changed

The task version will have been altered, and we will be able to identify this when the old task data is loaded. To handle this it may be required that each task (or maybe flow) provide a a upgrade function that will get automatically activated when version mismatches occur, these functions are ran before any task code runs allowing the existing persisted data to be migrated to the newer data before running begins.

Task was split in two tasks or merged from two (or more) to one task

This case is useful when a larger task was split into two tasks. For this to work correctly we might need to allow arbitrary 'migrations' to be ran on the existing task details before they are re-associated with the new tasks/flow. This would allow that 'task data' migration code to translate the old task data into new task data using a set of functions. This is a generalization of the case when task code is changed, but involves more flexibility.

Note: We would likely need to associate a way here to know which migration has been done, typically this is done via a migration number (possibly per flow?).

Flow structure was changed

If manual links were added or removed from graph, or task requirements were changed, or flow was refactored (task moved into or out of subflows, linear flow was replaced with graph flow, tasks were reordered in linear flow, etc), nothing particular should be done. All task state are loaded as usual.

Design rationales

Flow Factory

How do we upgrade?

- We change the code that creates the flow, then we restart the service.

Then, when flow is restored from storage, what we really want is to load new flow, with updated structure and updated tasks, but preserve task states and results. At least, that is most common case.

So, the code should be run to put the tasks into patterns and re-create the flow, but logbook should be consulted to load state and results of tasks. To make this as simple as possible creation of the flow should be put into separate function that can be located on service restart (a module level function, or a staticmethod of a class).

Using Names to Associate Tasks with State

The engine gets a flow and flow details, and should reconstruct its own internal state.

A task should be somehow matched with existing TaskDetail(s).

The match should be:

stable if tasks are added or removed;
should not change when service is restarted, upgraded;
should be the same across all server instances in HA setups.

One option is that tasks should be matched with TaskDetail(s) by task name.

This has several implications:

the names of tasks should be unique in flow;
it becomes too hard to change the name of task.

Types

Task detail

Stores all of the information associated with one specific run instance of a task.

Field	Description
Name	Name of the task
UUID	Unique identifier for the task
State	State of the task
Results	Results that the task may have produced
Failure	Serialized failure that the task may have produced (including stack trace, exception type, exception string)
Version	Version of the task that was ran
Meta	JSON blob of non-indexable associated task information

Flow detail

Stores a collection of task details, metadata about the flow and potentially any task relationships.
Persistence representation of a specific run instance of a flow.
Provides all of the details necessary for automatic reconstruction of a flow object.

Field	Description
Name	Name of the flow
UUID	Unique identifier for the flow
State	State of the flow
Meta	JSON blob of non-indexable associated flow information

Logbook

Stores a collection of flow details + any metadata about the logbook (last_updated, deleted, name...).
Typically connected to job with which the logbook has a one-to-one relationship.
Provides all of the data necessary to automatically reconstruct a job object.

Field	Description
Name	Name of the logbook
UUID	Unique identifier for the logbook
Meta	JSON blob of non-indexable associated logbook information

@@ Line 14: / Line 14: @@
 * A task is associated with <code>TaskDetail</code> by '''name'''. This requires that task names are unique within particular flow.
-'''TBD:''' is there any other way to do this, using a uuid?
 === First-Time Loading ===