TaskFlow/Patterns and Engines/Persistence

Revised on: // by

Big Picture
How can we persist the flow so that it can be resumed, restarted or rolled-back on engine failure?

Since a flow is a set of tasks and relations between tasks we need to create a model and corresponding information that allows us to persist the right amount of information to preserve, resume, and rollback a flow on software or hardware failure. To start when a flow is loaded into engine, there is a translation step that associates that flow with persistent data in a storage backend. In particular, for each flow there is a corresponding  record, and for each task there is a corresponding   record. These form the basic level of information about how a flow will be persisted (see below for there schema).

To allow for resumption taskflow must be able to re-create the flow and re-connect the links between tasks (and between tasks->task details and so on) in order to revert them or resume those tasks in the correct ordering. For that to happen, typically a factory function should be provided. The fully qualified name of the function must be saved in ; then, we'll be able to import and call this function again, e. g. after service restart. This function will be responsible with recreating a new set of tasks and flows (which may be the same or altered set of tasks, see below for cases).

Note: Putting flow creation in a separate function should not be to much burden -- the pattern is already used, look for example at [cinder code].

Requirements

 * A task is associated with  by name. This requires that task names are unique within particular flow.

First-Time Loading
When new flow is loaded into engine, there is no persisted data for it yet, so a corresponding  object will be created, as well as a   object for each task and these will be immediately saved into the persistence backend that is configured. If no persistence backend is configured, then as expected nothing will be saved and the tasks will be ran in a non-persistent manner.

Second-Time-Loading
When we resume the flow from a persistent backend (for example, if the flow was interrupted and engine destroyed to save resources or if the service was restarted), we need to re-create the flow. For that, we will call the function that was saved on first-time loading that builds the flow for us (aka; the flow factory function described above).

Then, we load the flow into the engine. For each task, there must already be a  object. We then re-associate tasks with  objects (if a matching one exists) by using task names and then the engine resumes. In this scenario task states and results should be the only information needed to resume the flow.

Note: It might be useful to allow for a migration 'object' (or list of objects) to be provided as well during this re-association process so that tasks can be automatically migrated from there older version to there newer versions.

Resumption Scenarios
The upgrade use-case is probably more interesting and challenging.

While there are several options, the best (and recommended) practice for upgrades should be loading a new (potentially changed) flow/s and migrating or merging previous state of older flows into this new (potentially changed) flow/s. This allows for that new/updated flow to take advantage of pre-existing (potentially partially) completed state, or cleanup from that pre-existing state using a well-behaved and well-defined process.

A couple of options are possible here:

- This is done with the same process as loading a unchanged flow: tasks are associated with saved state by names when the flow is loaded into engine.

Note:  Likely though this will not work out-of-the box for all use cases: sometimes, data/task migrations (much like database migrations, but on higher level) will be needed.

Let's consider several use-cases.

Task was added
This is the simplest use case.

As there is no state for the new task, a new  record will be created for it automatically.

Task was removed
Nothing should be done -- flow structure is reloaded from factory function, and removed task is not in it -- so, flow will be ran as if it was not there, and any results it returned if it was completed before will be ignored.

Task code was changed
The task version will have been altered, and we will be able to identify this when the old task data is loaded. To handle this it may be required that each task (or maybe flow) provide a a upgrade function that will get automatically activated when version mismatches occur, these functions are ran before any task code runs allowing the existing persisted data to be migrated to the newer data before running begins.

Task was split in two tasks or merged from two (or more) to one task
This case is useful when a larger task was split into two tasks. For this to work correctly we might need to allow arbitrary 'migrations' to be ran on the existing task details before they are re-associated with the new tasks/flow. This would allow that 'task data' migration code to translate the old task data into new task data using a set of functions. This is a generalization of the case when task code is changed, but involves more flexibility.

Note: We would likely need to associate a way here to know which migration has been done, typically this is done via a migration number (possibly per flow?).

Flow structure was changed
If manual links were added or removed from graph, or task requirements were changed, or flow was refactored (task moved into or out of subflows, linear flow was replaced with graph flow, tasks were reordered in linear flow, etc), nothing particular should be done. All task state are loaded as usual.

Flow Factory
How do we upgrade?

- We change the code that creates the flow, then we restart the service.

Then, when flow is restored from storage, what we really want is to load new flow, with updated structure and updated tasks, but preserve task states and results. At least, that is most common case.

So, the code should be run to put the tasks into patterns and re-create the flow, but logbook should be consulted to load state and results of tasks. To make this as simple as possible creation of the flow should be put into separate function that can be located on service restart (a module level function, or a staticmethod of a class).

Using Names to Associate Tasks with State
The engine gets a flow and flow details, and should reconstruct its own internal state.

A task should be somehow matched with existing.

The match should be:
 * stable if tasks are added or removed;
 * should not change when service is restarted, upgraded;
 * should be the same across all server instances in HA setups.

One option is that tasks should be matched with  by task name.

This has several implications:
 * the names of tasks should be unique in flow;
 * it becomes too hard to change the name of task.