Difference between revisions of "TaskFlow/Patterns and Engines/Persistence"

Latest revision as of 20:43, 26 April 2014

Revised on: 4/26/2014 by Harlowja

Big Picture

How can we persist the flow so that it can be resumed, restarted or rolled-back on engine failure?

Since a flow is a set of tasks and relations between tasks we need to create a model and corresponding information that allows us to persist the right amount of information to preserve, resume, and rollback a flow on software or hardware failure. To start when a flow is loaded into engine, there is a translation step that associates that flow with persistent data in a storage backend. In particular, for each flow there is a corresponding FlowDetail record, and for each task there is a corresponding TaskDetail record. These form the basic level of information about how a flow will be persisted (see below for there schema).

To allow for resumption taskflow must be able to re-create the flow and re-connect the links between tasks (and between tasks->task details and so on) in order to revert them or resume those tasks in the correct ordering. For that to happen, typically a factory function should be provided. The fully qualified name of the function must be saved in FlowDetail; then, we'll be able to import and call this function again, e. g. after service restart. This function will be responsible with recreating a new set of tasks and flows (which may be the same or altered set of tasks, see below for cases).

Note: Putting flow creation in a separate function should not be to much burden -- the pattern is already used, look for example at [cinder code].

Requirements

A task is associated with TaskDetail by name. This requires that task names are unique within particular flow.

First-Time Loading

When new flow is loaded into engine, there is no persisted data for it yet, so a corresponding FlowDetail object will be created, as well as a TaskDetail object for each task and these will be immediately saved into the persistence backend that is configured. If no persistence backend is configured, then as expected nothing will be saved and the tasks will be ran in a non-persistent manner.

Second-Time-Loading

When we resume the flow from a persistent backend (for example, if the flow was interrupted and engine destroyed to save resources or if the service was restarted), we need to re-create the flow. For that, we will call the function that was saved on first-time loading that builds the flow for us (aka; the flow factory function described above).

Then, we load the flow into the engine. For each task, there must already be a TaskDetail object. We then re-associate tasks with TaskDetail objects (if a matching one exists) by using task names and then the engine resumes. In this scenario task states and results should be the only information needed to resume the flow.

Note: It might be useful to allow for a migration 'object' (or list of objects) to be provided as well during this re-association process so that tasks can be automatically migrated from there older version to there newer versions.

Resumption Scenarios

The upgrade use-case is probably more interesting and challenging.

While there are several options, the best (and recommended) practice for upgrades should be loading a new (potentially changed) flow/s and migrating or merging previous state of older flows into this new (potentially changed) flow/s. This allows for that new/updated flow to take advantage of pre-existing (potentially partially) completed state, or cleanup from that pre-existing state using a well-behaved and well-defined process.

A couple of options are possible here:

- This is done with the same process as loading a unchanged flow: tasks are associated with saved state by names when the flow is loaded into engine.

Note: Likely though this will not work out-of-the box for all use cases: sometimes, data/task migrations (much like database migrations, but on higher level) will be needed.

Let's consider several use-cases.

Task was added

This is the simplest use case.

As there is no state for the new task, a new TaskDetail record will be created for it automatically.

Task was removed

Nothing should be done -- flow structure is reloaded from factory function, and removed task is not in it -- so, flow will be ran as if it was not there, and any results it returned if it was completed before will be ignored.

Task code was changed

The task version will have been altered, and we will be able to identify this when the old task data is loaded. To handle this it may be required that each task (or maybe flow) provide a a upgrade function that will get automatically activated when version mismatches occur, these functions are ran before any task code runs allowing the existing persisted data to be migrated to the newer data before running begins.

Task was split in two tasks or merged from two (or more) to one task

This case is useful when a larger task was split into two tasks. For this to work correctly we might need to allow arbitrary 'migrations' to be ran on the existing task details before they are re-associated with the new tasks/flow. This would allow that 'task data' migration code to translate the old task data into new task data using a set of functions. This is a generalization of the case when task code is changed, but involves more flexibility.

Note: We would likely need to associate a way here to know which migration has been done, typically this is done via a migration number (possibly per flow?).

Flow structure was changed

If manual links were added or removed from graph, or task requirements were changed, or flow was refactored (task moved into or out of subflows, linear flow was replaced with graph flow, tasks were reordered in linear flow, etc), nothing particular should be done. All task state are loaded as usual.

Design rationales

Flow Factory

How do we upgrade?

- We change the code that creates the flow, then we restart the service.

Then, when flow is restored from storage, what we really want is to load new flow, with updated structure and updated tasks, but preserve task states and results. At least, that is most common case.

So, the code should be run to put the tasks into patterns and re-create the flow, but logbook should be consulted to load state and results of tasks. To make this as simple as possible creation of the flow should be put into separate function that can be located on service restart (a module level function, or a staticmethod of a class).

Using Names to Associate Tasks with State

The engine gets a flow and flow details, and should reconstruct its own internal state.

A task should be somehow matched with existing TaskDetail(s).

The match should be:

stable if tasks are added or removed;
should not change when service is restarted, upgraded;
should be the same across all server instances in HA setups.

One option is that tasks should be matched with TaskDetail(s) by task name.

This has several implications:

the names of tasks should be unique in flow;
it becomes too hard to change the name of task.

@@ Line 1: / Line 1: @@
-How can we persist the flow? Here is informal description.
+'''Revised on:''' {{REVISIONMONTH1}}/{{REVISIONDAY}}/{{REVISIONYEAR}} by {{REVISIONUSER}}
 == Big Picture ==
-Flow is set of tasks and relations between tasks. When flow is loaded into engine, there is translation step that associates flow with persistent  data in storage. In particular, for each flow there is FlowDetail record, and for each task there is TaskDetail record.
+''How can we persist the flow so that it can be resumed, restarted or rolled-back on engine failure?''
-Task is associated with TaskDetail by '''name'''. This requires that task names are unique within particular flow.
+Since a  flow is a set of tasks and relations between tasks we need to create a model and corresponding information that allows us to persist the ''right'' amount of information to preserve, resume, and rollback a flow on software or hardware failure. To start when a flow is loaded into engine, there is a translation step that associates that flow with persistent data in a storage backend. In particular, for each flow there is a corresponding <code>FlowDetail</code> record, and for each task there is a corresponding <code>TaskDetail</code> record. These form the basic level of information about how a flow will be persisted (see below for there schema).
+To allow for resumption taskflow must be able to re-create the flow and re-connect the links between tasks (and between tasks->task details and so on) in order to revert them or resume those tasks in the correct ordering. For that to happen, typically a factory function should be provided. The fully qualified name of the function '''must''' be saved in <code>FlowDetail</code>; then, we'll be able to import and call this function again, e. g. after service restart. This function will be responsible with recreating a new set of tasks and flows (which may be the same or altered set of tasks, see below for cases).
+'''Note:''' Putting flow creation in a separate function should not be to much burden -- the pattern is already used, look for example at [[http://github.com/openstack/cinder/blob/master/cinder/volume/flows/create_volume/__init__.py#L1685 cinder code]].
+=== Requirements ===
+* A task is associated with <code>TaskDetail</code> by '''name'''. This requires that task names are unique within particular flow.
 === First-Time Loading ===
-When new flow is loaded into engine, there are no persistend data for it yet, so corresponding FlowDetail object should be created, as well as TaskDetail object for each task.
+When new flow is loaded into engine, there is no persisted data for it yet, so a corresponding <code>FlowDetail</code> object will be created, as well as a <code>TaskDetail</code> object for each task and these will be immediately saved into the persistence backend that is configured. If no persistence backend is configured, then as expected nothing will be saved and the tasks will be ran in a non-persistent manner.
+=== Second-Time-Loading ===
+When we resume the flow from a persistent backend (for example, if the flow was interrupted and engine destroyed to save resources or if the service was restarted), we need to re-create the flow. For that, we will call the function that was saved on first-time loading that builds the flow for us (aka; the flow factory function described above).
+Then, we load the flow into the engine. For each task, there must already be a <code>TaskDetail</code> object. We then re-associate tasks with <code>TaskDetail</code> objects (if a matching one exists) by using task names and then the engine resumes. In this scenario task states and results should be the only information needed to resume the flow.
+'''Note:''' It might be useful to allow for a migration 'object' (or list of objects) to be provided as well during this re-association ''process'' so that tasks can be automatically migrated from there older version to there newer versions.
+=== Resumption Scenarios ===
+The upgrade use-case is probably more interesting and challenging.
+While there are several options, the best (and recommended) practice for upgrades should be loading a new (potentially changed) flow/s and ''migrating'' or ''merging'' previous state of older flows into this new (potentially changed) flow/s. This allows for that new/updated flow to take advantage of pre-existing (potentially partially) completed state, or cleanup from that pre-existing state using a well-behaved and well-defined process.
+'''A couple of options are possible here:'''
-To allow farther resumption taskflow must be able to re-create the flow. For that, a function should be provided
+- This is done with the same process as loading a unchanged flow: tasks are associated with saved state by names when the flow is loaded into engine.
-=== Resuming Same Flow ===
+'''Note: ''' Likely though this will not work out-of-the box for all use cases: sometimes, data/task migrations (much like database migrations, but on higher level) will be needed.
-When we resume the flow from database (for example, if flow was interrupted and engine destroyed to save resources or if service was restarted), we need to re-create the flow.
+Let's consider several use-cases.
-TBD.
+==== Task was added ====
-=== Resuming Changed Flow ===
+This is the simplest use case.
-TBD.
+As there is no state for the new task, a new <code>TaskDetail</code> record will be created for it automatically.
-==  Design rationales ==
+==== Task was removed ====
-=== We Need the Flow ===
+Nothing should be done -- flow structure is reloaded from factory function, and removed task is not in it -- so, flow will be ran as if it was not there, and any results it returned if it was completed before will be ignored.
-How do we upgrade? We change the code that creates the flow, then we restart
+==== Task code was changed ====
-the service.
-Then, when flow is restored from storage, what we really want is to load '''new'''
+The task version will have been altered, and we will be able to identify this when the old task data is loaded. To handle this it may be required that each task (or maybe flow) provide a a upgrade function that will get automatically activated when version mismatches occur,  these functions are ran before any task code runs allowing the existing persisted data to be migrated to the newer data before running begins.
-flow, with updated structure and updated tasks, but preserve task
-states and results. At least, that is most common case.
-So, the code should be run to put the tasks into patterns and re-create the
+==== Task was split in two tasks or merged from two (or more) to one task ====
-flow, but logbook should be considered to load state and results of tasks.
-=== Flow Factory ===
+This case is useful when a larger task was split into two tasks. For this to work correctly we might need to allow arbitrary 'migrations' to be ran on the existing task details before they are re-associated with the new tasks/flow. This would allow that 'task data' migration code to translate the old task data into new task data using a set of functions. This is a generalization of the case when task code is changed, but involves more flexibility.
-Creation of the flow should be put into separate function.
+'''Note:''' We would likely need to associate a way here to know which migration has been done, typically this is done via a migration number (possibly per flow?).
-The pattern is already used, look for example at [[https://github.com/openstack/cinder/blob/master/cinder/volume/flows/create_volume/__init__.py#L1650 cinder code]].
+==== Flow structure was changed ====
-The thing is that we should be able to call this function again after service
+If manual links were added or removed from graph, or task requirements were changed, or flow was refactored (task moved into or out of subflows, linear flow was replaced with graph flow, tasks were reordered in linear flow, etc), nothing particular should be done. All task state are loaded as usual.
-restart. For that the simplest way is to save fully qualified name of the
-function with flow details. Then, the function can be imported by saved name
-and called.
-'''Discuss''': should we allow classes with __call__ method in addition to
+==  Design rationales ==
-functions? How do we instantiate them?
-'''Discuss''': can flow factories have any arguments? What are the restrictions
+=== Flow Factory ===
-if they can?
-=== Loading Flow into Engine ===
+How do we upgrade?
-The engine gets a flow and flow details, and should reconstruct its internal
+- We change the code that creates the flow, then we restart the service.
-state.
-A task should be somehow matched with TaskDetails. The match should be:
+Then, when flow is restored from storage, what we really want is to load '''new''' flow, with updated structure and updated tasks, but preserve task states and results. At least, that is most common case.
-* stable if tasks are added or removed;
-* should not change when service is restarted, upgraded;
-* should be the same across all server instances in HA setups.
-One option is that tasks shuld be matched with TaskDetails by task name. This
+So, the code should be run to put the tasks into patterns and re-create the flow, but logbook should be consulted to load state and results of tasks. To make this as simple as possible creation of the flow should be put into separate function that can be located on service restart (a module level function, or a staticmethod of a class).
-has several implications:
-* the names of tasks should be unique in flow;
-* it becomes too hard to change the name of task;
-* this is not cool.
-'''Discuss''': I don't think it's really problems.
+=== Using Names to Associate Tasks with State ===
-Another option is adding some kind of task id that defaults to
+The engine gets a flow and flow details, and should reconstruct its own internal state.
-utils.get_callable_name() but can be specified explicitly if desired.
-'''Discuss''': Alternatives?
+A task should be somehow matched with existing <code>TaskDetail(s)</code>.
-If no TaskDetail exists for task, it should be created and task is put to
+The match should be:
-default state (states.PENDING).
+* stable if tasks are added or removed;
+* should not change when service is restarted, upgraded;
+* should be the same across all server instances in HA setups.
-If TaskDetail for task exists, task state and result should be loaded from it.
-'''Discuss''': What if versions don't match?
+One option is that tasks should be matched with <code>TaskDetail(s)</code> by task name.
-When task states are loaded, the engine can run a flow from the point it
+This has several implications:
-stopped. For ActionEngine, it can just simulate running flow from the start
+* the names of tasks should be unique in flow;
-like taskflow.patterns.LinearFlow does now.
+* it becomes too hard to change the name of task.