Jump to: navigation, search

Difference between revisions of "TaskFlow/Patterns and Engines/Persistence"

(started rewrite)
(continued rewriting)
Line 4: Line 4:
  
 
Flow is set of tasks and relations between tasks. When flow is loaded into engine, there is translation step that associates flow with persistent  data in storage. In particular, for each flow there is FlowDetail record, and for each task there is TaskDetail record.
 
Flow is set of tasks and relations between tasks. When flow is loaded into engine, there is translation step that associates flow with persistent  data in storage. In particular, for each flow there is FlowDetail record, and for each task there is TaskDetail record.
 +
 +
To allow farther resumption taskflow must be able to re-create the flow. For that, a factory function should be provided. Fully qualified name of the function should be saved in FlowDetail; then, we'll be able to call this function again, e. g. after service restart.
 +
 +
Moving flow creation to separate function should not be an issue -- the pattern is already used, look for example at [[https://github.com/openstack/cinder/blob/master/cinder/volume/flows/create_volume/__init__.py#L1650 cinder code]].
  
 
Task is associated with TaskDetail by '''name'''. This requires that task names are unique within particular flow.
 
Task is associated with TaskDetail by '''name'''. This requires that task names are unique within particular flow.
Line 11: Line 15:
 
When new flow is loaded into engine, there are no persistend data for it yet, so corresponding FlowDetail object should be created, as well as TaskDetail object for each task.
 
When new flow is loaded into engine, there are no persistend data for it yet, so corresponding FlowDetail object should be created, as well as TaskDetail object for each task.
  
To allow farther resumption taskflow must be able to re-create the flow. For that, a function should be provided
+
=== Resuming Same Flow ===
  
=== Resuming Same Flow ===
+
When we resume the flow from database (for example, if flow was interrupted and engine destroyed to save resources or if service was restarted), we need to re-create the flow. For that, we call function that were saved on first-time loading that builds the flow for us.
  
When we resume the flow from database (for example, if flow was interrupted and engine destroyed to save resources or if service was restarted), we need to re-create the flow.
+
Then, we load the flow into engine. For each task, there must already be TaskDetail object. We associate task with TaskDetail by trask using task names.
  
TBD.
+
Task states and results should be only infromation  needed to resume the flow.
  
 
=== Resuming Changed Flow ===
 
=== Resuming Changed Flow ===
  
TBD.
+
 
  
 
==  Design rationales ==
 
==  Design rationales ==
  
=== We Need the Flow ===
+
=== Flow Factory ===
  
 
How do we upgrade? We change the code that creates the flow, then we restart
 
How do we upgrade? We change the code that creates the flow, then we restart
Line 37: Line 41:
 
flow, but logbook should be considered to load state and results of tasks.
 
flow, but logbook should be considered to load state and results of tasks.
  
=== Flow Factory ===
+
So, creation of the flow should be put into separate function.
 
 
Creation of the flow should be put into separate function.
 
  
The pattern is already used, look for example at [[https://github.com/openstack/cinder/blob/master/cinder/volume/flows/create_volume/__init__.py#L1650 cinder code]].
+
=== Using Names To Associate Tasks ===
 
 
The thing is that we should be able to call this function again after service
 
restart. For that the simplest way is to save fully qualified name of the
 
function with flow details. Then, the function can be imported by saved name
 
and called.
 
 
 
'''Discuss''': should we allow classes with __call__ method in addition to
 
functions? How do we instantiate them?
 
 
 
'''Discuss''': can flow factories have any arguments? What are the restrictions
 
if they can?
 
 
 
=== Loading Flow into Engine ===
 
  
 
The engine gets a flow and flow details, and should reconstruct its internal
 
The engine gets a flow and flow details, and should reconstruct its internal
Line 67: Line 56:
 
has several implications:
 
has several implications:
 
* the names of tasks should be unique in flow;
 
* the names of tasks should be unique in flow;
* it becomes too hard to change the name of task;
+
* it becomes too hard to change the name of task.
* this is not cool.
 
 
 
'''Discuss''': I don't think it's really problems.
 
 
 
Another option is adding some kind of task id that defaults to
 
utils.get_callable_name() but can be specified explicitly if desired.
 
 
 
'''Discuss''': Alternatives?
 
 
 
If no TaskDetail exists for task, it should be created and task is put to
 
default state (states.PENDING).
 
 
 
If TaskDetail for task exists, task state and result should be loaded from it.
 
  
'''Discuss''': What if versions don't match?
+
== Open Questions ==
  
When task states are loaded, the engine can run a flow from the point it
+
Are there any good alternatives to using task names?
stopped. For ActionEngine, it can just simulate running flow from the start
 
like taskflow.patterns.LinearFlow does now.
 

Revision as of 11:21, 25 September 2013

How can we persist the flow? Here is informal description.

Big Picture

Flow is set of tasks and relations between tasks. When flow is loaded into engine, there is translation step that associates flow with persistent data in storage. In particular, for each flow there is FlowDetail record, and for each task there is TaskDetail record.

To allow farther resumption taskflow must be able to re-create the flow. For that, a factory function should be provided. Fully qualified name of the function should be saved in FlowDetail; then, we'll be able to call this function again, e. g. after service restart.

Moving flow creation to separate function should not be an issue -- the pattern is already used, look for example at [cinder code].

Task is associated with TaskDetail by name. This requires that task names are unique within particular flow.

First-Time Loading

When new flow is loaded into engine, there are no persistend data for it yet, so corresponding FlowDetail object should be created, as well as TaskDetail object for each task.

Resuming Same Flow

When we resume the flow from database (for example, if flow was interrupted and engine destroyed to save resources or if service was restarted), we need to re-create the flow. For that, we call function that were saved on first-time loading that builds the flow for us.

Then, we load the flow into engine. For each task, there must already be TaskDetail object. We associate task with TaskDetail by trask using task names.

Task states and results should be only infromation needed to resume the flow.

Resuming Changed Flow

Design rationales

Flow Factory

How do we upgrade? We change the code that creates the flow, then we restart the service.

Then, when flow is restored from storage, what we really want is to load new flow, with updated structure and updated tasks, but preserve task states and results. At least, that is most common case.

So, the code should be run to put the tasks into patterns and re-create the flow, but logbook should be considered to load state and results of tasks.

So, creation of the flow should be put into separate function.

Using Names To Associate Tasks

The engine gets a flow and flow details, and should reconstruct its internal state.

A task should be somehow matched with TaskDetails. The match should be:

  • stable if tasks are added or removed;
  • should not change when service is restarted, upgraded;
  • should be the same across all server instances in HA setups.

One option is that tasks shuld be matched with TaskDetails by task name. This has several implications:

  • the names of tasks should be unique in flow;
  • it becomes too hard to change the name of task.

Open Questions

Are there any good alternatives to using task names?