Jump to: navigation, search

Heat/ConvergenceDesignNotesByMike

< Heat
Revision as of 21:46, 28 May 2014 by Mike Spreitzer (talk | contribs) (Workflow Engine or Not)

Convergence Design Notes by Mike

In the Juno design summit etherpad (https://etherpad.openstack.org/p/heat-workflow-vs-convergence) there are several design problems and solutions discussed, with final selection unclear. In this page I offer one opinion. The contentious issues include: to use a workflow engine or not, whether and (if so) how to chunk the work.

Workflow Engine or Not

The desires to support very large stacks (size of 1E6 resources was mentioned), react quickly to new stack operations, and efficiently support a hypothetical new incremental stack update operation appear to have side-tracked the idea of using a workflow engine. Here is my net of the issues. Suppose a workflow engine with a clean interrupt operation: it stops the launch of new actions for the interrupted workflow, and waits for completion of the actions currently in progress. Heat could use such a workflow engine. When some differences between target and observed state are detected, heat would compose a workflow to heal the differences and launch it. If later more differences are detected before that workflow completes, that workflow would be interrupted and then a new workflow composed and launched to handle all the current differences. This would probably require yet another copy of state: in addition to target and observed state there would be another kind of state that is observed state overwritten by the goals of the workflow (if any) currently in progress; let's call that "anticipated state". The kind of difference that causes a new/revised workflow is a difference between anticipated state and target state.

There is a desire to add an incremental stack update operation, which would not take a whole revised template+effective_environment+parameters but rather some description of an incremental change. There is a desire for an efficient implementation of this hypothetical operation, particularly in the case of a large stack and a small change. It might be possible for the implementation to interrupt a workflow in progress and incrementally compute the needed revised workflow, and possibly even --- as an optimization --- detect the special case where there is no intersection between the new delta and the workflow(s) currently in progress and in that case compose and launch an additional independent workflow. But this is going to be pretty complex logic.

The alternative is an approach that does not use a workflow engine. Rather, multiple heat engines conspire to do the actions themselves. This appears preferable to me, although I am not happy with the degree to which this involves duplicating functionality of a workflow system.

Where to Store Observed State

I saw mention of three approaches: (1) store it in the DB, (2) store it in memcached, and (3) read it whenever needed from the authoritative source. Approach (3) was roundly dismissed. Approach (2) raises questions about consistency. It is not clear to me that we have a problem with consistency, that depends on other parts of the design. For now let us assume approach (1), and later revisit this question.

To Chunk or Not

Some of the discussion was around the idea of breaking of stack operation into smallish batches of individual resource operations. There is even a reference to an academic paper on graph partitioning, which could be used to do that breaking into batches. Even with batching (chunking) there remains the problem of doing each operation only after its dependencies are satisfied. With multiple engines there is the problem of avoiding duplicated or inconsistent work (remember the desire for the ability to start working on a new stack operation before the old one is finished). Suppose one large stack operation arrives, is broken into chunks, and they are distributed and begun execution. While that execution is going on, a stack update arrives that adds and removes resources. The set of chunks is now necessarily different; how to coordinate with the chunks in progress?