Jump to: navigation, search

StructuredWorkflowPrimitives

Revision as of 22:26, 10 May 2013 by Harlowja (talk | contribs) (Structured Workflow Primitives)

Structured Workflow Primitives

Rationale

This wiki can be used to document some of the core foundational pieces of a workflow primitive library.

Relevant Links

Reviews

Primitives

Reservations

A reservation would start out as a similar concept to what exists in nova, where a reservation is a claim on a given resource. Whether that resource is virtual or physical does not matter that much but such a mechanism is needed to be able to track said claim and know how to un-claim said resource (very much needed for things like rollback). Typically a reservation has a identifier which can be used to locate more details about the underlying reservation (and likely any sub-reservations).

Jobs

A job is the initial (and any derivative) set of tasks & workflows required to fulfill a reservation (and any sub-reservations). It is how a reservation transitions from a virtual claim to a physical claim on a given set of resources.

Tasks

This would form the underlying workflow component, it could be a simple object likely with apply() and revert() methods. It would perform some action, which revert() could attempt to undo (if applicable).

Workflows

Workflows would be a set of common patterns that order tasks in various ways but would be separated from the task itself. You could imagine that a workflow could be [get up in the morning, take shower, go to work]. Each task there could be independently applied but this would likely not be the correct sequence. A common linear sequence would likely be the correct ordering. This would be one example of a pattern that a set of tasks would go through (aka a linear workflow), one that should not need to have code duplicated to accomplish. Such a set of common patterns that perform said workflow (where tasks are attached to said workflow pattern) would be very useful to have to avoid creating arbitrary and ad-hoc workflows.

Locks

There needs to be a concept of a lock on a given workflow (and later resources) which can be used to guarantee said workflow is only worked on by one entity. There could be room for a ZK based lock, a DB based lock or a single machine lock (useful for dev/test) possibly using the existing nova/oslo file locking functionality.

Job Board

There needs to be a concept of a single owner of a job, this can be used to identify the individual entity working on said job (and subsequent workflows and tasks) at any given moment. One could imagine a job ownership 'service' (similar to the physical concept of a job board), which would be used to post & atomically claim an actionable job from something like the nova-api to another entity. Currently the MQ is used to post and claim actionable pieces of work, but if the concept is generalized there could be a MQ (likely in connection with a DB) backend or a in-memory backend or a zookeeper backend. The other concept that needs to exist (and which each backend can provide) is the concept of a job reclamation and/or reposting. This is needed to be able to detect when jobs have failed and by some mechanism (may or may not be reposted to the job board) having the capability to reclaim jobs when an entity processing them fails.

Extras: Possible other API extensions could also be added to determine the current entity processing a job and its status and depending on the backing ownership service there could be further extensions that allow for manual transfer of failed workflows to other 'entities' (the ZK impl. could likely do this automatically so may not need said extensions).

Task log with metadata

In order to be able to do resumption of tasks there needs to be enough associated history for what tasks/workflows have occurred to be able to have said workflow ownership be resumable. Likely there can be a database (or zookeeper backed) task log with a concept of tasks/workflows that have occurred and for each task/workflow there would be a reference to a description of what occurred (the metadata part). Both the log and the associated would be needed in order to do correct rollback.

Needed semantics

Reserve/configure/acquire (or release): In order to be able to correctly undo resource allocations, for each api/library or system that is integrated with there needs to be semantics in said api/library to be able to first reserve the resource (but not power it on). Then secondly there needs to be a way to configure said resource and finally a acquire semantic (could be called a power it on synonym). If any of those 3 stages fail then there must be a way to destroy said resource (either by a simple destroy() functionality, or via a [poweroff, unconfigure, unreserve] functionality) so that said resource can be released.