Difference between revisions of "StructuredStateManagementDetails"
(→Combining 'atomic tasks into a workflow) |
(→Design details) |
||
Line 3: | Line 3: | ||
In order to implement of this new orchestration layer the following key concepts must be built into the design from the start. | In order to implement of this new orchestration layer the following key concepts must be built into the design from the start. | ||
− | # | + | # '''Atomic''' task units. |
− | # Combining '''atomic''' | + | # Combining '''atomic''' task units into a workflow. |
− | |||
# Task resumption. | # Task resumption. | ||
# Task rollback. | # Task rollback. | ||
Line 14: | Line 13: | ||
# Tolerant to upgrades. | # Tolerant to upgrades. | ||
− | ==== Atomic task | + | ==== Atomic task units ==== |
===== Why it matters ===== | ===== Why it matters ===== | ||
− | Tasks that are created (either via code or other operation) must be atomic so that the task as a unit can be said to have completed or the task as a unit can be said to have failed. This allows for said task to be rolled back as a unit. It is also useful to be able to be able to accurately track exactly what tasks have been applied to a given workflow, which is inherently useful for correct status tracking (and is directly tied to how resumption is done). | + | Tasks that are created (either via code or other operation) must be atomic so that the task as a unit can be said to have completed or the task as a unit can be said to have failed. This allows for said task to be rolled back as a single unit. It is also useful to be able to be able to accurately track exactly what tasks have been applied to a given workflow, which is inherently useful for correct status tracking (and is directly tied to how resumption is done). |
===== How it will be addressed ===== | ===== How it will be addressed ===== | ||
Line 47: | Line 46: | ||
===== Why it matters ===== | ===== Why it matters ===== | ||
− | A workflow in nova can be typically organized into a series of smaller tasks/states which can be applied as a single unit (and rolled back as a single unit). Since we need to be able to run individual states in a trackable manner a concept of a chain of states (for the nova case) is needed to be able to run and rollback that set of tasks | + | A workflow in nova can be typically organized into a series of smaller tasks/states which can be applied as a single unit (and rolled back as a single unit). Since we need to be able to run individual states in a trackable manner (and we shouldn't duplicate this code in multiple places) a concept of a chain of states (for the nova case) is needed to be able to run and rollback that set of tasks as a group. |
===== How it will be addressed ===== | ===== How it will be addressed ===== | ||
The following was created to address the issue of running (and rolling back) multiple states as a single unit. | The following was created to address the issue of running (and rolling back) multiple states as a single unit. | ||
+ | |||
+ | '''Note''': to start a linear unit is fine (and thats what was created). | ||
<nowiki> | <nowiki> | ||
Line 62: | Line 63: | ||
# The order in which states will run is controlled by this - note its only linear. | # The order in which states will run is controlled by this - note its only linear. | ||
self.states = OrderedDict() | self.states = OrderedDict() | ||
− | self.results = OrderedDict() | + | self.results = OrderedDict() # Useful for accessing the result of a given state after running... |
# A parent set of states that will need to be rolled back if this one fails. | # A parent set of states that will need to be rolled back if this one fails. | ||
self.parents = parents | self.parents = parents |
Revision as of 01:22, 25 April 2013
Contents
Design details
In order to implement of this new orchestration layer the following key concepts must be built into the design from the start.
- Atomic task units.
- Combining atomic task units into a workflow.
- Task resumption.
- Task rollback.
- Task tracking.
- Resource locking.
- Workflow sharding/ownership.
- Simplicity (allowing for extension and verifiability).
- Tolerant to upgrades.
Atomic task units
Why it matters
Tasks that are created (either via code or other operation) must be atomic so that the task as a unit can be said to have completed or the task as a unit can be said to have failed. This allows for said task to be rolled back as a single unit. It is also useful to be able to be able to accurately track exactly what tasks have been applied to a given workflow, which is inherently useful for correct status tracking (and is directly tied to how resumption is done).
How it will be addressed
Tasks which previously were very unorganized in the run_instance path of nova will need to be refactored into clearly defined tasks (likely with an apply() method and a rollback() method). These tasks will be split up so that each task performs a clear single piece of work in an atomic manner (aka not one big task that does many different things) where possible. This will also help make testing of said task easier (since it will have a clear set of inputs and a clear set of expected outputs/side-effects, of which the rollback() method should undo).
For example this could be a task/state baseclass:
class State(base.Base): __metaclass__ = abc.ABCMeta def __init__(self): super(State, self).__init__() def __str__(self): return "State: %s" % (self.__class__.__name__) @abc.abstractmethod def apply(self, context, *args, **kwargs): raise NotImplementedError() def revert(self, context, result, chain, excp, cause): pass
Combining atomic tasks into a workflow
Why it matters
A workflow in nova can be typically organized into a series of smaller tasks/states which can be applied as a single unit (and rolled back as a single unit). Since we need to be able to run individual states in a trackable manner (and we shouldn't duplicate this code in multiple places) a concept of a chain of states (for the nova case) is needed to be able to run and rollback that set of tasks as a group.
How it will be addressed
The following was created to address the issue of running (and rolling back) multiple states as a single unit.
Note: to start a linear unit is fine (and thats what was created).
class StateChain(object): def __init__(self, name, tolerant=False, parents=None): # What states I have done that may need to be undone on failure. self.reversions = [] self.name = name self.tolerant = tolerant # The order in which states will run is controlled by this - note its only linear. self.states = OrderedDict() self.results = OrderedDict() # Useful for accessing the result of a given state after running... # A parent set of states that will need to be rolled back if this one fails. self.parents = parents # This functor is needed to be able to fetch results for states which have already occurred. # # It is required since we need to be able to rollback said state if it has already completed elsewhere # and one of the requirements we have put in place is that one rollback a state is given the result # it returned when its apply method was called. self.result_fetcher = None # This is a default tracker + other listeners which will always be notified when states start/complete/error. # # They are useful for altering some external system that should be notified when a state changes. self.change_tracker = None self.listeners = [] def __setitem__(self, name, performer): self.states[name] = performer def __getitem__(self, name): return self.results[name] def run(self, context, *args, **kwargs): for (name, performer) in self.states.items(): try: self._on_state_start(context, performer, name) # See if we have already ran this... (resumption!) result = None if self.result_fetcher: result = self.result_fetcher(context, name, self) if result is None: result = performer.apply(context, *args, **kwargs) # Keep a pristine copy of the result in the results table # so that if said result is altered by other further states # the one here will not be. # # Note: python is by reference objects, so someone else could screw with this, # which would be bad if we need to rollback and a result we created was modified by someone else... self.results[name] = copy.deepcopy(result) self._on_state_finish(context, performer, name, result) except Exception as ex: with excutils.save_and_reraise_exception(): try: self._on_state_error(context, name, ex) except: pass cause = (name, performer, (args, kwargs)) self.rollback(context, name, self, ex, cause) return self def _on_state_error(self, context, name, ex): if self.change_tracker: self.change_tracker(context, ERRORED, name, self) for i in self.listeners: i.notify(context, ERRORED, name, self, error=ex) def _on_state_start(self, context, performer, name): if self.change_tracker: self.change_tracker(context, STARTING, name, self) for i in self.listeners: i.notify(context, STARTING, name, self) def _on_state_finish(self, context, performer, name, result): # If a future state fails we need to ensure that we # revert the one we just finished. self.reversions.append((name, performer)) if self.change_tracker: self.change_tracker(context, COMPLETED, name, self, result=result.to_dict()) for i in self.listeners: i.notify(context, COMPLETED, name, self, result=result) def rollback(self, context, name, chain=None, ex=None, cause=None): if chain is None: chain = self for (i, (name, performer)) in enumerate(reversed(self.reversions)): try: performer.revert(context, self.results[name], chain, ex, cause) except excp.NovaException: # Ex: WARN: Failed rolling back stage 1 (validate_request) of # chain validation due to nova exception # WARN: Failed rolling back stage 2 (create_db_entry) of # chain init_db_entry due to nova exception msg = _("Failed rolling back stage %s (%s)" " of chain %s due to nova exception.") LOG.warn(msg, (i + 1), performer.name, self.name) if not self.tolerant: # This will log a msg AND re-raise the Nova exception if # the chain does not tolerate exceptions raise except Exception: # Ex: WARN: Failed rolling back stage 1 (validate_request) of # chain validation due to unknown exception # WARN: Failed rolling back stage 2 (create_db_entry) of # chain init_db_entry due to unknown exception msg = _("Failed rolling back stage %s (%s)" " of chain %s, due to unknown exception.") LOG.warn(msg, (i + 1), performer.name, self.name) if not self.tolerant: # Log a msg AND re-raise the generic Exception if the # Chain does not tolerate exceptions raise if self.parents: # Rollback any parents chains for p in self.parents: p.rollback(context, name, chain, ex, cause)