Difference between revisions of "StructuredStateManagement"

Revision as of 21:50, 22 April 2013

Summary

Move away from ad-hoc states and state transitions to a more concrete organized structured state management in nova.

What problems does this solve

Increases the [stability, extendability, reliability] of nova.
Makes it easier to [debug, test, understand, verify, review] nova code.
Removes hard to discover state-transition dependencies and interactions with clearly defined state-transition dependencies and interactions.
Ensures state transitions are done reliably and correctly by isolating those transitions to a single place.
Removes the need for periodic tasks to cleanup garbage left by nova's ad-hoc states.
Fixes a variety of problems that previously had piecemeal like patches applied.
Eliminates the inherent fragility of a ad-hoc workflow.
Creates the path for smart resource scheduling.

Issues that would likely not have happened with a better state management system

Examples include:

Varying exceptions
MQ timeouts
DB timeouts
WS call timeouts

Bugs/blueprints that likely would not be needed:

Requirements

https://etherpad.openstack.org/task-system

Discussions

https://etherpad.openstack.org/the-future-of-orch

Plan of record

Create prototype.
Get feedback from summit session.
Get more feedback from email list & heat folks about common library.
Adjust prototype as needed from feedback.
Split prototype into small chunks.
Adjust tests for each small chunks.
Start to submit prototype chunks into http://review.openstack.org (disabling whole/pieces component until ready to turn on?).

Design

Design details

In order to implement of this new orchestration layer the following key concepts must be built into the design from the start.

A set of atomic tasks that can be organized into a workflow.
Task resumption.
Task rollback.
Task tracking.
Resource locking.
Workflow sharding/ownership.
Simplicity (allowing for extension and verifiability).
Tolerant to upgrades.

Atomic tasks

Why it matters

@@ Line 12: / Line 12: @@
 * Fixes a variety of problems that previously had piecemeal like patches applied.
 * Eliminates  the inherent  fragility of a ad-hoc workflow.
+* Creates the path for ''smart'' resource scheduling.
 ==== Issues that would likely not have happened with a better state management system ====

Difference between revisions of "StructuredStateManagement"

Revision as of 21:50, 22 April 2013

Contents

Summary

What problems does this solve

Issues that would likely not have happened with a better state management system

Requirements

Discussions

Plan of record

Design

Design details

Atomic tasks

Why it matters

How it will be addressed

Task resumption

Why it matters

How it will be addressed

Task rollback

Why it matters

How it will be addressed

Task tracking

Why it matters

How it will be addressed

Resource locking

Why it matters

How it will be addressed

Workflow sharding/ownership

Why it matters

How it will be addressed

Simplicity

Why it matters

How it will be addressed

Tolerant to upgrades

Why it matters

How it will be addressed