Difference between revisions of "StructuredStateManagement"
(→Atomic tasks) |
(→Summary) |
||
Line 17: | Line 17: | ||
* Addresses the underlying key point of http://www.slideshare.net/harlowja/nova-states-summit/9 where states will now be fully recovered from on ''cutting''. | * Addresses the underlying key point of http://www.slideshare.net/harlowja/nova-states-summit/9 where states will now be fully recovered from on ''cutting''. | ||
− | + | ==== What problems does this solve in nova (+ the general ones) ==== | |
− | ==== What problems does this solve in nova ==== | ||
* Removes the need for periodic tasks to cleanup ''garbage'' (orphaned instances, orphaned resources...) left by nova's ad-hoc states. | * Removes the need for periodic tasks to cleanup ''garbage'' (orphaned instances, orphaned resources...) left by nova's ad-hoc states. | ||
Line 28: | Line 27: | ||
==== Issues that would likely not have happened with a better state management system ==== | ==== Issues that would likely not have happened with a better state management system ==== | ||
− | + | * https://blueprints.launchpad.net/nova/+spec/compute-instance-cleanup-service | |
− | + | * https://bugs.launchpad.net/nova/+bug/1050979 | |
− | + | * https://bugs.launchpad.net/nova/+bug/1061024 | |
− | + | * https://bugs.launchpad.net/nova/+bug/1082414 | |
− | + | * ... | |
− | |||
=== Connected blueprints === | === Connected blueprints === | ||
Line 43: | Line 41: | ||
=== Connected wikis === | === Connected wikis === | ||
− | + | https://wiki.openstack.org/wiki/Convection | |
=== Requirements === | === Requirements === | ||
Line 57: | Line 55: | ||
# Create prototype | # Create prototype | ||
## Create library and use said library in nova for run_instance api in nova. | ## Create library and use said library in nova for run_instance api in nova. | ||
− | # Get feedback | + | # Get feedback |
## Get feedback from summit session on said discussion and prototype. | ## Get feedback from summit session on said discussion and prototype. | ||
## Get more feedback from email list & heat folks about common library. | ## Get more feedback from email list & heat folks about common library. | ||
− | # Adjust prototype as needed from feedback. | + | # Adjust nova prototype as needed from feedback. |
− | # Split prototype | + | # Split nova prototype into small chunks. |
− | # Adjust tests for each small chunks. | + | # Adjust tests for each small chunks (depending on what it changes). |
# Start to submit chunks into http://review.openstack.org (disabling whole/pieces component until ready to turn on?). | # Start to submit chunks into http://review.openstack.org (disabling whole/pieces component until ready to turn on?). | ||
Revision as of 00:25, 25 April 2013
Contents
- 1 Summary
- 2 Design
Summary
Move away from ad-hoc states and state transitions to a more concrete organized structured state management in the various openstack projects (initial interests are in nova/heat).
What problems does this solve in general
- Increases the [stability, extendability, reliability] of the various openstack projects.
- Makes it easier to [debug, test, understand, verify, review] the projects which have a workflow-like concept.
- Removes hard to discover state-transition dependencies and interactions with clearly defined state-transition dependencies and interactions.
- Ensures state transitions are done reliably and correctly by isolating those transitions to a single place/entity.
- Fixes a variety of problems that previously had piecemeal like patches applied to attempt to solve them (avoiding fixing the larger problem).
- Eliminates the inherent fragility of the current ad-hoc workflows that exist in the openstack projects.
- They are by there ad-hoc nature hard to debug, hard to verify, hard to adjust, hard to understand (just hard in general)...
- Makes it possible to audit & track the state transitions performed on a given resource.
- This kind of functionality has started to appear in nova, but the ad-hoc nature was preserved
- Addresses the underlying key point of http://www.slideshare.net/harlowja/nova-states-summit/9 where states will now be fully recovered from on cutting.
What problems does this solve in nova (+ the general ones)
- Removes the need for periodic tasks to cleanup garbage (orphaned instances, orphaned resources...) left by nova's ad-hoc states.
- Creates the path for smart resource scheduling.
- Makes it possible to do [resizing, live migration] in a more secure and manageable manner.
- Discussion about how this can be done correctly require a intermediary to orchestrate this ownership transfer.
- Makes it possible for nova to have multi-stage booting where an instances and its dependent resources are first reserved, the resources configured, the instance configured, and then finally the instance is powered-on (thus completing the instance provisioning process).
Issues that would likely not have happened with a better state management system
- https://blueprints.launchpad.net/nova/+spec/compute-instance-cleanup-service
- https://bugs.launchpad.net/nova/+bug/1050979
- https://bugs.launchpad.net/nova/+bug/1061024
- https://bugs.launchpad.net/nova/+bug/1082414
- ...
Connected blueprints
Connected wikis
https://wiki.openstack.org/wiki/Convection
Requirements
https://etherpad.openstack.org/task-system
Discussions
https://etherpad.openstack.org/the-future-of-orch
Plan of record
- Create prototype
- Create library and use said library in nova for run_instance api in nova.
- Get feedback
- Get feedback from summit session on said discussion and prototype.
- Get more feedback from email list & heat folks about common library.
- Adjust nova prototype as needed from feedback.
- Split nova prototype into small chunks.
- Adjust tests for each small chunks (depending on what it changes).
- Start to submit chunks into http://review.openstack.org (disabling whole/pieces component until ready to turn on?).
Prototype
https://github.com/Yahoo/NovaOrc
Design
Design details
In order to implement of this new orchestration layer the following key concepts must be built into the design from the start.
- A set of atomic tasks that can be organized into a workflow.
- Task resumption.
- Task rollback.
- Task tracking.
- Resource locking.
- Workflow sharding/ownership.
- Simplicity (allowing for extension and verifiability).
- Tolerant to upgrades.
Atomic tasks
Why it matters
Tasks that are created (either via code or other operation) must be atomic so that the task as a unit can be said to have completed or the task as a unit can be said to have failed. This allows for said task to be rolled back as a unit. It is also useful to be able to be able to accurately track exactly what tasks have been applied to a given workflow, which is inherently useful for correct status tracking (and is directly tied to how resumption is done).
How it will be addressed
A general purpose library will need to have functionality