StructuredStateManagement
Contents
Summary
Move away from ad-hoc states and state transitions to a more concrete organized structured state management in nova.
What problems does this solve
- Increases the [stability/extendability/reliability] of nova.
- Makes it easier to [test, understand, verify, review] nova code.
- Ensures state transitions are done reliably and correctly by isolating those transitions to a single place.
- Removes the need for periodic tasks to cleanup garbage left by novas ad-hoc states.
- Fixes a variety of problems that previously had piecemeal like patches applied.
Issues that would likely not have happened with a better state management system
Examples include:
- Varying exceptions
- MQ timeouts
- DB timeouts
- WS call timeouts
Bugs/blueprints that likely would not be needed:
- https://blueprints.launchpad.net/nova/+spec/compute-instance-cleanup-service
- https://bugs.launchpad.net/nova/+bug/1050979
- https://bugs.launchpad.net/nova/+bug/1061024
- https://bugs.launchpad.net/nova/+bug/1082414
- ...
Requirements
https://etherpad.openstack.org/task-system
Discussions
https://etherpad.openstack.org/the-future-of-orch
Plan of record
- Create prototype.
- Get feedback from summit session.
- Get more feedback from email list & heat folks about common library.
- Adjust prototype as needed from feedback.
- Split prototype into small chunks.
- Adjust tests for each small chunks.
- Start to submit prototype chunks into http://review.openstack.org (disabling whole/pieces component until ready to turn on?).
Design
Design details
In order to implement of this new orchestration layer the following key concepts must be built into the design from the start.
- A set of atomic tasks that can be organized into a workflow.
- Task resumption.
- Task rollback.
- Task tracking.
- Resource locking.
- Workflow sharding/ownership.
- Simplicity (allowing for extension and verifiability).
- Tolerant to upgrades.