StructuredStateManagement

Summary

Move away from ad-hoc states and state transitions to a more concrete organized structured state management in the various openstack projects (initial interests are in nova/heat).

What problems does this solve in general

Increases the [stability, extendability, reliability] of nova/heat.
Makes it easier to [debug, test, understand, verify, review] nova/heat code.
Removes hard to discover state-transition dependencies and interactions with clearly defined state-transition dependencies and interactions.
Ensures state transitions are done reliably and correctly by isolating those transitions to a single place.
Fixes a variety of problems that previously had piecemeal like patches applied.
Eliminates the inherent fragility of a ad-hoc workflow.
- They are hard to debug, hard to verify, hard to adjust, hard to understand (just hard in general)...
Makes it possible to audit & track the state transitions performed on a given resource.

What problems does this solve in nova

Removes the need for periodic tasks to cleanup garbage (orphaned instances, orphaned resources...) left by nova's ad-hoc states.
Creates the path for smart resource scheduling.
Makes it possible to do [resizing, live migration] in a more secure and manageable manner.
- Discussion about how this can be done correctly require a intermediary to orchestrate this ownership transfer.
Makes it possible for nova to have multi-stage booting where an instances and its dependent resources are first reserved, the resources configured, the instance configured, and then finally the instance is powered-on (thus completing the instance provisioning process).
Addresses the underlying key point of http://www.slideshare.net/harlowja/nova-states-summit/9 where states will now be fully recovered from on cutting.

Issues that would likely not have happened with a better state management system

In nova

Plan of record

Create prototype
1. Create library and use said library in nova for run_instance api in nova.
Get feedback others
1. Get feedback from summit session on said discussion and prototype.
2. Get more feedback from email list & heat folks about common library.
Adjust prototype as needed from feedback.
Split prototype+library into small chunks.
Adjust tests for each small chunks.
Start to submit chunks into http://review.openstack.org (disabling whole/pieces component until ready to turn on?).

Prototype

https://github.com/Yahoo/NovaOrc

Design

Design details

In order to implement of this new orchestration layer the following key concepts must be built into the design from the start.

A set of atomic tasks that can be organized into a workflow.
Task resumption.
Task rollback.
Task tracking.
Resource locking.
Workflow sharding/ownership.
Simplicity (allowing for extension and verifiability).
Tolerant to upgrades.

StructuredStateManagement

Contents

Summary

What problems does this solve in general

What problems does this solve in nova

Issues that would likely not have happened with a better state management system

Connected blueprints

Connected wikis

Requirements

Discussions

Plan of record

Prototype

Design

Design details

Atomic tasks

Why it matters

How it will be addressed

Task resumption

Why it matters

How it will be addressed

Task rollback

Why it matters

How it will be addressed

Task tracking

Why it matters

How it will be addressed

Resource locking

Why it matters

How it will be addressed

Workflow sharding/ownership

Why it matters

How it will be addressed

Simplicity

Why it matters

How it will be addressed

Tolerant to upgrades

Why it matters

How it will be addressed