Difference between revisions of "Nova-vm-state-management"
m (Text replace - "__NOTOC__" to "") |
|||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ||
* '''Launchpad Entry''': [https://blueprints.launchpad.net/nova/+spec/nova-vm-state-management Improve VM State Management to constrain state transitions] | * '''Launchpad Entry''': [https://blueprints.launchpad.net/nova/+spec/nova-vm-state-management Improve VM State Management to constrain state transitions] | ||
* '''Created''': 12 Oct 2011 | * '''Created''': 12 Oct 2011 | ||
Line 8: | Line 8: | ||
Specifically: | Specifically: | ||
− | # Limit the valid | + | # Limit the valid operations in each state (for example can only resume a paused instance) |
# Make some minor changes to state sequence to make the abiove robust | # Make some minor changes to state sequence to make the abiove robust | ||
− | # Ensure that long running | + | # Ensure that long running operations check current state rather than assuming it is unchanged |
== Rationale == | == Rationale == | ||
Line 56: | Line 56: | ||
|} | |} | ||
− | The full set of state transitions will be mapped out and provided back to the | + | The full set of state transitions will be mapped out and provided back to the documentation team. |
From those already mapped we can make the following Observations: | From those already mapped we can make the following Observations: | ||
Line 120: | Line 120: | ||
The checks for valid actions will be implemented as a decorator, for example | The checks for valid actions will be implemented as a decorator, for example | ||
+ | |||
+ | <pre><nowiki> | ||
@check_vm_state("delete") | @check_vm_state("delete") | ||
@scheduler_api.reroute_compute("delete") | @scheduler_api.reroute_compute("delete") | ||
def delete(self, context, instance_id): | def delete(self, context, instance_id): | ||
"""Terminate an instance.""“ | """Terminate an instance.""“ | ||
+ | </nowiki></pre> | ||
+ | |||
− | Some other changes may be | + | Some other changes may be required to ensure that vm_state and task_state are set consistently (for example task_state is currently to None for a short period during Rebuild, and live_migration doesn't update state at all.) |
=== Migration === | === Migration === | ||
Line 137: | Line 141: | ||
== BoF agenda and discussion == | == BoF agenda and discussion == | ||
− | + | [http://etherpad.openstack.org/vmstatemachine Etherpad from Boston Design Summit ] | |
---- | ---- | ||
[[Category:Spec]] | [[Category:Spec]] |
Latest revision as of 23:30, 17 February 2013
- Launchpad Entry: Improve VM State Management to constrain state transitions
- Created: 12 Oct 2011
- Contributors: Phil Day (HP Cloud Services)
Contents
Summary
This blueprint would constrain the valid state transitions to a limited subset, and ensure that the remaining transitions lead to consistent and deterministic behavior.
Specifically:
- Limit the valid operations in each state (for example can only resume a paused instance)
- Make some minor changes to state sequence to make the abiove robust
- Ensure that long running operations check current state rather than assuming it is unchanged
Rationale
Current checks on valid state transitions are limited to a few cases, leading to multiple opportunities for non-deterministic behavior. In addition some long running tasks can lead to odd behavior – for example a VM in the building state can spend a long time in image download, be terminated, and when the image download completes go ahead and launch the VM.
Design
VM State is recorded in three instance attributes:
"power_state" derived from the hypervisor "vm_state" changed by Nova code generally at the start and end of main actions "task_state" changed by Nova code to reflect transient steps within an action
For example the following shows how these state values are updated during a Create action
Node | power_state | vm_state |
API | Building | |
Scheduler | Building | |
Compute | Building | |
Building | ||
Building | ||
Running | Active |
The full set of state transitions will be mapped out and provided back to the documentation team. From those already mapped we can make the following Observations:
- Most actions set vm_state and task_state early (in compute/api.py), so in-progress tasks can be determined by task_state != None
- Most actions clear task_state on completion, so may actions can be checked by a combination of vm_state and task_state = None
- Always need to leave at least one valid action (terminate)
- Long running actions (such as image download) should periodically update task_state so users can tell that progress is being made
- Long running actions should check for and honour state changes (specifcally terminated)
- The reported state should be a combination of vm_state and task_state
The initial proposal for valid transition is as follows:
vm_state | task_state |
!=None | |
Active | Resize_verify |
Active | None |
Building | |
ReBuilding | |
Paused | |
Suspended | |
Rescued | |
Deleted | |
Stopped | |
Migrating | |
Resizing | |
Error |
UI Changes
No changes are required to the UI.
Code Changes
The checks for valid actions will be implemented as a decorator, for example
@check_vm_state("delete") @scheduler_api.reroute_compute("delete") def delete(self, context, instance_id): """Terminate an instance.""“
Some other changes may be required to ensure that vm_state and task_state are set consistently (for example task_state is currently to None for a short period during Rebuild, and live_migration doesn't update state at all.)
Migration
TBD
Test/Demo Plan
TBD
BoF agenda and discussion
Etherpad from Boston Design Summit