* Launchpad Entry: Improve VM State Management to constrain state transitions

Summary

This blueprint would constrain the valid state transitions to a limited subset, and ensure that the remaining transitions lead to consistent and deterministic behavior.

Specifically:

  1. Limit the valid operations in each state (for example can only resume a paused instance)
  2. Make some minor changes to state sequence to make the abiove robust
  3. Ensure that long running operations check current state rather than assuming it is unchanged

Rationale

Current checks on valid state transitions are limited to a few cases, leading to multiple opportunities for non-deterministic behavior. In addition some long running tasks can lead to odd behavior – for example a VM in the building state can spend a long time in image download, be terminated, and when the image download completes go ahead and launch the VM.

Design

VM State is recorded in three instance attributes:

"power_state" derived from the hypervisor "vm_state" changed by Nova code generally at the start and end of main actions "task_state" changed by Nova code to reflect transient steps within an action

For example the following shows how these state values are updated during a Create action

Node

power_state

vm_state

task_state

API

Building

Scheduling

Scheduler

Building

Scheduling

Compute

Building

Networking

Building

Block_Device_Mapping

Building

Spawning

Running

Active

The full set of state transitions will be mapped out and provided back to the documentation team. From those already mapped we can make the following Observations:

The initial proposal for valid transition is as follows:

vm_state

task_state

Valid Actions

<Any>

!=None

Terminate

Active

Resize_verify

Terminate, Reboot, Stop, Rebuild, Pause, Suspend, Rescue, Create_Snapshot, Resize, Confirm_Resize, Revert_Resize

Active

None

Terminate, Reboot, Stop, Rebuild, Pause, Suspend, Rescue, Create_Snapshot, Resize

Building

<Any>

Terminate

ReBuilding

<Any>

Terminate

Paused

<Any>

Terminate, Unpause, Rescue

Suspended

<Any>

Terminate, Resume, Rescue

Rescued

<Any>

Terminate, Reboot, Stop, Rebuild, Pause, Suspend, UnRescue

Deleted

<Any>

Terminate

Stopped

<Any>

Terminate, Start

Migrating

<Any>

Terminate

Resizing

<Any>

Terminate

Error

<Any>

Terminate

UI Changes

No changes are required to the UI.

Code Changes

The checks for valid actions will be implemented as a decorator, for example

@check_vm_state("delete")
@scheduler_api.reroute_compute("delete")
def delete(self, context, instance_id):
  """Terminate an instance.""“

Some other changes may be required to ensure that vm_state and task_state are set consistently (for example task_state is currently to None for a short period during Rebuild, and live_migration doesn't update state at all.)

Migration

TBD

Test/Demo Plan

TBD

BoF agenda and discussion

Etherpad from Boston Design Summit


CategorySpec

Wiki: nova-vm-state-management (last edited 2011-11-04 15:52:19 by markmc)