Jump to: navigation, search

Nova-vm-state-management

Revision as of 14:52, 12 October 2011 by Philday (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Summary

This blueprint would constrain the valid state transitions to a limited subset, and ensure that the remaining transitions lead to consistent and deterministic behavior.

Specifically:

  1. Limit the valid opertaions in each state (for example can only resume a paused instance)
  2. Make some minor changes to state sequence to make the abiove robust
  3. Ensure that long running opertations check current state rather than assuming it is unchanged

Rationale

Current checks on valid state transitions are limited to a few cases, leading to multiple opportunities for non-deterministic behavior. In addition some long running tasks can lead to odd behavior – for example a VM in the building state can spend a long time in image download, be terminated, and when the image download completes go ahead and launch the VM.

Design

VM State is recorded in three instance attributes:

"power_state" derived from the hypervisor "vm_state" changed by Nova code generally at the start and end of main actions "task_state" changed by Nova code to reflect transient steps within an action

For example the following shows how these state values are updated during a Create action

Node power_state vm_state
API Building
Scheduler Building
Compute Building
Building
Building
Running Active

The full set of state transitions will be mapped out and provided back to the documentaion team. From those already mapped we can make the following Observations:

  • Most actions set vm_state and task_state early (in compute/api.py), so in-progress tasks can be determined by task_state != None
  • Most actions clear task_state on completion, so may actions can be checked by a combination of vm_state and task_state = None
  • Always need to leave at least one valid action (terminate)
  • Long running actions (such as image download) should periodically update task_state so users can tell that progress is being made
  • Long running actions should check for and honour state changes (specifcally terminated)
  • The reported state should be a combination of vm_state and task_state

The initial proposal for valid transition is as follows:

vm_state task_state
 !=None
Active Resize_verify
Active None
Building
ReBuilding
Paused
Suspended
Rescued
Deleted
Stopped
Migrating
Resizing
Error

UI Changes

No changes are required to the UI.

Code Changes

The checks for valid actions will be implemented as a decorator, for example

@check_vm_state("delete") @scheduler_api.reroute_compute("delete") def delete(self, context, instance_id):

 """Terminate an instance.""“

Some other changes may be requied to ensure that vm_state and task_state are set consistently (for exampel task_state is currenlty to None for a short period during Rebuild, and live_migration doesn't update state at all.)

Migration

TBD

Test/Demo Plan

TBD

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.