Jump to: navigation, search

Nova-vm-state-management

Summary

This blueprint would constrain the valid state transitions to a limited subset, and ensure that the remaining transitions lead to consistent and deterministic behavior.

Specifically:

  1. Limit the valid operations in each state (for example can only resume a paused instance)
  2. Make some minor changes to state sequence to make the abiove robust
  3. Ensure that long running operations check current state rather than assuming it is unchanged

Rationale

Current checks on valid state transitions are limited to a few cases, leading to multiple opportunities for non-deterministic behavior. In addition some long running tasks can lead to odd behavior – for example a VM in the building state can spend a long time in image download, be terminated, and when the image download completes go ahead and launch the VM.

Design

VM State is recorded in three instance attributes:

"power_state" derived from the hypervisor "vm_state" changed by Nova code generally at the start and end of main actions "task_state" changed by Nova code to reflect transient steps within an action

For example the following shows how these state values are updated during a Create action

Node power_state vm_state
API Building
Scheduler Building
Compute Building
Building
Building
Running Active

The full set of state transitions will be mapped out and provided back to the documentation team. From those already mapped we can make the following Observations:

  • Most actions set vm_state and task_state early (in compute/api.py), so in-progress tasks can be determined by task_state != None
  • Most actions clear task_state on completion, so may actions can be checked by a combination of vm_state and task_state = None
  • Always need to leave at least one valid action (terminate)
  • Long running actions (such as image download) should periodically update task_state so users can tell that progress is being made
  • Long running actions should check for and honour state changes (specifcally terminated)
  • The reported state should be a combination of vm_state and task_state

The initial proposal for valid transition is as follows:

vm_state task_state
 !=None
Active Resize_verify
Active None
Building
ReBuilding
Paused
Suspended
Rescued
Deleted
Stopped
Migrating
Resizing
Error

UI Changes

No changes are required to the UI.

Code Changes

The checks for valid actions will be implemented as a decorator, for example


@check_vm_state("delete")
@scheduler_api.reroute_compute("delete")
def delete(self, context, instance_id):
  """Terminate an instance.""“


Some other changes may be required to ensure that vm_state and task_state are set consistently (for example task_state is currently to None for a short period during Rebuild, and live_migration doesn't update state at all.)

Migration

TBD

Test/Demo Plan

TBD

BoF agenda and discussion

Etherpad from Boston Design Summit