- Launchpad Entry: Improve VM State Management to constrain state transitions
- Created: 12 Oct 2011
- Contributors: Phil Day (HP Cloud Services)
This blueprint would constrain the valid state transitions to a limited subset, and ensure that the remaining transitions lead to consistent and deterministic behavior.
- Limit the valid operations in each state (for example can only resume a paused instance)
- Make some minor changes to state sequence to make the abiove robust
- Ensure that long running operations check current state rather than assuming it is unchanged
Current checks on valid state transitions are limited to a few cases, leading to multiple opportunities for non-deterministic behavior. In addition some long running tasks can lead to odd behavior – for example a VM in the building state can spend a long time in image download, be terminated, and when the image download completes go ahead and launch the VM.
VM State is recorded in three instance attributes:
"power_state" derived from the hypervisor "vm_state" changed by Nova code generally at the start and end of main actions "task_state" changed by Nova code to reflect transient steps within an action
For example the following shows how these state values are updated during a Create action
The full set of state transitions will be mapped out and provided back to the documentation team. From those already mapped we can make the following Observations:
- Most actions set vm_state and task_state early (in compute/api.py), so in-progress tasks can be determined by task_state != None
- Most actions clear task_state on completion, so may actions can be checked by a combination of vm_state and task_state = None
- Always need to leave at least one valid action (terminate)
- Long running actions (such as image download) should periodically update task_state so users can tell that progress is being made
- Long running actions should check for and honour state changes (specifcally terminated)
- The reported state should be a combination of vm_state and task_state
The initial proposal for valid transition is as follows:
No changes are required to the UI.
The checks for valid actions will be implemented as a decorator, for example
@check_vm_state("delete") @scheduler_api.reroute_compute("delete") def delete(self, context, instance_id): """Terminate an instance.""“
Some other changes may be required to ensure that vm_state and task_state are set consistently (for example task_state is currently to None for a short period during Rebuild, and live_migration doesn't update state at all.)