Nova/Live Migration

Live Migration for Host Maintenance
For administrators of a cloud there frequently become times when it is necessary to bring a host down for maintenance purposes. When this happens there needs to be a simple way, either through a Command Line Interface or a Graphical User Interface or a programmable Application Programming Interface, to live migrate running instances from the target host to other hosts in the cloud. At minimum this capability needs to be available to a human operator. Future work can address the issues of doing this automatically based upon policy decisions but that is not part of the scope of this document. This capability will require 2 changes:
 * The concept of the operational mode of a host. The host can either be operational (fully functional, capable of supporting/starting instances) or maintenance (not capable of running instances or creating new instances).
 * Automatic migration. When a host goes into maintenance mode all instances are migrated off of that host.

Details
The basic idea here is that an operator selects a host and marks it as being in maintenance mode. This means that the host will be marked in the Nova scheduler as in maintenance mode such that no new instances will be started on this host. Note that this maintenance mode will remain in effect until the host is explicitly put back in service, e.g. new instances will not be created on this host even after it is rebooted.

Next, all instances are live migrated off of the host onto other hosts in the cloud. The Nova scheduler will be called to select a new host for each instance, just as if the instance was being started from scratch. The goal is to use normal scheduler metrics to spread the instances throughout the cloud, not to reschedule all of the instances onto one, overloaded host. Note that to minimize service interruptions the IP addresses associated with all instances must remain the same after the live migrations. The live migrations must be done in a secure manner, at minimum using something like SSL/TLS connections.

OpenStack changes
This capability will require changes to various components inside OpenStack
 * Nova scheduler - updated to know and change the maintenance state of individual hosts
 * Nova API - new APIs will be needed to:
 * Return the maintenance state of a host
 * Put a host into maintenance mode
 * Put a host back into operational mode
 * CLI - new options added to the nova command to:
 * Display the operational status of a host
 * Put a host into maintenance mode
 * Put a host back into operational mode
 * Horizon - new options to:
 * Display the operational status of a host
 * Put a host into maintenance mode
 * Put a host back into operational mode

Open Questions

 * 1) Configuration Management Data Base integration.  Most cloud deployments will utilize some form of 3rd party CMDB.  What kind of integration is desired with such a CMDB and how to integrate with disparate CMDB systems.


 * 1) -- Lifeless -- The scheduler doesn't need to know about this, just stop advertising the host as having spare resources (or even stop advertising it to the scheduler at all). There is already a nova command to entirely quiesce a host: 'nova host-servers-migrate'. All thats needed is to tell the nova compute to stop advertising available resources on the hypervisor and trigger that before nova host-servers-migrate.


 * 1) - PhilDay -- The ability to stop a host from being used by the scheduler (what's describer here as maintenance mode) already exists, you just disable the nova-compute service on that host (and you can give it a short text description of why its been disabled

nova service-disable --reason nova-compute

The command Rob described, nova host-servers-migrate, is implemented entirely in the python nova client - it uses the Nova API to get a list of servers and then calls migrate on them one at a time.