Upgrade-with-minimal-downtime


 * Launchpad Entry: NovaSpec:upgrade-with-minimal-downtime
 * Created:
 * Contributors:

Summary
Clouds are expected to be always available and involve large numbers of servers. Here we consider how to perform upgrades with minimal disruption.

Goals for the upgrade are:
 * where possible, transparent to the cloud users
 * minimal instance downtime or instance connectivity loss
 * ability to rollback to a pre-upgrade state if things fail
 * ability to upgrade from v2 to v4 without having to do upgrade to v3 first
 * * need to decide what we mean about versions
 * * right now it would be 2012.1 -> 2012.2
 * * what about sprint releases or bug fixes to major versions (http://summit.openstack.org/sessions/view/106)

Release Note
TODO

User stories
Consider the possible different ways to perform the upgrade...

Big bang
To perform the upgrade you could try this approach:
 * Build an update cloud along side your new cloud
 * Get it configured
 * Make your old cloud read-only
 * Copy the state into your new cloud
 * Move to using your new cloud

This approach leads to too much downtime

Rolling upgrade
This approach involves upgrading each component of the system, piece by piece, eventually giving you a cloud running on the new version.

While this is more complex, we should be able to have minimal downtime of each component, and using the reliance built into OpenStack, we should be able to achieve zero downtime, but we may have some actions taking slightly longer than usual.

There are two key was to perform this kind of upgrade:
 * in-place upgrade of each component
 * replacement/side-by-side upgrade of each component

In-place upgrades
In-place upgrades require each service to be down for the duration of the upgrade. It is likely to make rollback harder.

So for the moment I will ignore this approach, except in the case of upgrading the hypervisor. When doing a hypervisor upgrade, you can remove the node from the cloud, after live migrate instances, without affecting the overall availability of the cloud. You would only have a very slightly reduced capacity during the time the node was unavailable.

Side by side upgrades
Side-by-side upgrades involve this procedure for each service:
 * Configure the new worker
 * Turn off the old worker
 * * Allow the message queue or a load balancer to hide this downtime
 * Snapshot/backup the old worker for rollback
 * Copy/move any state to the new worker
 * start up the new worker
 * repeat for all other workers, in an appropriate order

The advantages of this approach appear to be:
 * potentially easier rollback
 * potentially less downtime of a component
 * works well when deploying nova in a VM
 * easier to test as system is in a known state (or VM image)

Assumptions
If we take the Side by side rolling upgrade, here are things we need to assume about OpenStack.

Backwards Compatible Schemas
To enable a rolling upgrade of nova components we need to ensure the communication between all components of the OpenStack work across different versions (as a minimum, two versions back).

Things to consider are:
 * database schema
 * message queue messages
 * notification messages
 * OpenStack API compatibility (for when different zones are at different versions)

For example, to avoid the need for new versions of the code to work with old database versions, we should assume the database will be upgraded first. However, we could upgrade the database last (in a non-backwards compatible way), if all new versions can work with the old database.

Migrate service between hosts using a GUIDs in the host flag and an IP alias
Consider using GUIDs rather than hostnames to identify service in the database and in the message queue names. This may work in the current system by specifying a GUID in the host flag.

Using a GUID, with an associated IP alias, should allow Compute (and similar) workers to be smoothly migrated between two different hosts (and in particular during a side by side upgrade).

Because an old host can be turned off and the new host can be started up with the same identity as the old host, this can minimise the downtime (no need to wait for rpm upgrades to complete). It also enable you to more easily scale in and scale out your cloud on demand, because you can more easily migration workers to different hosts as required.

Live migration
When upgrading a hypervisor, ideally instance should be live migrated to another host, so the host can be upgrade with zero downtime for the instances that were running on that host.

Without the live migration support, the instances will either be lost (terminated), or be suspended during the hypervisor upgrade.

Design
To ensure a smooth upgrade we need to be able to support graceful shutdown of services:

Graceful shutdown of services
We need to ensure that when we stop a service, we can let the service stop getting new messages from the message queue, and complete services any current requests that have not completed.

This will help when switching off an old service before performing an upgrade, or rebooting a host for some maintenance reason. The message queue should ensure the system doesn't loose any requests during the short downtime.

Possible Upgrade Procedures
Here we concentrate on the side by side rolling upgrade of nova components.

Assuming the database is always backwards compatible we should probably upgrade the components in the following order:
 * Update to latest Database Scheme
 * Upgrade MessageQueue or Database if required
 * Upgrade: Scheduler, Glance, Keystone
 * Upgrade: Volume, Network
 * Upgrade: Compute
 * Upgrade: Nova API

We can now look at each nova component on how to minimize the downtime during the upgrade. Please note this has not yet been tested.

nova-compute

 * Create new compute worker
 * configure new compute worker as replacement for old worker
 * * this is done by using the same GUID in host flag as the old worker
 * Gracefully stop old compute service
 * * stop fetching from queue
 * * allow all operations to complete
 * Start new compute service
 * Shut down old machine, now new service is running

nova-scheduler

 * Create new scheduler worker host
 * Gracefully stop old service
 * * stop fetching from queue
 * * allow all operations to complete
 * Start new scheduler service
 * Tidy up old queues in Message Queue
 * * We could use the GUID host flag to avoid this
 * * But looks like the old queues are not used anyway?

nova-api
Similar approach to the scheduler:
 * Start new api host
 * Configure new api service, and start new api service
 * * Note: we assume not changes required to keystone (only needs external end point listed?)
 * Reconfigure load banancer, or move ip alias to new machine
 * Wait for requests to start being served by new API, and the above change to be properly applied
 * Gracefully stop old service (complete all current request)
 * * stop fetching from queue
 * * allow all operations to complete
 * Shutdown old host

dashboard
Assumptions:
 * Database is not important (just stores sessions ?), it can be recreated on the new host
 * Sessions will be lost during the switch over, unless mitigated by a load balancer (using Citrix NetScalar's graceful shut-down, or similar)

Method:
 * Follow the same steps as nova-api

nova-volume
This depends on what storage type you use.

TODO - no yet completed this.

iSCSI
There are a few approaches:
 * terminate instances using iSCSI
 * * just remap the iSCSI target to the new host
 * * volumes are marked as detached, and available again for new instances
 * keep iSCSI target static
 * * ensure it stays the same, use IP alias and a GUID style hostname.
 * * After graceful shutdown of the old service, copy across the volume disk from old VM to the new VM
 * * Question: does hypervisors work OK when iSCSI is unavailable for a little bit of time, can we recover easily?

XenServer Storage Manager
Issues:
 * all state is external, no disks need moving between old and new hosts
 * use guid hostname so the old and new host has the same identity, small downtime as new instance is brought up
 * or remap entries in the database, in the messey way

Overall approach should be similar to compute.

nova-network
This depends on the network model

Flat-model
Method:
 * Use the same method as nova-scheduler
 * Use a GUID on the host flag

VLAN-model
Ideas:
 * Where possible each compute should have it's own network, to limit the scope of any downtime (new style network HA)
 * Look at using Network HA active passive pair to limit the connectivity downtime (using virtual ip for the gateway, so instances don't notice)

glance-api
Method:
 * as nova-api

glance-registry
Assumptions:
 * deploy one on every glance-api node?
 * usual db problems

Method:
 * new api nodes talk to new registry
 * old api nodes talk to old registry
 * probably each api has its own, talking to a central DB

Implementation
TODO

Test/Demo Plan
We need a continuous integration system to check trunk can upgrade from the previous released version.

Unresolved issues
Some things we need to resolve:
 * Getting decisions on what backwards compatibility will be ensured between new and old database scheme and message queue messages.
 * Deciding on what versions will be upgradable (only minor release, any milestone release, any revision, bug fixes to major versions)