Difference between revisions of "Evacuate"

Latest revision as of 23:31, 17 February 2013

Launchpad Entry: NovaSpec:rebuild-for-ha
Created: 1 Aug 2012
Contributors: Alex Glikson, Oshrit Feder, Pavel Kravchenco

Summary

High availability for VMs minimizes the effect of a nova-compute node failure. Upon failure detection, VMs whose storage is accessible from other nodes (e.g. shared storage) could be rebuilt and restarted on a target node

Release Note

Administrators detecting a compute node failure could evacuate the nodes' VMs to target nodes

Rationale

On commodity hardware, failures are common and should be considered to provide high service level. With VM HA support, administrators can evacuate VMs from a failed node, while keeping the VM characteristics such as identity, volumes, networks and state to ensure VM availability over time

User stories

Administrator wants to evacuate and rebuild VMs from failed nodes

Assumptions

VM to evacuate is down due to node failure, and is in started/powered off state
VMs' storage is accessible from other nodes (e.g. shared storage), if not - rebuild is performed (re-create disk from image)
The administrator selected a valid target node to rebuild the VM on
Post evacuation and rebuild on target node, administrator responsible for any VM inconsistency that might occur during the sudden node failure (e.g. partial disk writes)

Recovery from compute node failure

With several changes, the existing rebuild instance functionality can be extended to support the HA scenario.

As when the administrator detects a compute node failure, all the VMs that were running on it are now down, the newly introduced evacuate REST API can be invoked to evacuate a selected VM to a specified running target compute node.

Evacuate is distinguished from rebuild with a dedicated admin api, as rebuild flushes the VM's disk and is intended to be used when it is desired to restart with a reset disk yet keeping the same identity.

The exact semantics of this operations varies depending on the configuration of the instances and the underlying storage topology. For example, if it is a regular 'ephemeral' instance, invoking will respawn from the same image on another node while retaining the same identity and configuration (e.g. same ID, flavor, IP, attached volumes, etc). For instances running off shared storage (i.e. same instance file accessible on the target host), the VM will be re-created and point to the same instance file while retaining the identity and configuration. For instances booted from volume, VM will be re-created and booted from the same volume.

nova.compute.api will expose an evacuate method. The target compute node receives the evacuation request, and invokes the modified rebuild method (nova.compute.rpcapi) with no image_ref and with with recreate flag set. Rebuild perform several additional tests when the recreate flag is set: 1. Makes sure that VM with the same name does not exists. 2. Validates shared storage (if instance's disk is not on shared storage, image_ref is updated with the image, and the process continue as pure rebuild). In addition, with the flag set, the is VM's record is updated with the new host, and as the VM is new on the host, only create is needed in contrast to destroy and re-create in the rebuild from image scenario. Next, the rebuild flow re-connects volumes and networks, state is ensured and the instance is spawned.

When/if the failed node is back online, further self cleanup is performed in compute.init_host (cleanup stale instances in virt and their network), to ensure that the recovered node is aware of the evacuated VMs and does not re-launch them. Evacuated VMs are being locked while evacuating to ensure single handler as the recovery of the failed node might happen while the evacuation is in progress (e.g. the VM has not yet rebuilt on the target node).

REST API

Admin API: v2/{tenant_id}/servers/{server_id}/action

with {server_id}=the server to evacuate, parameters: action=evacuate, host=target compute node to rebuild the server on

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Related entries

http://wiki.openstack.org/Rebuildforvms

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

Ability to also evacuate instances which are paused, shutdown, suspended etc..
Automatically detecting a capable host to evacuate too, if host is not specified automatically select the best host using the scheduler, else use the specified host. I think the assumption to let the user decide this is not good for user experience. We should let them be able to decide, but not require this I think.

BoF agenda and discussion

@@ Line 1: / Line 1: @@
-__NOTOC__
-* '''Launchpad Entry''': [[NovaSpec]]:rebuild-for-ha
+* '''Launchpad Entry''': NovaSpec:rebuild-for-ha
 * '''Created''': 1 Aug 2012
-* '''Contributors''': Alex Glikson
+* '''Contributors''': Alex Glikson, Oshrit Feder, Pavel Kravchenco
 == Summary ==
 High  availability for VMs minimizes the effect of a nova-compute node failure. Upon failure detection, VMs whose storage is accessible from other nodes (e.g. shared storage) could be rebuilt and restarted on a target node
 == Release Note ==
 Administrators detecting a compute node failure could evacuate the nodes' VMs to target nodes
 == Rationale ==
 On commodity hardware, failures are common and should be considered to provide high service level. With VM HA support, administrators can evacuate VMs from a failed node, while keeping the VM characteristics such as identity, volumes, networks and state to ensure VM availability over time
 == User stories ==
-Administrator wants to evacuate and rebuild VMs from failed nodes
+* Administrator wants to evacuate and rebuild VMs from failed nodes
 == Assumptions ==
+* VM to evacuate is down due to node failure, and is in started/powered off state
+* VMs' storage is accessible from other nodes (e.g. shared storage), if not - rebuild is performed (re-create disk from image)
+* The administrator selected a valid target node to rebuild the VM on
+* Post evacuation and rebuild on target node, administrator responsible for any VM inconsistency that might occur during the sudden node failure (e.g. partial disk writes)
-VM to evacuate is down due to node failure, and is in started/powered off state
+== Recovery from compute node failure ==
-VMs' storage is accessible from other nodes (e.g. shared storage)
+With several changes, the existing rebuild instance functionality can be extended to support the HA scenario.
-The administrator selected a valid target node to rebuild the VM on
-Post evacuation and rebuild on target node, administrator responsible for any VM inconsistency that might occur during the sudden node failure (e.g. partial disk writes)
-== Design ==
-This is just one possible design for this feature (keep that in mind). At its simplest, a server template consists of a core image and a ''metadata map''. The metadata map defines metadata that must be collected during server creation and a list of files (on the server) that must be modified using the defined metadata.
-Here is a simple example: let's assume that the server template has a Linux server with Apache HTTP installed. Apache needs to know the IP address of the server and the directory on the server that contains the HTML files.
-The metadata map would look something like this:
-<pre><nowiki>
-  metadata {
-   IP_ADDRESS;
-   HTML_ROOT : string(1,255) : "/var/www/";
-  }
-  map {
-   /etc/httpd/includes/server.inc
-  }
-</nowiki></pre>
-In this case, the <code><nowiki>metadata</nowiki></code> section defines the metadata components required; the <code><nowiki>map</nowiki></code> section defines the files that must be parsed and have the metadata configured. Within the <code><nowiki>metadata</nowiki></code> section, there are two defined items. <code><nowiki>IP_ADDRESS</nowiki></code> is a predefined (built-in) value, and <code><nowiki>HTML_ROOT</nowiki></code> is the root directory of the web server.
-For <code><nowiki>HTML_ROOT</nowiki></code>, there are three sub-fields: the name, the data type, and (in this case) the default value. The token <code><nowiki>required</nowiki></code> could be used for items that must be supplied by the user.
+As when the administrator detects a compute node failure, all the VMs that were running on it are now down, the newly introduced evacuate REST API can be invoked to evacuate a selected VM to a specified running target compute node.
-When the server is created, a (as-yet-undefined) process would look at the files in the <code><nowiki>map</nowiki></code> section and replace metadata tokens with the defined values. For example, the file might contain:
+Evacuate is distinguished from rebuild with a dedicated admin api, as rebuild flushes the VM's disk and is intended to be used when it is desired to restart with a reset disk yet keeping the same identity.
+The exact semantics of this operations varies depending on the configuration of the instances and the underlying storage topology. For example, if it is a regular 'ephemeral' instance, invoking will respawn from the same image on another node while retaining the same identity and configuration (e.g. same ID, flavor, IP, attached volumes, etc). For instances running off shared storage (i.e. same instance file accessible on the target host), the VM will be re-created and point to the same instance file while retaining the identity and configuration. For instances booted from volume, VM will be re-created and booted from the same volume.
-<pre><nowiki>
+nova.compute.api will expose an evacuate method. The target compute node receives the evacuation request, and invokes the modified rebuild method (nova.compute.rpcapi) with no image_ref and with with recreate flag set.  Rebuild perform several additional tests when the recreate flag is set: 1. Makes sure that VM with the same name does not exists. 2. Validates shared storage (if instance's disk is not on shared storage, image_ref is updated with the image, and the process continue as pure rebuild). In addition, with the flag set, the is VM's record is updated with the new host, and as the VM is new on the host, only create is needed in contrast to destroy and re-create in the rebuild from image scenario. Next, the rebuild flow re-connects volumes and networks, state is ensured and the instance is spawned.
-<VirtualHost {{IP_ADDRESS}}:*>
-  DocumentRoot "{{HTML_ROOT}}";
-</VirtualHost>
-</nowiki></pre>
+When/if the failed node is back online, further self cleanup is performed in compute.init_host (cleanup stale instances in virt and their network), to ensure that the recovered node is aware of the evacuated VMs and does not re-launch them. Evacuated VMs are being locked while evacuating to ensure single handler as the recovery of the failed node might happen while the evacuation is in progress (e.g. the VM has not yet rebuilt on the target node).
-== Implementation ==
+=== REST API ===
+Admin API: v2/{tenant_id}/servers/{server_id}/action
-This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:
+with {server_id}=the server to evacuate, parameters: action=evacuate, host=target compute node to rebuild the server on
-=== UI Changes ===
-Should cover changes required to the UI, or specific UI that is required to implement this
 === Code Changes ===
 Code changes should include an overview of what needs to change, and in some cases even the specific details.
-=== Migration ===
+== Related entries ==
+http://wiki.openstack.org/Rebuildforvms
-Include:
-* data migration, if any
-* redirects from old URLs to new ones, if any
-* how users will be pointed to the new way of doing things, if necessary.
 == Test/Demo Plan ==
 This need not be added or completed until the specification is nearing beta.
 == Unresolved issues ==
 This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.
+* Ability to also evacuate instances which are paused, shutdown, suspended etc..
+* Automatically detecting a capable host to evacuate too, if host is not specified automatically select the best host using the scheduler, else use the specified host. I think the assumption to let the user decide this is not good for user experience. We should let them be able to decide, but not require this I think.
 == BoF agenda and discussion ==
-Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.
 ----
 [[Category:Spec]]