Jump to: navigation, search

Difference between revisions of "Nova/InstanceLevelSnapshots"

(Libvirt Support)
Line 1: Line 1:
 +
There are currently three ways in Nova to create a snapshot of one, some, or all of the volumes attached to a particular instance. But none of these options allow a user to create volume snapshots after I/O is quiesced as a single transaction. This proposal explains how this missing feature could be implemented, thus substantially improving the snapshotting capabilities of Nova.
  
== Instance-Level Snapshots ==
+
= Existing Behaviour =
  
Openstack should have the ability to take a snapshot of a running instance that includes all of its attached volumes. A coordinated snapshot of multiple volumes for backup purposes. The snapshot operation should occur while the instance is in a paused and quiesced state so that each snapshot is both consistent within itself and with respect to its sibling snapshots.
+
These are three ways of creating a snapshot-like thing in Nova:
  
=== Existing Behaviour ===
+
# <code>create_image</code> - takes a snapshot of the root volume and may take snapshots of the attached volumes depending on the volume type of the root volume. I/O is not quiesced.
 +
# <code>create_backup</code> - takes a snapshot of the root volume with options to specify how often to repeat and how many previous snapshots to keep around. I/O is not quiesced.
 +
# <code>os-assisted-snapshot</code> - takes a snapshot of a single cinder volume. The volume is first quiesced before the snapshot is initiated.
  
There are two snapshot operations that exist in Openstack at the moment.
+
= Proposed Changes =
  
# A user can create an image of a running instance using Nova's <tt>createImage</tt> API call. This will create a &quot;template&quot; that can be used to spin up new instances from and is referred to by Glance, regardless of where it is stored. This operation will only capture the contents of the root volume.
+
My general thesis is that I/O should be quiesced in all cases if the underlying driver supports it. Libvirt supports this feature and I would like to extend the existing functionality to take advantage of it.
# A user can create a snapshot of an attached volume through Cinder. This snapshot is a new volume that can be managed just as any other Cinder volume.
 
  
 +
It's not reasonable to change the names or behaviour of the existing public api calls. Instead I would like to create a new snapshot() call in the v3 API.
  
The user must choose between one of the two single-volume calls, thus I propose adding functionality that allows a user to create a snapshot that includes both the root volume and all attached cinder volumes in a single pseudo-atomic operation.
+
We only need a quiesce() call added to the driver and the rest of the implementation will live in the api layer. It was suggested that the conductor would be the correct place for this code. Once implemented, the existing snapshot calls (image, backup, os-assisted) could use the underlying snapshot routines to achieve their expected results. Leaving us with only one set of snapshot-related functions to maintain.
  
=== Timeline ===
+
The new snapshot call would take at least one option: the drives that should be snapshotted:
  
Maybe Icehouse 3 dependent on required libvirt changes. Libvirt may need to expose additional operations for this feature to be properly supported. See below for further details.
+
<pre>snapshot(devices=['vda', 'vdb'])</pre>
 +
Where a value of None implies all volumes.
  
=== API Changes ===
+
This allows the user to snapshot only the root volume if a small bootable image is desired.
  
There are two paths that can be taken and I'm hoping to get consensus on the better path.
+
There will be no exclusion based on volume type, both glance and cinder volumes will be snapshotted respectively. Otherwise we introduce unexpected behaviour that would be confusing to the user and difficult to explain.
  
# Nova already has a command <tt>createImage</tt> for creating an image of an existing instance. This command could be extended to take an additional parameter <tt>all-volumes</tt> that signals the underlying code to capture all attached volumes in addition to the root volume. The semantic here is important, <tt>createImage</tt> is used to create a template image stored in Glance for later reuse. If the primary intent of this new feature is for backup only, then it may not be wise to overlap the two operations in this way. On the other hand, this approach would introduce the least amount of change to the existing API, requiring only modification of an existing command instead of the addition of an entirely new one.
+
The flow will look like:
# If the feature's primary use is for backup purposes, then a new API call may be a better approach, and leave <tt>createImage</tt> untouched. This new call could be called <tt>createBackup</tt> and take as a parameter the name of the instance. Although it introduces a new member to the API reference, it would allow this feature to evolve without introducing regressions in any existing calls. These two calls could share code at some point in the future.
 
  
 +
* call the compute node to quiesce
 +
* call the compute node to snapshot each individual glance drive
 +
* call the volume driver to snapshot each cinder volume
 +
* package the whole thing
  
=== Snapshot Consistency ===
 
  
We should be able to support crash-consistent snapshots without too many requirements on libvirt by simply pausing the VM, requesting snapshots, and resuming. A stretch-goal would be to quiesce the instance to provide filesystem-consistent snapshots. The end goal is to use guest-assisted snapshots for application-consistent snapshots across all volumes.
+
The packaged snapshot could be stored in either Swift or Glance. I think Glance is more appropriate since the resulting collection of snapshots could be used to spin up a new instance. There is a pending proposal for &quot;artifacts&quot; in Glance that would be perfect for this - an instance template artifact would contain metadata that references the volume snapshots for an instance. These references would point to either images stored within Glance or snapshots contained in Cinder. All together this data represents a perfect point-in-time snapshot of an entire virtual machine.
  
=== Volume Naming Scheme ===
+
If <code>create_image</code> and <code>create_backup</code> are updated to use this implementation, then the behaviour will appear unchanged to the user with the exception that I/O was quiesced during the snapshot(s) and they therefore have a more reliable and useful result.
  
We will need a way for the user to tell which Cinder volumes are members of a particular snapshot. I propose we can store the Glance snapshot UUID in the volume snapshot metadata for each Cinder volume.
+
Given this, I think it makes more sense to leave the implementation within the api layer of Nova so that existing functions can share in the implementation - as opposed to moving it into the client.
 
 
=== New Snapshot Logic ===
 
 
 
The basic pseudocode might look like:
 
 
 
<pre>volumes = get_all_attached_volumes()
 
 
 
PAUSE(quiesce=True)
 
 
 
UUID = snapshot_root_volume()
 
 
 
for volume in volumes:
 
    snapshot_volume(volume, UUID)            [volume_api.create_snapshot() via cinder]
 
 
 
wait for snaphots to complete
 
 
 
RESUME()
 
 
 
return response</pre>
 
=== Libvirt Support ===
 
 
 
In the <tt>createImage</tt> path, Nova issues a call to libvirt's <tt>snapshot</tt> routine. This routine performs a pause, quiesce, snapshot, and resume in one operation as viewed from Nova. To implement multi-volume snapshots, each of these sub-actions must be exposed by libvirt so that Nova can control the snapshot flow. In other words, Nova must be able to issue separate <tt>pause</tt>, <tt>quiesce</tt>, <tt>snapshot</tt>, and <tt>resume</tt> commands so that the correct logic can be implemented from the Nova layer.
 
 
 
I still have some pending questions about what is possible in the current version of libvirt.
 
 
 
# Can I request a snapshot for a VM that's already in a <tt>paused</tt> state? I think the answer is &quot;not yet&quot; based on what I've seen so far.
 
# Can I quiesce a VM that's in a paused state without requesting a snapshot? I think the answer is &quot;not yet&quot; based on what I've seen so far.
 
 
 
 
 
Depending on the answers here, libvirt may require modifications before multi-volume snapshots can be implemented in Nova.
 
 
 
=== Snapshot Storage ===
 
 
 
Initially I assumed that the snapshots would be stored next to their parents, whether that be Glance or Cinder. But it might be nice to combine all of the snapshot images into a single OVF file that contains all volumes attached to the instance at the time of snapshot. Additional metadata could be included such as RAM and CPU architecture. The OVF file could then be uploaded to Glance so new instances could be created from it.
 

Revision as of 21:32, 24 February 2014

There are currently three ways in Nova to create a snapshot of one, some, or all of the volumes attached to a particular instance. But none of these options allow a user to create volume snapshots after I/O is quiesced as a single transaction. This proposal explains how this missing feature could be implemented, thus substantially improving the snapshotting capabilities of Nova.

Existing Behaviour

These are three ways of creating a snapshot-like thing in Nova:

  1. create_image - takes a snapshot of the root volume and may take snapshots of the attached volumes depending on the volume type of the root volume. I/O is not quiesced.
  2. create_backup - takes a snapshot of the root volume with options to specify how often to repeat and how many previous snapshots to keep around. I/O is not quiesced.
  3. os-assisted-snapshot - takes a snapshot of a single cinder volume. The volume is first quiesced before the snapshot is initiated.

Proposed Changes

My general thesis is that I/O should be quiesced in all cases if the underlying driver supports it. Libvirt supports this feature and I would like to extend the existing functionality to take advantage of it.

It's not reasonable to change the names or behaviour of the existing public api calls. Instead I would like to create a new snapshot() call in the v3 API.

We only need a quiesce() call added to the driver and the rest of the implementation will live in the api layer. It was suggested that the conductor would be the correct place for this code. Once implemented, the existing snapshot calls (image, backup, os-assisted) could use the underlying snapshot routines to achieve their expected results. Leaving us with only one set of snapshot-related functions to maintain.

The new snapshot call would take at least one option: the drives that should be snapshotted:

snapshot(devices=['vda', 'vdb'])

Where a value of None implies all volumes.

This allows the user to snapshot only the root volume if a small bootable image is desired.

There will be no exclusion based on volume type, both glance and cinder volumes will be snapshotted respectively. Otherwise we introduce unexpected behaviour that would be confusing to the user and difficult to explain.

The flow will look like:

  • call the compute node to quiesce
  • call the compute node to snapshot each individual glance drive
  • call the volume driver to snapshot each cinder volume
  • package the whole thing


The packaged snapshot could be stored in either Swift or Glance. I think Glance is more appropriate since the resulting collection of snapshots could be used to spin up a new instance. There is a pending proposal for "artifacts" in Glance that would be perfect for this - an instance template artifact would contain metadata that references the volume snapshots for an instance. These references would point to either images stored within Glance or snapshots contained in Cinder. All together this data represents a perfect point-in-time snapshot of an entire virtual machine.

If create_image and create_backup are updated to use this implementation, then the behaviour will appear unchanged to the user with the exception that I/O was quiesced during the snapshot(s) and they therefore have a more reliable and useful result.

Given this, I think it makes more sense to leave the implementation within the api layer of Nova so that existing functions can share in the implementation - as opposed to moving it into the client.