Cinder/GuestAssistedSnapshotting

= QEMU guest-assisted snapshotting =

Related blueprints

 * https://blueprints.launchpad.net/nova/+spec/qemu-assisted-snapshots
 * https://blueprints.launchpad.net/cinder/+spec/qemu-assisted-snapshots
 * Gerrit topic: https://review.openstack.org/#/q/status:open+branch:master+topic:bp/qemu-assisted-snapshots,n,z

Goals

 * 1)  Add snapshot support for Cinder backing stores which lack internal snapshots (NFS, Gluster, etc.)
 * 2)  Create snapshots w/ quiesced I/O
 * 3)  (Phase 2) Enable snapshotting of all volumes on a VM

Prerequisites

 * QEMU guest agent installed (for quiescing), but will still work if not installed

Overview
Currently, GlusterFS + Cinder does not support snapshots. Snapshot support can be enabled by storing volume data as QCOW2 files on Cinder volumes rather than as flat raw files (as is done today), and leveraging QCOW2's snapshot functionality.

Creation of Snapshot

 * 1) User calls Cinder's snapshot-create API
 * 2) If the volume is detached, Cinder's GlusterFS driver will manipulate the qcow2 files with qemu-img to create a snapshot, skip to below.
 * 3) If the volume is attached, Cinder will:
 * 4)   Create a new snapshot with status 'creating'
 * 5)   Create a new (empty) qcow2 file on the GlusterFS share which references the current image as its backing file
 * 6)   Call Nova's create_volume_snapshot via novaclient and give it this filename and type 'qcow2'
 * 7)   Nova (compute/libvirt): scan VM for volumes matching the supplied volume_id (disk serial)
 * 8)   Nova (compute/libvirt): calls libvirt's createSnapshotXML operation with REUSE_EXT to create a snapshot
 * 9) *   First w/ QUIESCE flag, again w/o QUIESCE if that fails
 * 10) * The new qcow2 file (created by Cinder, populated by libvirt) becomes the active image for the VM
 * 11)   Record information about current qcow2 active file into volume's snapshot info store (this will be a file stored alongside the volume- files, like volume- .info.
 * 12)   Update the snapshot's status to 'available' or 'error' based on the above

Deletion of Snapshot

 * 1) User calls Cinder's snapshot-delete API
 * 2) If the volume is detached, Cinder's GlusterFS driver will manipulate the qcow2 files with qemu-img to merge the snapshot, skip to below.
 * 3) If the volume is attached, Cinder will:
 * 4)   Set snapshot status to 'deleting'
 * 5)   Call Nova's delete_volume_snapshot operation via novaclient with the identifier(*) of the snapshot being merged
 * 6)   Nova will call libvirt's blockPull/blockCommit operations as appropriate to merge the snapshot data
 * 7)   Nova/libvirt will delete the qcow2 file that is no longer needed
 * 8)   Update volume's snapshot info store as needed (if this snapshot was the active one, change active image file)
 * 9)   Update the snapshot's status to 'deleted' or 'error_deleting'

Volume Attach

 * 1) GlusterFS driver's initialize_connection must read the volume's snapshot info store to determine the appropriate filename for the active image
 * 2) * This filename is passed to Nova (as it is today)

Snapshot data format

 * Initial volume filename is still volume-
 * When a snapshot is created, the filename will be volume- .
 * This qcow2 image has a backing file pointer to another volume- [.] file.

Changes required for Cinder

 * Cinder code to create qcow2 snapshots (per-driver code & options)
 * Cinder code to translate/process qcow2 snapshots for operations like upload-to-image, backup_create, clone
 * Cinder code to track active image's filename (qcow2 chain information) and use this to determine filename for initialize_connection
 * Cinder needs to embed novaclient

Changes required for Nova

 * libvirt driver implementation of volume_snapshot
 * compute API support for volume_snapshot   (nova/compute/api.py, rpcapi.py)
 * Nova API changes for volume_snapshot


 * libvirt driver implementation of volume_snapshot_delete
 * compute API support for volume_snapshot_delete   (nova/compute/api.py, rpcapi.py)
 * Nova API changes for volume_snapshot_delete

New create_volume_snapshot call

 * Can create a single snapshot
 * Each volume passed in will be provided with the information { volume_uuid, type : 'qcow2' or 'cinder', path: '/path/to/new/qcow2.img' }
 * type 'qcow2' = handle via libvirt in Nova
 * type 'cinder' = call cinderclient to snapshot a non-qcow2 volume (used if snapshotting multiple volumes at once)
 * Intended to be called by Cinder
 * Needs to be added to novaclient so that Cinder can call it

New delete_volume_snapshot call

 * Deletes a single volume snapshot
 * Takes parameters { volume_uuid, path }
 * Interacts with libvirt, using blockpull or blockcommit operations to merge the snapshot into the qcow2 chain

Cinder
Note: the below Cinder APIs are not required at this time and will likely be dropped.

new volume_actions API "create-snapshot-metadata"

 * Allow creation of a snapshot by providing metadata rather than Cinder creating snapshot. (i.e. it was created by Nova.) Cinder driver snapshot code is not called.
 * Metadata:
 * volume_id
 * Leave snapshot in "creating" status
 * This will be implemented as a volume_action

new snapshot_actions API "finalize-snapshot-metadata"

 * Finalize snapshot creation process, set status to available or failed
 * This will be implemented as a snapshot_action

new snapshot_actions API "snapshot-delete-metadata"

 * Deletes a snapshot without performing any real storage operation

Next Phases
Port Cinder code out of GlusterFS driver as much as possible so that drivers like NFS can use this same scheme.

Related Notes

 * Some examples on libvirt's blockcommit and blockpull -- http://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html
 * More notes on libvirt qcow2 based block operations -- http://kashyapc.fedorapeople.org/virt/lc-2012