Trove/volume-data-snapshot-design

Description

 * This feature is being proposed as the addition backup/restore strategy

slicknik (talk) Can you please clarify this? You mention that it will be an "addition", but nowhere do you mention how a user will be able to specify through the API how to actually take a database backup through a volume snapshot. Or is it meant as an "alternative" rather than an "addition"?

Volumes Snapshots

 * This introduction provides a high level overview of the two basic resources offered by the OpenStack Block Storage service. The first is Volumes and the second is Snapshots which are derived from Volumes.

slicknik (talk) The BP / spec is supposed to contain relevant information about requirements for the spec, and what changes are proposed to meet the requirements. If you feel that the intended audience might need more background about Volumes, or Snapshots you can link to the Cinder wiki / docs (eg. https://wiki.openstack.org/wiki/Cinder), rather than try and cover all background details in the spec, since it distracts from the purpose of the spec.

Volumes

 * Volumes are allocated block storage resources that can be attached to instances as secondary storage or they can be used as the root store to boot instances. Volumes are persistent R/W Block Storage devices most commonly attached to the Compute node via iSCSI.

slicknik (talk) See note above

Snapshots

 * A Snapshot in OpenStack Block Storage is a read-only point in time copy of a Volume. The Snapshot can be created from a Volume that is currently in use (via the use of '--force True') or in an available state. The Snapshot can then be used to create a new volume via create from snapshot.

slicknik (talk) See note above

-

Backup workflow

 * The actual flow will be:

slicknik (talk) Who is asking whom, and based on what? Can we get some details here? slicknik (talk) Once again which components are "preparing" the database? What needs to be done to "prepare" the database? Does the FS need to be quiesced? What API calls need to be made? slicknik (talk) Can a consistent snapshot be guaranteed based on the prepare steps above? Does it depend on the API calls made previously? Do all cinder drivers support and API call to quiesce the FS? slicknik (talk) What does this entail? Is this even a task or a no-op?
 * 1) ask if instance if it has the volume
 * 1) prepare database for storage snapshot
 * 1) snapshot
 * 1) return database in to the normal state

Restore workflow
slicknik (talk)Again, what API calls are involved here? slicknik (talk) Swap with what? How does the swap occur? Does the instance come up fully (i.e. prepare call is finished), and then the swap occurs? Or does it happen as part of prepare? If it's the former, what if the user writes data before the swap occurs? If it's the latter, how do you handle failures in prepare? slicknik (talk) Are we proposing to extend the backend record? What new updates need to be made, and to what fields? How do we deal with incremental backups? slicknik (talk) Seems extra work to create it just to delete it. Why can't we just boot an instance and attach the restored volume instead?
 * 1) create a new volume from the given snapshot
 * 1) swap the volume
 * 1) update backend record
 * 1) delete initial volume

Recovery process

 * So, lets say, cinder failed to create the snapshot, for the Trove it's like - no problem, lets mark it as FAILED and thats all.

slicknik (talk) I'm unsure what this section is addressing. Are you trying to detail out areas where failures may occur which we have to handle? If so, I definitely see at least a few cases: This is just a quick list and there are probably other error cases that I'm missing - so this needs to be given some more thought.
 * Unable to connect to cinder - (this probably already exists and can be reused)
 * Able to connect to cinder, but unable to snapshot and instance
 * Unable to quiesce FS so consistent snapshots are not possible
 * Able to snapshot, but restore volume from snapshot fails
 * Restore succeed, but we're unable to "swap-out" volumes.

Justification

 * Data could be backued in two ways:


 * 1) Standart backup strategies (innobackupex, nodetoolsnapshot) + Swift container (already implemented).
 * 2) Snapshot of the attached block storage (not implemented).
 * Basically, its the another way of backupin' data through standard OpenStack capabilities.

slicknik (talk) Well if it's _just_ another way of doing it, why do it at all? We already have a perfectly good way of doing it today, so what's the benefit in adding something else like this if it is just more code to write / maintain? -

Benefits

 * Generic way to backup the data. This feature is not the datastore-type/version specific. Makes Swift storage optional.

slicknik (talk) These are all good points for the justification. You should expand on these.

Impacts
Changes the behavior to the backups made by Trove, it impacts at already implemented backuping process through native database tools (mysqldump, nodetool, etc.) and the Swift as storage container service. Changes are backward compatible.

slicknik (talk) How are changes backwards compatible? What if I was using innobackupex for backups so far, and now I want to switch to using Volumes? What happens to my existing backups? How can I restore from these? You haven't touched upon any of these scenarios.

-

Configuration

 * Configuration parameters are guest specific.

slicknik (talk) What do you mean by "guest specific"? Are they meant to be different on different guest agents? Please clarify

-

slicknik (talk) What if I have existing backups using Swift Storage Strategy, and then you switch to Cinder? How do we handle that?

Database
No changes

slicknik (talk) How will we be able to distinguish a swift backup from a cinder backup if there are no DB changes to the backup? -

Public API
No changes

slicknik (talk) How does one specify which backup to take if there are no API changes?

Internal API
-

From trove-api to trove-taskamanger
No changes

-

From trove-taskamanger to trove-guestagent
No changes

slicknik (talk) How does the trove guest know which backup / storage strategy to pick / use if there are no internal API changes?

Guest Agent
- Changes are backward compatible. Changes will be available for all datastores. This method of the backuping is generic for the all datastores types/versions.

slicknik (talk) How are changes backwards compatible? What if I was using innobackupex for backups so far, and now I want to switch to using Volumes? What happens to my existing backups? How can I restore from these? You haven't touched upon any of these scenarios. -