Trove/volume-data-snapshot-design

Data volume snapshot

Description

This feature is being proposed as the addition backup/restore strategy

slicknik (talk) Can you please clarify this? You mention that it will be an "addition", but nowhere do you mention how a user will be able to specify through the API how to actually take a database backup through a volume snapshot. Or is it meant as an "alternative" rather than an "addition"?

Volumes Snapshots

This introduction provides a high level overview of the two basic resources offered by the OpenStack Block Storage service. The first is Volumes and the second is Snapshots which are derived from Volumes.

slicknik (talk) The BP / spec is supposed to contain relevant information about requirements for the spec, and what changes are proposed to meet the requirements. If you feel that the intended audience might need more background about Volumes, or Snapshots you can link to the Cinder wiki / docs (eg. https://wiki.openstack.org/wiki/Cinder), rather than try and cover all background details in the spec, since it distracts from the purpose of the spec.

Volumes

Volumes are allocated block storage resources that can be attached to instances as secondary storage or they can be used as the root store to boot instances. Volumes are persistent R/W Block Storage devices most commonly attached to the Compute node via iSCSI.

slicknik (talk) See note above

Snapshots

A Snapshot in OpenStack Block Storage is a read-only point in time copy of a Volume. The Snapshot can be created from a Volume that is currently in use (via the use of '--force True') or in an available state. The Snapshot can then be used to create a new volume via create from snapshot.

slicknik (talk) See note above

Backup workflow

The actual flow will be:

ask if instance if it has the volume

slicknik (talk) Who is asking whom, and based on what? Can we get some details here?

prepare database for storage snapshot

slicknik (talk) Once again which components are "preparing" the database? What needs to be done to "prepare" the database? Does the FS need to be quiesced? What API calls need to be made?

snapshot

slicknik (talk) Can a consistent snapshot be guaranteed based on the prepare steps above? Does it depend on the API calls made previously? Do all cinder drivers support and API call to quiesce the FS?

return database in to the normal state

slicknik (talk) What does this entail? Is this even a task or a no-op?

Restore workflow

create a new volume from the given snapshot

slicknik (talk)Again, what API calls are involved here?

swap the volume

slicknik (talk) Swap with what? How does the swap occur? Does the instance come up fully (i.e. prepare call is finished), and then the swap occurs? Or does it happen as part of prepare? If it's the former, what if the user writes data before the swap occurs? If it's the latter, how do you handle failures in prepare?

update backend record

slicknik (talk) Are we proposing to extend the backend record? What new updates need to be made, and to what fields? How do we deal with incremental backups?

delete initial volume

slicknik (talk) Seems extra work to create it just to delete it. Why can't we just boot an instance and attach the restored volume instead?

Recovery process

So, lets say, cinder failed to create the snapshot, for the Trove it's like - no problem, lets mark it as FAILED and thats all.

slicknik (talk) I'm unsure what this section is addressing. Are you trying to detail out areas where failures may occur which we have to handle? If so, I definitely see at least a few cases:

Unable to connect to cinder - (this probably already exists and can be reused)
Able to connect to cinder, but unable to snapshot and instance
Unable to quiesce FS so consistent snapshots are not possible
Able to snapshot, but restore volume from snapshot fails
Restore succeed, but we're unable to "swap-out" volumes.

This is just a quick list and there are probably other error cases that I'm missing - so this needs to be given some more thought.

Justification/Benefits

Justification

Data could be backued in two ways:

Standart backup strategies (innobackupex, nodetoolsnapshot) + Swift container (already implemented).
Snapshot of the attached block storage (not implemented).

Basically, its the another way of backupin' data through standard OpenStack capabilities.

slicknik (talk) Well if it's _just_ another way of doing it, why do it at all? We already have a perfectly good way of doing it today, so what's the benefit in adding something else like this if it is just more code to write / maintain?

Benefits

Generic way to backup the data. This feature is not the datastore-type/version specific. Makes Swift storage optional.

slicknik (talk) These are all good points for the justification. You should expand on these.

Impacts

Changes the behavior to the backups made by Trove, it impacts at already implemented backuping process through native database tools (mysqldump, nodetool, etc.) and the Swift as storage container service. Changes are backward compatible.

slicknik (talk) How are changes backwards compatible? What if I was using innobackupex for backups so far, and now I want to switch to using Volumes? What happens to my existing backups? How can I restore from these? You haven't touched upon any of these scenarios.

Configuration

Configuration parameters are guest specific.

slicknik (talk) What do you mean by "guest specific"? Are they meant to be different on different guest agents? Please clarify

Name	Type	Default	Available variants
backup_agent	String	trove.guestagent.backup.backupagent.SwiftAgent	trove.guestagent.backup.backupagent.CinderAgent
storage_strategy	String	Swift	Cinder

slicknik (talk) What if I have existing backups using Swift Storage Strategy, and then you switch to Cinder? How do we handle that?

Database

No changes

slicknik (talk) How will we be able to distinguish a swift backup from a cinder backup if there are no DB changes to the backup?

Public API

No changes

slicknik (talk) How does one specify which backup to take if there are no API changes?

Internal API

From trove-api to trove-taskamanger

No changes

From trove-taskamanger to trove-guestagent

No changes

slicknik (talk) How does the trove guest know which backup / storage strategy to pick / use if there are no internal API changes?

Guest Agent

Changes are backward compatible. Changes will be available for all datastores. This method of the backuping is generic for the all datastores types/versions.

slicknik (talk) How are changes backwards compatible? What if I was using innobackupex for backups so far, and now I want to switch to using Volumes? What happens to my existing backups? How can I restore from these? You haven't touched upon any of these scenarios.

Trove/volume-data-snapshot-design

Contents

Data volume snapshot

Description

Volumes Snapshots

Volumes

Snapshots

Backup workflow

Restore workflow

Recovery process

Justification/Benefits

Justification

Benefits

Impacts

Configuration

Database

Public API

Internal API

From trove-api to trove-taskamanger

From trove-taskamanger to trove-guestagent

Guest Agent