Difference between revisions of "Trove/incremental-backups"

Revision as of 15:52, 20 November 2013

Incremental Backups

Adding support for incremental backups and restores. The main difference between this and the current manual backups will be that these will be deleted when the instance is deleted. This is intended to be run as a backend scheduled task.

https://wiki.openstack.org/wiki/Trove/scheduled-tasks

Storing Metadata

In order to create incremental backups we need to store a little extra information in full backups and incremental ones. There are two options for storing this metadata for the incremental and full backups.

1. Add meta field to the backups table in trove:

ALTER TABLE `backups`
    ADD COLUMN `meta` varchar(1024) DEFAULT NULL;
    ADD COLUMN `parent_id` varchar(36) DEFAULT NULL;
    ADD COLUMN `automated` tinyint(1) DEFAULT 0;

PRO: More secure as the data will be stored only in trove database.
CON: The meta data will need to be passed to and from the guest, making the logic more complex.

2. Use swift metadata http://docs.openstack.org/api/openstack-object-storage/1.0/content/object-metadata.html to store any extra metadata for the backup.

curl –X POST -i \
    -H "X-Auth-Token: my-token" \
    -H "X-Object-Meta-Parent: https://storage.com/v1/XXX/path_to_parent_backup"" \
    -H "X-Object-Meta-LSN: 123456789" \
    https://storage.com/v1/XXX/path_to_incremental_backup

PRO: No database change. No need to pass extra info to the guest as a simple HEAD call on the swift file will get the meta info. The guest can easily update the metadata on the swift objects it writes.
CON: Less secure as users can change these fields (possibly this field could be encrypted with the same key as the backups)

Incremental Workflow

API:

1. A full backup must be made by the user or an automated system. (This backup ID is used as the parent ID) 2. An api call is made to run an incremental backup (two new optional fields):

POST /backups 
{'backup_type': 'incremental',
 'instance_id': 'instance_id',
'parent_id': '<parent_id>'}

3. A entry in the backup table is created. 4. The backup handler looks up the information on the parent backup (error if it doesn't exist) 5. A message is sent to the guest to run the backup

backup_info = {
       'backup_type': 'incremental',
       'meta': <meta_info_from_parent>,
       'parent_location': <parent_location>,
       'parent_id': <parent_id>}

GUEST: 1. backup agent will be changed to accept a type passed in by the api. 2. A new backup type of increamental uses the meta data to preform the restore. For xtrabackup this requires passing a 'lsn' number to the backup command:

/usr/bin/innobackupex  --stream --ibbackup=xtrabackup \
   --incremental --incremental-lsn=%(lsn)s

3. After a successful backup record meta data along with other backup info (location, checksum, etc) For xtrabackup that will include the last LSN number of the backup.

Full backup changes (innobackupex)

Change the current backup process to add the meta data from the backup process.

def check_process():
     # check for 'completed OK!'
     # parse out the LSN number
     # update meta data property on backup
     return True

Restore Workflow (metadata in db)

During the create instance call if a backup ID is passed in restorePoint the api will need to change slightly:

API: 1. Check that the backup file exists and checksum matches. 2. Check if the backup has a parent

   a. Check that parent exists/checksum matches and add to the backup_info object
   b. Check if the parent backup has a parent and repeat

3. Send the backup_info to the guest to do the restore.

GUEST: 1. Read the backup_info and select the correct restore logic by backup_type 2. Preform restore on all parents if they exist. 3. Preform restore on backup 4. Finish guest install

@@ Line 14: / Line 14: @@
 ALTER TABLE `backups`
      ADD COLUMN `meta` varchar(1024) DEFAULT NULL;
+    ADD COLUMN `parent_id` varchar(36) DEFAULT NULL;
+    ADD COLUMN `automated` tinyint(1) DEFAULT 0;
 </pre>
 * PRO: More secure as the data will be stored only in trove database.
-* CON: The meta data will need to be passed to and from the guest, making the logic more conplex.
+* CON: The meta data will need to be passed to and from the guest, making the logic more complex.
 . Use swift metadata http://docs.openstack.org/api/openstack-object-storage/1.0/content/object-metadata.html to store any extra metadata for the backup.
@@ Line 25: / Line 27: @@
      -H "X-Auth-Token: my-token" \
      -H "X-Object-Meta-Parent: https://storage.com/v1/XXX/path_to_parent_backup"" \
+    -H "X-Object-Meta-LSN: 123456789" \
      https://storage.com/v1/XXX/path_to_incremental_backup
 </pre>
@@ Line 36: / Line 39: @@
 . A full backup must be made by the user or an automated system. (This backup ID is used as the parent ID)
-. An api call is made to run an incremental backup:
+. An api call is made to run an incremental backup (two new optional fields):
-     <pre>POST /backups {'backup_type': 'incremental', 'instance_id': 'instance_id', 'parent_id': '<parent_id>'}</pre>
+     <pre>
+POST /backups
+{'backup_type': 'incremental',
+ 'instance_id': 'instance_id',
+'parent_id': '<parent_id>'}
+    </pre>
 . A entry in the backup table is created.
 . The backup handler looks up the information on the parent backup (error if it doesn't exist)
 . A message is sent to the guest to run the backup
-    <pre>backup_info = {'backup_type': 'incremental', 'meta': <meta_info_from_parent>, 'parent_location': <parent_location>, 'parent_id': <parent_id>}</pre>
+    <pre>
+backup_info = {
+       'backup_type': 'incremental',
+       'meta': <meta_info_from_parent>,
+       'parent_location': <parent_location>,
+       'parent_id': <parent_id>}
+  </pre>
 GUEST:
@@ Line 50: / Line 64: @@
     --incremental --incremental-lsn=%(lsn)s
 </pre>
-. After a successful backup.
+. After a successful backup record meta data along with other backup info (location, checksum, etc) For xtrabackup that will include the last LSN number of the backup.
-=== Full backup changes ===
+=== Full backup changes (innobackupex) ===
 Change the current backup process to add the meta data from the backup process.
+<pre>
+def check_process():
+     # check for 'completed OK!'
+     # parse out the LSN number
+     # update meta data property on backup
+     return True
+</pre>
+=== Restore Workflow (metadata in db) ===
-=== Optional Point in Time Restore API ===
+During the create instance call if a backup ID is passed in restorePoint the api will need to change slightly:
-This will require modifying the existing restorePoint object for the create instance call. The api will be expanded to:
+API:
+. Check that the backup file exists and checksum matches.
+. Check if the backup has a parent
+    a. Check that parent exists/checksum matches and add to the backup_info object
+    b. Check if the parent backup has a parent and repeat
+. Send the backup_info to the guest to do the restore.
-<pre>
+GUEST:
-restorePoint = {
+. Read the backup_info and select the correct restore logic by backup_type
-   instance = 'uuid_of_instance'
+. Preform restore on all parents if they exist.
-   time = 'date string'
+. Preform restore on backup
-}
+. Finish guest install
-</pre>