Jump to: navigation, search

Difference between revisions of "Trove/PointInTimeRecovery"

(Points to consider)
(Points to consider)
Line 111: Line 111:
  
 
= Points to consider =  
 
= Points to consider =  
 +
 +
: Renaming. Possible names: ''Recover from the backup''. ''Instance data recovery from the backup''.
 
:<SlickNik> - For something to be "point in time" recovery, you should be able to recover to any point in time (with reasonable limits on granularity). This doesn't enable that.
 
:<SlickNik> - For something to be "point in time" recovery, you should be able to recover to any point in time (with reasonable limits on granularity). This doesn't enable that.
  

Revision as of 18:18, 8 April 2014

Trove. Point-in-Time recovery.

Introduction

Every once in a while, an event might happen that corrupts a database. We have all made a stupid mistake at least once that had trashed a database. When this happens what do you do? If you do not have a database backup, then you had better own up to the problem you caused and tell your boss that you screwed up. If you do have at least a complete database backup then you most likely will be able to recover the corrupted database, up to the point that you corrupted the data. This article will discuss how to use a point in time restore to recover your databases.
If you google “Point in time recovery” you also could find “Point in time restore”. So, let decide how to call it. Historically, database has a feature called Point in time recovery.

What is a point-in-time recovery?

So what is a point in time recovery? A point in time recovery is restoring a database to a specified date and time. When you have completed a point in time recovery, your database will be in the state it was at the specific date and time you identified when restoring your database. A point in time recovery is a method to recover your database to any point in time since the last database backup.

What does it take to do a point-in-time recovery?

In order to perform a point in time recovery you will need to have an entire series of backups (complete, differential, and transaction log backups) up to and/or beyond the point in time in which you want to recover. If you are missing any backups, or have truncated the transaction log without first performing a transaction log backup, then you will not be able to perform a point in time recovery. At a minimum, you will need a complete backup and all the transaction log backups taken following the complete backup. Optionally if you are taking differential backups, then you will need the complete backup, the last differential backup prior to the corruption, then all the transaction log backups taken following the differential backup.

Description

OpenStack DBaaS Trove is able to perform instance restoration (whole new instance, from scratch) from previously stored backup in remote storage (OpenStack Swift, Amazon AWS S3, etc). From administration/regular user perspective Trove should be able to perform point in time recovery.

Justification/Benefits

Justification

From the user perspective, i'd want to able to perform restoring my data at any time, but now users are able to do it only at provisioning (restoring at provisioning is only the half-baked use case). The actual difference between restore (in terms of Trove) and recovery is when user can perform given operation, restore - at provisioning, recovery - whenever user needs it.

Benefits

Restore gives an ability to spin-up new instance from backup (as mentioned earlier), but the Recovery gives an ability to restore already running instance from backup. For the beginning Trove would be able to recover/restore running instance from full backup.

Impacts

All proposed changes are backward compatible. Feature improves the approach of the backup usage, and extends the restoring API.

Database

There are no expected changes to the database

Configuration

There are no expected changes to the configuration

Public API

New routes will be added. Recovery public API described below.

ReST routes

HTTP method Routes
POST {tenant_id}/instances/{instance_id}/recover or {tenant_id}/instances/{instance_id}/restore
Question. Which route is more appropriate?

Request body

{
   "recovery": {
       "instance": "UUDI", 
       "backup": "UUID", 
   }
}

Response object

{
   "recovery": {
       "id": "UUDI", 
       "name": "instance", 
       "status": "BUILDING", 
       "datastore": "mysql", 
       "recovered_from_backup": "backup_id",
       "point_in_time": "2011-01-22T13:25:27-06:00", 
   }
}

Internal API. Trove taskmanager RPC API

RPC message

RPC method Method parameters
do_instance_recovery instance_id, backup_id

RPC message type

CAST

Internal API. Trove guestagent RPC API

RPC message

RPC method Method parameters
do_recovery
 {
     "backup_info": {
         "id": "backup_id",
         "location": "location",
         "type": "backup_type",
         "checksum": "checksum",
     }
 }

RPC message type

CAST

Guest Agent

All changes made for the agent are backward compatible. Reused restore functionality (restore from full backup).

Points to consider

Renaming. Possible names: Recover from the backup. Instance data recovery from the backup.
<SlickNik> - For something to be "point in time" recovery, you should be able to recover to any point in time (with reasonable limits on granularity). This doesn't enable that.
The answer - for this point of view, point in time is described by the existing backups, so if user has a set of the backups the their create_at time is the only available point in times from which user is able to recover.

<SlickNik> Then this is not "Point in Time" recovery. "Point in time" is an industry standard term, and refers to the ability for the user to restore from _any_ point in time (not from explicit snapshots). For a more detailed explanation about what point in time recovery is, please see the following:


<Denis M.> SlickNik, please suggest the appropirate name for this feature. I'm open for the discussion.

<SlickNik> - This currently doesn't enable anything new over restoring to a new instance (and additionally has the issue that we may overwrite valid data by mistake, making it dangerous).
The answer - yes, this feature re-uses restore functionality, but, as said at previos topics, this feature avoids quota usage, applying backup to the ACTIVE instance is rather faster than provisioning of new with pre-defined data. And it's not dangerous from developers perspective, API is the service that allowed to be used by the endpoint user, so, i want to say that it's up to user what he wants to do with his data.

Long term goals

This feature would be very useful when replication will come. Simple use case (from A section) - join operation, user has two standalone servers and he wants to use instance A as the master node and instance B as the slave node. The most valid way is to create the backup from the master node and then apply it to the slave node and then do joining (specific to datastore).