Difference between revisions of "Trove/Specs/Trove-v1-MySQL-Replication"

Revision as of 14:16, 22 April 2014

Description

Providing support for the various replication use cases is critical for use of Trove in production. For the first phase implementation of Replication in Trove we will implement the functionality laid out in the Trove V1 Replication Blueprint

Use Case Summary

The following use cases will be addressed by this V1 implementation:

A. Read Replicas (Slaves)

The master can exist before the slave such that the master already contains data
N Slaves for one master
Slaves can be marked read-only (read-only will be default)
A slave can be detached from "replication set" to act as independent site
A pre-existing non-replication site can become the master of a new "replication set"
The health of a slave will be monitor-able

Design

Trove API

Create Slaves

POST /instances/{id}/replicate

{
    "count": 2,
    "read_only": "True"
}

Stop Replication

POST /instances/{id}/detach

{
    "empty body?"
}

Python-Troveclient

trove replicate <master instance> <slave count> --read-only=<boolean>

trove detach_replication <slave instance>

Taskmanager

The taskmanager will implement 2 API calls:

create_replication(master=<instance>, slave_flavor=<flavor id>, slave_count=<N>, read_only=true)
detach_replication(slave=<instance>)

taskmanager.create_replication

The Create Replication task will be performed with the following steps:

Execute getReplicationSnapshot() on the master site, receiving "master snapshot results metadata"
N times:
1. Create trove instance of given flavor
2. execute guestagent.create_replication_slave() on new instance
3. Update instance metadata to add "topology" section
delete replication snapshot from Swift

After the Create Replication task has completed, the instance object for the master instance will look like:

    Insert master object structure

taskmanager.detach_replication

Executes guestagent.detach_replication_slave() for the selected instance.

Trove GuestAgent

There will be 3 new methods added to the guestagent API:

get_replication_snapshot()
attach_replication_slave()
detach_replication_slave()

replication will be focused around a replication snapshot. This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.

Each datastore implementation will need to implement these methods. The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate. The content of the metadata is specific to the datastore, but will be represented as a JSON object.

Trove Guestagent - MySQL Datastore Implementation

get_replication_snapshot()

The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift. The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.

{
    "master": {
        "host": "192.168.0.1",
        "port": 3306
    },
    "dataset": {
        "datastore": "mysql",
        "datastore_version": "mysql-5.5",
        "dataset_size": 2,
        "snapshot_href": "http://..."
    },
    "binlog_position": <binlog position>
}

attach_replication_slave()

Injects the copy of the master's data into the selected site, then configures the site to receive replicated updates from the master site.

After being attached, the instance object will look like:

	insert slave instance object

detach_replication_slave()

Stops the slave from replicating from the master. After the instance has been detached from the master, it is an indepent copy of the master's data, and is a fully functional site on its own.

After being detached, the instance object will look like:

	insert ex-slave instance object

@@ Line 46: / Line 46: @@
 The taskmanager will implement 2 API calls:
-* createReplication(master=<instance>, slave_count=<N>, read_only=true)
+* create_replication(master=<instance>, slave_flavor=<flavor id>, slave_count=<N>, read_only=true)
-* detachReplication(slave=<instance>)
+* detach_replication(slave=<instance>)
-==== taskmanager.createReplication ====
+==== taskmanager.create_replication ====
 The Create Replication task will be performed with the following steps:
@@ Line 55: / Line 55: @@
 # Execute getReplicationSnapshot() on the master site, receiving "master snapshot results metadata"
 # N times:
-## Create trove instance
+## Create trove instance of given flavor
-## execute createReplicationSlave() on new instance
+## execute guestagent.create_replication_slave() on new instance
 ## Update instance metadata to add "topology" section
 # delete replication snapshot from Swift
@@ Line 66: / Line 66: @@
 </pre>
-and the instance object structure of each slave will look like:
+==== taskmanager.detach_replication ====
-<pre>
-    Insert slave object structure
-</pre>
+Executes guestagent.detach_replication_slave() for the selected instance.
 === Trove GuestAgent ===
@@ Line 77: / Line 74: @@
 There will be 3 new methods added to the guestagent API:
-* getReplicationSnapshot()
+* get_replication_snapshot()
-* attachReplicationSlave()
+* attach_replication_slave()
-* detachReplicationSlave()
+* detach_replication_slave()
-Replication will be focused around a replication snapshot.  This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.
+replication will be focused around a replication snapshot.  This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.
 Each datastore implementation will need to implement these methods.   The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate.  The content of the metadata is specific to the datastore, but will be represented as a JSON object.
@@ Line 87: / Line 84: @@
 ==== Trove Guestagent - MySQL Datastore Implementation ====
-===== getReplicationSnapshot() =====
+===== get_replication_snapshot() =====
 The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift.  The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.
@@ Line 105: / Line 102: @@
      "binlog_position": <binlog position>
 }
+</pre>
+===== attach_replication_slave() =====
+Injects the copy of the master's data into the selected site, then configures the site to receive replicated updates from the master site.
+After being attached, the instance object will look like:
+<pre>
+	insert slave instance object
+</pre>
+===== detach_replication_slave() =====
+Stops the slave from replicating from the master.  After the instance has been detached from the master, it is an indepent copy of the master's data, and is a fully functional site on its own.
+After being detached, the instance object will look like:
+<pre>
+	insert ex-slave instance object
 </pre>