Difference between revisions of "Trove/Specs/Trove-v1-MySQL-Replication"

Revision as of 19:03, 30 April 2014

Description

Providing support for the various replication use cases is critical for use of Trove in production. For the first phase implementation of Replication in Trove we will implement the functionality laid out in the Trove V1 Replication Blueprint

Use Case Summary

The following use cases will be addressed by this V1 implementation:

A. Read Replicas (Slaves)

The master can exist before the slave such that the master already contains data
N slaves can be created for one master
Slaves can be marked read-only (read-only will be default)
A slave can be detached from its "replication set" to act as independent site
A pre-existing non-replication site can become the master of a new "replication set"
The health of a slave will be monitor-able

Design

Trove API

Create Slaves

POST /instances/{id}/action

{
  "replicate": {
    "count": 2,
    "instance": {
      "availability_zone": "{zone}",
      "flavorRef": "{flavor}",
      "configuration": "{ref}",
      "volume": { "size": 1 }
    }
    "topology": {
                "slave_of": [{"id": "{id}"}],
                "read_only": true
    }
}

Notes:

id in the resource URI is the id of the master instance to be replicated
count allows multiple slaves to be created from a single snapshot of the master
instance defines the template used for each slave. This object will support a subset of the properties supported for a 'trove create' operation.
- instance properties for the slave (flavor, volume size, availability zone, configuration group, ...) will be derived from the master unless they are explicitly included in the replicate call
- datastore properties (datastore_version, databases, users) are not supported for a replicate call and will always be derived from the master
- if a configuration group is specified (either on the master or passed in the replicate call for the slaves), the server_id property if present will be ignored. New server ids will be assigned by the task manager.

Stop Replication

POST /instances/{id}/topology/action

{
    "detach": {}
}

Notes:

id in the resource URI is the id of the slave to be detached from its current topology.

Python-Troveclient

New Commands

The following new commands will be added to create and detach replication slaves.

Create N read-only slaves from an existing site:

trove replicate <master instance> <slave count> --read-only=<boolean>

A subset of the 'trove create' arguments will be supported to optionally define properties of the slave(s). See prior section for details.

Detach one slave from its master:

trove detach_replication <slave instance>

No additional arguments are required for a 'detach' operation.

Updated Commands

The trove show command will be updated to provide information about the replication topology of a given instance.

trove show <master instance>

+-------------------+-----------------------------------------
|      Property     |              Value                      
+-------------------+-----------------------------------------
| created           | 2014-04-30T18:00:04
| datastore         | mysql              
| datastore_version | mysql-5.5          
| flavor            | {....} 
| id                | fc318e00-3a6f-4f93-af99-146b44912188 
| name              | master                           
| status            | ACTIVE                              
| topology          | slaves: {u'id: 2b832ab9-af64-4bdf-9ac7-137a010c489c}
| updated           | 2014-04-30T18:00:21                 
| volume            | {u'used': 0.24, u'size': 4}               
|                   | 
+-------------------+-----------------------------------------

trove show <slave instance>

+-------------------+-----------------------------------------
|      Property     |              Value                      
+-------------------+-----------------------------------------
| created           | 2014-04-30T18:00:04
| datastore         | mysql              
| datastore_version | mysql-5.5          
| flavor            | {....} 
| id                | 2b832ab9-af64-4bdf-9ac7-137a010c489c
| name              | slave1                           
| status            | ACTIVE                              
| topology          | {u'slave_of: fc318e00-3a6f-4f93-af99-146b44912188, u'read_only: true}
| updated           | 2014-04-30T18:00:21                 
| volume            | {u'used': 0.24, u'size': 4}               
|                   | 
+-------------------+-----------------------------------------

Notes:

Only immediate links will be included in the 'show' output. (In future iterations it may be necessary to add new commands to view more complex topologies.)
exact rendering of show output is subject to change; content is intended to be representative

Taskmanager

The taskmanager will implement 2 API calls:

create_replicated_instances(master_id, slave_count, flavor, topology, volume_size, availability_zone, overrides, nics )
detach_replication(slave_instance)

taskmanager.create_replication

The Create Replication task will be performed with the following steps:

Execute getReplicationSnapshot() on the master site, receiving "master snapshot results metadata"
N times:
1. Create trove instance of given flavor, volume size, and any optional instance parameters
2. generate a unique server_id for the slave.
3. execute guestagent.create_replication_slave() on new instance
4. Update instance metadata to add "topology" section for slave
Update topology of master to list slave ids
delete replication snapshot from Swift

After the Create Replication task has completed, the topology of the master will list the slaves:

{
  "topology": {
    "members": [
      {
        "id": "{master-id}",
        "name": "master"
      },
      {
        "id": "{slave1-id}",
        "name": "slave1",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
      {
        "id": "{slave2-id}",
        "name": "slave2",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
    ]
  }
}

The topology for a slave will indicate its relationship to the master:

  "topology": {
      {
        "id": "{slave1-id}",
        "name": "slave1",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
}

taskmanager.detach_replication

Executes guestagent.detach_replication_slave() for the selected instance; removes the topology record for the detached slave; and updates the topology record for the master to remove the now-detached slave from the replicas list.

Trove GuestAgent

There will be 3 new methods added to the guestagent API:

get_replication_snapshot()
attach_replication_slave()
detach_replication_slave()

Replication will be focused around a replication snapshot. This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.

Each datastore implementation will need to implement these methods. The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate. The content of the metadata is specific to the datastore, but will be represented as a JSON object.

Notes:

In future iterations, trove capabilities may be used to indicate whether a particular data store supports the replicate / detach actions.

Trove Guestagent - MySQL Datastore Implementation

get_replication_snapshot()

The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift. The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.

{
    "master": {
        "host": "192.168.0.1",
        "port": 3306
    },
    "dataset": {
        "datastore": "mysql",
        "datastore_version": "mysql-5.5",
        "dataset_size": 2,
        "snapshot_href": "http://..."
    },
    "binlog_position": <binlog position>
}

attach_replication_slave()

Injects the copy of the master's data into the selected site, then configures the site to receive replicated updates from the master site.

detach_replication_slave()

Stops the slave from replicating from the master. After the instance has been detached from the master, it is an indepent copy of the master's data, and is a fully functional site on its own.

After a slave is detached the topology for the master will no longer contain the detached slave:

{
  "topology": {
    "members": [
      {
        "id": "{master-id}",
        "name": "master"
      },
      {
        "id": "{slave2-id}",
        "name": "slave2",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
    ]
  }
}

The detached slave (slave1 in this example) will have no topology, as it is now a stand-alone instance.

Feedback

Use Case Summary

1. The master can exist before the slave such that the master already contains data

esp: Once an instance becomes a master can it be downgraded in the same way that a slave can be detached?

3. Slaves can be marked read-only (read-only will be default)

esp: If a read-only slave is detached is there an option to make it read_write?

6. The health of a slave will be monitor-able

esp: We'll probably want to monitor the health of the master too.
esp: Will the mechanism of monitoring be anything more than the heart beat message sent by the agent?

@@ Line 258: / Line 258: @@
 The detached slave (slave1 in this example) will have no topology, as it is now a stand-alone instance.
+=== Feedback ===
+==== Use Case Summary ====
+. The master can exist before the slave such that the master already contains data
+* esp: Once an instance becomes a master can it be downgraded in the same way that a slave can be detached?
+. Slaves can be marked read-only (read-only will be default)
+* esp: If a read-only slave is detached is there an option to make it read_write?
+. The health of a slave will be monitor-able
+* esp: We'll probably want to monitor the health of the master too.
+* esp: Will the mechanism of monitoring be anything more than the heart beat message sent by the agent?

Difference between revisions of "Trove/Specs/Trove-v1-MySQL-Replication"

Revision as of 19:03, 30 April 2014

Contents

Description

Use Case Summary

Design

Trove API

Create Slaves

Stop Replication

Python-Troveclient

New Commands

Updated Commands

Taskmanager

taskmanager.create_replication

taskmanager.detach_replication

Trove GuestAgent

Trove Guestagent - MySQL Datastore Implementation

get_replication_snapshot()

attach_replication_slave()

detach_replication_slave()

Feedback

Use Case Summary