Jump to: navigation, search

Difference between revisions of "Trove/Specs/Trove-v1-MySQL-Replication"

(taskmanager.create_replication)
Line 7: Line 7:
 
A. Read Replicas (Slaves)
 
A. Read Replicas (Slaves)
 
# The master can exist before the slave such that the master already contains data
 
# The master can exist before the slave such that the master already contains data
# N Slaves for one master
+
# N slaves can be created for one master
 
# Slaves can be marked read-only (read-only will be default)
 
# Slaves can be marked read-only (read-only will be default)
# A slave can be detached from "replication set" to act as independent site
+
# A slave can be detached from its "replication set" to act as independent site
 
# A pre-existing non-replication site can become the master of a new "replication set"
 
# A pre-existing non-replication site can become the master of a new "replication set"
 
# The health of a slave will be monitor-able
 
# The health of a slave will be monitor-able
  
 
== Design ==
 
== Design ==
 
  
 
=== Trove API ===
 
=== Trove API ===
Line 26: Line 25:
 
     "count": 2,
 
     "count": 2,
 
     "instance": {
 
     "instance": {
       "availability_zone": "us-west-2",
+
       "availability_zone": "{zone}",
       "flavorRef": "7",
+
       "flavorRef": "{flavor}",
 +
      "configuration": "{ref}",
 
       "volume": { "size": 1 }
 
       "volume": { "size": 1 }
 
     }
 
     }
Line 40: Line 40:
 
* <code>id</code> in the resource URI is the id of the master instance to be replicated
 
* <code>id</code> in the resource URI is the id of the master instance to be replicated
 
* <code>count</code> allows multiple slaves to be created from a single snapshot of the master
 
* <code>count</code> allows multiple slaves to be created from a single snapshot of the master
* <code>instance</code> defines the template used for each slave. Certain instance properties (datastore_version, databases, users) will not be supported here
+
* <code>instance</code> defines the template used for each slave. This object will support a subset of the properties supported for a 'trove create' operation.
* <code>instance</code> may reference a configuration group, which would then be used for all the slave instances. If no configuration group is specified, the configuration group of the master will be used.
+
** instance properties for the slave (flavor, volume size, availability zone, configuration group, ...) will be derived from the master unless they are explicitly included in the replicate call
** If the configuration group includes a server_id property, that property will be ignored as new ids will be generated.
+
** datastore properties (datastore_version, databases, users) are not supported for a replicate call and will always be derived from the master
 +
** if a configuration group is specified (either on the master or passed in the replicate call for the slaves), the server_id property if present will be ignored. New server ids will be assigned by the task manager.
  
 
==== Stop Replication ====
 
==== Stop Replication ====
Line 58: Line 59:
 
=== Python-Troveclient ===
 
=== Python-Troveclient ===
  
trove replicate <master instance> <slave count>  --read-only=<boolean>
+
==== New Commands ====
 +
 
 +
The following new commands will be added to create and detach
 +
replication slaves.
 +
 
 +
Create **n** read-only slaves from an existing site:
 +
<code>trove replicate <master instance> <slave count>  --read-only=<boolean></code>
 +
 
 +
A subset of the 'trove create' arguments will be supported to optionally define properties of the slave(s). See prior section for details.
 +
 
 +
Detach one slave from its master:
 +
<code>trove detach_replication <slave instance></code>
 +
 
 +
No additional arguments are required for a 'detach' operation.
 +
 
 +
==== Updated Commands ====
 +
 
 +
The <code>trove show</code> command will be updated to provide information about the replication topology of a given instance.
 +
 
 +
<code>trove show <master instance></code>
 +
 
 +
TODO: Add output
  
trove detach_replication <slave instance>
+
<code>trove show <slave instance></code>
 +
 
 +
TODO: Add output
 +
 
 +
Notes:
 +
* Only immediate links will be included in the 'show' output. (In future iterations it may be necessary to add new commands to view more complex topologies.)
  
 
=== Taskmanager ===
 
=== Taskmanager ===
Line 66: Line 93:
 
The taskmanager will implement 2 API calls:
 
The taskmanager will implement 2 API calls:
  
* create_replicated_instances(master_id, slave_count, flavor, topology, volume_size, availability_zone, nics )
+
* create_replicated_instances(master_id, slave_count, flavor, topology, volume_size, availability_zone, overrides, nics )
 
* detach_replication(slave_instance)
 
* detach_replication(slave_instance)
  
Line 79: Line 106:
 
## execute guestagent.create_replication_slave() on new instance
 
## execute guestagent.create_replication_slave() on new instance
 
## Update instance metadata to add "topology" section for slave
 
## Update instance metadata to add "topology" section for slave
# Update topology of master to include slaves
+
# Update topology of master to list slave ids
 
# delete replication snapshot from Swift
 
# delete replication snapshot from Swift
 
<br/>
 
<br/>
After the Create Replication task has completed, showing the topology of the master will list the newly created slave instances:
+
 
 +
After the Create Replication task has completed, the topology of the master will list the slaves:
 
<pre>
 
<pre>
 
{
 
{
Line 112: Line 140:
 
</pre>
 
</pre>
  
The topology for a slave will include its own properties:
+
The topology for a slave will indicate its relationship to the master:
 
<pre>
 
<pre>
 
   "topology": {
 
   "topology": {
Line 124: Line 152:
 
}
 
}
 
</pre>
 
</pre>
 
Notes:
 
* The currently proposed change set for the topology api supports a topology record for each slave but does not explicitly support a combined topology view of the master. To provide the kind of show output suggested [[Trove/Specs/Trove-v1-MySQL-Replication#Show_Topology|here]] we will update both the master and slave topology records during replication. This is subject to further revision.
 
  
 
==== taskmanager.detach_replication ====
 
==== taskmanager.detach_replication ====
  
Executes guestagent.detach_replication_slave() for the selected instance.
+
Executes guestagent.detach_replication_slave() for the selected instance; removes the topology record for the detached slave; and updates the topology record for the master to remove the now-detached slave from the replicas list.
  
 
=== Trove GuestAgent ===
 
=== Trove GuestAgent ===
Line 140: Line 165:
 
* detach_replication_slave()
 
* detach_replication_slave()
  
replication will be focused around a replication snapshot.  This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.
+
Replication will be focused around a replication snapshot.  This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.
  
 
Each datastore implementation will need to implement these methods.  The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate.  The content of the metadata is specific to the datastore, but will be represented as a JSON object.
 
Each datastore implementation will need to implement these methods.  The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate.  The content of the metadata is specific to the datastore, but will be represented as a JSON object.
 +
 +
Notes:
 +
* In future iterations, trove capabilities may be used to indicate whether a particular data store supports the replicate / detach actions.
  
 
==== Trove Guestagent - MySQL Datastore Implementation ====
 
==== Trove Guestagent - MySQL Datastore Implementation ====

Revision as of 15:23, 30 April 2014

Description

Providing support for the various replication use cases is critical for use of Trove in production. For the first phase implementation of Replication in Trove we will implement the functionality laid out in the Trove V1 Replication Blueprint

Use Case Summary

The following use cases will be addressed by this V1 implementation:

A. Read Replicas (Slaves)

  1. The master can exist before the slave such that the master already contains data
  2. N slaves can be created for one master
  3. Slaves can be marked read-only (read-only will be default)
  4. A slave can be detached from its "replication set" to act as independent site
  5. A pre-existing non-replication site can become the master of a new "replication set"
  6. The health of a slave will be monitor-able

Design

Trove API

Create Slaves

POST /instances/{id}/action

{
  "replicate": {
    "count": 2,
    "instance": {
      "availability_zone": "{zone}",
      "flavorRef": "{flavor}",
      "configuration": "{ref}",
      "volume": { "size": 1 }
    }
    "topology": {
                "slave_of": [{"id": "{id}"}],
                "read_only": true
    }
}

Notes:

  • id in the resource URI is the id of the master instance to be replicated
  • count allows multiple slaves to be created from a single snapshot of the master
  • instance defines the template used for each slave. This object will support a subset of the properties supported for a 'trove create' operation.
    • instance properties for the slave (flavor, volume size, availability zone, configuration group, ...) will be derived from the master unless they are explicitly included in the replicate call
    • datastore properties (datastore_version, databases, users) are not supported for a replicate call and will always be derived from the master
    • if a configuration group is specified (either on the master or passed in the replicate call for the slaves), the server_id property if present will be ignored. New server ids will be assigned by the task manager.

Stop Replication

POST /instances/{id}/topology/action

{
    "detach": {}
}

Notes:

  • id in the resource URI is the id of the slave to be detached from its current topology.

Python-Troveclient

New Commands

The following new commands will be added to create and detach replication slaves.

Create **n** read-only slaves from an existing site: trove replicate <master instance> <slave count> --read-only=<boolean>

A subset of the 'trove create' arguments will be supported to optionally define properties of the slave(s). See prior section for details.

Detach one slave from its master: trove detach_replication <slave instance>

No additional arguments are required for a 'detach' operation.

Updated Commands

The trove show command will be updated to provide information about the replication topology of a given instance.

trove show <master instance>

TODO: Add output

trove show <slave instance>

TODO: Add output

Notes:

  • Only immediate links will be included in the 'show' output. (In future iterations it may be necessary to add new commands to view more complex topologies.)

Taskmanager

The taskmanager will implement 2 API calls:

  • create_replicated_instances(master_id, slave_count, flavor, topology, volume_size, availability_zone, overrides, nics )
  • detach_replication(slave_instance)

taskmanager.create_replication

The Create Replication task will be performed with the following steps:

  1. Execute getReplicationSnapshot() on the master site, receiving "master snapshot results metadata"
  2. N times:
    1. Create trove instance of given flavor, volume size, and any optional instance parameters
    2. generate a unique server_id for the slave.
    3. execute guestagent.create_replication_slave() on new instance
    4. Update instance metadata to add "topology" section for slave
  3. Update topology of master to list slave ids
  4. delete replication snapshot from Swift


After the Create Replication task has completed, the topology of the master will list the slaves:

{
  "topology": {
    "members": [
      {
        "id": "{master-id}",
        "name": "master"
      },
      {
        "id": "{slave1-id}",
        "name": "slave1",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
      {
        "id": "{slave2-id}",
        "name": "slave2",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
    ]
  }
}

The topology for a slave will indicate its relationship to the master:

  "topology": {
      {
        "id": "{slave1-id}",
        "name": "slave1",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
}

taskmanager.detach_replication

Executes guestagent.detach_replication_slave() for the selected instance; removes the topology record for the detached slave; and updates the topology record for the master to remove the now-detached slave from the replicas list.

Trove GuestAgent

There will be 3 new methods added to the guestagent API:

  • get_replication_snapshot()
  • attach_replication_slave()
  • detach_replication_slave()

Replication will be focused around a replication snapshot. This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.

Each datastore implementation will need to implement these methods. The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate. The content of the metadata is specific to the datastore, but will be represented as a JSON object.

Notes:

  • In future iterations, trove capabilities may be used to indicate whether a particular data store supports the replicate / detach actions.

Trove Guestagent - MySQL Datastore Implementation

get_replication_snapshot()

The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift. The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.

{
    "master": {
        "host": "192.168.0.1",
        "port": 3306
    },
    "dataset": {
        "datastore": "mysql",
        "datastore_version": "mysql-5.5",
        "dataset_size": 2,
        "snapshot_href": "http://..."
    },
    "binlog_position": <binlog position>
}
attach_replication_slave()

Injects the copy of the master's data into the selected site, then configures the site to receive replicated updates from the master site.

detach_replication_slave()

Stops the slave from replicating from the master. After the instance has been detached from the master, it is an indepent copy of the master's data, and is a fully functional site on its own.

After a slave is detached the topology for the master will no longer contain the detached slave:

{
  "topology": {
    "members": [
      {
        "id": "{master-id}",
        "name": "master"
      },
      {
        "id": "{slave2-id}",
        "name": "slave2",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
    ]
  }
}

The detached slave (slave1 in this example) will have no topology, as it is now a stand-alone instance.