Jump to: navigation, search

Difference between revisions of "Trove/Specs/Trove-v1-MySQL-Replication"

m
 
(38 intermediate revisions by 4 users not shown)
Line 7: Line 7:
 
A. Read Replicas (Slaves)
 
A. Read Replicas (Slaves)
 
# The master can exist before the slave such that the master already contains data
 
# The master can exist before the slave such that the master already contains data
# N Slaves for one master
+
# N slaves can be created for one master <small>[[User:Slicknik|slicknik]] ([[User talk:Slicknik|talk]]) * To clarify, the v1 implementation will allow for this but will require N separate create calls. We may optimize this in a later implementation. </small>
 
# Slaves can be marked read-only (read-only will be default)
 
# Slaves can be marked read-only (read-only will be default)
# A slave can be detached from "replication set" to act as independent site
+
# A slave can be detached from its master to act as independent site
# A pre-existing non-replication site can become the master of a new "replication set"
+
# A pre-existing non-replication site can become the master of a new slave
# The health of a slave will be monitor-able
+
# The health of a slave will be monitor-able by third party apps.
  
 
== Design ==
 
== Design ==
  
 +
=== Trove API ===
 +
 +
The REST API will be extended to support:
 +
* creating a new instance as a replication slave of an existing instance
 +
* detaching a slave from its master such that it becomes a stand-alone instance
 +
 +
==== Create Instance (Master) ====
 +
 +
There is no explicit action to create a master: any existing instance can be used as the replication source when creating a new slave.
 +
 +
For reference, here is a sample call to create a MySQL instance.
 +
 +
Request:
 +
<pre>
 +
POST /instances
 +
{
 +
  "instance": {
 +
    "name": "products",
 +
    "datastore": {
 +
      "type": "mysql",
 +
      "version": "5.5"
 +
    },
 +
    "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
 +
    "flavorRef": "7",
 +
    "volume": {
 +
      "size": 1
 +
    }
 +
  }
 +
}
 +
</pre>
  
=== Trove API ===
+
Response:
 +
<pre>
 +
{
 +
  "instance": {
 +
    "status": "BUILD",
 +
    "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998",
 +
    "name": "products",
 +
    "created": "...",
 +
    "updated": "...",
 +
    "links": [{...}],
 +
    "datastore": {
 +
      "type": "mysql",
 +
      "version": "5.5"
 +
    },
 +
    "configuration": {
 +
      "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
 +
      "links": [{...}],
 +
    },
 +
    "flavor": {
 +
      "id": "7",
 +
      "links": [{...}],
 +
    },
 +
    "volume": {
 +
      "size": 1
 +
    }
 +
  }
 +
}
 +
</pre>
 +
 
 +
==== Create Slave ====
 +
 
 +
A replication slave is created as a new instance with a 'slaveOf' reference to an existing instance (which will become the master).
 +
 
 +
Request:
 +
<pre>
 +
POST /instances
 +
{
 +
  "instance": {
 +
    "name": "products-s1",
 +
    "datastore": {
 +
      "type": "mysql",
 +
      "version": "5.5"
 +
    },
 +
    "slaveOf": "dfbbd9ca-b5e1-4028-adb7-f78643e17998",
 +
    "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
 +
    "flavorRef": "7",
 +
    "volume": {
 +
      "size": 1
 +
    }
 +
  }
 +
}
 +
</pre>
 +
 
 +
Response:
 +
<pre>
 +
{
 +
  "instance": {
 +
    "status": "BUILD",
 +
    "id": "061aaf4c-3a57-411e-9df9-2d0f813db859",
 +
    "name": "products-s1",
 +
    "created": "...",
 +
    "updated": "...",
 +
    "links": [{...}],
 +
    "datastore": {
 +
      "type": "mysql",
 +
      "version": "5.5"
 +
    },
 +
    "slaveOf": {
 +
      "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998",
 +
      "links":[{..}],
 +
    }
 +
    "configuration": {
 +
      "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
 +
      "links": [{...}],
 +
    },
 +
    "flavor": {
 +
      "id": "7",
 +
      "links": [{...}],
 +
    },
 +
    "volume": {
 +
      "size": 1
 +
    }
 +
  }
 +
}
 +
</pre>
 +
 
 +
==== Stop Replication ====
 +
 
 +
'''POST''' /instances/{id}/action
 +
<pre>
 +
{
 +
    "detach_replication": {}
 +
}
 +
</pre>
 +
 
 +
Notes:
 +
* <code>id</code> in the resource URI is the id of a replication slave instance
  
 
=== Python-Troveclient ===
 
=== Python-Troveclient ===
 +
 +
==== New Commands ====
 +
 +
* Detach
 +
 +
<code>trove detach_replication <slave instance></code>
 +
 +
No additional arguments are required for a 'detach' operation.
 +
 +
==== Updated Commands ====
 +
 +
* Create
 +
 +
<code>trove create <name> <flavor> --size <volume size> ... --slave_of <masterId></code>
 +
 +
The optional <code>--slave_of</code> argument is used to indicate that the new instance should be configured as a slave of the specified master instance.
 +
 +
* Show
 +
 +
The <code>trove show</code> command will be updated to indicate whether the specified instance instance is a replication  master or slave.
 +
 +
<pre>
 +
trove show <master>
 +
 +
+-------------------+---------------------------------------------+
 +
|      Property    |        Value                              |
 +
+-------------------+---------------------------------------------+
 +
|      created      | 2014-05-27T18:21:57                        |
 +
|    datastore    | mysql                                      |
 +
| datastore_version | mysql-5.5                                  |
 +
|      flavor      | 100                                        |
 +
|        id        | 93832783-0993-48e0-a0ab-7b996818b7cc        |
 +
|        name      | test1                                      |
 +
|      slaves      | 061aaf4c-3a57-411e-9df9-2d0f813db859        |
 +
|      status      | ACTIVE                                      |
 +
|      updated      | 2014-05-27T18:48:05                        |
 +
|      volume      | {u'used': 0.11, u'size': 3}                |
 +
+-------------------+---------------------------------------------+
 +
 +
</pre>
 +
 +
<pre>
 +
trove show <slave>
 +
 +
+-------------------+---------------------------------------------+
 +
|      Property    |        Value                              |
 +
+-------------------+---------------------------------------------+
 +
|      created      | 2014-05-27T18:21:57                        |
 +
|    datastore    | mysql                                      |
 +
| datastore_version | mysql-5.5                                  |
 +
|      flavor      | 100                                        |
 +
|        id        | 93832783-0993-48e0-a0ab-7b996818b7cc        |
 +
|        name      | test1                                      |
 +
|      slaveOf    | dfbbd9ca-b5e1-4028-adb7-f78643e17998        |
 +
|      status      | ACTIVE                                      |
 +
|      updated      | 2014-05-27T18:48:05                        |
 +
|      volume      | {u'used': 0.11, u'size': 3}                |
 +
+-------------------+---------------------------------------------+
 +
</pre>
 +
 +
Notes:
 +
* Only immediate links will be included in the 'show' output. (In future iterations it may be necessary to add new commands to view more complex topologies.)
 +
* Exact rendering of show output is subject to change; content is intended to be representative.
  
 
=== Taskmanager ===
 
=== Taskmanager ===
Line 24: Line 213:
 
The taskmanager will implement 2 API calls:
 
The taskmanager will implement 2 API calls:
  
* createReplication(master=<instance>, slave_count=<N>)
+
* create_instance will be updated to support the additional 'slaveOf' argument
* detachReplication(slave=<instance>)
+
* detach_replication(slave_instance)
  
==== taskmanage.createReplication ====
+
==== taskmanager.create_instance ====
  
The Create Replication task will be performed with the following steps:
+
The create instance task will be updated to handle creating a slave. When a master instance is specified (via the slave_of parameter):
  
# Execute getReplicationSnapshot() on the master site, receiving "master snapshot results metadata"
+
# execute get_replication_master_snapshot() on the master site, receiving "master snapshot results metadata"
# N times:
+
# uses the master snapshot to create a new instance with a copy of the master's data (via restore functionality)
## Create trove instance
+
# execute guestagent.attach_replication_slave() on new instance
## execute createReplicationSlave() on new instance
 
 
# delete replication snapshot from Swift
 
# delete replication snapshot from Swift
 +
<br/>
 +
 +
==== taskmanager.detach_replication ====
 +
 +
Executes guestagent.detach_replication_slave() for the selected instance; removed the slaveOf reference from the instance record.
  
 
=== Trove GuestAgent ===
 
=== Trove GuestAgent ===
  
There will be 3 new methods added to the guestagent API:
+
There will be 4 new methods added to the guestagent API:
  
* getReplicationSnapshot()
+
* get_replication_snapshot()
* createReplicationSlave()
+
* attach_replication_slave()
* detachReplicationSlave()
+
* detach_replication_slave()
 +
* demote_replication_master()
  
 
Replication will be focused around a replication snapshot.  This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.
 
Replication will be focused around a replication snapshot.  This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.
  
Each datastore implementation will need to implement these methods.  The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate.  The content of the metadata is specific to the datastore, but will be stored as a JSON object.
+
Each datastore implementation will need to implement these methods.  The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate.  The content of the metadata is specific to the datastore, but will be represented as a JSON object.
 +
 
 +
Notes:
 +
* In future iterations, trove capabilities may be used to indicate whether a particular data store supports the replicate / detach actions.
  
 
==== Trove Guestagent - MySQL Datastore Implementation ====
 
==== Trove Guestagent - MySQL Datastore Implementation ====
  
===== getReplicationSnapshot() =====
+
===== get_replication_snapshot() =====
  
 
The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift.  The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.
 
The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift.  The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.
  
 +
<pre>
 
{
 
{
 
     "master": {
 
     "master": {
Line 63: Line 261:
 
         "datastore": "mysql",
 
         "datastore": "mysql",
 
         "datastore_version": "mysql-5.5",
 
         "datastore_version": "mysql-5.5",
         "dataset_size": 2
+
         "dataset_size": 2,
         "snapshot_href": "http://...",
+
         "snapshot_href": "http://..."
 
     },
 
     },
 
     "binlog_position": <binlog position>
 
     "binlog_position": <binlog position>
 
}
 
}
 +
</pre>
 +
 +
===== attach_replication_slave() =====
 +
 +
Configures the site to receive replicated updates from the master site.
 +
 +
===== detach_replication_slave() =====
 +
 +
Stops the slave from replicating from the master.  After the instance has been detached from the master, it is an indepent copy of the master's data, and is a fully functional site on its own.
 +
 +
After a slave is detached the topology for the master will no longer contain the detached slave:
 +
 +
<pre>
 +
{
 +
  "topology": {
 +
    "members": [
 +
      {
 +
        "id": "{master-id}",
 +
        "name": "master"
 +
      },
 +
      {
 +
        "id": "{slave2-id}",
 +
        "name": "slave2",
 +
        "mysql": {
 +
          "slave_of": [{"id": "{master-id}"}],
 +
          "read_only": true
 +
        }
 +
      }
 +
    ]
 +
  }
 +
}
 +
</pre>
 +
 +
The detached slave (slave1 in this example) will have no topology, as it is now a stand-alone instance.
 +
 +
===== demote_replication_master() =====
 +
Returns the site to its pre-replication state.  For mysql, this will involve turning off bin-logging and removing associated logs.
 +
 +
==== Trove Guestagent - Replication Status ====
 +
 +
The trove guest-agent will reflect the state of replication via the guest heartbeat.  In the event that replication is not functional at a site, that site's heartbeat status will be ERROR and the database service will be disabled.
 +
 +
'''Master Instance''' - for mysql, master state is indeterminate
 +
<br/>
 +
'''Slave Instance''' - in the event of a replication related issue which prevents replication from continuing, the guest status will be updated to ERROR, which will be reflected in the guest heartbeat.  For mysql, an ERROR state will be flagged when the IO and SQL slave threads are not running.
 +
 +
=== Configuration Groups ===
 +
 +
==== Server ID ====
 +
 +
MySQL replication requires a unique server id for each slave of a given master. By default trove already generates a unique id during instance creation, so this requirement is satisfied. It is currently possible to override the generated service id using a configuration group. This could cause issues with replication so server id will be removed as a 'settable' field in MySQL configuration groups.
 +
 +
==== Read-Only ====
 +
 +
By default new slave instances will be created as read-only. This option will be added as a supported field for MySQL configuration groups so that it is possible to override the default and create a read-write slave.
 +
 +
In a future iteration we will consider adding read-only as a field on the instance itself rather than exposing this via configuration groups.
 +
 +
 +
=== Feedback ===
 +
 +
==== Use Case Summary ====
 +
1. The master can exist before the slave such that the master already contains data
 +
* esp: Once an instance becomes a master can it be downgraded in the same way that a slave can be detached?
 +
* mwj: Updated design - when last slave is detached, master site will be "demoted".
 +
 +
3. Slaves can be marked read-only (read-only will be default)
 +
* esp: If a read-only slave is detached is there an option to make it read_write?
 +
* mwj: I don't think this is necessary for V1.
 +
 +
6. The health of a slave will be monitor-able
 +
* esp: We'll probably want to monitor the health of the master too. 
 +
 +
* esp: Will the mechanism of monitoring be anything more than the heart beat message sent by the agent?
 +
 +
==== Design ====
 +
==== Trove API ====
 +
 +
===== Create Slaves =====
 +
* esp: If a user chooses to create slave(s) with smaller flavor(s) and volume we should allow it as long as it fits.  This is similar to how backup/restore currently works.  It would be good to provide sufficient logging and return an error response for when the data doesn't fit though.
 +
 +
=====Stop Replication =====
 +
* esp: I think maybe this 'POST /instances/{id}/topology/action' could be PUT but I don't care that much :)
 +
* esp: suggested HTTP method
 +
<pre>
 +
PUT /instances/{id}/topology/action
 +
 +
    "instance": {
 +
    "detach": {},
 +
    "read_only": false
 +
    }
 +
}
 +
</pre>
 +
 +
==== Python-Troveclient ====
 +
 +
===== Updated Commands =====
 +
*esp: I think only showing the direct association between nodes is a good way to go.  Trying to show more than that will get messy quick.
 +
*esp: It wouldn't hurt to add these calls above in the Trove API section but not critical.
 +
 +
==== Taskmanager ====
 +
 +
===== taskmanager.create_replication =====
 +
 +
4. delete replication snapshot from Swift
 +
*esp: I'm guessing the the snapshot will only be created 1x for a set of given replicas and deleted when the last slave is created.
 +
*mwj: Yes, that's why we changed the proposed API to have a slave count.
 +
 +
*esp: One day creating replicas could be done in parallel but I'm probably dreaming :)
 +
*mwj: Yes, we thought of that, but decided not to do so for V1.
 +
 +
 +
TBD: Handling security groups
 +
*glucas When security group support is enabled, each instance created via a 'trove create' call gets a new security group. What should we do with slaves?
 +
Proposal: Slaves should be added to the security group of the master rather than getting their own group each. (This may not be addressed in v1.)
 +
 +
==== Trove Guestagent - MySQL Datastore Implementation ====
 +
 +
===== detach_replication_slave() =====
 +
*esp: After a slave is detached can it be re-attached?  Or do we only allow attaching slaves that do not contain data?
 +
*mwj: For this version, the guestagent will assume an empty db.  In this version, there will be no API call to re-attach a slave, or to attach any pre-existing site as a slave.  The only operation taskmanager will know is creating a new set of slaves from a specified master.

Latest revision as of 18:01, 25 August 2014

Description

Providing support for the various replication use cases is critical for use of Trove in production. For the first phase implementation of Replication in Trove we will implement the functionality laid out in the Trove V1 Replication Blueprint

Use Case Summary

The following use cases will be addressed by this V1 implementation:

A. Read Replicas (Slaves)

  1. The master can exist before the slave such that the master already contains data
  2. N slaves can be created for one master slicknik (talk) * To clarify, the v1 implementation will allow for this but will require N separate create calls. We may optimize this in a later implementation.
  3. Slaves can be marked read-only (read-only will be default)
  4. A slave can be detached from its master to act as independent site
  5. A pre-existing non-replication site can become the master of a new slave
  6. The health of a slave will be monitor-able by third party apps.

Design

Trove API

The REST API will be extended to support:

  • creating a new instance as a replication slave of an existing instance
  • detaching a slave from its master such that it becomes a stand-alone instance

Create Instance (Master)

There is no explicit action to create a master: any existing instance can be used as the replication source when creating a new slave.

For reference, here is a sample call to create a MySQL instance.

Request:

POST /instances
{
  "instance": {
    "name": "products",
    "datastore": {
      "type": "mysql",
      "version": "5.5"
    },
    "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
    "flavorRef": "7",
    "volume": {
      "size": 1
    }
  }
}

Response:

{
  "instance": {
    "status": "BUILD",
    "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998",
    "name": "products",
    "created": "...",
    "updated": "...",
    "links": [{...}],
    "datastore": {
      "type": "mysql",
      "version": "5.5"
    },
    "configuration": {
      "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
      "links": [{...}],
    },
    "flavor": {
      "id": "7",
      "links": [{...}],
    },
    "volume": {
      "size": 1
    }
  }
}

Create Slave

A replication slave is created as a new instance with a 'slaveOf' reference to an existing instance (which will become the master).

Request:

POST /instances
{
  "instance": {
    "name": "products-s1",
    "datastore": {
      "type": "mysql",
      "version": "5.5"
    },
    "slaveOf": "dfbbd9ca-b5e1-4028-adb7-f78643e17998",
    "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
    "flavorRef": "7",
    "volume": {
      "size": 1
    }
  }
}

Response:

{
  "instance": {
    "status": "BUILD",
    "id": "061aaf4c-3a57-411e-9df9-2d0f813db859",
    "name": "products-s1",
    "created": "...",
    "updated": "...",
    "links": [{...}],
    "datastore": {
      "type": "mysql",
      "version": "5.5"
    },
    "slaveOf": {
      "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998",
      "links":[{..}],
    }
    "configuration": {
      "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6",
      "links": [{...}],
    },
    "flavor": {
      "id": "7",
      "links": [{...}],
    },
    "volume": {
      "size": 1
    }
  }
}

Stop Replication

POST /instances/{id}/action

{
    "detach_replication": {}
}

Notes:

  • id in the resource URI is the id of a replication slave instance

Python-Troveclient

New Commands

  • Detach

trove detach_replication <slave instance>

No additional arguments are required for a 'detach' operation.

Updated Commands

  • Create

trove create <name> <flavor> --size <volume size> ... --slave_of <masterId>

The optional --slave_of argument is used to indicate that the new instance should be configured as a slave of the specified master instance.

  • Show

The trove show command will be updated to indicate whether the specified instance instance is a replication master or slave.

trove show <master>

+-------------------+---------------------------------------------+
|      Property     |         Value                               |
+-------------------+---------------------------------------------+
|      created      | 2014-05-27T18:21:57                         |
|     datastore     | mysql                                       |
| datastore_version | mysql-5.5                                   |
|       flavor      | 100                                         |
|         id        | 93832783-0993-48e0-a0ab-7b996818b7cc        |
|        name       | test1                                       |
|       slaves      | 061aaf4c-3a57-411e-9df9-2d0f813db859        |
|       status      | ACTIVE                                      |
|      updated      | 2014-05-27T18:48:05                         |
|       volume      | {u'used': 0.11, u'size': 3}                 |
+-------------------+---------------------------------------------+

trove show <slave>

+-------------------+---------------------------------------------+
|      Property     |         Value                               |
+-------------------+---------------------------------------------+
|      created      | 2014-05-27T18:21:57                         |
|     datastore     | mysql                                       |
| datastore_version | mysql-5.5                                   |
|       flavor      | 100                                         |
|         id        | 93832783-0993-48e0-a0ab-7b996818b7cc        |
|        name       | test1                                       |
|       slaveOf     | dfbbd9ca-b5e1-4028-adb7-f78643e17998        |
|       status      | ACTIVE                                      |
|      updated      | 2014-05-27T18:48:05                         |
|       volume      | {u'used': 0.11, u'size': 3}                 |
+-------------------+---------------------------------------------+

Notes:

  • Only immediate links will be included in the 'show' output. (In future iterations it may be necessary to add new commands to view more complex topologies.)
  • Exact rendering of show output is subject to change; content is intended to be representative.

Taskmanager

The taskmanager will implement 2 API calls:

  • create_instance will be updated to support the additional 'slaveOf' argument
  • detach_replication(slave_instance)

taskmanager.create_instance

The create instance task will be updated to handle creating a slave. When a master instance is specified (via the slave_of parameter):

  1. execute get_replication_master_snapshot() on the master site, receiving "master snapshot results metadata"
  2. uses the master snapshot to create a new instance with a copy of the master's data (via restore functionality)
  3. execute guestagent.attach_replication_slave() on new instance
  4. delete replication snapshot from Swift


taskmanager.detach_replication

Executes guestagent.detach_replication_slave() for the selected instance; removed the slaveOf reference from the instance record.

Trove GuestAgent

There will be 4 new methods added to the guestagent API:

  • get_replication_snapshot()
  • attach_replication_slave()
  • detach_replication_slave()
  • demote_replication_master()

Replication will be focused around a replication snapshot. This snapshot will contain the data necessary to set up a slave to replicate from the site which created the snapshot, typically a URI to the user's data set stored in Swift plus the metadata required to coordinate replication.

Each datastore implementation will need to implement these methods. The content of the image uploaded to swift is opaque to the taskmanager and higher components, so the guest agent is free to store whatever data it chooses, in whichever format is most appropriate. The content of the metadata is specific to the datastore, but will be represented as a JSON object.

Notes:

  • In future iterations, trove capabilities may be used to indicate whether a particular data store supports the replicate / detach actions.

Trove Guestagent - MySQL Datastore Implementation

get_replication_snapshot()

The MySQL guestagent will use xtrabackup to create a backup of the user's data and upload it to Swift. The metadata will include a URI of the uploaded backup data, along with the site's binlog position and network information required to set up replication.

{
    "master": {
        "host": "192.168.0.1",
        "port": 3306
    },
    "dataset": {
        "datastore": "mysql",
        "datastore_version": "mysql-5.5",
        "dataset_size": 2,
        "snapshot_href": "http://..."
    },
    "binlog_position": <binlog position>
}
attach_replication_slave()

Configures the site to receive replicated updates from the master site.

detach_replication_slave()

Stops the slave from replicating from the master. After the instance has been detached from the master, it is an indepent copy of the master's data, and is a fully functional site on its own.

After a slave is detached the topology for the master will no longer contain the detached slave:

{
  "topology": {
    "members": [
      {
        "id": "{master-id}",
        "name": "master"
      },
      {
        "id": "{slave2-id}",
        "name": "slave2",
        "mysql": {
          "slave_of": [{"id": "{master-id}"}],
          "read_only": true
        }
      }
    ]
  }
}

The detached slave (slave1 in this example) will have no topology, as it is now a stand-alone instance.

demote_replication_master()

Returns the site to its pre-replication state. For mysql, this will involve turning off bin-logging and removing associated logs.

Trove Guestagent - Replication Status

The trove guest-agent will reflect the state of replication via the guest heartbeat. In the event that replication is not functional at a site, that site's heartbeat status will be ERROR and the database service will be disabled.

Master Instance - for mysql, master state is indeterminate
Slave Instance - in the event of a replication related issue which prevents replication from continuing, the guest status will be updated to ERROR, which will be reflected in the guest heartbeat. For mysql, an ERROR state will be flagged when the IO and SQL slave threads are not running.

Configuration Groups

Server ID

MySQL replication requires a unique server id for each slave of a given master. By default trove already generates a unique id during instance creation, so this requirement is satisfied. It is currently possible to override the generated service id using a configuration group. This could cause issues with replication so server id will be removed as a 'settable' field in MySQL configuration groups.

Read-Only

By default new slave instances will be created as read-only. This option will be added as a supported field for MySQL configuration groups so that it is possible to override the default and create a read-write slave.

In a future iteration we will consider adding read-only as a field on the instance itself rather than exposing this via configuration groups.


Feedback

Use Case Summary

1. The master can exist before the slave such that the master already contains data

  • esp: Once an instance becomes a master can it be downgraded in the same way that a slave can be detached?
  • mwj: Updated design - when last slave is detached, master site will be "demoted".

3. Slaves can be marked read-only (read-only will be default)

  • esp: If a read-only slave is detached is there an option to make it read_write?
  • mwj: I don't think this is necessary for V1.

6. The health of a slave will be monitor-able

  • esp: We'll probably want to monitor the health of the master too.
  • esp: Will the mechanism of monitoring be anything more than the heart beat message sent by the agent?

Design

Trove API

Create Slaves
  • esp: If a user chooses to create slave(s) with smaller flavor(s) and volume we should allow it as long as it fits. This is similar to how backup/restore currently works. It would be good to provide sufficient logging and return an error response for when the data doesn't fit though.
Stop Replication
  • esp: I think maybe this 'POST /instances/{id}/topology/action' could be PUT but I don't care that much :)
  • esp: suggested HTTP method
PUT /instances/{id}/topology/action
{   
    "instance": {
     "detach": {},
     "read_only": false
     }
}

Python-Troveclient

Updated Commands
  • esp: I think only showing the direct association between nodes is a good way to go. Trying to show more than that will get messy quick.
  • esp: It wouldn't hurt to add these calls above in the Trove API section but not critical.

Taskmanager

taskmanager.create_replication

4. delete replication snapshot from Swift

  • esp: I'm guessing the the snapshot will only be created 1x for a set of given replicas and deleted when the last slave is created.
  • mwj: Yes, that's why we changed the proposed API to have a slave count.
  • esp: One day creating replicas could be done in parallel but I'm probably dreaming :)
  • mwj: Yes, we thought of that, but decided not to do so for V1.


TBD: Handling security groups

  • glucas When security group support is enabled, each instance created via a 'trove create' call gets a new security group. What should we do with slaves?

Proposal: Slaves should be added to the security group of the master rather than getting their own group each. (This may not be addressed in v1.)

Trove Guestagent - MySQL Datastore Implementation

detach_replication_slave()
  • esp: After a slave is detached can it be re-attached? Or do we only allow attaching slaves that do not contain data?
  • mwj: For this version, the guestagent will assume an empty db. In this version, there will be no API call to re-attach a slave, or to attach any pre-existing site as a slave. The only operation taskmanager will know is creating a new set of slaves from a specified master.