Jump to: navigation, search

Difference between revisions of "Trove/Blueprints/Trove-v1-MySQL-Replication"

m (High Level Requirements)
(High Level Requirements)
Line 13: Line 13:
  
 
== High Level Requirements ==
 
== High Level Requirements ==
While the specific details of each datastore need to be investigated against this list, these are seen at the overall requirements that would motivate the replication feature:
+
While the specific details of each datastore need to be investigated against this list, these are seen as the use-cases that would motivate the replication feature:
  
 
A. Read Replicas (Slaves)
 
A. Read Replicas (Slaves)

Revision as of 00:16, 5 March 2014

Description

Providing support for the various replication use cases is critical for use of Trove in production. This will describe the various use cases and related requirements and then propose a scoping for an initial V1 implementation for MySQL.

Justification/Benefits

Most of the datastores currently supported by Trove have replication capabilities to fulfill various use cases such as:

  • scale out via read replicas
  • operational recovery (aka failover)
  • offline backup

In order to be production ready, Trove needs to support easy configuration and management of these use cases. Today Amazon RDS fulfills the first use cases and part of the second use case for MySQL.

Over time all of these requirements should be evaluated; the goal of this blueprint is to focus on read replicas for scale out and target the MySQL datastore. It is expected that implementation of this scoping will occur for other datastores and then further work can be scoped out to meet the remaining requirements.

High Level Requirements

While the specific details of each datastore need to be investigated against this list, these are seen as the use-cases that would motivate the replication feature:

A. Read Replicas (Slaves)

  1. The master can exist before the master such that it already contains data
  2. N Slaves for one master
  3. Slaves can be marked read-only (probably by default)
  4. When master fails, a slave can be chosen to be promoted to new master, with other slaves switched to follow new master
  5. A slave can be detached from "replication set" to act as independent site
  6. A pre-existing non-replication site can become the master of a new "replication set"
  7. All slaves should be in the same zone. (is this necessary or desired?)


B. MultiZone Disaster Recovery

  1. A master in one zone is mirrored by a slave in a different zone
  2. Some mechanism should exist where cloud admin can set up "zone configuration" so that the user can simply select "MultiZone DR" and Trove will know where to put both the master and the slave
  3. Should be able to restore master from slave, either directly or by making backup stored in Glance
  4. Should be able to "click the switch" on an already running mysql instance


C. Single Zone Failover

  1. Implements master-master replication between 2 instances in the same zone
  2. Can be set up on pre-existing instance
  3. Should be able to switch "active master", i.e., the site to which data is being written (other site could be marked read-only)

Impacts

Configuration

  • Does this impact any configuration files? If so, which ones?

Database

  • Does this impact any existing tables? If so, which ones?
  • Are the changes forward and backward compatible?
  • Be sure to include the expected migration process

Public API

  • Does this change any API that an end-user has access to?
  • Are there any exceptions in terms of consistency with other APIs?

Internal API

  • Does this change any internal messages between API and Task Manager or Task Manager to Guest

Guest Agent

  • Does this change behavior on the Guest Agent? If so, is it backwards compatible with API and Task Manager?