Trove/Replication-And-Clustering-With-Nodes-3

=This Wiki Is Outdated=

11/2014: This proposal has been superseded and is kept around for historical purposes. To see the true Clustering specification, visit https://blueprints.launchpad.net/trove/+spec/clustering. To see the true Replication specification, see https://blueprints.launchpad.net/trove/+spec/replication-v1

Example: Cassandra
To illustrate the approach, Cassandra is used in the examples below. The eccentricities of each Datastore will be explained in their own sections.

Create Cluster
Request:

POST /instances { "instance": { "name": "products", "datastore": { "type": "cassandra", "version": "2.0.6" },   "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "flavorRef": "7", "volume": { "size": 1 },   "cluster": { "size": 3, "nodes": [ {"region": "west"}, {"region": "east"}, {"region": "eu"} ]   }  } }

Response:

{ "instance": { "status": "BUILD", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "cassandra", "version": "2.0.6" },   "cluster": { "size": 3, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "region": "west"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a", "region": "east"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d", "region": "eu"} ]   }  } } Notes:
 * For Phase One:
 * cluster.nodes{} will not be supported.
 * cluster.nodes[].region will not be returned.
 * For Phase Two:
 * if cluster.allocations{} is not provided, the current region is assumed.
 * cluster.nodes[].region will always be returned.
 * Cassandra-specific fields that are required to construct the initial cluster (num_tokens, endpoint_snitch, seed ip-list, etc.) are to be determined/calculated based on configuration file values and common-sense.

Show Cluster
Request:

GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998

Response:

{ "instance": { "status": "ACTIVE", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "cassandra", "version": "2.0.6" },   "cluster": { "size": 3, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "region": "west"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a", "region": "east"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d", "region": "eu"} ]   }  } } Notes:
 * In Phase One:
 * cluster.nodes[].region will not be returned.
 * Change: instance.volume.used, instance.ip[], and instance.hostname will never be returned
 * It's possible that instance.ip[] can remain if it only returns the seed ips.
 * It's possible that instance.hostname can remain if it's converted to an array and only contains the seed hostnames.

Show Node
Request:

GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/nodes/416b0b16-ba55-4302-bbd3-ff566032e1c1

Response:

{ "node": { "status": "ACTIVE", "id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "name": "products-1", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "ip": ["10.0.0.1"], "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], },   "flavor": { "id": "7", "links": [{...}], },   "volume": { "size": 2, "used": 0.17 } } }

Add Node(s)
Request:

POST /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/nodes { "nodes": { "num": 2, "allocations": [ {"region": "west"}, {"region": "west"} ] } } Response: HTTP 202 (Empty Body) Notes:
 * For Phase One:
 * nodes.num will be the only supported field (nodes.allocations will not)
 * For Phase Two:
 * if nodes.allocations[] is not provided, the region of every existing node must match, otherwise the request is failed.

Replace Node
Request:

POST /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/action

{ "replace_node": { "id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a" } } Response: HTTP 202 (Empty Body) Notes:
 * http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_live_node.html
 * http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
 * http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

Remove Node
Request:

DELETE /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/nodes/7f52e4f9-3fa6-4238-ac08-1ce15197329a Response: HTTP 202 (Empty Body) Notes:
 * http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_remove_node_t.html

Create Cluster
Request:

POST /instances { "instance": { "name": "products", "datastore": { "type": "mongodb", "version": "2.4.10" },   "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "flavorRef": "7", "volume": { "size": 1 },   "cluster": { "size": 5, "nodes": [ {"region": "west"}, {"region": "west"}, {"region": "west"}, {"region": "east"}, {"region": "east"} ]   }  } }

Response:

{ "instance": { "status": "BUILD", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mongodb", "version": "2.4.10" },   "cluster": { "size": 5, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "region": "west"}, {"id": "965ef811-7c1d-47fc-89f2-a89dfdd23ef2", "region": "west"}, {"id": "3642f41c-e8ad-4164-a089-3891bf7f2d2b", "region": "west"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a", "region": "east"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d", "region": "east"} ]   }  } }

Show Cluster
Request:

GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998

Response:

{ "instance": { "status": "ACTIVE", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mongodb", "version": "2.4.10" },   "cluster": { "size": 5, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "region": "west"}, {"id": "965ef811-7c1d-47fc-89f2-a89dfdd23ef2", "region": "west"}, {"id": "3642f41c-e8ad-4164-a089-3891bf7f2d2b", "region": "west"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a", "region": "east"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d", "region": "east"} ]   }  } }

Show Node
Request:

GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/nodes/416b0b16-ba55-4302-bbd3-ff566032e1c1

Response:

{ "node": { "status": "ACTIVE", "id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "name": "products-1", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "ip": ["10.0.0.1"], "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], },   "flavor": { "id": "7", "links": [{...}], },   "volume": { "size": 2, "used": 0.17 } } }

Create Arbiter(s)
Request:

POST /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/nodes

{ "nodes": { "num": 2, "allocations": [ {"region": "eu", "type": "arbiter"}, {"region": "eu", "type": "arbiter"} ] } }

Response:

HTTP 202 (Empty Body)

Show Cluster (After Arbiters)
Request:

GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998

Response:

{ "instance": { "status": "ACTIVE", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mongodb", "version": "2.4.10" },   "cluster": { "size": 7, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "region": "west"}, {"id": "965ef811-7c1d-47fc-89f2-a89dfdd23ef2", "region": "west"}, {"id": "3642f41c-e8ad-4164-a089-3891bf7f2d2b", "region": "west"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a", "region": "east"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d", "region": "east"}, {"id": "77032c55-4496-4e35-8c0d-6cd1c18e1a9c", "region": "eu", "type": "arbiter"}, {"id": "1fd054ed-221f-4c99-8d17-570bcff4c1d2", "region": "eu", "type": "arbiter"} ]   }  } }

Create Master
Request:

POST /instances { "instance": { "name": "products", "datastore": { "type": "mysql", "version": "5.5" },   "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "flavorRef": "7", "volume": { "size": 1 } } }

Response:

{ "instance": { "status": "BUILD", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" },   "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], },   "flavor": { "id": "7", "links": [{...}], },   "volume": { "size": 1 } } }

Create Slave
Request:

POST /instances { "instance": { "name": "products-slave", "datastore": { "type": "mysql", "version": "5.5" },   "configuration": "fc318e00-3a6f-4f93-af99-146b44912188", "flavorRef": "7", "volume": { "size": 1 },   "slave": { "of": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "read_only": true } } }

Response:

{ "instance": { "status": "BUILD", "id": "061aaf4c-3a57-411e-9df9-2d0f813db859", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" },   "configuration": { "id": "fc318e00-3a6f-4f93-af99-146b44912188", "links": [{...}], },   "flavor": { "id": "7", "links": [{...}], },   "volume": { "size": 1 },   "slave": { "of": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "read_only": true } } }

Show Master
Request:

GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998

Response:

{ "instance": { "status": "ACTIVE", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" },   "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], },   "flavor": { "id": "7", "links": [{...}], },   "volume": { "size": 1 },   "slave": { "list": [ {"id": "061aaf4c-3a57-411e-9df9-2d0f813db859"} ]   }  } }

Show Slave
Request:

GET /instances/061aaf4c-3a57-411e-9df9-2d0f813db859

Response:

{ "instance": { "status": "ACTIVE", "id": "061aaf4c-3a57-411e-9df9-2d0f813db859", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" },   "configuration": { "id": "fc318e00-3a6f-4f93-af99-146b44912188", "links": [{...}], },   "flavor": { "id": "7", "links": [{...}], },   "volume": { "size": 1 },   "slave": { "of": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "read_only": true } } }

Detach Slave
Request:

POST /instances/061aaf4c-3a57-411e-9df9-2d0f813db859/action

{ "detach": {} }

Response:

HTTP 202 (Empty Body)

Delete Master
Request:

DELETE /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998 Response: HTTP 202 (Empty Body) Notes:
 * How to handle situation in which a slave is attached to a master, and the user attempts to delete the master?

Delete Slave
Request:

DELETE /instances/061aaf4c-3a57-411e-9df9-2d0f813db859 Response: HTTP 202 (Empty Body)

Nodes Table
Create a new 'nodes' Table: CREATE TABLE "nodes" ( "id" varchar(36) NOT NULL,  "instance_id" varchar(36) NOT NULL,  "created" datetime DEFAULT NULL,  "updated" datetime DEFAULT NULL,  "name" varchar(255) DEFAULT NULL,  "hostname" varchar(255) DEFAULT NULL,  "compute_instance_id" varchar(36) DEFAULT NULL,  "task_id" int(11) DEFAULT NULL,  "task_description" varchar(32) DEFAULT NULL,  "task_start_time" datetime DEFAULT NULL,  "volume_id" varchar(36) DEFAULT NULL,  "flavor_id" int(11) DEFAULT NULL,  "volume_size" int(11) DEFAULT NULL,  "tenant_id" varchar(36) DEFAULT NULL,  "server_status" varchar(64) DEFAULT NULL,  "deleted" tinyint(1) DEFAULT NULL,  "deleted_at" datetime DEFAULT NULL,  "datastore_version_id" varchar(36) NOT NULL,  "configuration_id" varchar(36) DEFAULT NULL,  PRIMARY KEY ("id"),  KEY "instance_id" ("instance_id"),  KEY "datastore_version_id" ("datastore_version_id"),  KEY "configuration_id" ("configuration_id"), KEY "instances_tenant_id" ("tenant_id"), KEY "instances_deleted" ("deleted"), CONSTRAINT "nodes_ibfk_3" FOREIGN KEY ("instance_id") REFERENCES "instances" ("id"), CONSTRAINT "nodes_ibfk_2" FOREIGN KEY ("configuration_id") REFERENCES "configurations" ("id"), CONSTRAINT "nodes_ibfk_1" FOREIGN KEY ("datastore_version_id") REFERENCES "datastore_versions" ("id") ) ENGINE=InnoDB DEFAULT CHARSET=utf8; aka the same table as instances, except:
 * addition of: "instance_id" varchar(36) NOT NULL
 * addition of: KEY "instance_id" ("instance_id")
 * addition of: CONSTRAINT "instances_ibfk_3" FOREIGN KEY ("instance_id") REFERENCES "instances" ("id"),
 * TODO: changing of 'DEFAULT NULL' to 'NOT NULL' whenever possible (ex: things like CREATED should never be NULL)
 * TODO: addition of removal of indexes as deemed necessary

Alter Instances Table
Add slave_of Column to Instances Table (+ Constraint + Index): ALTER TABLE instances ADD COLUMN slave_of VARCHAR(36) DEFAULT NULL; KEY "slave_of" ("slave_of"), CONSTRAINT "instances_ibfk_3" FOREIGN KEY ("slave_of") REFERENCES "instances" ("id")

Alter Other Tables
Add node_id column to the following tables: ALTER TABLE ADD COLUMN node_id VARCHAR(36) DEFAULT NULL; KEY "node_id" ("node_id"), CONSTRAINT " _ibfk_ " FOREIGN KEY ("node_id") REFERENCES "nodes" ("id")
 * agent_heartbeats
 * backups
 * conductor_lastseen
 * root_enabled_history
 * security_group_instance_associations
 * service_statuses
 * usage_events

TaskManager

 * add node_id to /etc/guest_info (if it's a node in a cluster). guest_id remains as-is.
 * for-loop create each node.
 * poll until all nodes are active.
 * for each node: use trove/nova to get ip/hostname
 * for couchbase:
 * send ip/hostname list via rpc cast to guest
 * for cassandra:
 * send seed ip list via rpc cast to guest seed nodes, one by one (polling on REBOOT => ACTIVE), then to rest of nodes.
 * for mongodb:
 * send ip/hostname list via rpc cast to guest that is the db.isMaster

Guest

 * update heartbeat payload ( heartbeat(guest_id, payload, sent) ) from {"service_status": " "} to {"service_status": " ", "node_id": ""}
 * add method to each datastore guest manager for handling ip/hostname list

Conductor

 * update heartbeat logic to update the nodes table (node status)

Capabilities
A capability might be supported for a datastore-version for standalone instances, but not for clusters. Therefore, the capability tables must be amended to include a cluster-enabled flag.

ALTER TABLE capabilities ADD COLUMN enabled_cluster TINYINT(1) DEFAULT NULL; ALTER TABLE capability_overrides ADD COLUMN enabled_cluster TINYINT(1) DEFAULT NULL; The following capabilities should have enabled_cluster set to false for the first iteration of clusters:
 * backup-create + list-instance
 * configuration-attach + detach + instances
 * resize-
 * database-
 * root-
 * secgroup-
 * user-

Introduce read_only and hidden Parameters
Need to introduce two additional attributes for configuration group parameters: read_only and hidden.
 * read_only fields include cluster_name, num_tokens, seed_provider, seeds, endpoint_snitch (cassandra) + replSet (mongodb) + server_id, log_bin (mysql).
 * depending on the provider, some of the read_only fields should also be hidden from the user on a configuration-show.
 * once read_only + hidden are available, a parallel effort should move configuration-default to configurations-show if a configuration-group is attached.
 * amcrn (talk) 20:43, 8 May 2014 (UTC): update: mysql master/slave will not be required to do this because the overrides.cnf functionality handles this nicely. to be determined as to how clustering will handle this. it could be this, or it could be a copy of the original conf, or a mixture thereof.

Auto-Create and Attach

 * cassandra & mongodb need to have configuration-groups automatically created and attached to each node (for cluster_name, replset, etc.) during provisioning.
 * unique configuration-group per node.
 * auto-created+attached configuration-groups need to not be detachable from the instance.
 * dependency: configuration-group support for mongodb + cassandra
 * amcrn (talk) 20:43, 8 May 2014 (UTC): update: mysql master/slave will not be required to do this because the overrides.cnf functionality handles this nicely. to be determined as to how clustering will handle this. it could be this, or it could be a copy of the original conf, or a mixture thereof.

Create Slave

 * glucas: Replication will require capturing a snapshot of the master's state and passing that to the slave. It would be preferable to create multiple slaves from a single snapshot and then clean up, rather than repeating the snapshot process multiple times. For that reason we propose a 'replicate' action that can create N slaves from an existing master in one call. I believe this approach works with the schema changes proposed here, i.e. adding the slave_of reference to the instance table.
 * amcrn (talk) 17:49, 8 May 2014 (UTC): this is not usually cost-effective, depending on the provider. ex: if you have at least one slave in two or more regions, it makes sense to seed additional slaves with a backup from their home region. Here's a question: the backup of the master, will it stay put in Swift even after the replication has been setup? Re-worded, does the user have to delete the backup that's an artifact of setting up replication, or will Trove take care of it?
 * mwj: I thought the point of the topology design was to not introduce properties to the instance table that would not be relevant to all (i.e., non-replication) instances. I take it this is no longer an issue?
 * amcrn (talk) 17:49, 8 May 2014 (UTC): don't understand the question because the instances table has not been modified at all, except the addition of the slave_of column, and no clustered nodes will be inserted in the instances table.

Capabilities

 * glucas: We should use capabilities to indicate whether a datastore supports replication (and potentially read-only vs. read-write replication). In the first iteration, replication will not be supported for clusters. mwj: Is it really necessary to have a capability for replication?  We are designing replication to use backup/restore functionality, so the backup/restore capability should be enough, no?
 * amcrn (talk) 17:36, 8 May 2014 (UTC): there definitely needs to be a capability flag for clusters when it comes to features, as to whether master/slave needs it, that's an open question. does the logic for creating a backup wildly change when targeted against a slave vs. a standalone, and if so, are you guaranteeing you'll support this in your first iteration? you'll need to answer this for configuration-groups, users, databases, security-groups, root-enable/history, etc.

Configuration Groups

 * dougshelley66: What if the end user wants to specify some configuration parameters for nodes or replicas? It appears (from the description above) that each node will have an auto generated group attached. Given an instance can only have one config group, this would allow the user to specify one?
 * amcrn (talk) 17:33, 8 May 2014 (UTC): here's how it'd work: you want a master/slave setup, with 2 slaves. all three nodes would get their own configuration-group automatically created and attached (3 unique ids). if the user wishes to configure some parameters, they can do so. if the user provides a configuration-id on provisioning, then we'd add the read_only/hidden parameters to their configuration-group automatically.

Promote Slave

 * mwj: I would rather call this "detach slave" as I expect will may want to have "promote slave" be used in future fail-over designs.
 * amcrn (talk) 17:30, 8 May 2014 (UTC): agreed, just went with the industry standard as a default. changed it to "detach" because i agree with you.

Delete Slave

 * mwj: Shouldn't "delete instance" just work, even for slaves?
 * amcrn (talk) 17:29, 8 May 2014 (UTC): didn't have the example above, but yes it should. added an example.

Clusters

 * drewford: Should there be a "List Clusters" call?
 * drewford: "size" and "num" attributes seem like workarounds when dealing with nodes and allocations. Using "num"/"size" to tell the API how many objects you are adding in an array adds the dependency that the "num" value must match the length of the array.  What if they are not the same?  Unless there is a long-term use case for "size" and "num", they seem like they will become artifacts.
 * drewford: "allocations" adds one more level to the create object that seems like it doesn't need to be there.  Why not get straight to the point when adding nodes, like:

"nodes": [ {"region":"west"}, {"region":"west"} ]


 * drewford: When dealing with lists of child objects, preventing individual "detail" calls for each child is a good thing. Example - when getting details on a cluster - in a UI you would most likely also want to see a list of the nodes in the cluster.  It would be good to prevent implementors from having to call the API for each individual node just to show some important details in a nodes list.  Here are a couple options:
 * 1) Use the complete node detail object in the nodes array on the "Show Cluster" call
 * 2) Add a "List Cluster Nodes" call that returns a list of detailed node objects.