Trove/Replication-And-Clustering-With-Nodes
Summary
Summary of New Fields for Instance-Create
Field | Type | Description | Applicable Datastore(s) |
---|---|---|---|
cluster.size | int | size of the new cluster | mongdb, cassandra, couchbase, redis-cluster |
cluster.slave | complex-field | see below for sub-fields | mysql, redis, redis-cluster |
-- cluster.slave.of | string (uuid) | instance-id of the master of which to be a slave | mysql, redis, redis-cluster |
-- cluster.slave.read_only | boolean | whether this slave should be read-only | mysql, redis, redis-cluster |
Summary of New Fields for Instance-Show
Field | Type | Description | Applicable Datastore(s) |
---|---|---|---|
cluster.size | int | size of the cluster (excluding slaves) | mongdb, cassandra, couchbase, redis-cluster |
cluster.nodes | array | instance-ids of nodes | mongdb, cassandra, couchbase, redis-cluster |
cluster.slaves | array | instance-ids of slaves | mysql, redis, redis-cluster |
cluster.slave | complex-field | see below for sub-fields | mysql, redis, redis-cluster |
-- cluster.slave.of | string (uuid) | instance-id of master (if a slave) | mysql, redis, redis-cluster |
-- cluster.slave.read_only | boolean | if a slave, whether it's read-only | mysql, redis, redis-cluster |
Summary of Route Changes
New Route:
GET /instances/<id>/node/<node-id>
{ "node": { "status": "<status>", "id": "<node-id>", "name": "<name>", "created": "<timestamp>", "updated": "<timestamp>", "links": [{...}], "ip": ["<ip>"], //or hostname, depending on conf "volume": { "size": <size>, "used": <used> } } }
New Route:
POST /instances/<id>/node/<node-id>/action
{ "<action>": { <action-specific fields> } }
Summary
- the term "instance" now represents either a single standalone instance or a cluster of nodes.
- each node in a cluster must initially be homogenous in respect to flavor, volume-size, configuration-group, datastore/datastore-version, security-groups, etc.
- in this approach, a slave is not considered a node, and therefore the two terms and concepts are exclusive.
- the instance.name now acts as the cluster name, both in trove as well as the datastore (e.g. cluster_name/replSet)
- node.name is the instance.name with "-<num>" as a suffix (counter for the cluster, starting at 1)
- the semantic meaning of instance.status, and the possible states, will change
- the node.status will likely mirror the state values and transitions seen in instance.status prior to the clustering implementation.
- the cluster.nodes[].ids are generated uuids
Features for First Iteration
Supported:
- Create Multi-Node Cluster
- Restart Multi-Node Cluster
- Terminate Multi-Node Cluster
- Increase Cluster Size
Not Supported:
- No 'POST /instances/<id>/node/<node-id>/action' implementations will land in the first iteration. This means no restart-node, delete-node, reboot-node, promote-node, repair-node, etc.
- Decreasing Cluster Size
- Resizing Flavor/Volume
- Backups
- Altering/Adding/Dropping Security Groups
Example: Cassandra
To illustrate the approach, Cassandra is used in the examples below. The eccentricities of each Datastore will be explained in their own sections.
Create Instance
Request:
POST /instances { "instance": { "name": "products", "datastore": { "type": "cassandra", "version": "2.0.6" }, "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "flavorRef": "7", "volume": { "size": 1 }, "cluster": { "size": 3 } } }
Response:
{ "instance": { "status": "BUILD", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "cassandra", "version": "2.0.6" }, "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], }, "flavor": { "id": "7", "links": [{...}], }, "volume": { "size": 1 }, "cluster": { "size": 3, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d"} ] } } }
Notes:
- Cassandra-specific fields that are required to construct the initial cluster (num_tokens, endpoint_snitch, seed ip-list, etc.) are to be determined/calculated based on configuration file values and common-sense.
Show Instance
Request:
GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998
Response:
{ "instance": { "status": "ACTIVE", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "cassandra", "version": "2.0.6" }, "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], }, "flavor": { "id": "7", "links": [{...}], }, "volume": { "size": 1 }, "cluster": { "size": 3, "nodes": [ {"id": "416b0b16-ba55-4302-bbd3-ff566032e1c1"}, {"id": "7f52e4f9-3fa6-4238-ac08-1ce15197329a"}, {"id": "ff9d680c-fde3-49c6-a84e-76173b6df39d"} ] } } }
Notes:
- Change: instance.volume.used, instance.ip[], and instance.hostname will never be returned
- It's possible that instance.ip[] can remain if it only returns the seed ips.
- It's possible that instance.hostname can remain if it's converted to an array and only contains the seed hostnames.
Show Node
Request:
GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/node/416b0b16-ba55-4302-bbd3-ff566032e1c1
Response:
{ "node": { "status": "ACTIVE", "id": "416b0b16-ba55-4302-bbd3-ff566032e1c1", "name": "products-1", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "ip": ["10.0.0.1"], "volume": { "size": 2, "used": 0.17 } } }
Resize Cluster
Request:
POST /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998/action { "resize_cluster": { "size": 6 } }
Response:
HTTP 202 (Empty Body)
All Other Operations
Notes:
- As mentioned in the Summary, other operations like: removing a node, replacing a running node, replacing a dead node, repairing a node, etc. are not supported in the first iteration.
Example: MySQL
Create Instance
Request:
POST /instances { "instance": { "name": "products", "datastore": { "type": "mysql", "version": "5.5" }, "configuration": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "flavorRef": "7", "volume": { "size": 1 } } }
Response:
{ "instance": { "status": "BUILD", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" }, "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], }, "flavor": { "id": "7", "links": [{...}], }, "volume": { "size": 1 } } }
Create Slave
Request:
POST /instances { "instance": { "name": "products-slave", "datastore": { "type": "mysql", "version": "5.5" }, "configuration": "fc318e00-3a6f-4f93-af99-146b44912188", "flavorRef": "7", "volume": { "size": 1 }, "cluster": { "slave": { "of": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "read_only": true } } } }
Response:
{ "instance": { "status": "BUILD", "id": "061aaf4c-3a57-411e-9df9-2d0f813db859", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" }, "configuration": { "id": "fc318e00-3a6f-4f93-af99-146b44912188", "links": [{...}], }, "flavor": { "id": "7", "links": [{...}], }, "volume": { "size": 1 }, "cluster": { "slave": { "of": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "read_only": true } } } }
Show Master
Request:
GET /instances/dfbbd9ca-b5e1-4028-adb7-f78643e17998
Response:
{ "instance": { "status": "ACTIVE", "id": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" }, "configuration": { "id": "b9c8a3f8-7ace-4aea-9908-7b555586d7b6", "links": [{...}], }, "flavor": { "id": "7", "links": [{...}], }, "volume": { "size": 1 }, "cluster": { "slaves": [ {"id": "061aaf4c-3a57-411e-9df9-2d0f813db859"} ] } } }
Show Slave
Request:
GET /instances/061aaf4c-3a57-411e-9df9-2d0f813db859
Response:
{ "instance": { "status": "ACTIVE", "id": "061aaf4c-3a57-411e-9df9-2d0f813db859", "name": "products", "created": "2014-04-25T20:19:23", "updated": "2014-04-25T20:19:23", "links": [{...}], "datastore": { "type": "mysql", "version": "5.5" }, "configuration": { "id": "fc318e00-3a6f-4f93-af99-146b44912188", "links": [{...}], }, "flavor": { "id": "7", "links": [{...}], }, "volume": { "size": 1 }, "cluster": { "slave": { "of": "dfbbd9ca-b5e1-4028-adb7-f78643e17998", "read_only": true } } } }
MongoDB
Notes:
- instance.name will function as the replica-set name
- The smallest viable deployment pattern for a replica-set is 3 nodes. A three member replica-set can either be (a) a primary and two secondaries or (b) a primary, a secondary, and an arbiter. option (b) provides less redundancy and fault tolerance, so it will not be supported in this first iteration. Therefore, size=3 by default creates a three member replica-set with one primary and two secondaries.
- instance.cluster.size will only support '3', '5', and '7' as values for MongoDB in this first iteration. Why? (a) an odd number of members ensures the replica-set is always able to elect a primary (b) only 7 members can vote at a time.
- Scaling the Nodes from 3 to 5, or 5 to 7 will be supported.
- A replica-set can have up to 12 members, but only 7 voting members. To avoid having to implement and handle 'priority'/'hidden' in the first iteration, the maximum number of nodes is capped at 7.
- Changing the priority, hidden, slaveDelay, or adding an Arbiter will not be supported in this first iteration.
Couchbase
Notes:
- instance.cluster.size can be any positive number
- Resizing the Cluster by increasing the number will be supported.
- Removing a Node will not be supported in the first iteration.
Data Model Changes
Option #1: Instances Table
Name | Data Type | Length | Nullable | Description |
---|---|---|---|---|
type | varchar | 10 | false | either 'standalone', 'cluster', 'node', 'master', or 'slave' |
parent_id | varchar | 36 | true | instance.id (Foreign Key) |
Notes:
- on db_sync backfill existing rows with instance.type as 'standalone'
- type could be an int instead of a varchar, backed by a class enum, but that doesn't seem to be the "OpenStack Way".
- once a slave is created, the master's type is changed from 'standalone' to 'master'
- for clusters, an additional row is inserted (the "cluster row"), with type=cluster.
- parent_id is either the master's uuid (if the row is type=slave), or the cluster's shell row uuid (if the row is type=node)
- node rows will not infer data (like flavor_id) from the cluster row; instead it will be "redundantly" included.
Example Database Rows (Represented in JSON)
Standalone Before:
{ "id": "035dc57c-659c-4cab-83e8-29ec99437655", "created": "2014-04-20 19:38:59", "updated": "2014-04-20 19:39:13", "name": "products", "hostname": null, "compute_instance_id": "1bfa857e-0119-49f5-9f41-d4b5308b69d8", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "cdfabe94-6952-4279-b208-9917ac7d30ed", "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null }
Standalone After:
{ "id": "035dc57c-659c-4cab-83e8-29ec99437655", "created": "2014-04-20 19:38:59", "updated": "2014-04-20 19:39:13", "name": "products", "hostname": null, "compute_instance_id": "1bfa857e-0119-49f5-9f41-d4b5308b69d8", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "cdfabe94-6952-4279-b208-9917ac7d30ed", "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null, "type": "standalone", "parent_id": null }
Cluster of 3 Nodes:
{ "rows": [ { "id": "035dc57c-659c-4cab-83e8-29ec99437655", "created": "2014-04-20 19:38:59", "updated": "2014-04-20 19:44:00", "name": "products", "hostname": null, "compute_instance_id": null, "task_id": 1, "task_description": "No tasks for the cluster.", "task_start_time": null, "volume_id": null, "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null, "type": "cluster", "parent_id": null }, { "id": "ad16a607-bed4-49c7-957e-7d3f869e364f", "created": "2014-04-20 19:38:59", "updated": "2014-04-20 19:39:13", "name": "products-1", "hostname": null, "compute_instance_id": "7b146333-97b0-48cf-b731-a7e4ad61d58f", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "801c31b4-d2cd-413c-be7a-1ebdf89f53bb", "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null, "type": "node", "parent_id": "035dc57c-659c-4cab-83e8-29ec99437655" }, { "id": "1f0c10fb-9bb6-45b5-b27d-ff0c32eb3ab3", "created": "2014-04-20 19:39:15", "updated": "2014-04-20 19:40:22", "name": "products-2", "hostname": null, "compute_instance_id": "fa6a6d74-f1bd-4ee8-bd7a-d54cf526bf77", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "932225fd-b613-4c62-b3e1-243aae10cc45", "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null, "type": "node", "parent_id": "035dc57c-659c-4cab-83e8-29ec99437655" }, { "id": "b38f0356-6796-434c-acc6-953c081476d0", "created": "2014-04-20 19:41:11", "updated": "2014-04-20 19:42:54", "name": "products-3", "hostname": null, "compute_instance_id": "d120ed57-3f43-4200-b222-802f292f9d56", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "5d02d7a1-b7e8-410a-a1bd-eb0e460668fa", "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null, "type": "node", "parent_id": "035dc57c-659c-4cab-83e8-29ec99437655" } ] }
Permutations of Approach:
- Don't deep copy data into node rows
- Don't include 'standalone' and 'master' as types, because it's inferable (if row's type=null, and there's another row with parent_id our row, we're a master not a standalone).
Master/Slave:
{ "rows": [ { "id": "1f0c10fb-9bb6-45b5-b27d-ff0c32eb3ab3", "created": "2014-04-20 19:39:15", "updated": "2014-04-20 19:40:22", "name": "products", "hostname": null, "compute_instance_id": "fa6a6d74-f1bd-4ee8-bd7a-d54cf526bf77", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "932225fd-b613-4c62-b3e1-243aae10cc45", "flavor_id": 10, "volume_size": 200, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": null, "type": "master", "parent_id": null }, { "id": "b38f0356-6796-434c-acc6-953c081476d0", "created": "2014-04-20 19:41:11", "updated": "2014-04-20 19:42:54", "name": "products-slave", "hostname": null, "compute_instance_id": "d120ed57-3f43-4200-b222-802f292f9d56", "task_id": 1, "task_description": "No tasks for the instance.", "task_start_time": null, "volume_id": "5d02d7a1-b7e8-410a-a1bd-eb0e460668fa", "flavor_id": 12, "volume_size": 500, "tenant_id": "19fb5380ec8444358ad3a9d26ced6b8f", "server_status": null, "deleted": 0, "deleted_at": null, "datastore_version_id": "4141cc0b-6300-4706-a9f2-3ce757fec859", "configuration_id": "9cc334ee-dca6-4002-a13e-b6a26fd4e47f", "type": "slave", "parent_id": "1f0c10fb-9bb6-45b5-b27d-ff0c32eb3ab3" } ] }
Notes:
- GET /instances/id will need to be amended to: where id=<id> and type!=node
- GET /instances will need to be amended to understand the type column.
- GET /instances/id/node/node-id will simply be: where id=<id> and type==node (instances table)
- Pros
- Single query lookups for instance and node
- No additional tables
- Deep copy of data from cluster row to all node rows:
- Makes business intelligence and analytics easy
- Makes it easier to support heterogenous clusters (if it becomes a requirement)
- Cons
- If a column needs to be added to instances that is nonsensical from a cluster-level, but sensical on a node-level, then that column cannot be NOT NULL (due to the cluster row). Re-worded, data integrity becomes more difficult if the field applicability diverges.
Option #2: Nodes/Cluster Relationship Table
Name | Data Type | Length | Nullable | Description |
---|---|---|---|---|
id | varchar | 36 | false | primary key (uuid) |
instance_id | varchar | 36 | false | foreign key to instance.id |
type | varchar | 10 | false | either 'node' or 'slave' |
updated | datetime | x | false | last updated |
Notes:
- Approach: Same as Option #1, except type/parent_id are moved to a separate table
- would remove the need for the cluster row in Option #1, but node rows would still exist in instances table
- would remove the need for any added columns to instances table.
- Cons
- GET /instances would need to start with relationship table (rows where type!=node), then select data from instances table.
- GET /instances/id would start with relationship table (is type!=node), then select data from instances table.
- No single query lookups for instance and node
Option #3: Nodes Table
CREATE TABLE "nodes" ( "id" varchar(36) NOT NULL, "instance_id" varchar(36) NOT NULL, "created" datetime DEFAULT NULL, "updated" datetime DEFAULT NULL, "name" varchar(260) DEFAULT NULL, "hostname" varchar(255) DEFAULT NULL, "compute_instance_id" varchar(36) DEFAULT NULL, "task_id" int(11) DEFAULT NULL, "task_description" varchar(32) DEFAULT NULL, "task_start_time" datetime DEFAULT NULL, "volume_id" varchar(36) DEFAULT NULL, "server_status" varchar(64) DEFAULT NULL, "deleted" tinyint(1) DEFAULT NULL, "deleted_at" datetime DEFAULT NULL, PRIMARY KEY ("id"), CONSTRAINT "nodes_ibfk_1" FOREIGN KEY ("instance_id") REFERENCES "instances" ("id"), ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
aka the same table as instances, except the addition of the instance_id foreign-key and the removal of the following (because they're "inherited"):
"flavor_id" int(11) DEFAULT NULL, "volume_size" int(11) DEFAULT NULL, "tenant_id" varchar(36) DEFAULT NULL, "datastore_version_id" varchar(36) NOT NULL, "configuration_id" varchar(36) DEFAULT NULL,
Notes:
- Pros:
- Can add a column to the nodes table without fear of it might impact instances (aka slave or cluster rows)
- The RESTful resource has a true backing resource/model.
- Cons:
- Adding a column to the instances table will likely mean it's also needed on the nodes table (duplication)
- Requires a lot more coding and refactoring than Option #1
- No single query lookups for instance and node
- Permutations of the Approach (or slight additions/modifications):
- Don't inherit flavor_id, volume_size, etc; instead deep copy the data into the nodes table.
- add a is_cluster column to instances table to avoid silly lookups against the nodes table for GET /instances/id