Difference between revisions of "Keystone-schema-in-cassandra"

Revision as of 09:40, 14 April 2015

Some basics of schema design in Cassandra

Cassandra is very similar to relational databases when it comes to tables and columns. First a table has to be created along with the columns and types before data can be inserted into it. Cassandra has a query language called CQL(Cassandra query language) similar to SQL. There is a notion of partition key in Cassandra which is similar to primary key in relational database. The table is partitioned across nodes based on the hash value of the partition key in Cassandra. The primary difference between Cassandra and relational db comes in the queries that can be answered. In relational database the WHERE clause can have arbitrary column whereas in Cassandra it must have the partition key. The query could have other conditions along with the partition key in Cassandra. This is because Cassandra needs to know the partition in which the row or data resides before it can answer queries on it.

There is a major difference between relational db and Cassandra when it comes to index. In relational db, index can be created on arbitrary columns which is a B-tree internally in the database and then arbitrary queries(sort, =, <, >) on that column can be answered. But in Cassandra index can not be created on any column. There is an implicit index on the partition key and queries can only be answered on that index which have a equality(=) condition on the partition key. If index on other column is required then a separate table has to be built manually and maintained in the application. That mean when inserting a row, this row should be inserted in two tables. Cassandra would support index out of the box from 3.0.0 release according to the people working on it. Then there would be no need of maintaining a separate table for index.

The Cassandra tables for each of the backend in Keystone is described below.

Identity

The identity backend of Keystone holds data for users, groups and user-group membership. There are three tables in relational db for holding this data.

user
group
user_group_membership

user

The user table looks something like this in MySql.

 id varchar(64) NOT NULL,
 name varchar(255) NOT NULL,
 extra text,
 password varchar(128) DEFAULT NULL,
 enabled tinyint(1) DEFAULT NULL,
 domain_id varchar(64) NOT NULL,
 default_project_id varchar(64) DEFAULT NULL,
 PRIMARY KEY (id),
 UNIQUE KEY `ixu_user_name_domain_id` (`domain_id`,`name`),
 CONSTRAINT `fk_user_domain_id` FOREIGN KEY (`domain_id`) REFERENCES `domain` (`id`)

Operations

create_user(user_id, user)

delete_user(user_id)

update_user(domain_id, user_id, user)

get_user(user_id)

get_user_by_name(domain_id, name)

list_users(domain_id)

From the operations it is evident that the data for user is queried on three columns. i.e. (domain_id, name) and (user_id). Based on this information, the equivalent of this table in Cassandra would consist of two tables. The first one would have (domain_id, name) as primary key and domain_id would be the partition key in this table. The second table would have (user_id) as partition key. All the insert, update and delete for user table goes to these two table.

group

The group table is similar to user except few fields such as password and default_project_id. This table can be designed the same way as user.

user_group_membership

The user_group_membership table looks something like this in MySql.

 user_id varchar(64) NOT NULL,
 group_id varchar(64) NOT NULL,
 PRIMARY KEY (user_id,group_id),
 KEY group_id (group_id),
 CONSTRAINT fk_user_group_membership_user_id FOREIGN KEY (user_id) REFERENCES user (id),
 CONSTRAINT fk_user_group_membership_group_id FOREIGN KEY (group_id) REFERENCES group (id)

Operationss

add_user_to_group(user_id, group_id)

check_user_in_group(user_id, group_id)

remove_user_from_group(user_id, group_id)

list_users_in_group(group_id)

list_groups_for_user(group_id)

There are two types of operations here. One is based on user_id and another is based on group_id. So there will be two tables in Cassandra to store user_group_membership. In the first table user_id would be partition key whereas in the second group_id would be the partition key. All the insert, update and delete go to both the tables in Cassandra.

@@ Line 42: / Line 42: @@
 * list_users(domain_id)
 From the operations it is evident that the data for user is queried on three columns. i.e. (domain_id, name) and (user_id). Based on this information, the equivalent of this table in Cassandra would consist of two tables. The first one would have (domain_id, name) as primary key and domain_id would be the partition key in this table. The second table would have (user_id) as partition key. All the insert, update and delete for user table goes to these two table.
@@ Line 68: / Line 69: @@
 * list_groups_for_user(group_id)
 There are two types of operations here. One is based on user_id and another is based on group_id. So there will be two tables in Cassandra to store user_group_membership. In the first table user_id would be partition key whereas in the second group_id would be the partition key. All the insert, update and delete go to both the tables in Cassandra.