Jump to: navigation, search

Keystone-schema-in-cassandra

Revision as of 09:51, 14 April 2015 by Ajaya Agrawal (talk | contribs) (Some basics of schema design in Cassandra)

Some basics of schema design in Cassandra

Cassandra is very similar to relational databases when it comes to tables and columns. First a table has to be created along with the columns and types before data can be inserted into it. Cassandra has a query language called CQL(Cassandra query language) similar to SQL. There is a notion of partition key in Cassandra which is similar to primary key in relational database. The table is partitioned across nodes based on the hash value of the partition key in Cassandra. The primary difference between Cassandra and relational db comes in the queries that can be answered. In relational database the WHERE clause can have arbitrary column whereas in Cassandra it must have the partition key. The query could have other conditions along with the partition key in Cassandra. This is because Cassandra needs to know the partition in which the row or data resides before it can answer queries on it.

There is a major difference between relational db and Cassandra when it comes to index. In relational db, index can be created on arbitrary columns which is a B-tree internally in the database and then arbitrary queries(sort, =, <, >) on that column can be answered. But in Cassandra index can not be created on any column. There is an implicit index on the partition key and queries can only be answered on that index which have a equality(=) condition on the partition key. If index on other column is required then a separate table has to be built manually and maintained in the application. That mean when inserting a row, this row should be inserted in two tables. Cassandra would support index out of the box from 3.0.0 release according to the people working on it. Then there would be no need of maintaining a separate table for index.

The Cassandra tables for each of the backend in Keystone is described below.

Identity

The identity backend of Keystone holds data for users, groups and user-group membership. There are three tables in relational db for holding this data.

  • user
  • group
  • user_group_membership

user

The user table looks something like this in MySql.

 id varchar(64) NOT NULL,
 name varchar(255) NOT NULL,
 extra text,
 password varchar(128) DEFAULT NULL,
 enabled tinyint(1) DEFAULT NULL,
 domain_id varchar(64) NOT NULL,
 default_project_id varchar(64) DEFAULT NULL,
 PRIMARY KEY (id),
 UNIQUE KEY `ixu_user_name_domain_id` (`domain_id`,`name`),
 CONSTRAINT `fk_user_domain_id` FOREIGN KEY (`domain_id`) REFERENCES `domain` (`id`)
Operations
  • create_user(user_id, user)
  • delete_user(user_id)
  • update_user(domain_id, user_id, user)
  • get_user(user_id)
  • get_user_by_name(domain_id, name)
  • list_users(domain_id)


From the operations it is evident that the data for user is queried on three columns. i.e. (domain_id, name) and (user_id). Based on this information, the equivalent of this table in Cassandra would consist of two tables. The first one would have (domain_id, name) as primary key and domain_id would be the partition key in this table. The second table would have (user_id) as partition key. All the insert, update and delete for user table goes to these two table.

group

The group table is similar to user except few fields such as password and default_project_id. This table can be designed the same way as user.

user_group_membership

The user_group_membership table looks something like this in MySql.

 user_id varchar(64) NOT NULL,
 group_id varchar(64) NOT NULL,
 PRIMARY KEY (user_id,group_id),
 KEY group_id (group_id),
 CONSTRAINT fk_user_group_membership_user_id FOREIGN KEY (user_id) REFERENCES user (id),
 CONSTRAINT fk_user_group_membership_group_id FOREIGN KEY (group_id) REFERENCES group (id)
Operations
  • add_user_to_group(user_id, group_id)
  • check_user_in_group(user_id, group_id)
  • remove_user_from_group(user_id, group_id)
  • list_users_in_group(group_id)
  • list_groups_for_user(group_id)


There are two types of operations here. One is based on user_id and another is based on group_id. So there will be two tables in Cassandra to store user_group_membership. In the first table user_id would be partition key whereas in the second group_id would be the partition key. All the insert, update and delete go to both the tables in Cassandra.

Assignment

The assignment backend holds about the role assignments. It additionally has a role_backend which stores data for all the roles. There are two tables in relational db for this backend.

  • assignment
  • role

assignment

The table in MySQL

 type enum('UserProject','GroupProject','UserDomain','GroupDomain') NOT NULL,
 actor_id varchar(64) NOT NULL,
 target_id varchar(64) NOT NULL,
 role_id varchar(64) NOT NULL,
 inherited tinyint(1) NOT NULL,
 PRIMARY KEY (type,actor_id,target_id,role_id),
 KEY role_id (role_id),
 KEY ix_actor_id (actor_id)
Operations

There are a lot of operations in this backend. All the operations are not written here for brevity. This table would be modelled with three tables in Cassandra. The first table looks similar to the MySQL table and the partition key would be (type, actor_id). There would be two additional table wherein the partition key would be target_id and role_id. These two tables are needed because when a project or role is deleted, all its role assignments should also be deleted. So index/search by role_id and project_id is needed.