Keystone-schema-in-cassandra

Some basics of schema design in Cassandra

Cassandra is very similar to relational databases when it comes to tables and columns. First a table has to be created along with the columns and types before data can be inserted into it. Cassandra has a query language called CQL(Cassandra query language) similar to SQL. There is a notion of partition key in Cassandra which is similar to primary key in relational database. The table is partitioned across nodes based on the hash value of the partition key in Cassandra. The primary difference between Cassandra and relational db comes in the queries that can be answered. In relational database the WHERE clause can have arbitrary column whereas in Cassandra it must have the partition key. The query could have other conditions along with the partition key in Cassandra. This is because Cassandra needs to know the partition in which the row or data resides before it can answer queries on it.

The Cassandra tables for each of the backend in Keystone is described below.

Identity

The identity backend of Keystone holds data for users, groups and user-group membership. There are three tables in relational db for holding this data.

user
group
user_group_membership

user

The table for user looks something like this in MySql.

CREATE TABLE `user` (

 `id` varchar(64) NOT NULL,
 `name` varchar(255) NOT NULL,
 `extra` text,
 `password` varchar(128) DEFAULT NULL,
 `enabled` tinyint(1) DEFAULT NULL,
 `domain_id` varchar(64) NOT NULL,
 `default_project_id` varchar(64) DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `ixu_user_name_domain_id` (`domain_id`,`name`),
 CONSTRAINT `fk_user_domain_id` FOREIGN KEY (`domain_id`) REFERENCES `domain` (`id`)

)

Operations

create_user(user_id, user)

delete_user(user_id)

update_user(domain_id, user_id, user)

get_user(user_id)

get_user_by_name(domain_id, name)

list_users(domain_id)

From the operations it is evident that the data for user is queried on three columns. i.e. (domain_id, name) and (user_id). Based on this information, the equivalent of this table in Cassandra would consist of two tables. The first one would have (domain_id, name) as primary key and domain_id would be the partition key in this table. The second table would have (user_id) as partition key. All the insert, update and delete for user table goes to these two table.