Jump to: navigation, search


Revision as of 09:12, 20 April 2015 by Ajaya Agrawal (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Revised on: 4/20/2015 by Ajaya Agrawal


Presently, majority of OpenStack relies on relational databases for storing state. There are some limitations of such relational databases which might affect the overall scalability and maintainability (operationally) of a deployed cloud. This wiki tries to list down all the properties which NoSQL databases will bring to OpenStack world, along with some of their trade-offs. This wiki will take Cassandra as the reference NoSQL database, but the concepts should apply to most of the key-value store NoSQL databases.


Relational databases are very good at efficiently storing and retrieving relational data. And understandably so, since they are around since a few decades. However, there are some aspects where relational databases fall short:

Horizontal scalability/centralized architecture

One can only scale a relational database vertically, i.e., by buying a bigger server. This work upto a point, but beyond a point, it becomes infeasible. Even with existing database clustering solutions such as Galera, there is no support for partitioning or sharding data such that the total data is distributed on different servers, allowing them to scale.

NoSQL databases are designed with horizontal scalability as one of the primary goals. These databases distribute data of a table to multiple database servers, based on a hashing or sharding scheme. Moreover, they also come with tooling to add or delete nodes at a later point of time, and distribute the load evenly accordingly.


Since for RDBMS (Relational databases) all of the data resides on one node, availability gets affected in case of node failure. Mitigation of this issue is generally done by setting up a 'cluster'. A cluster can be active-active (meaning data is synchronously replicated) or an active-passive (master-slave, where data from master is asynchronously replicated to the slaves.

A synchronously replicated, active-active clustering solution like Galera has it's own problems. Synchronous replication means data has to be committed on all of the nodes in the cluster before the request is returned to the caller. This causes deadlock issues (which are more accurately race conditions). Also, such kind of replication is not viable if nodes are spread geographically, due to the synchronous nature. With master slave configuration, as the slave is generally behind, be it by a fraction of seconds, data-loss is a possibility when master goes down and a slave is promoted to master.

In NoSQL databases like Cassandra, one can specify replication count per table, so that data is available all the time. One can specify consistency level while writing/reading such that even if some nodes are down of such a NoSQL cluster, all the operations with the database will succeed.

Data center awareness

Cassandra has DC awareness (TODO: elaborate)

Problem with OpenStack API

Openstack is backed by relational database from the very beginning. This has led to APIs which assume a relational database in the backend. A very good example of this is "sorting by arbitrary columns in listing resources". Concretely, for 'list all instances', 'list all images', and 'list all volumes' APIs, one can sort the results based on ANY attribute of an instance, image or volume (e.g. creation time, size, flavor). Due to the very nature of distributed databases, sorting based on an arbitrary key is very expensive, and is not directly supported. This feature is expensive in relational databases too, but since all data reside on one node, the database APIs still support it despite it involving a full-table scan.

The logical plausible solution would be, to make sorting based on any arbitrary keys an 'extension' of OpenStack projects, thereby making it an optional feature. This way, people using NoSQL will not have this extension, and as a result, feature available for their customers, while RDBMS-backed deployments can have.

Some basics of schema design in Cassandra

Cassandra is very similar to relational databases when it comes to tables and columns. First a table has to be created along with the columns and types before data can be inserted into it. Cassandra has a query language called CQL(Cassandra query language) similar to SQL. There is a notion of partition key in Cassandra which is similar to primary key in relational database. The table is partitioned across nodes based on the hash value of the partition key in Cassandra. The primary difference between Cassandra and relational db comes in the queries that can be answered. In relational database the WHERE clause can have arbitrary column whereas in Cassandra it must have the partition key. The query could have other conditions along with the partition key in Cassandra. This is because Cassandra needs to know the partition in which the row or data resides before it can answer queries on it. Rows in a single node which have same partition key can be sorted based on something called clustering key. Partition key and clustering key uniquely identify a row in Cassandra and these two together form primary key of a table in Cassandra.

There is a major difference between relational db and Cassandra when it comes to index. In relational db, index can be created on arbitrary columns which is a B-tree internally in the database and then arbitrary queries(sort, =, <, >) on that column can be answered. But in Cassandra index can not be created on any column. There is an implicit index on the partition key and queries can only be answered on that index which have a equality(=) condition on the partition key. If index on other column is required then a separate table has to be built manually and maintained in the application. That mean when inserting a row, this row should be inserted in two tables. Cassandra would support index out of the box from 3.0.0 release according to the people working on it. Then there would be no need of maintaining a separate table for index.