OpenStack and SQLAlchemy

This page describes the ongoing efforts of Mike Bayer, creator/maintainer of SQLAlchemy, Alembic Migrations, and Dogpile caching, to spearhead what is essentially an "OpenStack SQLAlchemy 2.0" effort - as I am now full-time employed by an OpenStack vendor (Red Hat), I've been hired to produce a strong and sustained pass through OpenStack's relational database integration, improving areas that have been problematic and establishing a very strong set of practices that are specific to OpenStack's needs.

Within this effort, I'd like to stress that from my POV, everything is on the table, including new improvements / fixes / features to SQLAlchemy and Alembic directly, migration of projects for which relational databases aren't suitable to non-relational systems, improvement of relational schemas on the OpenStack side, and of course improvement of exising relational patterns within OpenStack applications, for which oslo.db will serve as the foundation.

As of June 30, 2014, I've been on the job for about a month, so below is an outline of most of what I've come across and my thoughts going forward.

My goal here is to share with the community what I've seen and done so far, so that interested parties can chime in with further advice and information, and most importantly to help steer me towards the initial blueprints that I should be starting on, that is, which features are most likely to be accepted by the community as well as which are the top priority. Not to mention, if you're looking for something to do, you can help me implement some of these things!

OpenStack presence within SQLAlchemy's CI environment (a.k.a. "The Version that shall not be Named")

On Monday, June 23, I got up in the AM with two goals: get a release of SQLAlchemy out the door, as it had been several months since the last point release, and then do some work learning how Nova's unit tests run and function.

Release 0.9.5 of SQLAlchemy was not particularly different than any of the other hundred or so point releases I've put out over nine years, it was a little more overdue than some, and out it went. Later that day, as I was learning to run Nova's test suite, I was hitting some failures within the FixedIp/FloatingIp tests for which I couldn't find the cause, assuming that I just wasn't setting up my environment correctly, as Nova's tests are passing for all the Jenkins servers the vast majority of the time.

As we learned within about 30 minutes of my hitting these problems, it turns out 0.9.5 had a bad fix in it; one that didn't fail any of SQLAlchemy's 5000 tests but hit a use case within Nova that was not very unusual; not the kind of use I usually do myself but nonetheless something that was pretty common. IRC began jumping that builds were failing everywhere and it was due to the 0.9.5 release! The bizarre coiincidence of this was that none of the thirty-odd point releases of SQLAlchemy released under OpenStack's lifetime have ever really had this effect, only the major jumps saw any issues.

Luckily, as I had just learned 30 minutes earlier how to run Nova's tests I was able to drill down to exactly what was going on and SQLAlchemy 0.9.6 was out within an hour, but we shall now forever see in requirements.txt, the Version that shall Not Be Named (except in requirements.txt):

SQLAlchemy>=0.7.8,!=0.9.5,<=0.9.99

In response to this I have now added openstack builds to our Jenkins CI system, so far running tests for oslo.db and the SQLAlchemy tests within Nova; I spent a couple of days figuring out how to get everything to build and run against specific versions of SQLAlchemy and Alembic, even versions above the 0.9.99 in all the requirements files as we are set for 1.0 hopefully within the next year. I will be adding in as many other OpenStack projects as I can here so that we will no longer have a SQLAlchemy release that hasn't been run against a battery of basic OpenStack tests.

ORM Woes

After chatting with a decent amount of folks for a month, I can break up the take on OpenStack / SQLAlchemy ORM into three camps:

1. We love SQLAlchemy ORM, we know that we can improve how we use it and we're looking forward to getting there (oslo.db, Keystone)

2. Our app uses a relational database but the SQLAlchemy ORM slows us down, we already have our own "orm-like" system (e.g. APIs that shuttle rows into "values" dictionaries, nova.objects, etc.) so we want to ultimately get rid of the ORM part, and probably continue using SQLAlchemy Core to some degree (some Nova developers I've spoken to).

3. Our application has a SQLAlchemy backend that almost nobody uses in production, most users use MongoDB, and the SQLAlchemy backend is only there because MongoDB is AGPL (Ceilometer, Marconi). We'd like to get a better NoSQL backend without Mongo's licensing issue and do away with relational storage entirely (Marconi).

Here's my observations / thoughts about these three camps:

1. Great! Read on, I have some things we can start with.

2. I agree that a lot of the patterns that I've seen in Nova in particular are not well-suited to the approach used by SQLAlchemy's ORM, however I have observed that the applications still rely a lot on very sophisticated and mature features provided by SQLAlchemy (namely relationship() and it's integration with eager loading and the unit of work features), and throwing it away entirely will incur a lot of reinvention, not just of the features that were lost but also of a totally new crop of stability and performance issues to solve.

To that degree, I'd like to propose a twist to the "we want to write our own ORM" idea (edit: OK nova.objects is really an ORM-like thing on top of their API on top of relational DB, but that's an issue for another day) and say, "let's write our own ORM that's based on the SQLAlchemy ORM primitives", e.g. a hybrid approach where existing ORM logic that is fine can stay that way, ORM logic that has performance or behavioral issues can be reorganized using systems, some of them new, some of them just underutilized, that bypass unneeded SQLAlchemy features on a case-by-case basis.

As a secondary issue, some devs I've spoken to have referred to the nature of OpenStack API-based applications being a part of the problem. This refers to the fact that OpenStack apps like Nova and Neutron expose fine-grained APIs which end up breaking out composed use cases into dozens or hundreds of small API operations, each of which invokes within it's own database transaction and ORM session state. This makes it impossible, without introducing some elaborate offline state management system, for a larger operation to be performed so that larger groups of objects can be loaded and manipulated at once, greatly decreasing the number of database round trips present. No matter what ORM you use or don't use, the pattern of many small round trips is difficult to scale on relational databases.

There are under way projects which aim to apply detailed profiling to OpenStack applications so that a breakdown of performance issues due to API use and/or database access might be plainly reasoned about; see Rally and OS Profiler for examples.

On my end, I have no intention of focusing on this second issue for the time being, I'm going to stick with leaving OS applications as much as they are right now as possible, applying iterative changes to database access code which I hope can optimize to a significant degree within the current usage patterns.

3. Projects with alternate MongoDB / MySQL backends (since after all, MySQL is the database here being rejected, SQLAlchemy is just a Python API) should probably do what they're doing and go to an all key/value model. It is not at all fun to maintain the same datamodel and API against both a KV store as well as relational tables. I had a detailed look at Ceilometer with its team members early on, and while I'm pretty confident that if it were desireable, I could get it's relational backend to compete with the Mongo backend performance-wise, there's no reason to get into this if the Mongo licensing issue can be solved alone.

ORM Quick Wins Proof of Concept

I've built up a proof of concept at nova_poc which seeks to demonstrate the range of database and ORM performance that can be achieved within one particular method of the Nova API. Naturally, I picked an example that is very juicy in this regard; it has a very easily fixable issue with the kind of query it emits that has an enormous 10x impact on its performance (which I feel is worth illustrating just for how dramatic it is), and it then illustrates some new SQLAlchemy extensions that more or less may be going into SQLAlchemy directly, which allow the remaining operation to complete in less than half the time. These extensions can be ported to oslo.db to work against all supported SQLAlchemy versions prior to the official release of these features.

To summarize, we're looking at the API method floating_ip_update(), and I run a series of profiles against a database table 10K rows (e.g. FloatingIp objects), where each FloatingIp refers to a single FixedIp that then refers to an Instance. The profiling focuses on the nature of this method as an API feature all it's own, and simulates the case where the API method is called thousands of times, not unlike other API methods that seek to add some small amount of data each time they are called.

Eager load and Column load tuning

Turning off an eagerload in the _floating_ip_get_by_address() function is the most dramatic win; while removing this method seems like a reasonable fix to remove some unneeded overhead, in profiling it shows itself to be enormous:

Running scenario default
Scenario default, total calls for 1000 operations: 24543047
Scenario default, total time for 10000 operations: 222

Running scenario default_optimized
Scenario default_optimized, total calls for 1000 operations: 1881626
Scenario default_optimized, total time for 10000 operations: 25

While not illustrated in the POC code itself, talk with various OpenStack devs has revealed that we could probably do a lot more individual column fetching as well, that is instead of:

session.query(FixedIp).all()

We say:

session.query(FixedIp.address, FixedIp.host, FixedIp.reserved).all()

By loading indivdual columns, we get read-only objects that are not identity managed, bypassing a huge amount of work that is done on instances both in instantiating them, populating relationships and doing bookkeeping. Any API feature that only returns a simple read-only object or a "values" dictionary that does not rely upon relationship loading should be using this pattern, or similar, by using a Column Bundle. I will be exploring adding the following feature to oslo.db:

session.query(FixedIp.bundle).all()

The above will return a fast-loading, read-only Bundle object that otherwise expresses the full interface of FixedIp. Relationships might be tricky here but I can at least get them on board as regular lazy loads (just like any other ORM would do anyway).

Fast Object Save

We explore replacing the unit of work flush() call used by object.save() within oslo.db-style applications with a more direct "flush single object" call; this grants us only a 12% savings so far, though it is very simple to use:

Running scenario default_optimized
Scenario default_optimized, total calls for 1000 operations: 1881626
Scenario default_optimized, total time for 10000 operations: 25

Running scenario fast_save
Scenario fast_save, total calls for 1000 operations: 1685221
Scenario fast_save, total time for 10000 operations: 22

Update: see https://bitbucket.org/zzzeek/sqlalchemy/issue/3100/sessionflush_object for an updated version of this feature that includes a "bulk save" operation. Inserting 10000 records runs in half the time/function calls with this simple proof of concept.

Baked Queries

Something that has been in the works for a long time and has recently seen lots of work in the past months is the "baked query" feature; this pattern is ideal for OpenStack's "many short queries" pattern, and allows caching of the generation of SQL. Recent versions of this pattern have gotten very slick, and can cache virtually everything that happens Python-wise from the construction of the Query object, to calling all the methods on the query, to the query-objects construction of a Core SQL statement, to the compilation of that statement as a string - all of these steps are removed from the call-graph after the first such call. In SQLAlchemy 1.0 I've also thrown in the construction of column metadata from the result set too. The pattern involves a bit more verbosity to that of constructing a query, where here I've built off of some of the ideas of the Pony ORM to use Python function information as the source of a cache key. A query such as:

result = model_query(
            context, models.FloatingIp, session=session).\
            filter_by(address=address)

would be expressed in "baked" form as:

# note model_query is using the "baked" process internally as well
result = model_query(context, models.FloatingIp, session=session)

result.bake(lambda query:
    query.filter_by(
        address=bindparam('address'))).params(address=address)

In the above form, everything within each lambda is invoked only once, the result of which becomes part of a cached value.

For this slight increase in verbosity, we get an improvement like this:

Running scenario default_optimized
Scenario default_optimized, total calls for 1000 operations: 1881626
Scenario default_optimized, total time for 10000 operations: 25

Running scenario baked
Scenario baked, total calls for 1000 operations: 1052935
Scenario baked, total time for 10000 operations: 16

That is, around a 40% improvement.

Putting together both "fast save" plus "baked" we get down to a full 50% improvement vs. the plain optimized version:

Running scenario fast_save_plus_baked
Scenario fast_save_plus_baked, total calls for 1000 operations: 856035
Scenario fast_save_plus_baked, total time for 10000 operations: 13

13 seconds for 10K operations vs. 222 seconds in the current code, I hope I've established that there is hope!

Migrations

Migrations are a huge deal. Here's where it looks like this is going.

Alembic - "Alembic is the defacto standard going forward" - ultimately we want to get everything SQLAlchemy-Migrate onto Alembic. And in talking with Nova devs, they really like that they can test their migrations against SQLite. So from my POV, SQLite migrations on Alembic, we're doing that! Unless folks think otherwise.

SQLite Support

SQLite has extremely limited support for ALTER TABLE, essentially only being able to add new columns, and that's about it. In order to provide the full suite of ALTER operations that all other databases provide, tools such as SQLAlchemy-Migrate will create a copy of the target table with changes applied, copy data from old table to new, then drop the old table and rename the new one. I've always been peeved at SQLite's insistence on never providing some better pathway to ALTER built in. But it has been on the Alembic roadmap to add SQLite migrations in a style similar to that of Migrate for a long time, emulating this same approach in some way. Alembic will have this!

However in talking with some folks, it appears like some people might actually like this "make a new table and switch it" approach for other databases too, as a way to work around locking of tables. I don't know that I agree with this long term, the "ALTER locks the database!" issue in my experience has always been kind of a MySQL problem alone, but it can be present for other backends as well.

So the API I'm looking towards here is going to try to solve a few problems; one is that, copying the whole table to a new one each time for *every single column* is crazy, and also that the technique we use on SQLite might be nice for other databases too, so here's the API:

with op.batch_alter("mytable", recreate=True) as batch_op:
    batch_op.drop_column("some_column")
    batch_op.alter_column("q", type=Integer)
    batch_op.add_column(Column('y', String(50), nullable=False))

That is, we assign a whole series of column operations under the umbrella of one table, and they are created at once.

The batch_alter() call can even be conditional, dropping into traditional migration behavior for all backends except SQLite and whatever else:

with op.batch_alter("mytable", recreate_on=('sqlite',)) as batch_op:
    batch_op.drop_column("some_column")
    batch_op.alter_column("q", type=Integer)
    batch_op.add_column(Column('y', String(50), nullable=False))

SQlite-compatible alterations is issue #21.

Finishing up Autogenerate

I started writing Alembic long before SQLAlchemy-Migrate had autogenerate. But by the time I got Alembic released and later autogenerate working, Migrate already had it. Here's what we need to finish up in autogenerate:

1. ForeignKeyConstraint change detection - this is an entirely straightforward feature add and has long been on the todo list.

2. MySQL implicit indexes - this is actually done; recent versions of Alembic can navigate around MySQLs goofy production of indexes automatically on foreign key columns, and not accidentally spit them out in autogenerates. It was seriously tough to get index autogen mostly working on all backends, so new issues will continue to be fixed as they are reported.

3. Type comparison - Alembic balks on type comparison by default, because ultimately SQLAlchemy should add comparision features to its type objects natively. However, on Alembic backends like that of MySQL, we can add explicit rules for the very common and annoying ones, namely the detection of MySQL's TINYINT as a SQLAlchemy Boolean. Alembic already allows user-specified rules in this regard, so they can be part of oslo.db in any case, however Alembic can also add this rule and others in directly.

4. Table order - Alembic's autogenerate should be spitting out table creates and drops in the order of foreign key dependency, though we don't have good test coverage for this yet and it might not be working. We'll fix that, no biggie! Alternatively, Alembic could include a mode of operation that includes all the ForeignKeyConstraint objects after all the tables. Input on how we'd like to see this work would be welcome.

New Migration Features

There's a lot of features I'd like to add to Alembic, and if OpenStack has a need for them, that would justify the effort:

1. Full branch support - issue 167 multiple heads / branch resolution support has a detailed architecture laid out for this feature. Lots of people are looking for it. This would turn Alembic's current "linked list" version model into a full directed acyclic graph (DAG). Any particular migration can be dependent on any other group of migrations, or none at all; individual branches can be maintained and upgraded along their path, or automatically merged.

2. Multiple version directories - This would allow migration files to be present in more than one place for a single migrations environment. Currently, you can get this approach by using multiple base directories, but that requires separate env.py files for each. With this issue, cross-dependent migration files can live in multiple places, working nicely with the multiple heads support of #167.

Testing

I spent quite a bit looking into testing. The main thing we can see in testing is that there may be a desire for "transactional testing"; run tests within a fixed schema, that is the schema is created once, many tests are run, each one rolling back their work, the schema is only dropped once all tests for that schema are complete.

Transactional testing is described in the SQLAlchemy docs and in some of my talks; for OpenStack, we need the version that supports graceful handling of tests that actually do a rollback. This is doable and the recipe for that is currently a github gist.

In order to accommodate parallelized runs, the system will create N temporary schemas for N concurrent runs.

I really want to get into this feature but as I talk to people, I'm getting mixed signals on how important this feature is. So far I've only worked with unit tests, which seem to be entirely against SQLite in-memory databases, and I haven't gotten my head around to what extent other databases besides SQLite might be desireable in unit tests, nor have I gotten to fully understand how integration tests run, so input is welcome here.

Improvement of ORM Patterns

Looking around oslo.db and other apps, I want to improve upon some kinds of verbosity I see, which make the code more repetetive and inconsistent, welds together application-level and business level concepts (e.g. transaction control and exception reporting melded with data manipulation) and makes it harder to reason about how the application interacts with sessions, as there are a variety of ad-hoc patterns present.

Connectivity Unification

If there's one thing that bugs me, it's when a large system has five different ways that it gets around to connecting to a database. Oslo.db has a single point of create_engine() call, and that's great, but it isn't used everywhere. Particularly, we need to get projects that use Alembic like Neutron to use a consistent connectivity approach in their env.py files, and additionally test suites need to be able to run on any DB or DBs you want to point them at. There's a lot of hardcoding / multiple code path stuff going on and I want to get in there and try to propose a system that does the whole thing just one way, and in a very flexible and environmentally-open way.

https://bugs.launchpad.net/oslo/+bug/1339206 - oslo.db test suite is too rigidly hardcoded to a small number of potential database backends

Transaction Demarcation

For the oft-repeated "get_session(); with session.begin(subtransactions=True)" phrase and other variants, I'd like to move that into something more abstracted:

def my_api_thing(x):
    with transactional() as session:
    	# do stuff

I want context managers and/or decorators that are oslo.db specific to replace any kind of containing code that deals with transactions or sessions. That way the containinment logic can be maintained in once place and configured separately from the business logic contained within. Projects like Keystone are already using a form more like this; it would be nice to nail down a consistent pattern and get it into oslo.db for everyone to use.

https://review.openstack.org/#/c/125181/ - Make EngineFacade a Facade!

MySQLdb + eventlet = sad

This is the basic thing that happens with these two actors. We're in an eventlet "thread". The code uses MySQLdb to do something ordinary that locks a row in a table - this can happen with a SELECT FOR UPDATE but it really can happen just on any old UPDATE too depending on isolation level (see https://review.openstack.org/#/c/104111/). If the eventlet "thread" then does *anything* that causes it to defer out, another eventlet thread might run, and if that thread tries to work with MySQLdb and lock the same row, you're done, it's a deadlock. The simple reason is, MySQLdb doesn't integrate with eventlet, so a simple thing like blocking on a transactional lock, it doesn't defer. MySQL times out on these so we're getting exceptions instead of permanent freezes, but IMHO this is a really silly problem. Eventlet and MySQLdb do not work together, we're done. MySQLdb 1000% either has to go, or be fixed, even if that means forking it. A good blog post that illustrates both scenarios is here - in this post, the author has used "greenify" to patch MySQLdb's blocking, but without this change (which I'm told isn't compatible with eventlet?), MySQLdb is a nonstarter. Because it is in pure C, it is faster than MySQL-connector when it's been patched, but if it hasn't, it's dog slow. MySQL-connector IMHO is probably the driver we should get behind because it's the official MySQL driver. The various workarounds and hacks (which I guess is why this is under "ORM patterns") I've seen proposed are all technically viable but IMHO totally unnecessary. OpenStack should be serious about this, it's a totally solvable problem, let's just solve it.

Exception Rewriting

Exception rewriting - I notice that the oslo.db pattern includes that DBAPI level exceptions are re-thrown as OpenStack specific; I'd like to move this into handling using SQLAlchemy events. The dbapi_error event goes all the way back to 0.7, and ticket 3076 will allow it to be used to reraise with a new exception. For older SQLAlchemy versions, I can create a wrapper that will allow it to be used in this way safely for older versions, despite the warnings stated on that document.

blueprint: https://review.openstack.org/#/c/105306/

Deleted Object Selecting

There's a need to be able to run queries that automatically de-select for certain rows, particularly rows that have a "deleted" flag set, or that otherwise include various identifiers that are given for a context. Right now oslo.db has a function `model_query()` for this purpose. I'd like to streamline `model_query()` and allow for a more flexible pattern, where instead of plainly adding WHERE clauses to a SELECT against a primary table, we can use rules to produce a SQL transformation that will apply the desired criteria to all "FROM" clauses at the point at which they are named, even within a more complex, composed query. This will allow `model_query()` to be more flexible and ultimately to allow its behavior to be more transparently integrated into the app.

Pessimistic Locking - SELECT FOR UPDATE

This issue has been coming up for some projects, that they make sometimes great use of SQLAlchemy's with_lockmode() method, which ultimately emits SELECT..FOR UPDATE, so that selected rows are aggressively locked within a transaction. The issue is that backends like Galera don't support FOR UPDATE, meaning the locking that is desired is not actually provided.

While I'm not sure that there's a generalized solution to this, in some cases I've been helping developers move to more of an "optimistic" approach, where a particular UPDATE statement targets some kind of timestamp or value counter in the row, then counts the number of rows matched. If the number of rows matched is not what's expected, it is assumed that the UPDATE failed to match the row, and the operation can either try again or can bail out. This approach might not be suitable in all cases. In cases where a large stream of data is affecting a relatively small set of rows, it can perhaps be feasible to serialize those updates by sending them all to a queue, so that they are invoked in isolation without the need for up-front locking. For very high throughput situations, queues can be constructed to use multiple, concurrent UPDATE streams separated via hashing, or can batch together updates for particular rows over a period of time and emit fewer UPDATE statements than if each source of data did it individually.

Encouraging Bundle / Baked Query Use

Encouraging Bundle use - as stated earlier, I want to look into bringing the very efficient Column Bundle pattern into the fold for mainstream use, not to mention the baked query pattern illustrated previously.

Transparent Retries of Transactions

I have ready to go an already in production extension for transaction replaying, developed for a client with the agreement that the extension is open source. This extension installs as an event handler using a new series of hooks that were added in 0.9, or alternatively using a series of compatibility hooks that were backported to 0.7 and 0.8. This extension allows logging of SQL statements as they proceed within a transaction, and allows them to be transparently replayed when a database disconnect error occurs. This means that SQLAlchemy's usual behavior of raising an exception and forcing you to re-try explicitly can be hidden so that it occurs implicitly, without any interruption or special error handling required in application code. The extension includes a lot of configuration options, such as what statements should be replayed and which shouldn't, as well as what exceptions should be handled this way. Transaction replaying is an elaborate feature which might introduce lots of new edge cases, namely that the state of the database may have changed since the last transaction failed, so to that extent I'm a little wary of this feature myself. However I have seen lots of messages alluding to this kind of thing. The mechanics to pull this into SQLAlchemy as a transparent feature are ready to go, if this is a feature people are looking for (I'm not sure if people are looking for this).

Contributors

I would be remiss if I didn't mention that SQLAlchemy, Alembic, Dogpile and everything else I do are collaborative projects, and while SQLAlchemy in particular might present a daunting codebase, we would love to get new collaborators on board! Any of the features above are fair game and I can walk someone through the steps to be taken, and additionally everything we add features for always needs lots of tests and review, if you're looking for a way to get into the project I can bring you right in, there's lots of (often very tedious and boring) work to be done. I'm on all the IRC channels now as zzzeek so ping away.

Coffee and Cake

So! I hope you've enjoyed reading this document, and whether you're a fan or a hater, please know that I'm on board now and I'm here to help, with whatever database kind of thing you want to do - it's all part of my job now. If we wanted to use/build a whole new ORM, I'd be doing that too right alongside. OpenStack's lessons in database integration are now my lessons too. So let me know where I should dig in first! Have a save drive home.

OpenStack and SQLAlchemy

Contents

OpenStack and SQLAlchemy

OpenStack presence within SQLAlchemy's CI environment (a.k.a. "The Version that shall not be Named")

ORM Woes

ORM Quick Wins Proof of Concept

Eager load and Column load tuning

Fast Object Save

Baked Queries

Migrations

SQLite Support

Finishing up Autogenerate

New Migration Features

Testing

Improvement of ORM Patterns

Connectivity Unification

Transaction Demarcation

MySQLdb + eventlet = sad

Exception Rewriting

Deleted Object Selecting

Pessimistic Locking - SELECT FOR UPDATE

Encouraging Bundle / Baked Query Use

Transparent Retries of Transactions

Contributors

Coffee and Cake