Database migrations have historically been a source of considerable downtime when deploying new versions of Openstack services. DB migrations in the past have taken as long as an hour or more on some operators databases.
DB migrations are necessary, so the focus is on how to optimize the process to reduce downtime. The best way to reduce downtime is to try to make as much of the DB migration run "online" as possible. This means while the service is running (usually old code, but sometimes new code).
However, some DB migrations do things that are particularly difficult to execute online leading to downtime.
DB migrations typically fall into two categories: schema migrations and data migrations and the strategy to dealing with them often differs.
Schema migrations are any schema changes that create, drop or rename tables, columns, indexes and foreign keys.
MySQL Version Considerations
Even if services are tolerant of schema changes, the database engine needs to ensure it doesn't block reads/writes. Blocking reads/writes would cause downtime as well.
Not all versions of MySQL are made alike. Older versions (such as 5.1) will block reads/writes for pretty much any schema changes. Newer versions (5.5 and 5.6) are much better and in many cases much faster and/or online (without blocking reads/writes).
It's important that the operator is running a version of MySQL that allows schema changes to be performed online without blocking running services.
FIXME: Fill in matrix of versions and schema changes that block
Data changes are any migrations that change data already in tables. For example, encrypting data in a column or moving data from one table to a newly created table.
As a general rule, strive to make DB migrations run as much as possible online. This means that the DB migrations can run while old (or sometimes new) code is already running. This minimizes the amount of time services are down. Less downtime makes operators and users happy.
Maintain Backwards Compatible Schemas If Possible
Running schema migrations online is the best case, but it can only be done if the new schema is compatible with the running code. That means we should strive to keep schemas backwards compatible.
Things like adding new columns or creating new tables is backwards compatible (mostly, be careful of 'SELECT *' or 'INSERT INTO foo VALUES ()') because old running code simply don't use the new columns or tables.
Avoid Renaming Columns
It's extremely difficult to rename columns safely with services running. Old code simply won't know the new column name and new code must go extremely far to transparently handle using either the old or new names. Renaming columns is usually not necessary for technical reasons, so avoiding it is recommended.
Avoid Changing the Type of Columns
For many of the same reasons avoiding renaming columns, we should avoid changing the type of columns. Creating a new column is recommended.
Try to Incrementally Migrate Data
The volume of data being changed or moved during data migrations can be a significant source of downtime. If possible, try to incrementally migrate data. This would require code to handle both the old and new locations of data and potentially migrate the data on a row by row basis. However, this should be significantly easier than handling renaming or changing the type of columns.