Jump to: navigation, search

Difference between revisions of "DBMigrationBestPractices"

Line 18: Line 18:
  
 
== Best Practices ==
 
== Best Practices ==
 +
 +
As a general rule, strive to make DB migrations run as much as possible "online". This means that the DB migrations can run while old (or sometimes new) code is already running. This minimizes the amount of time services are down.
  
 
=== Maintain Backwards Compatible Schemas If Possible ===
 
=== Maintain Backwards Compatible Schemas If Possible ===
  
We want to do as much as possible while code is still running to minimize downtime during deploys. This means doing as much of the DB migrations as possible "online".
+
If possible, maintaining backwards compatible schemas means migrations can occur online. Things like adding new columns or creating new tables is backwards compatible (mostly, be careful of 'SELECT *' or 'INSERT INTO foo VALUES ()') because old running code simply don't use the new columns or tables.
 
 
One way to do this is to maintain backwards compatible schemas. Things like adding new columns or creating new tables is backwards compatible (mostly, be wary of SELECT * use) because old running code simply don't use the new columns or tables.
 
  
 
=== Avoid Renaming Columns ===
 
=== Avoid Renaming Columns ===
  
This cannot be done online. Old code will expect the old column name and new code will expect the new column name. As a result, code cannot be running if a column is renamed.
+
It's extremely difficult to rename columns safely with services running. Old code simply won't know the new column name and new code must go extremely far to transparently handle using either the old or new names. Renaming columns is usually not necessary for technical reasons, so avoiding it is recommended.
 
 
It's theoretically possible to make code tolerant of both the old and new names, but this is extremely complicated because of the need to retry queries. Renaming columns is usually not necessary for technical reasons, so avoiding it is recommended.
 
  
 
=== Avoid Changing the Type of Columns ===
 
=== Avoid Changing the Type of Columns ===
  
For many of the same reasons avoiding renaming columns, we should avoid changing the type of columns.
+
For many of the same reasons avoiding renaming columns, we should avoid changing the type of columns. Creating a new column is recommended.

Revision as of 16:00, 6 February 2014

Overview

Database migrations have historically been a source of considerable downtime when deploying new versions of Openstack services.

Schema Migrations

Schema migrations are any schema changes that create, drop or rename tables, columns, indexes and foreign keys. In some cases, making a schema change will cause the database engine to block reads and writes to the tables affected. Depending on the size of the table and the operation being performed, this can be up to an hour or more.

MySQL Considerations

Not all versions of MySQL are made alike. Older versions (such as 5.1) will block reads/writes for pretty much any schema changes. Newer versions (5.5 and 5.6) are much better and in many cases much faster and/or online (without blocking reads/writes).

FIXME: Fill in matrix of versions and schema changes that block

Data Migrations

Data changes are any migrations that alter data already in tables. For instance, moving data from one table to a newly created table would be a data migration.

Best Practices

As a general rule, strive to make DB migrations run as much as possible "online". This means that the DB migrations can run while old (or sometimes new) code is already running. This minimizes the amount of time services are down.

Maintain Backwards Compatible Schemas If Possible

If possible, maintaining backwards compatible schemas means migrations can occur online. Things like adding new columns or creating new tables is backwards compatible (mostly, be careful of 'SELECT *' or 'INSERT INTO foo VALUES ()') because old running code simply don't use the new columns or tables.

Avoid Renaming Columns

It's extremely difficult to rename columns safely with services running. Old code simply won't know the new column name and new code must go extremely far to transparently handle using either the old or new names. Renaming columns is usually not necessary for technical reasons, so avoiding it is recommended.

Avoid Changing the Type of Columns

For many of the same reasons avoiding renaming columns, we should avoid changing the type of columns. Creating a new column is recommended.