Jump to: navigation, search

Difference between revisions of "NoDowntimeDBMigrations"

(Proposed Solution)
(Problem Description)
Line 1: Line 1:
 
=== Problem Description ===
 
=== Problem Description ===
  
Database migrations in Openstack currently require services are stopped before the migrations are run. This is because code currently assumes a fixed schema.
+
In the past, some migrations have shown to take a significant amount of time run on large installations, causing unacceptable amounts of downtime during code upgrades.
  
In the past, some migrations have shown to take a significant amount of time run on large installations, causing unacceptable amounts of downtime.
+
As Openstack installations continue getting larger and larger, these long downtimes will continue getting worse and worse. If an Openstack provider guarantees 99.99% uptime, that leaves only 4 minutes total per month for all downtime (planned or unplanned). Some database migrations in the past have taken hours to run on large installations.
  
There are generally two causes of downtime during the migrations:
+
==== Schema Changes ====
# Database engine limitations during schema changes
 
# Data needs to be migrated from one format to another
 
  
 +
MySQL will often grab a table lock during the entire take it takes to execute a schema change. This will block all reads and writes during the period the table lock is held. The length of time the lock is held often depends on the number of rows in the table.
  
As Openstack installations continue getting larger and larger, these long downtimes will continue getting worse and worse. If an Openstack provider guarantees 99.99% uptime, that leaves only 4 minutes total per month for all downtime (planned or unplanned). Some database migrations in the past have taken hours to run on large installations.
+
When MySQL will take a table lock depends on the version of the daemon and what underlying storage engine is used.
 +
 
 +
PostgreSQL is much better but will also grab a table lock in some cases.
 +
 
 +
==== Schema and Data Format Expectations ====
 +
 
 +
Code in Openstack currently expects a fixed schema and data format. All schema changes and changes to data format are required to be completed after Openstack has been stopped and before the new version is running.
  
 
=== Proposed Solution ===
 
=== Proposed Solution ===

Revision as of 16:49, 23 August 2013

Problem Description

In the past, some migrations have shown to take a significant amount of time run on large installations, causing unacceptable amounts of downtime during code upgrades.

As Openstack installations continue getting larger and larger, these long downtimes will continue getting worse and worse. If an Openstack provider guarantees 99.99% uptime, that leaves only 4 minutes total per month for all downtime (planned or unplanned). Some database migrations in the past have taken hours to run on large installations.

Schema Changes

MySQL will often grab a table lock during the entire take it takes to execute a schema change. This will block all reads and writes during the period the table lock is held. The length of time the lock is held often depends on the number of rows in the table.

When MySQL will take a table lock depends on the version of the daemon and what underlying storage engine is used.

PostgreSQL is much better but will also grab a table lock in some cases.

Schema and Data Format Expectations

Code in Openstack currently expects a fixed schema and data format. All schema changes and changes to data format are required to be completed after Openstack has been stopped and before the new version is running.

Proposed Solution

Schema Changes

Database engines will often obtain table locks during schema changes causing all reads and writes to block for however long it takes the schema change to occur. This often is related to the total number of rows in the table.

When MySQL/MariaDB will grab a table lock will vary depending on the version of the software and the underlying storage engine.

Handling Multiple Schemas =

Embrace the Expand/Contract pattern for database migrations. This pattern splits database migrations into three parts:

  1. Expand schema (adding new columns/tables/indexes)
  2. Migrate data
  3. Contract schema (removing unused columns/tables/indexes)


Code would migrate data on load when the service is running. Optionally, a background task can migrate all data at whatever speed specified. When all data that needs to be migrated has been migrated, the contraction can be run which will remove unused columns.

This decouples the database schema from the code allowing the two to be updated independently and allowing the service to continue running transparently while the data is migrated.