Jump to: navigation, search

Difference between revisions of "Heat/Blueprints/RollingUpdates"

 
Line 24: Line 24:
 
== Design ==
 
== Design ==
  
[in progress]
+
A single new property will be introduced to OS::Heat::[[InstanceGroup]]:
 +
 
 +
[[MetadataUpdatePattern]]:
 +
 
 +
The string argument will be one of "rolling", "canary", or "immediate". If this property is not specified, 'immediate' is assumed.
 +
 
 +
=== rolling ===
 +
 
 +
Metadata updates will be performed on one resource at a time. Any [[WaitCondition]] that depends on the [[InstanceGroup]] will be waited on before continuing to the next resource. Any failure of said [[WaitCondition]] will result in rolling back to the previous Metadata.
 +
 
 +
=== canary ===
 +
 
 +
Identical to rolling except [[WaitCondition]] will be waited on in groups of instances rather than 1 at a time. The progression is 1 instance, then 1%, 5%, 20%, then the remainder.
 +
 
 +
=== immediate ===
 +
 
 +
The new Metadata is copied to all instances immediately without waiting.
  
 
== Implementation ==
 
== Implementation ==
  
This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:
+
Currently an update stack just tries to update the metadata for each resource. To facilitate the rollback capability of rolling/canary upgrades, a new column will be needed in the resource table previous_rsrc_metadata. As machines are updated, their previous metadata will need to be stored so that they can be rolled back. Updates to stacks are already protected by code that will not let another update happen in parallel, so there is no need for any joining to a table of versioned metadata. Once an [[InstanceGroup]] has been updated, all of the previous_rsrc_metadata should be set to NULL.
  
 
=== UI Changes ===
 
=== UI Changes ===
  
Should cover changes required to the UI, or specific UI that is required to implement this
+
N/A
  
 
=== Code Changes ===
 
=== Code Changes ===
  
Code changes should include an overview of what needs to change, and in some cases even the specific details.
+
''TBD''
  
 
=== Migration ===
 
=== Migration ===
  
Include:
+
* Adding a column means a schema change, and so would have to be handled in database migrations.
* data migration, if any
 
* redirects from old URLs to new ones, if any
 
* how users will be pointed to the new way of doing things, if necessary.
 
  
 
== Test/Demo Plan ==
 
== Test/Demo Plan ==
  
This need not be added or completed until the specification is nearing beta.
+
''TBD'''
  
 
== Unresolved issues ==
 
== Unresolved issues ==
 
This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.
 
  
 
== BoF agenda and discussion ==
 
== BoF agenda and discussion ==
 
Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.
 
  
 
----
 
----
 
[[Category:Spec]]
 
[[Category:Spec]]

Revision as of 18:56, 8 February 2013

  • Launchpad Entry: HeatSpec:rolling-updates
  • Created: 07 Feb 2013
  • Contributors: Clint Byrum

Summary

While managing a large group of instances, I may want to roll out changes in topology and/or configuration to a limited percentage of these instances and then wait to see if those initial rollouts produced failures or successes before deploying a larger percentage. This is known as a "canary" deployment strategy, after the old mining practice of carrying a canary in a lantern to test for air quality.

Release Note

Multi-Instance resources may now specify a property which causes them to apply updates using a rolling or canary strategy.

Rationale

With large scale deployments, updating configuration on all machines at once without testing may result in downtime. Being able to control the deployment will lead to more reliability for users who implement it.

User stories

As an operations engineer I want to roll out a change to topology or configuration on a very large resource without the risk of significant downtime or error rates.

Assumptions

Design

A single new property will be introduced to OS::Heat::InstanceGroup:

MetadataUpdatePattern:

The string argument will be one of "rolling", "canary", or "immediate". If this property is not specified, 'immediate' is assumed.

rolling

Metadata updates will be performed on one resource at a time. Any WaitCondition that depends on the InstanceGroup will be waited on before continuing to the next resource. Any failure of said WaitCondition will result in rolling back to the previous Metadata.

canary

Identical to rolling except WaitCondition will be waited on in groups of instances rather than 1 at a time. The progression is 1 instance, then 1%, 5%, 20%, then the remainder.

immediate

The new Metadata is copied to all instances immediately without waiting.

Implementation

Currently an update stack just tries to update the metadata for each resource. To facilitate the rollback capability of rolling/canary upgrades, a new column will be needed in the resource table previous_rsrc_metadata. As machines are updated, their previous metadata will need to be stored so that they can be rolled back. Updates to stacks are already protected by code that will not let another update happen in parallel, so there is no need for any joining to a table of versioned metadata. Once an InstanceGroup has been updated, all of the previous_rsrc_metadata should be set to NULL.

UI Changes

N/A

Code Changes

TBD

Migration

  • Adding a column means a schema change, and so would have to be handled in database migrations.

Test/Demo Plan

TBD'

Unresolved issues

BoF agenda and discussion