StackRetryOption

API to continue stack operations after failure

There needs a convenient mechanism to issue a retry request for provisioning a stack in Heat.

Problem description

A stack provisioning can fail due to internal or external circumstances. Internal circumstances would be Heat engine going down while provisioning a stack or other error conditions happening inside Heat engine that aborts the stack provisioning. External circumstances would be nova services not reachable/available, user quota exceeding, other infrastructure related issues etc..

When Heat encounters these issues which could be rectified manually it aborts the stack operation, but doesn't provide a way to retry the same operation without issuing another update to stack.

As of now a user has to issue another update request or use the abandon/adopt APIs. Abandon and adopt are cumbersome and time consuming and not stable in Heat. Another way is to update the current stack, but it could lead to accidental updates to the stack due to inadvertently picking wrong template or parameters/environment. The user needs to find the parameters/environment and template issued in the previous request which failed. It is far more convenient to ask Heat to continue the stack provisioning/deletion from where it left, given that Heat already has the template and parameters and/or environment.

Proposed change

There should be a mechanism to retry/continue the stack operation when it has failed. If user can rectify the problem manually or automatically, he/she should be able to issue the retry/continue operation. Since the template + environment is already available in Heat database, there are no chances of any accidental updates to stack and the provisioning can restart from where it failed.

The CLI would look like: $ heat stack-create --retry my-stack $ heat stack-update --retry my-other-stack

Implementation

There will be new options in the CLIs and new parameters in the APIs to create, delete, update, suspend resume and snapshot APIs to continue the stack operation from where it failed.

Assignee(s)

Primary assignee:

   anant.patil@hp.com

Milestones

Target Milestone for completion: Kilo-3

Work Items

1. Change the APIs to have a new parameter.

2. Change the CLIs to have a new option.

3. Make changes to Heat engine to retry the request by internally issuing an update to the stack or restart the operation by looking in DB for failed resources and restarting from there.

Dependencies

None.