Difference between revisions of "Support-retry-with-idempotency"

Revision as of 10:30, 14 November 2013

Background

Currently, Heat doesn't retry API calls when creating/updating/deleting a stack. In case of API request failure, Heat would change the stack status to "XXX_FAILURE" . (Or start rollback process if the rollback flag was true.)

However, I think there are some circumstances/scenarios where API retry is appropriate. (i.e. 503 response or timeout due to server failover)

I believe, API retry function can improve the reliability of Heat's task-processing. Note: Providing a retry capability for the "HEAT API" is out of scope for this proposal.

Our definition of API retry

Out definition of API retry means “Retry for Single API request”. Like this:

“Retry included multiple API request” is out of our discussion. Like this:

The necessity of API retry

We think end-user impact must be reduce maximally. API failure that caused by temporary system problem can avoid end-user impact by using API retry.

Stack creation failed by temporary system problem (e.g. failover, temporary overload).
Do retry few times (max_attempts can define in config).
If retry over happened, Heat would change stack status to CREATE_FAILED, it is same of currently state transition.

Heat can avoid sending ERROR response to end-user if temporary system problem was recovered before API retry-over.

The necessity of idempotency

API retry needs idempotency. Do API retry without idempotency may create duplicated resources. Currently, there is no way to cope this situation.

Do API retry with idempotency will solve the situation. If Heat added ClientToken(IdempotencyToken) to request header, Nova doesn't create duplicated instance.

API retry + Idempotency would be appropriate for API retry- processing.

Retry Policy

HTTP methods

Retry-policy should be defined per each method.

for POST methods

We propose HEAT to support ClientToken when retrying POST method.

POST request from Heat to nova(or others), but Heat couldn't get a response for some reason
Actually, nova(or others) has received the request and created a resource
But, Heat doesn't know the resource id to check the status.
Retry POST request with ClientToken until either receiving a response or it reaches a retry limit. If it reaches a retry limit, the stack status is changed to CREATE_FAILED or a rollback process is started.

PUT methods

We believe, PUT methods naturally have idempotency and thus are safe to retry.

PUT request from Heat to nova(or others), but Heat couldn't get a response for some reason.
Actually, nova(or others) has received the request and updated the resource
Heat doesn't know the result. But Heat already knows the resource id(when it created it)
Retrying PUT request would result in the same status as 2). Thus retry the request until either receiving a response or it reaches a retry limit. If it reaches a retry limit, the stack status is changed to CREATE_FAILED or a rollback process is started.

ClientTokens doesn't need to be used for PUT-retry.

DELETE methods

DELETE methods are not idempotent. However, we can retry DELETE method anyway and see the response to know what happend in the previous request.

DELETE request from Heat to nova(or others), but Heat couldn't get a response for some reason.
Actually, nova(or others) has received the request and deleted the resource
Heat doesn't know the result. But Heat already knows the resource id(when it created it)
Retrying DELETE request would get either of the following response which wolud result in the same status(deleted).

response 20x(almost 204) -> delete action success (deleted)
response 404 -> delete action failed but already deleted (deleted)
Heat can retry DELETE requests until it gets 2xx or 404 response or it reaches a retry limit. If it reaches a retry limit, the stack status is changed to DELETE_FAILED.

ClientTokens doesn't need to be used for DELETE-retry.

GET methods

Same as PUT method.

HTTP responses

Retry-policy should also be defined per HTTP responses.

HTTP response 2xx

No Problem. API retry is not necessary.

HTTP response 4xx (ClientError)

Heat knows that the resource was not created.
The error is not transient.
The request will never succeed in this case.

API retry is not appropriate in this case.

HTTP response 5xx (ServerError)

Heat knows that the resource was not created.
The error may transient in this case.

API retry may solve the problem.

Couldn't get HTTP response

Two different circumstances exist in this case.

HTTP request was lost
- The resource was not created.
HTTP request accepted but HTTP response was lost
- The resource may or may not exist.
- The error may transient in this case. This situation may occur by network switch/server failover or temporary overload.
- API retry may work in these situations.

Heat doesn't know whether the resource exists or not. Therefore, idempotency for API receiver's side (i.e. Nova or other modules) is necessary for "safe API retrying" in this case.

Retry Parameters

Heat already has "Timeout" parameter, we don't need to add a new parameter for this. We want to add the following parameters:

max_attempts (time)
retry_interval (seconds)

[Note] The above might be wrong. "Timeout" parameter in Heat is "Timeout for Stack Creation" and not "Timeout for API call". We need "Timeout for API response waiting", which doesn't exist now. It is necessary for Heat to handle retries.

Retry Parameters Configuration

"max_attemtps" and "retry_interval" should be system wide parameters. The value can be difined based on the system architecture and environment(e.g. estimated duration of server failovers). On the other hand, the time required to create a resource varies by its type and size. "max_attemtps" and "retry_interval" should also be configurable per resources. We propose the parameters to be configurable as follows:

Global parameter max_attempts and retry_interval in heat.conf (mandatory)
max_attempts and retry_interval can set per each resource in heat.conf (optional)
If optional parameters are defined, Heat would use optional parameter
max_attempts and retry_interval cannot be indicated in templates or API request parameters.

Implementation plan

We are going to start implementation after idempotency has been implemented. The necessity of idempotency is under the discussion in Nova project.

@@ Line 49: / Line 49: @@
 Retry-policy should be defined per each method.
-==== POST methods ====
+==== for POST methods ====
-In the current implementation, retrying POST method may result in creating duplicate resources:
+We propose HEAT to support ClientToken when retrying POST method.
 # POST request from Heat to nova(or others), but Heat couldn't get a response for some reason
 # Actually, nova(or others) has received the request and created a resource
 # But, Heat doesn't know the resource id to check the status.
 # Retry POST request with ClientToken until either receiving a response or it reaches a retry limit. If it reaches a retry limit, the stack status is changed to CREATE_FAILED or a rollback process is started.
-We propose HEAT to support ClientToken when retrying POST method.
 ==== PUT methods ====