Difference between revisions of "NovaInstanceActions"
m (Text replace - "__NOTOC__" to "") |
m (Text replace - "NovaSpec" to "NovaSpec") |
||
Line 1: | Line 1: | ||
− | * '''Launchpad Entry''': | + | * '''Launchpad Entry''': NovaSpec:instance-actions |
* '''Created''': 5 Nov 2012 | * '''Created''': 5 Nov 2012 | ||
* '''Contributors''': Andrew Laski, Johannes Erdfelt | * '''Contributors''': Andrew Laski, Johannes Erdfelt |
Revision as of 23:31, 17 February 2013
- Launchpad Entry: NovaSpec:instance-actions
- Created: 5 Nov 2012
- Contributors: Andrew Laski, Johannes Erdfelt
Contents
Summary
Create a new instance_actions table, and API extension to access it. This would provide a mechanism for better error reporting, and provide users insight into what has been done with their instance.
Rationale
Currently the API only reports asynchronous errors if the instance is in an ERROR state, however not all problems that occur are fatal, or unrecoverable. But we're currently left with a choice between setting the instance to ERROR and being able to communicate the error to a user or continuing/recovering and hiding the error information. Some examples of this would be:
- A changePassword failure sets an instance to ERROR when this is not fatal to
the running of the instance. Changing that behavior would hide the error information.
- For some cases the xenapi driver logs agent failures and continues, which
can leave a user wondering why their instance isn't configured properly and no way for them to retrieve that information.
- Resize failures could possibly be made to automatically revert to the
original instance, but this would leave users no indication of why that happened.
The API extension would provide a mechanism for retrieving all actions taken on an instance, and would include error information no matter what VM state is set. This would improve visibility into what has happened to an instance, both normal operations and errors that have occurred. This also opens up future work to only set a VM ERROR state if there's a VM error, not a task error.
Design
New table
A new table `instance_actions` that contained the following:
- start time of action
- finished time of action
- action
- instance_uuid
- request_id
- user_id
- service_id
- result, success or failure
- short message of error, if applicable
- traceback of error, if applicable
API extension
Add a new resource GET /servers/<server_id>/actions
{
- "actions": [
- {
- "action": "",
- "started_at": "",
- "finished_at": "",
- "request_id": "",
- "user_id": "",
- "service": "",
- "result": "",
- "message": ""
- },
- {...}
- ]
}
Unresolved Issues
The initial thinking is that only user initiated actions will be recorded, but this leaves an edge case of automatic resize confirmations. Should these be considered a user intiated action for the purposes of this recording?
References
- https://etherpad.openstack.org/grizzly-error-handling-recovery
- https://bugs.launchpad.net/nova/+bug/1061062
- https://bugs.launchpad.net/nova/+bug/1061024