Jump to: navigation, search

Difference between revisions of "Monasca/Incident Manager"

(Blanked the page)
 
(142 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Use Cases ==
 
# Create a new incident
 
# Display all incidents in Ops Console
 
# Display all open, acknowledged or resolved incidents in Ops Console
 
# Display all open, acknowledged or resolved incidents assigned to a user in Ops Console
 
# Acknowledge an incident in Ops Console
 
# Resolve an incident in Ops Console
 
  
== Concepts ==
 
* Incidents
 
** Incidents are created when an alarm transitions to the ALARM or UNDETERMINED state and are associated with an alarm.
 
** Incidents enable alarms to
 
*** Track status
 
*** Be assigned to users
 
*** Commented on by users
 
** There are three statuses of an incident
 
*** OPEN: When an incident is created it is in the OPEN state.
 
*** ACKNOWLEDGED: When an incident is being worked on it is ACKNOWLEDGED.
 
*** RESOLVED: When an incident is closed, it is resolved.
 
** Some of the concepts around incidents are "borrowed" from PagerDuty. See https://developer.pagerduty.com/documentation/rest/incidents.
 
* Alarm
 
** There are three states of an alarm
 
*** OK
 
*** ALARM
 
*** UNDETERMINED
 
* Alarm state transition event
 
** An event that is created by the Threshold Engine when the alarm transitions state.
 
* Assignment/Owner
 
** The user that the incident is assigned to.
 
* Comment
 
** A comment on an incident.
 
* Actions
 
** Similar to alarm definition actions in Monasca, incidents can also have actions which occur when an incident is modified.
 
 
== Incident Lifecycle ==
 
This section describes the lifecycle of an incident.
 
 
Alarm state transition events are processed as follows:
 
# To ALARM
 
## Open a new incident for the supplied alarm, or adds an alarm state transition event to an existing incident.
 
### If an incident doesn't exist for the alarm, or the status of the incident has been RESOLVED, a new incident is created with the incident status as OPEN.
 
### If there exists an incident with a status of OPEN or ACKNOWLEDGED for the alarm, the alarm state transition event is added to the existing incident, and the status is not modified.
 
# To OK
 
## Adds an alarm state transition event to an existing incident.
 
### If an incident doesn't exist for the alarm, or the status of the incident has been RESOLVED, nothing is done.
 
### If there exists an incident with a status of OPEN or ACKNOWLEDGED for the alarm, the alarm state transition event is added to the existing incident, and the status is not modified.
 
# To UNDETERMINED
 
## Open a new incident for the supplied alarm, or adds an alarm state transition event to an existing incident.
 
### If an incident doesn't exist for the alarm, or the status of the incident has been RESOLVED, a new incident is created with the incident status as OPEN.
 
### If there exists an incident with a status of OPEN or ACKNOWLEDGED for the alarm, the alarm state transition event is added to the existing incident, and the status is not modified.
 
 
Acknowledge incident
 
# Modify the incident to ACKNOWLEDGED.
 
# If an incident is acknowledged, it won't generate any additional notifications, even if it receives new alarm state transition events.
 
 
Resolve incident
 
# Modify the incident to RESOLVED.
 
# If an incident is resolved, it won't generate any additional notifications.
 
 
Assign or reassign incidents are processed as follows:
 
# When an incident is created it is initially unassigned. It can then be assigned or reassigned later.
 

Latest revision as of 15:14, 24 April 2015