Jump to: navigation, search

Monasca/Incident Manager

Use Cases

  1. Create a new incident
  2. Display all incidents in Ops Console
  3. Display all open, acknowledged or resolved incidents in Ops Console
  4. Display all open, acknowledged or resolved incidents assigned to a user in Ops Console
  5. Acknowledge an incident in Ops Console
  6. Resolve an incident in Ops Console

Concepts

  • Incidents
    • Incidents are created when an alarm transitions to the ALARM or UNDETERMINED state and are associated with an alarm.
    • Incidents enable alarms to
      • Track status
      • Be assigned to users
      • Commented on by users
    • There are three statuses of an incident
      • OPEN: When an incident is created it is in the OPEN state.
      • ACKNOWLEDGED: When an incident is being worked on it is ACKNOWLEDGED.
      • RESOLVED: When an incident is closed, it is resolved.
    • Some of the concepts around incidents are "borrowed" from PagerDuty. See https://developer.pagerduty.com/documentation/rest/incidents.
  • Alarm
    • There are three states of an alarm
      • OK
      • ALARM
      • UNDETERMINED
  • Alarm state transition event
    • An event that is created by the Threshold Engine when the alarm transitions state.
  • Assignment/Owner
    • The user that the incident is assigned to.
  • Comment
    • A comment on an incident.
  • Actions
    • Similar to alarm definition actions in Monasca, incidents can also have actions which occur when an incident is modified.