Jump to: navigation, search

Difference between revisions of "Monasca/Message Schema"

(Created page with "{| class="wikitable" |- ! Message !! Produced By !! Consumed By !! Kafka Topic !! Description |- | Metric || API, Transform and Aggregation Engine || Persister, Threshold Engi...")
 
m (Roland Hochmuth moved page Monasca/Monasca Message Schema to Monasca/Message Schema)
 
(59 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
== Overview ==
 +
Monasca supports two primary classification of messages that are published and consumed via the MessageQ:
 +
 +
# Value objects, such as metrics and events, that are sent to the API from some external data source, such as the Monasca Agent.
 +
# Domain events that represent a change to some entity in the system as follows:
 +
## An alarm definition being created, updated or deleted by the API.
 +
## An alarm being updated or deleted by the API.
 +
## An alarm transitioning in state by the Threshold Engine.
 +
 +
Currently, the message formats that are published and consumed by Monasca have been developed for internal use only. The messages are in a JSON format that was largely developed without a lot of consideration for supporting external or third-party components, applications and services. While the current message format is OK for internal and easy to by external applications, there is a risk to them in doing so.
 +
 +
Consequently, we are in the process of reviewing the messages used by Monasca and creating a formal message schema specification that is used by Monasca, such that external third-party components, applications and services outside of the Monasca system can easily publish and consume from the MessageQ without being concerned about future changes. We plan to define the message schema and treat it as an external specification such that third-parties can work with Monasca without being concerned about future changes. One use case for doing this involve enabling third-party data analytics applications.
 +
 +
The current messages that are published and consumed by various components in the Monasca system via the MessageQ are as follows:
 +
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! Message !! Produced By !! Consumed By !! Kafka Topic !! Description
 
! Message !! Produced By !! Consumed By !! Kafka Topic !! Description
|-
 
| Metric || API, Transform and Aggregation Engine || Persister, Threshold Engine || metrics || A metric sent to the API or created by the Transform Engine, Event Engine or Anomaly Engine is published to the MessageQ.
 
 
|-
 
|-
 
| AlarmDefinitionCreatedEvent || API || Threshold Engine || events || When an alarm definition is created by the API an AlarmDefinitionCreatedEvent is published to the MessageQ.
 
| AlarmDefinitionCreatedEvent || API || Threshold Engine || events || When an alarm definition is created by the API an AlarmDefinitionCreatedEvent is published to the MessageQ.
Line 10: Line 23:
 
|-
 
|-
 
| AlarmDefinitionUpdatedEvent || API || Threshold Engine || events || When an alarm definition is updated by the API an AlarmDefinitionUpdatedEvent is published to the MessageQ.
 
| AlarmDefinitionUpdatedEvent || API || Threshold Engine || events || When an alarm definition is updated by the API an AlarmDefinitionUpdatedEvent is published to the MessageQ.
|-
 
| AlarmCreatedEvent || Threshold Engine ||  || events || When an alarm is created by the Threshold Engine an AlarmCreatedEvent is published to the MessageQ.
 
 
|-
 
|-
 
| AlarmDeletedEvent || API || Threshold Engine || events || When an alarm is deleted by the API an AlarmDeletedEvent is published to the MessageQ.
 
| AlarmDeletedEvent || API || Threshold Engine || events || When an alarm is deleted by the API an AlarmDeletedEvent is published to the MessageQ.
Line 17: Line 28:
 
| AlarmUpdatedEvent || API || Threshold Engine || events || When an alarm is updated by the API an AlarmUpdatedEvent is published to the MessageQ.
 
| AlarmUpdatedEvent || API || Threshold Engine || events || When an alarm is updated by the API an AlarmUpdatedEvent is published to the MessageQ.
 
|-
 
|-
| AlarmStateTransitionedEvent || Threshold Engine || Notification Engine, Persister || alarm-state-transitions || When an alarm transitions from the OK to Alarmed, Alarmed to OK, ..., this event is published to the MessageQ and persisted by the persister and processed by the Notification Engine. The Monitoring API can query the history of alarm state transition events.
+
| AlarmStateTransitionedEvent || Threshold Engine || Notification Engine, Persister || alarm-state-transitions || When an alarm transitions state, such as from the OK to Alarm or Alarm to OK, this an AlarmStateTransitionedEvent is published to the MessageQ and persisted by the persister and processed by the Notification Engine. The API can query the history of AlarmStateTransitionedEvent.
 +
|-
 +
| AlarmNotification || Notification Engine || Persister || alarm-notifications || Currently unsupported This event is published to the MessageQ when the Notification Engine processes an alarm and sends a notification. The alarm notification is persisted by the Persister and can be queried by the API. The database maintains a history of all events.
 +
|-
 +
| Metric || API, Transform and Aggregation Engine || Persister, Threshold Engine || metrics || A metric sent to the API or created by the Transform Engine, Event Engine or Anomaly Engine is published to the MessageQ.
 
|-
 
|-
| AlarmNotification || Notification Engine || Persister || alarm-notifications || This event is published to the MessageQ when the notification engine processes an alarm and sends a notification. The alarm notification is persisted by the Persister and can be queried by the Monitoring API. The database maintains a history of all events.
+
| Event || API, Transform Engine, Event Engine || Persister, Event Engine || raw-events, transformed-events || An event sent to the API or created by the Transform Engine or Event Engine or is published to the MessageQ.
 
|}
 
|}
 +
 +
== Alarm Definition Events==
 +
Alarm Definition Events are created when an alarm definition is created, deleted or updated by the API. These events are meant to be primarily consumed by the Threshold Engine to maintain a synchronized in-memory model of the alarm definitions in the system, without having to query or poll the Config Database. When the Threshold Engine starts up it queries the Config Database for all the alarm definitions to initialize it's internal model of alarm definitions. Subsequently, the Threshold Engine consumes AlarmDefinitionEvents to update it's internal model of the alarm definitions.
 +
 +
We should consider renaming the members. For example, AlarmDefinitionCreatedEvent has member alarmName.
 +
 +
All of the AlarmDefinition*Event currently only hold the data needed by the Threshold Engine. They should be updated to have all the data about what changed, for example the actions
 +
 +
=== AlarmDefinitionCreatedEvent ===
 +
* String tenantId
 +
* String alarmDefinitionId
 +
* String alarmName
 +
* String alarmDescription
 +
* String alarmExpression
 +
* Map<String, AlarmSubExpression> alarmSubExpressions
 +
* List<String> matchBy
 +
 +
==== Example ====
 +
 +
{
 +
      "alarm-definition-created": {
 +
          "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
 +
          "alarmDefinitionId": "0fe0b88f-8b06-459a-8bed-50d114c4f07b",
 +
          "alarmName": "example-alarm-definition",
 +
          "alarmDescription": "example-alarm-definition-description",
 +
          "alarmExpression": "max(cpu.user_perc) > 100",
 +
          "alarmSubExpressions": {
 +
              "7b790964-67de-4f70-b625-cca5da0119d3": {
 +
                  "function": "MAX",
 +
                  "metricDefinition": {
 +
                      "name": "cpu.user_perc",
 +
                      "dimensions": {
 +
                          "component": "monasca-agent",
 +
                          "service": "monitoring",
 +
                          "hostname": "devstack"
 +
                  },
 +
                  "operator": "GT",
 +
                  "threshold": 100,
 +
                  "period": 60,
 +
                  "periods": 1,
 +
                  "expression": "max(cpu.user_perc) > 100.0"
 +
              }
 +
          },
 +
          "matchBy": [
 +
              "hostname"
 +
          ]
 +
      }
 +
  }
 +
 +
=== AlarmDefinitionDeletedEvent ===
 +
* String alarmDefinitionId
 +
* Map<String, MetricDefinition> subAlarmMetricDefinitions
 +
 +
==== Example ====
 +
 +
  {
 +
        "alarm-definition-deleted": {
 +
            "alarmDefinitionId": "19afaacf-35ad-44c0-a02f-96a0a20d6e6d",
 +
            "subAlarmMetricDefinitions": {
 +
                "c5fa8686-597b-420b-97ed-648637c1f702": {
 +
                    "name": "cpu.user_perc",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "hostname": "devstack"
 +
                    }
 +
                }
 +
            }
 +
        }
 +
    }
 +
 +
=== AlarmDefinitionUpdatedEvent ===
 +
* String tenantId
 +
* String alarmDefinitionId
 +
* String alarmName
 +
* String alarmDescription
 +
* String alarmExpression
 +
* String severity
 +
* List<String> matchBy
 +
* boolean alarmActionsEnabled
 +
* Map<String, AlarmSubExpression> oldAlarmSubExpressions
 +
* Map<String, AlarmSubExpression> changedSubExpressions
 +
* Map<String, AlarmSubExpression> unchangedSubExpressions
 +
* Map<String, AlarmSubExpression> newAlarmSubExpressions
 +
 +
==== Example ====
 +
 +
  {
 +
      "alarm-definition-updated": {
 +
          "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
 +
          "alarmDefinitionId": "0fe0b88f-8b06-459a-8bed-50d114c4f07b",
 +
          "alarmName": "example-alarm-definition",
 +
          "alarmDescription": "example-alarm-definition-description",
 +
          "alarmExpression": "max(cpu.user_perc) > 100",
 +
          "severity": "HIGH",
 +
          "matchBy": [
 +
              "hostname"
 +
          ],
 +
          "alarmActionsEnabled": true,
 +
          "oldAlarmSubExpressions": {},
 +
          "changedSubExpressions": {},
 +
          "unchangedSubExpressions": {
 +
              "7b790964-67de-4f70-b625-cca5da0119d3": {
 +
                  "function": "MAX",
 +
                  "metricDefinition": {
 +
                      "name": "cpu.user_perc",
 +
                      "dimensions": {
 +
                          "component": "monasca-agent",
 +
                          "service": "monitoring",
 +
                          "hostname": "devstack"
 +
                      }
 +
                  },
 +
                  "operator": "GT",
 +
                  "threshold": 100,
 +
                  "period": 60,
 +
                  "periods": 1,
 +
                  "expression": "max(cpu.user_perc) > 100.0"
 +
              }
 +
          },
 +
          "newAlarmSubExpressions": {}
 +
      }
 +
  }
 +
 +
== Alarm Events ==
 +
Alarms are created by the Threshold Engine in response to when there are new metrics sent to the system that match an alarm definition. Currently, AlarmCreatedEvents are not created by the Threshold Engine when this occurs. Note, alarms are not created by the API.
 +
 +
When alarms are deleted or updated by the API an AlarmDeletedEvent and AlarmUpdatedEvent is created. AlarmDeletedEvents and AlarmUpdateEvents are primarily meant to be consumed by the Threshold Engine to delete or update it's internal in-memory model of alarms when they are updated or deleted by the API.
 +
 +
=== AlarmCreatedEvent ===
 +
The AlarmCreatedEvent used to be published by the API when a new Alarm was created. After we added the alarm definition feature alarms were no longer created by the API, but were created by the Threshold Engine instead. However, the Threshold Engine does not publish the AlarmCreatedEvent as it was the only consumer of it. We should probably consider publishing an AlarmCreatedEvent in the Threshold Engine in the future..
 +
 +
* String tenantId
 +
* String alarmId
 +
* String alarmName
 +
* String alarmExpression
 +
* Map<String, AlarmSubExpression> alarmSubExpressions
 +
 +
==== Example ====
 +
 +
  {
 +
      "metric": {
 +
          "name": "cpu.user_perc",
 +
          "timestamp": 1418844000,
 +
          "value": 100
 +
      },
 +
      "meta": {
 +
          "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
 +
          "region": "useast"
 +
      },
 +
      "creation_time": 1418844001
 +
  }
 +
 +
=== AlarmDeletedEvent ===
 +
* String tenantId
 +
* String alarmId
 +
* List<MetricDefinition> alarmMetrics
 +
* String alarmDefinitionId
 +
* Map<String, AlarmSubExpression> subAlarms
 +
 +
==== Example ====
 +
 +
 +
    {
 +
        "alarm-deleted": {
 +
            "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
 +
            "alarmId": "04def1c4-b6ee-4d05-8da8-39559a8e2a9e",
 +
            "alarmMetrics": [
 +
                {
 +
                    "name": "net.in_errors",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "device": "eth0",
 +
                        "hostname": "mini-mon"
 +
                    }
 +
                },
 +
                {
 +
                    "name": "net.out_errors",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "device": "eth0",
 +
                        "hostname": "mini-mon"
 +
                    }
 +
                },
 +
                {
 +
                    "name": "net.in_errors",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "device": "eth1",
 +
                        "hostname": "mini-mon"
 +
                    }
 +
                },
 +
                {
 +
                    "name": "net.out_errors",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "device": "eth1",
 +
                        "hostname": "mini-mon"
 +
                    }
 +
                }
 +
            ],
 +
            "alarmDefinitionId": "64938145-91de-470f-85b5-2f03730c8560",
 +
            "subAlarms": {
 +
                "df171878-3bcf-4d02-9cbc-d7a7fe2f9965": {
 +
                    "function": "MAX",
 +
                    "metricDefinition": {
 +
                        "name": "net.in_errors",
 +
                        "dimensions": {}
 +
                    },
 +
                    "operator": "GT",
 +
                    "threshold": 5,
 +
                    "period": 60,
 +
                    "periods": 1,
 +
                    "expression": "max(net.in_errors) > 5.0"
 +
                },
 +
                "c5f550a6-0382-4a74-969b-4c92b73e0446": {
 +
                    "function": "MAX",
 +
                    "metricDefinition": {
 +
                        "name": "net.out_errors",
 +
                        "dimensions": {}
 +
                    },
 +
                    "operator": "GT",
 +
                    "threshold": 5,
 +
                    "period": 60,
 +
                    "periods": 1,
 +
                    "expression": "max(net.out_errors) > 5.0"
 +
                }
 +
            }
 +
        }
 +
    }
 +
 +
=== AlarmUpdatedEvent ===
 +
* String alarmId
 +
* String tenantId
 +
* String alarmDefinitionId
 +
* List<MetricDefinition> alarmMetrics
 +
* Map<String, AlarmSubExpression> subAlarms
 +
* AlarmState alarmState
 +
* AlarmState oldAlarmState
 +
 +
==== Example ====
 +
  {
 +
      "alarm-updated": {
 +
          "alarmId": "2a913e31-36a0-4ef7-88c0-4aaa5392273b",
 +
          "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
 +
          "alarmDefinitionId": "fef92344-7c20-4c83-ae47-e613224b3503",
 +
          "alarmMetrics": [
 +
              {
 +
                  "name": "monasca.collection_time_sec",
 +
                  "dimensions": {
 +
                      "component": "monasca-agent",
 +
                      "service": "monitoring",
 +
                      "hostname": "mini-mon"
 +
                  }
 +
              }
 +
          ],
 +
          "subAlarms": {
 +
              "82c23ead-352b-42a9-aa7f-dff3bf5c30f8": {
 +
                  "function": "AVG",
 +
                  "metricDefinition": {
 +
                      "name": "monasca.collection_time_sec",
 +
                      "dimensions": {
 +
                          "component": "monasca-agent",
 +
                          "service": "monitoring",
 +
                          "hostname": "mini-mon"
 +
                      }
 +
                  },
 +
                  "operator": "GT",
 +
                  "threshold": 5,
 +
                  "period": 60,
 +
                  "periods": 3,
 +
                  "expression": "avg(monasca.collection_time_sec) > 5.0 times 3"
 +
              }
 +
          },
 +
          "alarmState": "UNDETERMINED",
 +
          "oldAlarmState": "OK"
 +
      }
 +
  }
 +
 +
=== AlarmStateTransitionedEvent ===
 +
An AlarmStateTransitionedEvent is created when an alarm state changes. These events are primarily published by the ThresholdEngine and consumed by the Persister and Notification Engine. The AlarmStateTransitionEvent consists of the following:
 +
 +
* String tenantId
 +
* String alarmId
 +
* String alarmDefinitionId
 +
* List<MetricDefinition> metrics
 +
* String alarmName
 +
* String alarmDescription
 +
* AlarmState oldState
 +
* AlarmState newState
 +
* boolean actionsEnabled
 +
* String stateChangeReason
 +
* String severity
 +
* long timestamp
 +
 +
==== Example ====
 +
 +
    {
 +
        "alarm-transitioned": {
 +
            "tenantId": "8ada618268ec43709a2ab8eb8ea7996c",
 +
            "alarmId": "80e0426e-3a32-4166-ad2a-d28e4a7bc34b",
 +
            "alarmDefinitionId": "609c2c1a-2e1d-4b0b-9b3a-21ea7f649e1b",
 +
            "metrics": [
 +
                {
 +
                    "name": "cpu.system_perc",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "hostname": "devstack"
 +
                    }
 +
                },
 +
                {
 +
                    "name": "load.avg_1_min",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "hostname": "mini-mon"
 +
                    }
 +
                },
 +
                {
 +
                    "name": "cpu.system_perc",
 +
                    "dimensions": {
 +
                        "component": "monasca-agent",
 +
                        "service": "monitoring",
 +
                        "hostname": "mini-mon"
 +
                    }
 +
                }
 +
            ],
 +
            "alarmName": "high cpu and load",
 +
            "alarmDescription": "System CPU Utilization exceeds 1% and Load exceeds 3 per measurement period",
 +
            "oldState": "UNDETERMINED",
 +
            "newState": "ALARM",
 +
            "actionsEnabled": true,
 +
            "stateChangeReason": "Thresholds were exceeded for the sub-alarms: [max(cpu.system_perc) > 0.0, max(load.avg_1_min{hostname=mini-mon}) > 0.0]",
 +
            "severity": "LOW",
 +
            "timestamp": 1421258195
 +
        }
 +
    }
 +
 +
=== AlarmNotification ===
 +
Currently unsupported. This is a placeholder for future development when we add support for storing the notifications that have been sent.
 +
 +
== Metrics Message ==
 +
A metric that is sent to the API or created by the Transform Engine, Event Engine or Anomaly Engine is published to the MessageQ as a MetricsMessage. A MetricsMessage has the following fields:
 +
* MetricDefinition metric
 +
* meta:
 +
** String tenantId
 +
** String region: Should remove "region" as this isn't being used.
 +
* creation_time:
 +
 +
==== Example ====
 +
 +
    {
 +
        "metric": {
 +
            "name": "monasca.collection_time_sec",
 +
            "dimensions": {
 +
                "component": "monasca-agent",
 +
                "service": "monitoring",
 +
                "hostname": "devstack"
 +
            },
 +
            "timestamp": 1421259363,
 +
            "value": 8.01378607749939
 +
        },
 +
        "meta": {
 +
            "tenantId": "8ada618268ec43709a2ab8eb8ea7996c",
 +
            "region": "useast"
 +
        },
 +
        "creation_time": 1421259371
 +
    }
 +
 +
Note, the Metrics Message is inconsistent with the other messages in that it doesn’t have the type of message in it. This is OK today because it is on a different topic and there are so many of them, but this should probably be resolved.
 +
 +
== Event Message ==
 +
TBD. Currently, there is a proof-of-concept implementation of events in Monasca, but the details haven't been formalized.
 +
 +
== Types ==
 +
This section describes the specific types used in messages.
 +
 +
=== AlarmSubExpression ===
 +
* AggregateFunction function
 +
* MetricDefinition metricDefinition
 +
* AlarmOperator operator
 +
* double threshold threshold
 +
* int period
 +
* int periods
 +
 +
=== MetricDefinition ===
 +
* String name
 +
* Map<String, String> dimensions
 +
 +
== Enums ==
 +
=== AlarmState ===
 +
* UNDETERMINED
 +
* OK
 +
* ALARM
 +
 +
=== AggregateFunction ===
 +
* MIN
 +
* MAX
 +
* SUM
 +
* COUNT
 +
* AVG
 +
 +
=== AlarmOperator ===
 +
* LT("<")
 +
* LTE("<=")
 +
* GT(">")
 +
* GTE(">=")
 +
 +
=== AlarmSeverity ===
 +
* LOW
 +
* MEDIUM
 +
* HIGH
 +
* CRITICAL
 +
 +
== Avro Schema ==
 +
A prototype schema implementation using Avro is at, https://github.com/roland-hochmuth/monasca-schema

Latest revision as of 10:10, 3 February 2015

Overview

Monasca supports two primary classification of messages that are published and consumed via the MessageQ:

  1. Value objects, such as metrics and events, that are sent to the API from some external data source, such as the Monasca Agent.
  2. Domain events that represent a change to some entity in the system as follows:
    1. An alarm definition being created, updated or deleted by the API.
    2. An alarm being updated or deleted by the API.
    3. An alarm transitioning in state by the Threshold Engine.

Currently, the message formats that are published and consumed by Monasca have been developed for internal use only. The messages are in a JSON format that was largely developed without a lot of consideration for supporting external or third-party components, applications and services. While the current message format is OK for internal and easy to by external applications, there is a risk to them in doing so.

Consequently, we are in the process of reviewing the messages used by Monasca and creating a formal message schema specification that is used by Monasca, such that external third-party components, applications and services outside of the Monasca system can easily publish and consume from the MessageQ without being concerned about future changes. We plan to define the message schema and treat it as an external specification such that third-parties can work with Monasca without being concerned about future changes. One use case for doing this involve enabling third-party data analytics applications.

The current messages that are published and consumed by various components in the Monasca system via the MessageQ are as follows:

Message Produced By Consumed By Kafka Topic Description
AlarmDefinitionCreatedEvent API Threshold Engine events When an alarm definition is created by the API an AlarmDefinitionCreatedEvent is published to the MessageQ.
AlarmDefinitionDeletedEvent API Threshold Engine events When an alarm definition is deleted by the API an AlarmDefinitionDeletedEvent is published to the MessageQ.
AlarmDefinitionUpdatedEvent API Threshold Engine events When an alarm definition is updated by the API an AlarmDefinitionUpdatedEvent is published to the MessageQ.
AlarmDeletedEvent API Threshold Engine events When an alarm is deleted by the API an AlarmDeletedEvent is published to the MessageQ.
AlarmUpdatedEvent API Threshold Engine events When an alarm is updated by the API an AlarmUpdatedEvent is published to the MessageQ.
AlarmStateTransitionedEvent Threshold Engine Notification Engine, Persister alarm-state-transitions When an alarm transitions state, such as from the OK to Alarm or Alarm to OK, this an AlarmStateTransitionedEvent is published to the MessageQ and persisted by the persister and processed by the Notification Engine. The API can query the history of AlarmStateTransitionedEvent.
AlarmNotification Notification Engine Persister alarm-notifications Currently unsupported This event is published to the MessageQ when the Notification Engine processes an alarm and sends a notification. The alarm notification is persisted by the Persister and can be queried by the API. The database maintains a history of all events.
Metric API, Transform and Aggregation Engine Persister, Threshold Engine metrics A metric sent to the API or created by the Transform Engine, Event Engine or Anomaly Engine is published to the MessageQ.
Event API, Transform Engine, Event Engine Persister, Event Engine raw-events, transformed-events An event sent to the API or created by the Transform Engine or Event Engine or is published to the MessageQ.

Alarm Definition Events

Alarm Definition Events are created when an alarm definition is created, deleted or updated by the API. These events are meant to be primarily consumed by the Threshold Engine to maintain a synchronized in-memory model of the alarm definitions in the system, without having to query or poll the Config Database. When the Threshold Engine starts up it queries the Config Database for all the alarm definitions to initialize it's internal model of alarm definitions. Subsequently, the Threshold Engine consumes AlarmDefinitionEvents to update it's internal model of the alarm definitions.

We should consider renaming the members. For example, AlarmDefinitionCreatedEvent has member alarmName.

All of the AlarmDefinition*Event currently only hold the data needed by the Threshold Engine. They should be updated to have all the data about what changed, for example the actions

AlarmDefinitionCreatedEvent

  • String tenantId
  • String alarmDefinitionId
  • String alarmName
  • String alarmDescription
  • String alarmExpression
  • Map<String, AlarmSubExpression> alarmSubExpressions
  • List<String> matchBy

Example

{
     "alarm-definition-created": {
         "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
         "alarmDefinitionId": "0fe0b88f-8b06-459a-8bed-50d114c4f07b",
         "alarmName": "example-alarm-definition",
         "alarmDescription": "example-alarm-definition-description",
         "alarmExpression": "max(cpu.user_perc) > 100",
         "alarmSubExpressions": {
             "7b790964-67de-4f70-b625-cca5da0119d3": {
                 "function": "MAX",
                 "metricDefinition": {
                     "name": "cpu.user_perc",
                     "dimensions": {
                         "component": "monasca-agent",
                         "service": "monitoring",
                         "hostname": "devstack"
                 },
                 "operator": "GT",
                 "threshold": 100,
                 "period": 60,
                 "periods": 1,
                 "expression": "max(cpu.user_perc) > 100.0"
             }
         },
         "matchBy": [
             "hostname"
         ]
     }
 }

AlarmDefinitionDeletedEvent

  • String alarmDefinitionId
  • Map<String, MetricDefinition> subAlarmMetricDefinitions

Example

  {
       "alarm-definition-deleted": {
           "alarmDefinitionId": "19afaacf-35ad-44c0-a02f-96a0a20d6e6d",
           "subAlarmMetricDefinitions": {
               "c5fa8686-597b-420b-97ed-648637c1f702": {
                   "name": "cpu.user_perc",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "hostname": "devstack"
                   }
               }
           }
       }
   }

AlarmDefinitionUpdatedEvent

  • String tenantId
  • String alarmDefinitionId
  • String alarmName
  • String alarmDescription
  • String alarmExpression
  • String severity
  • List<String> matchBy
  • boolean alarmActionsEnabled
  • Map<String, AlarmSubExpression> oldAlarmSubExpressions
  • Map<String, AlarmSubExpression> changedSubExpressions
  • Map<String, AlarmSubExpression> unchangedSubExpressions
  • Map<String, AlarmSubExpression> newAlarmSubExpressions

Example

 {
     "alarm-definition-updated": {
         "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
         "alarmDefinitionId": "0fe0b88f-8b06-459a-8bed-50d114c4f07b",
         "alarmName": "example-alarm-definition",
         "alarmDescription": "example-alarm-definition-description",
         "alarmExpression": "max(cpu.user_perc) > 100",
         "severity": "HIGH",
         "matchBy": [
             "hostname"
         ],
         "alarmActionsEnabled": true,
         "oldAlarmSubExpressions": {},
         "changedSubExpressions": {},
         "unchangedSubExpressions": {
             "7b790964-67de-4f70-b625-cca5da0119d3": {
                 "function": "MAX",
                 "metricDefinition": {
                     "name": "cpu.user_perc",
                     "dimensions": {
                         "component": "monasca-agent",
                         "service": "monitoring",
                         "hostname": "devstack"
                     }
                 },
                 "operator": "GT",
                 "threshold": 100,
                 "period": 60,
                 "periods": 1,
                 "expression": "max(cpu.user_perc) > 100.0"
             }
         },
         "newAlarmSubExpressions": {}
     }
 }

Alarm Events

Alarms are created by the Threshold Engine in response to when there are new metrics sent to the system that match an alarm definition. Currently, AlarmCreatedEvents are not created by the Threshold Engine when this occurs. Note, alarms are not created by the API.

When alarms are deleted or updated by the API an AlarmDeletedEvent and AlarmUpdatedEvent is created. AlarmDeletedEvents and AlarmUpdateEvents are primarily meant to be consumed by the Threshold Engine to delete or update it's internal in-memory model of alarms when they are updated or deleted by the API.

AlarmCreatedEvent

The AlarmCreatedEvent used to be published by the API when a new Alarm was created. After we added the alarm definition feature alarms were no longer created by the API, but were created by the Threshold Engine instead. However, the Threshold Engine does not publish the AlarmCreatedEvent as it was the only consumer of it. We should probably consider publishing an AlarmCreatedEvent in the Threshold Engine in the future..

  • String tenantId
  • String alarmId
  • String alarmName
  • String alarmExpression
  • Map<String, AlarmSubExpression> alarmSubExpressions

Example

 {
     "metric": {
         "name": "cpu.user_perc",
         "timestamp": 1418844000,
         "value": 100
     },
     "meta": {
         "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
         "region": "useast"
     },
     "creation_time": 1418844001
 }

AlarmDeletedEvent

  • String tenantId
  • String alarmId
  • List<MetricDefinition> alarmMetrics
  • String alarmDefinitionId
  • Map<String, AlarmSubExpression> subAlarms

Example

   {
       "alarm-deleted": {
           "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
           "alarmId": "04def1c4-b6ee-4d05-8da8-39559a8e2a9e",
           "alarmMetrics": [
               {
                   "name": "net.in_errors",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "device": "eth0",
                       "hostname": "mini-mon"
                   }
               },
               {
                   "name": "net.out_errors",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "device": "eth0",
                       "hostname": "mini-mon"
                   }
               },
               {
                   "name": "net.in_errors",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "device": "eth1",
                       "hostname": "mini-mon"
                   }
               },
               {
                   "name": "net.out_errors",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "device": "eth1",
                       "hostname": "mini-mon"
                   }
               }
           ],
           "alarmDefinitionId": "64938145-91de-470f-85b5-2f03730c8560",
           "subAlarms": {
               "df171878-3bcf-4d02-9cbc-d7a7fe2f9965": {
                   "function": "MAX",
                   "metricDefinition": {
                       "name": "net.in_errors",
                       "dimensions": {}
                   },
                   "operator": "GT",
                   "threshold": 5,
                   "period": 60,
                   "periods": 1,
                   "expression": "max(net.in_errors) > 5.0"
               },
               "c5f550a6-0382-4a74-969b-4c92b73e0446": {
                   "function": "MAX",
                   "metricDefinition": {
                       "name": "net.out_errors",
                       "dimensions": {}
                   },
                   "operator": "GT",
                   "threshold": 5,
                   "period": 60,
                   "periods": 1,
                   "expression": "max(net.out_errors) > 5.0"
               }
           }
       }
   }

AlarmUpdatedEvent

  • String alarmId
  • String tenantId
  • String alarmDefinitionId
  • List<MetricDefinition> alarmMetrics
  • Map<String, AlarmSubExpression> subAlarms
  • AlarmState alarmState
  • AlarmState oldAlarmState

Example

 {
     "alarm-updated": {
         "alarmId": "2a913e31-36a0-4ef7-88c0-4aaa5392273b",
         "tenantId": "69a6aeb64a5f4704b88dcf1985d43184",
         "alarmDefinitionId": "fef92344-7c20-4c83-ae47-e613224b3503",
         "alarmMetrics": [
             {
                 "name": "monasca.collection_time_sec",
                 "dimensions": {
                     "component": "monasca-agent",
                     "service": "monitoring",
                     "hostname": "mini-mon"
                 }
             }
         ],
         "subAlarms": {
             "82c23ead-352b-42a9-aa7f-dff3bf5c30f8": {
                 "function": "AVG",
                 "metricDefinition": {
                     "name": "monasca.collection_time_sec",
                     "dimensions": {
                         "component": "monasca-agent",
                         "service": "monitoring",
                         "hostname": "mini-mon"
                     }
                 },
                 "operator": "GT",
                 "threshold": 5,
                 "period": 60,
                 "periods": 3,
                 "expression": "avg(monasca.collection_time_sec) > 5.0 times 3"
             }
         },
         "alarmState": "UNDETERMINED",
         "oldAlarmState": "OK"
     }
 }

AlarmStateTransitionedEvent

An AlarmStateTransitionedEvent is created when an alarm state changes. These events are primarily published by the ThresholdEngine and consumed by the Persister and Notification Engine. The AlarmStateTransitionEvent consists of the following:

  • String tenantId
  • String alarmId
  • String alarmDefinitionId
  • List<MetricDefinition> metrics
  • String alarmName
  • String alarmDescription
  • AlarmState oldState
  • AlarmState newState
  • boolean actionsEnabled
  • String stateChangeReason
  • String severity
  • long timestamp

Example

   {
       "alarm-transitioned": {
           "tenantId": "8ada618268ec43709a2ab8eb8ea7996c",
           "alarmId": "80e0426e-3a32-4166-ad2a-d28e4a7bc34b",
           "alarmDefinitionId": "609c2c1a-2e1d-4b0b-9b3a-21ea7f649e1b",
           "metrics": [
               {
                   "name": "cpu.system_perc",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "hostname": "devstack"
                   }
               },
               {
                   "name": "load.avg_1_min",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "hostname": "mini-mon"
                   }
               },
               {
                   "name": "cpu.system_perc",
                   "dimensions": {
                       "component": "monasca-agent",
                       "service": "monitoring",
                       "hostname": "mini-mon"
                   }
               }
           ],
           "alarmName": "high cpu and load",
           "alarmDescription": "System CPU Utilization exceeds 1% and Load exceeds 3 per measurement period",
           "oldState": "UNDETERMINED",
           "newState": "ALARM",
           "actionsEnabled": true,
           "stateChangeReason": "Thresholds were exceeded for the sub-alarms: [max(cpu.system_perc) > 0.0, max(load.avg_1_min{hostname=mini-mon}) > 0.0]",
           "severity": "LOW",
           "timestamp": 1421258195
       }
   }

AlarmNotification

Currently unsupported. This is a placeholder for future development when we add support for storing the notifications that have been sent.

Metrics Message

A metric that is sent to the API or created by the Transform Engine, Event Engine or Anomaly Engine is published to the MessageQ as a MetricsMessage. A MetricsMessage has the following fields:

  • MetricDefinition metric
  • meta:
    • String tenantId
    • String region: Should remove "region" as this isn't being used.
  • creation_time:

Example

   {
       "metric": {
           "name": "monasca.collection_time_sec",
           "dimensions": {
               "component": "monasca-agent",
               "service": "monitoring",
               "hostname": "devstack"
           },
           "timestamp": 1421259363,
           "value": 8.01378607749939
       },
       "meta": {
           "tenantId": "8ada618268ec43709a2ab8eb8ea7996c",
           "region": "useast"
       },
       "creation_time": 1421259371
   }

Note, the Metrics Message is inconsistent with the other messages in that it doesn’t have the type of message in it. This is OK today because it is on a different topic and there are so many of them, but this should probably be resolved.

Event Message

TBD. Currently, there is a proof-of-concept implementation of events in Monasca, but the details haven't been formalized.

Types

This section describes the specific types used in messages.

AlarmSubExpression

  • AggregateFunction function
  • MetricDefinition metricDefinition
  • AlarmOperator operator
  • double threshold threshold
  • int period
  • int periods

MetricDefinition

  • String name
  • Map<String, String> dimensions

Enums

AlarmState

  • UNDETERMINED
  • OK
  • ALARM

AggregateFunction

  • MIN
  • MAX
  • SUM
  • COUNT
  • AVG

AlarmOperator

  • LT("<")
  • LTE("<=")
  • GT(">")
  • GTE(">=")

AlarmSeverity

  • LOW
  • MEDIUM
  • HIGH
  • CRITICAL

Avro Schema

A prototype schema implementation using Avro is at, https://github.com/roland-hochmuth/monasca-schema