Monasca/Message Schema

Overview
Monasca supports two primary classification of messages that are published and consumed via the MessageQ:


 * 1) Value objects, such as metrics and events, that are sent to the API from some external data source, such as the Monasca Agent.
 * 2) Domain events that represent a change to some entity in the system as follows:
 * 3) An alarm definition being created, updated or deleted by the API.
 * 4) An alarm being updated or deleted by the API.
 * 5) An alarm transitioning in state by the Threshold Engine.

Currently, the message formats that are published and consumed by Monasca have been developed for internal use only. The messages are in a JSON format that was largely developed without a lot of consideration for supporting external or third-party components, applications and services. While the current message format is OK for internal and easy to by external applications, there is a risk to them in doing so.

Consequently, we are in the process of reviewing the messages used by Monasca and creating a formal message schema specification that is used by Monasca, such that external third-party components, applications and services outside of the Monasca system can easily publish and consume from the MessageQ without being concerned about future changes. We plan to define the message schema and treat it as an external specification such that third-parties can work with Monasca without being concerned about future changes. One use case for doing this involve enabling third-party data analytics applications.

The current messages that are published and consumed by various components in the Monasca system via the MessageQ are as follows:

Alarm Definition Events
Alarm Definition Events are created when an alarm definition is created, deleted or updated by the API. These events are meant to be primarily consumed by the Threshold Engine to maintain a synchronized in-memory model of the alarm definitions in the system, without having to query or poll the Config Database. When the Threshold Engine starts up it queries the Config Database for all the alarm definitions to initialize it's internal model of alarm definitions. Subsequently, the Threshold Engine consumes AlarmDefinitionEvents to update it's internal model of the alarm definitions.

We should consider renaming the members. For example, AlarmDefinitionCreatedEvent has member alarmName.

All of the AlarmDefinition*Event currently only hold the data needed by the Threshold Engine. They should be updated to have all the data about what changed, for example the actions

AlarmDefinitionCreatedEvent

 * String tenantId
 * String alarmDefinitionId
 * String alarmName
 * String alarmDescription
 * String alarmExpression
 * Map alarmSubExpressions
 * List matchBy

Example
{     "alarm-definition-created": { "tenantId": "69a6aeb64a5f4704b88dcf1985d43184", "alarmDefinitionId": "0fe0b88f-8b06-459a-8bed-50d114c4f07b", "alarmName": "example-alarm-definition", "alarmDescription": "example-alarm-definition-description", "alarmExpression": "max(cpu.user_perc) > 100", "alarmSubExpressions": { "7b790964-67de-4f70-b625-cca5da0119d3": { "function": "MAX", "metricDefinition": { "name": "cpu.user_perc", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "devstack" },                 "operator": "GT", "threshold": 100, "period": 60, "periods": 1, "expression": "max(cpu.user_perc) > 100.0" }         },          "matchBy": [ "hostname" ]     }  }

AlarmDefinitionDeletedEvent

 * String alarmDefinitionId
 * Map subAlarmMetricDefinitions

Example
{       "alarm-definition-deleted": { "alarmDefinitionId": "19afaacf-35ad-44c0-a02f-96a0a20d6e6d", "subAlarmMetricDefinitions": { "c5fa8686-597b-420b-97ed-648637c1f702": { "name": "cpu.user_perc", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "devstack" }               }            }        }    }

AlarmDefinitionUpdatedEvent

 * String tenantId
 * String alarmDefinitionId
 * String alarmName
 * String alarmDescription
 * String alarmExpression
 * String severity
 * List matchBy
 * boolean alarmActionsEnabled
 * Map oldAlarmSubExpressions
 * Map changedSubExpressions
 * Map unchangedSubExpressions
 * Map newAlarmSubExpressions

Example
{     "alarm-definition-updated": { "tenantId": "69a6aeb64a5f4704b88dcf1985d43184", "alarmDefinitionId": "0fe0b88f-8b06-459a-8bed-50d114c4f07b", "alarmName": "example-alarm-definition", "alarmDescription": "example-alarm-definition-description", "alarmExpression": "max(cpu.user_perc) > 100", "severity": "HIGH", "matchBy": [ "hostname" ],         "alarmActionsEnabled": true, "oldAlarmSubExpressions": {}, "changedSubExpressions": {}, "unchangedSubExpressions": { "7b790964-67de-4f70-b625-cca5da0119d3": { "function": "MAX", "metricDefinition": { "name": "cpu.user_perc", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "devstack" }                 },                  "operator": "GT", "threshold": 100, "period": 60, "periods": 1, "expression": "max(cpu.user_perc) > 100.0" }         },          "newAlarmSubExpressions": {} } }

Alarm Events
Alarms are created by the Threshold Engine in response to when there are new metrics sent to the system that match an alarm definition. Currently, AlarmCreatedEvents are not created by the Threshold Engine when this occurs. Note, alarms are not created by the API.

When alarms are deleted or updated by the API an AlarmDeletedEvent and AlarmUpdatedEvent is created. AlarmDeletedEvents and AlarmUpdateEvents are primarily meant to be consumed by the Threshold Engine to delete or update it's internal in-memory model of alarms when they are updated or deleted by the API.

AlarmCreatedEvent
The AlarmCreatedEvent used to be published by the API when a new Alarm was created. After we added the alarm definition feature alarms were no longer created by the API, but were created by the Threshold Engine instead. However, the Threshold Engine does not publish the AlarmCreatedEvent as it was the only consumer of it. We should probably consider publishing an AlarmCreatedEvent in the Threshold Engine in the future..


 * String tenantId
 * String alarmId
 * String alarmName
 * String alarmExpression
 * Map alarmSubExpressions

Example
{     "metric": { "name": "cpu.user_perc", "timestamp": 1418844000, "value": 100 },     "meta": { "tenantId": "69a6aeb64a5f4704b88dcf1985d43184", "region": "useast" },     "creation_time": 1418844001 }

AlarmDeletedEvent

 * String tenantId
 * String alarmId
 * List alarmMetrics
 * String alarmDefinitionId
 * Map subAlarms

Example
{       "alarm-deleted": { "tenantId": "69a6aeb64a5f4704b88dcf1985d43184", "alarmId": "04def1c4-b6ee-4d05-8da8-39559a8e2a9e", "alarmMetrics": [ {                   "name": "net.in_errors", "dimensions": { "component": "monasca-agent", "service": "monitoring", "device": "eth0", "hostname": "mini-mon" }               },                {                    "name": "net.out_errors", "dimensions": { "component": "monasca-agent", "service": "monitoring", "device": "eth0", "hostname": "mini-mon" }               },                {                    "name": "net.in_errors", "dimensions": { "component": "monasca-agent", "service": "monitoring", "device": "eth1", "hostname": "mini-mon" }               },                {                    "name": "net.out_errors", "dimensions": { "component": "monasca-agent", "service": "monitoring", "device": "eth1", "hostname": "mini-mon" }               }            ],            "alarmDefinitionId": "64938145-91de-470f-85b5-2f03730c8560", "subAlarms": { "df171878-3bcf-4d02-9cbc-d7a7fe2f9965": { "function": "MAX", "metricDefinition": { "name": "net.in_errors", "dimensions": {} },                   "operator": "GT", "threshold": 5, "period": 60, "periods": 1, "expression": "max(net.in_errors) > 5.0" },               "c5f550a6-0382-4a74-969b-4c92b73e0446": { "function": "MAX", "metricDefinition": { "name": "net.out_errors", "dimensions": {} },                   "operator": "GT", "threshold": 5, "period": 60, "periods": 1, "expression": "max(net.out_errors) > 5.0" }           }        }    }

AlarmUpdatedEvent

 * String alarmId
 * String tenantId
 * String alarmDefinitionId
 * List alarmMetrics
 * Map subAlarms
 * AlarmState alarmState
 * AlarmState oldAlarmState

Example
{     "alarm-updated": { "alarmId": "2a913e31-36a0-4ef7-88c0-4aaa5392273b", "tenantId": "69a6aeb64a5f4704b88dcf1985d43184", "alarmDefinitionId": "fef92344-7c20-4c83-ae47-e613224b3503", "alarmMetrics": [ {                 "name": "monasca.collection_time_sec", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "mini-mon" }             }          ],          "subAlarms": { "82c23ead-352b-42a9-aa7f-dff3bf5c30f8": { "function": "AVG", "metricDefinition": { "name": "monasca.collection_time_sec", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "mini-mon" }                 },                  "operator": "GT", "threshold": 5, "period": 60, "periods": 3, "expression": "avg(monasca.collection_time_sec) > 5.0 times 3" }         },          "alarmState": "UNDETERMINED", "oldAlarmState": "OK" } }

AlarmStateTransitionedEvent
An AlarmStateTransitionedEvent is created when an alarm state changes. These events are primarily published by the ThresholdEngine and consumed by the Persister and Notification Engine. The AlarmStateTransitionEvent consists of the following:


 * String tenantId
 * String alarmId
 * String alarmDefinitionId
 * List metrics
 * String alarmName
 * String alarmDescription
 * AlarmState oldState
 * AlarmState newState
 * boolean actionsEnabled
 * String stateChangeReason
 * String severity
 * long timestamp

Example
{       "alarm-transitioned": { "tenantId": "8ada618268ec43709a2ab8eb8ea7996c", "alarmId": "80e0426e-3a32-4166-ad2a-d28e4a7bc34b", "alarmDefinitionId": "609c2c1a-2e1d-4b0b-9b3a-21ea7f649e1b", "metrics": [ {                   "name": "cpu.system_perc", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "devstack" }               },                {                    "name": "load.avg_1_min", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "mini-mon" }               },                {                    "name": "cpu.system_perc", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "mini-mon" }               }            ],            "alarmName": "high cpu and load", "alarmDescription": "System CPU Utilization exceeds 1% and Load exceeds 3 per measurement period", "oldState": "UNDETERMINED", "newState": "ALARM", "actionsEnabled": true, "stateChangeReason": "Thresholds were exceeded for the sub-alarms: [max(cpu.system_perc) > 0.0, max(load.avg_1_min{hostname=mini-mon}) > 0.0]", "severity": "LOW", "timestamp": 1421258195 }   }

AlarmNotification
Currently unsupported. This is a placeholder for future development when we add support for storing the notifications that have been sent.

Metrics Message
A metric that is sent to the API or created by the Transform Engine, Event Engine or Anomaly Engine is published to the MessageQ as a MetricsMessage. A MetricsMessage has the following fields:
 * MetricDefinition metric
 * meta:
 * String tenantId
 * String region: Should remove "region" as this isn't being used.
 * creation_time:

Example
{       "metric": { "name": "monasca.collection_time_sec", "dimensions": { "component": "monasca-agent", "service": "monitoring", "hostname": "devstack" },           "timestamp": 1421259363, "value": 8.01378607749939 },       "meta": { "tenantId": "8ada618268ec43709a2ab8eb8ea7996c", "region": "useast" },       "creation_time": 1421259371 }

Note, the Metrics Message is inconsistent with the other messages in that it doesn’t have the type of message in it. This is OK today because it is on a different topic and there are so many of them, but this should probably be resolved.

Event Message
TBD. Currently, there is a proof-of-concept implementation of events in Monasca, but the details haven't been formalized.

Types
This section describes the specific types used in messages.

AlarmSubExpression

 * AggregateFunction function
 * MetricDefinition metricDefinition
 * AlarmOperator operator
 * double threshold threshold
 * int period
 * int periods

MetricDefinition

 * String name
 * Map dimensions

AlarmState

 * UNDETERMINED
 * OK
 * ALARM

AggregateFunction

 * MIN
 * MAX
 * SUM
 * COUNT
 * AVG

AlarmOperator

 * LT("<")
 * LTE("<=")
 * GT(">")
 * GTE(">=")

AlarmSeverity

 * LOW
 * MEDIUM
 * HIGH
 * CRITICAL

Avro Schema
A prototype schema implementation using Avro is at, https://github.com/roland-hochmuth/monasca-schema