Monasca/UI UX Support

Introduction

The following additions and modifications are a work in progress to support UI/UX related queries including alarm counts, advanced filtering, and sorted results.

Alarm Count Resource

This is the proposed specification for a resource that will provide a way to query how many alarms match the specified criteria. There are a few important things to note about the resource. First, the group_by field is limited to 'alarm_definition_id', 'name', 'state', 'severity', 'link', 'lifecycle_state', 'metric_name', 'dimension_name', 'dimension_value'. Secondly, if metric_name, dimension_name, or dimension_value are specified, the resulting counts are not guaranteed to add up to the total number of alarms in the system. Alarms may contain multiple different metrics (based on the alarm definition) and could be included in multiple counts when grouped by any of these three fields.

Alarm Count

GET /v2.0/alarms/count
- alarm_definition_id (string, optional) - Alarm definition ID to filter by.
- metric_name (string(255), optional) - Name of metric to filter by.
- metric_dimensions ({string(255): string(255)}, optional) - Dimensions of metrics to filter by specified as a comma separated array of (key, value) pairs as key1:value1,key1:value1, ...
- state (string, optional) - State of alarm to filter by, either OK, ALARM or UNDETERMINED.
- lifecycle_state (string(50), optional) - Lifecycle state to filter by.
- link (string(512), optional) - Link to filter by.
- state_updated_start_time (string, optional) - The start time in ISO 8601 combined date and time format in UTC.
- offset (string, optional)
- limit (integer, optional)
- group_by (string, optional) – a list of fields to group the results by as ```field1,field2,…```. See above for the permitted values.

Request Example

GET /v2.0/alarms/count?metric_name=cpu.system_perc&metric_dimensions=hostname:devstack&group_by=state,lifecycle_state

Response Body

Returns a JSON object containing the following fields:

links ([link]) - Links to alarms count resource
columns ([string]) - List of the column names, in the order they were returned
counts ([array[]]) - A two dimensional array of the counts returned

Response Example

Note in the example below that the category for 'OK', 'ACKNOWLEDGED' does not exist. This occurs when no alarms in the system match the category.

   {
       "links": [
           {
               "rel": "self",
               "href": "http://192.168.10.4:8080/v2.0/alarms?name=cpu.system_perc&dimensions=hostname%3Adevstack&group_by=state,lifecycle_state"
           }
       ],
       "columns": ["count", "state", "lifecycle_state"],
       "counts": [
           [124, "ALARM", "ACKNOWLEDGED"],
           [12, "ALARM", "RESOLVED"],
           [235, "OK", "OPEN"],
           [61, "OK", "RESOLVED"],
           [13, "UNDETERMINED", "ACKNOWLEDGED"],
           [1, "UNDETERMINED", "OPEN"],
           [2, "UNDETERMINED", "RESOLVED"],
       ]
   }

Alarm Sorting

Proposed specification for sorting alarms. If no values are selected, the default of alarm_id will be used. Alarms can be sorted by one or more of the following fields: 'alarm_id', 'alarm_definition_id', 'state', 'severity', 'lifecycle_state', 'link', 'state_updated_timestamp', 'updated_timestamp', 'created_timestamp'.

New Request Query Parameters

sort_by (string) - A comma separated list of fields to sort by, defaults to 'alarm_id'

Monasca Query Language

Proposed specification for a Monasca Query Language (MQL). The language will expand the capabilities when searching for metrics and creating alarm definitions. The latest Monasca Query Language code can be found at https://github.com/hpcloud-mon/monasca_query_language

Functions

Initially the language will support existing statistics functions: max, min, avg, sum, count. The language will be extendible for new functions and will support some nesting of functions. Additional functions may be added to support searching metric definitions.

Operators

The language will allow basic math on vectors/ranges (addition, subtraction, multiplication, division).

Syntax

The query language will include the following primitives for use with functions and operations

scalar - single integer/float value
vector - set of scalar values across multiple time series
range - set of scalar values in a single time series across a span of time
string

A metric selector will define a specific metric to return, using the dimension comparison operators (=, !=, =~, !~)

<metric_name>{<dimension_key><op><dimension_value,…}

Functions will contain any arithmetic expression that evaluates to a primitive type the function understands.

<function>(<expression>)

Arithmetic expressions will link statements (functions, metric selectors, scalars, constants, etc.) with simple arithmetic operations (+, -, *, /) and will include standard order of operations and nesting

<statement> <op> <statement> …

Boolean expressions evaluate a comparison between two arithmetic expressions where valid comparisons are (>, >=, <, <=)

Logical expression will link Boolean expressions together with logical operators (and, or) and will support standard order of operations and nesting

<boolean expression> <op> <boolean expression> …

Some basic query examples:

avg(net.in_bytes_sec{hostname=testhost_01,device="eth0|eth1"}[25m]
- corresponds to a vector of values representing the average number of bytes per second over the last 25 minutes for each eth0 and eth1 time series

avg(net.in_bytes_sec{hostname=testhost_01,device=”eth0|eth1”}[5m over 25m])
- is the same query as above but one average per 5 minutes

avg(net.out_bytes_sec{hostname=testhost_01,device=”eth0|eth1”}[5m over 25m]) offset 1w
- is the same query using data from 1 week ago

net.in_bytes_sec + net.out_bytes_sec
- takes the last point of each series and adds them together

avg(net.in_bytes_sec [5m]) + avg(net.out_bytes_sec [5m])
- is the same as above, but averages each for the past 5 minutes first

avg(net.in_bytes_sec [5m] + net.out_bytes_sec [5m])
- this attempts to align the points and add them together before taking the avg

net.in_bytes_sec > net.in_bytes offset 1d
- this will return true if the current value is above the value from 1 day ago

net.in_bytes_sec > avg(net.in_bytes_sec [5m] offset 1d)
- same as above, but compares to a 5 minute average from 1 day ago

net.in_bytes_sec > net.out_bytes_sec or net.in_bytes_sec < net.out_bytes_sec
- this will return true if the last values for each series were different

Alarming

Alarming would include the same syntax and capabilities as the metrics queries, with the caveat that each expression evaluates to a Boolean value. An few examples would be:

cpu.idle_perc{} > 90
max(apache.latency{method=”POST”}) > max(apache.latency{method=”UPDATE|PATCH”})
avg(messages_sec{component=monasca-api}) / avg(messages_sec{component=monasca-persister}) > 1.0