Monasca/UI UX Support

Introduction
The following additions and modifications are a work in progress to support UI/UX related queries including alarm counts, advanced filtering, and sorted results.

Alarm Count Resource
This is the proposed specification for a resource that will provide a way to query how many alarms match the specified criteria. There are a few important things to note about the resource. First, the group_by field is limited to 'alarm_definition_id', 'name', 'state', 'severity', 'link', 'lifecycle_state', 'metric_name', 'dimension_name', 'dimension_value'. Secondly, if metric_name, dimension_name, or dimension_value are specified, the resulting counts are not guaranteed to add up to the total number of alarms in the system. Alarms may contain multiple different metrics (based on the alarm definition) and could be included in multiple counts when grouped by any of these three fields.

Alarm Count

 * GET /v2.0/alarms/count
 * alarm_definition_id (string, optional) - Alarm definition ID to filter by.
 * metric_name (string(255), optional) - Name of metric to filter by.
 * metric_dimensions ({string(255): string(255)}, optional) - Dimensions of metrics to filter by specified as a comma separated array of (key, value) pairs as key1:value1,key1:value1, ...
 * state (string, optional) - State of alarm to filter by, either OK, ALARM or UNDETERMINED.
 * lifecycle_state (string(50), optional) - Lifecycle state to filter by.
 * link (string(512), optional) - Link to filter by.
 * state_updated_start_time (string, optional) - The start time in ISO 8601 combined date and time format in UTC.
 * offset (string, optional)
 * limit (integer, optional)
 * group_by (string, optional) – a list of fields to group the results by as ```field1,field2,…```. See above for the permitted values.

Request Example
GET /v2.0/alarms/count?metric_name=cpu.system_perc&metric_dimensions=hostname:devstack&group_by=state,lifecycle_state

Response Body
Returns a JSON object containing the following fields:
 * links ([link]) - Links to alarms count resource
 * columns ([string]) - List of the column names, in the order they were returned
 * counts ([array[]]) - A two dimensional array of the counts returned

Response Example
Note in the example below that the category for 'OK', 'ACKNOWLEDGED' does not exist. This occurs when no alarms in the system match the category. {       "links": [ {               "rel": "self", "href": "http://192.168.10.4:8080/v2.0/alarms?name=cpu.system_perc&dimensions=hostname%3Adevstack&group_by=state,lifecycle_state" }       ],        "columns": ["count", "state", "lifecycle_state"], "counts": [ [124, "ALARM", "ACKNOWLEDGED"], [12, "ALARM", "RESOLVED"], [235, "OK", "OPEN"], [61, "OK", "RESOLVED"], [13, "UNDETERMINED", "ACKNOWLEDGED"], [1, "UNDETERMINED", "OPEN"], [2, "UNDETERMINED", "RESOLVED"], ]   }

Alarm Sorting
Proposed specification for sorting alarms. If no values are selected, the default of alarm_id will be used. Alarms can be sorted by one or more of the following fields: 'alarm_id', 'alarm_definition_id', 'state', 'severity', 'lifecycle_state', 'link', 'state_updated_timestamp', 'updated_timestamp', 'created_timestamp'.

New Request Query Parameters

 * sort_by (string) - A comma separated list of fields to sort by, defaults to 'alarm_id'

Monasca Query Language
Proposed specification for a Monasca Query Language (MQL). The language will expand the capabilities when searching for metrics and creating alarm definitions. The latest Monasca Query Language code can be found at https://github.com/hpcloud-mon/monasca_query_language

Functions
Initially the language will support existing statistics functions: max, min, avg, sum, count. The language will be extendible for new functions and will support some nesting of functions. Additional functions may be added to support searching metric definitions.

Operators
The language will allow basic math on vectors/ranges (addition, subtraction, multiplication, division).

Syntax
The query language will include the following primitives for use with functions and operations
 * scalar - single integer/float value
 * vector - set of scalar values across multiple time series
 * range - set of scalar values in a single time series across a span of time
 * string

A metric selector will define a specific metric to return, using the dimension comparison operators (=, !=, =~, !~)

{ …

Boolean expressions evaluate a comparison between two arithmetic expressions where valid comparisons are (>, >=, <, <=)

Logical expression will link Boolean expressions together with logical operators (and, or) and will support standard order of operations and nesting

 …

Some basic query examples:
 * avg(net.in_bytes_sec{hostname=testhost_01,device="eth0|eth1"}[25m]
 * corresponds to a vector of values representing the average number of bytes per second over the last 25 minutes for each eth0 and eth1 time series


 * avg(net.in_bytes_sec{hostname=testhost_01,device=”eth0|eth1”}[5m over 25m])
 * is the same query as above but one average per 5 minutes


 * avg(net.out_bytes_sec{hostname=testhost_01,device=”eth0|eth1”}[5m over 25m]) offset 1w
 * is the same query using data from 1 week ago


 * net.in_bytes_sec + net.out_bytes_sec
 * takes the last point of each series and adds them together


 * avg(net.in_bytes_sec [5m]) + avg(net.out_bytes_sec [5m])
 * is the same as above, but averages each for the past 5 minutes first


 * avg(net.in_bytes_sec [5m] + net.out_bytes_sec [5m])
 * this attempts to align the points and add them together before taking the avg


 * net.in_bytes_sec > net.in_bytes offset 1d
 * this will return true if the current value is above the value from 1 day ago


 * net.in_bytes_sec > avg(net.in_bytes_sec [5m] offset 1d)
 * same as above, but compares to a 5 minute average from 1 day ago


 * net.in_bytes_sec > net.out_bytes_sec or net.in_bytes_sec < net.out_bytes_sec
 * this will return true if the last values for each series were different

Alarming
Alarming would include the same syntax and capabilities as the metrics queries, with the caveat that each expression evaluates to a Boolean value. An few examples would be:
 * cpu.idle_perc{} > 90
 * max(apache.latency{method=”POST”}) > max(apache.latency{method=”UPDATE|PATCH”})
 * avg(messages_sec{component=monasca-api}) / avg(messages_sec{component=monasca-persister}) > 1.0