Jump to: navigation, search

Monasca/Value Metadata

Add the ability to optionally add name/value pairs to a measurement when it is created in Monasca. The name/value pairs can be read from the API when the measurement is read. The value of the measurement will still be required so statistics such as Average, Max, etc. for the metric can be computed. The name/value pairs will be ignored when statistics are requested. At least at first, Alarms won’t be able to be generated on the name/value pairs as that will require a change to the Alarm Definition syntax.

Example: http_status{url: http://localhost:8080/healthcheck, hostname=devstack, service=object-storage}

Today this returns a single value of either 1 or 0 depending if the status check succeeded. If it fails, it would be helpful to have the actual http status code and error message if possible. So instead of just a value, the measurement would be something like: {Timestamp=now(), value=0, value_meta{http_rc=500, error_msg=“Error accessing MySQL”}}

When to use them:

  1. Add extra information to a measurement


When not to use them:

  1. When the extra information would be something that should be alarmed on or graphed. For example, we could add response time to http_status, but that would be better as its own metric so it can be graphed and alarmed on


In the future, it may be possible to alarm on the value of one name/value pairs.

The goals for the first sprint:

  1. Allow the API to receive and output the extra name/value pairs
  2. Store the name/value pairs in the database
  3. Be able to generate the extra name/value pairs using the CLI
  4. Validate the API and overall design to ensure the name/value pairs are useful


Non goals for the first sprint:

  1. Have the agent send the extra name/value pairs. This will happen in a later sprint.
  2. Have the UI display the extra name/value pairs
  3. Have the Threshold Engine use the extra name/value pairs

Requirements

  1. value_meta is optional
  2. The maximum number of value_meta fields in a measurement is 16
  3. The value_meta name must not be empty
  4. The value_meta name will have leading and trailing white space trimmed
  5. The maximum size of the value_meta name is 255 characters
  6. Leading and trailing whitespace will be preserved for the value_meta value
  7. The maximum size of the value_meta value is 2048 characters
  8. A measurement returned from the API will only have value_meta if value_meta was set for that measurement when it was created
  9. value_meta cannot be added after a measurement has been created

Notes:

  1. When the UI uses statistics so it doesn’t need to make any changes for Grafana graphing. When you select "no statistics" the extra data may cause a problem for Grafana. We will have to try it when we get to that point


Threshold could change to use the extra information but that might be tricky because it doesn’t always know which metric tripped the alarm. For example, with purely binary data, 0 or 1 or OK or FAIL, it is clear that the message should be for the failed measurement, which is the last measurement. In the case of avg, it isn't clear and it is probably all the measurements involved in the alarm, but that could be difficult to represent. Maybe it could be used only for some of the functions in an Alarm Definition.

The first priority is to get a working system to ensure it works like we want, so the plan will be to first do either one of the Persisters and API. Once we’ve proven that it works as needed, we will sync up the other Persister and API

Implementation Tasks:

  1. Create design for writing to Influxdb. Make decision about 8.x vs 9.0
  2. Create API definition for writing Meta Data
  3. Create API definition for returning Meta Data
  4. Change Measurement class and JSON representation to add name/value pairs
  5. Change Java API to accept the metadata and put it out on Kafka
  6. Change Persister to accept new name/value pairs and write them out to influxdb
  7. ………. More to come after design is approved


Questions:

  1. Should metadata fields be returned for a measurement only if there are actually values or should empty values be returned?
  2. Should the API be able to search for specific values in the name/value pairs when returning measurements? For example, find all measurements in a given time period with http_rc=500. We can always let the client do the filtering.


Current thoughts on Influxdb design:

  1. Add the names as columns names, values go into the column
  2. Do not store as tags. Keeps extra series from being created. If the name/value pairs are saved as tags, it would be faster to search, but it would cause separate series to be created and when the measurements are queried, they would come back in order for each series and then have to be sorted into time order. That would break pagination since we don’t know how many measurements each series would have
  3. The current RC of Influxdb 9.0 doesn’t support Strings in the database but we don’t really want to do 8.x since that will be wasted work. Hopefully they will get that fixed soon.