Difference between revisions of "Ceilometer/blueprints/monitoring"
m (Text replace - "__NOTOC__" to "") |
|||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | ||
* '''Launchpad Entry''': [[CeilometerSpec]]:monitoring | * '''Launchpad Entry''': [[CeilometerSpec]]:monitoring | ||
* '''Created''': 28 Nov 2012 | * '''Created''': 28 Nov 2012 | ||
Line 5: | Line 5: | ||
== Summary == | == Summary == | ||
+ | |||
+ | Note this is a big spec and where possible it is broken down into sub-specs to make it easier to share work. | ||
== Release Note == | == Release Note == | ||
Line 53: | Line 55: | ||
these are really the same kinds of meters that ceilometer currently samples | these are really the same kinds of meters that ceilometer currently samples | ||
# Sample at between 10s to 60s, and Transmit at between 1min and 5min | # Sample at between 10s to 60s, and Transmit at between 1min and 5min | ||
− | # try to reuse as much of the current ceilometer code as possible so that the | + | # try to reuse as much of the current ceilometer code as possible so that the features that we add can be used by metering ceilometer. |
== Design == | == Design == | ||
Line 59: | Line 61: | ||
The idea is to use most of ceilometer as-is, so the program flow is: | The idea is to use most of ceilometer as-is, so the program flow is: | ||
− | + | '''Data Insertion''' | |
+ | |||
The publisher has an option to emit samples at a faster rate (say 60sec). It does so through | The publisher has an option to emit samples at a faster rate (say 60sec). It does so through | ||
a different transport that is more efficient than rpc and doesn't interfer with metering. | a different transport that is more efficient than rpc and doesn't interfer with metering. | ||
Line 66: | Line 69: | ||
the db as is done now (only transport different). | the db as is done now (only transport different). | ||
− | + | "Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/multi-publisher" | |
+ | |||
+ | '''API Auth''' | ||
− | |||
The API needs to be accessible by non-admin, to get the user's own data and control their own alarms. | The API needs to be accessible by non-admin, to get the user's own data and control their own alarms. | ||
This should not be a problem and work is planned for this. | This should not be a problem and work is planned for this. | ||
− | + | "Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/user-api" | |
+ | |||
+ | '''Data Query''' | ||
− | |||
To handle aggregate queries (for autoscaling groups) we need to extend the query mechanism to | To handle aggregate queries (for autoscaling groups) we need to extend the query mechanism to | ||
be able to get statistics over a defined set of resources (usually the info is in the metadata). | be able to get statistics over a defined set of resources (usually the info is in the metadata). | ||
− | + | "Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/multi-dimensions" | |
We need to extend the API to be able to list the meter types across the resources in a tenant. | We need to extend the API to be able to list the meter types across the resources in a tenant. | ||
Line 84: | Line 89: | ||
We need to support more statistics functions: max, min, average, count within a defined period. | We need to support more statistics functions: max, min, average, count within a defined period. | ||
− | + | "Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/api-aggregate-average" | |
+ | |||
+ | '''Support Posting new sample data''' | ||
+ | |||
+ | "Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/meter-post-api" | ||
+ | |||
+ | '''Alarm Detection''' | ||
+ | |||
+ | TODO | ||
− | + | '''Alarm Notification''' | |
− | + | TODO | |
== Implementation == | == Implementation == |
Latest revision as of 23:29, 17 February 2013
- Launchpad Entry: CeilometerSpec:monitoring
- Created: 28 Nov 2012
- Contributors: Angus Salkeld
Contents
Summary
Note this is a big spec and where possible it is broken down into sub-specs to make it easier to share work.
Release Note
Rationale
User stories
The purpose of Alarms is to notify a user when a meter matches a certain criteria.
Some examples
"Tell me when the maximum disk utilization exceeds 90%" "Tell me when the average CPU utilization exceeds 80% over 120 seconds" "Tell me when my web app is becoming unresponsive" (loadbalancer latency meter) "Tell me when my httpd daemon dies" (custom user script that checks daemon health)
How can you use Alarms
Create an alarm
{ 'period': '300', 'eval_periods': '2', 'meter': 'CPUUtilization', 'function': 'average', 'operator': 'gt', 'threshold': '50' 'resource_id': 'inst-002', 'source': 'OS/compute', 'alarm_actions': ['rpc/my_notify_topic', 'http://bla.com/bla'], 'ok_actions': ['rpc/my_notify_topic'] }
This will check the "CPUUtilization" meter events every 300sec
and if the average CPUUtilization was > 50% (for inst-002) for both of the
last 2 300sec periods then it will send an rpc notification on the "my_notify_topic" topic
and post the alarm details to http://bla.com/bla.
Then when the alarm goes below this level it will do the "ok_actions".
Assumptions
- We are trying to deliver CloudWatch-like functionality but in an "openstack way" that can be extended.
- Kinds of metrics to monitor: http://docs.amazonwebservices.com/AmazonCloudWatch/latest/DeveloperGuide/CW_Support_For_AWS.html
these are really the same kinds of meters that ceilometer currently samples
- Sample at between 10s to 60s, and Transmit at between 1min and 5min
- try to reuse as much of the current ceilometer code as possible so that the features that we add can be used by metering ceilometer.
Design
The idea is to use most of ceilometer as-is, so the program flow is:
Data Insertion
The publisher has an option to emit samples at a faster rate (say 60sec). It does so through a different transport that is more efficient than rpc and doesn't interfer with metering.
We could run a different (or the same - to be decided) collector that inserts the samples into the db as is done now (only transport different).
"Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/multi-publisher"
API Auth
The API needs to be accessible by non-admin, to get the user's own data and control their own alarms. This should not be a problem and work is planned for this.
"Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/user-api"
Data Query
To handle aggregate queries (for autoscaling groups) we need to extend the query mechanism to be able to get statistics over a defined set of resources (usually the info is in the metadata).
"Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/multi-dimensions"
We need to extend the API to be able to list the meter types across the resources in a tenant.
We need to support more statistics functions: max, min, average, count within a defined period.
"Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/api-aggregate-average"
Support Posting new sample data
"Blueprint: https://blueprints.launchpad.net/ceilometer/+spec/meter-post-api"
Alarm Detection
TODO
Alarm Notification
TODO
Implementation
This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:
API Changes
- new alarm rest resource
- new alarm history rest resource
- need changes to make statistics aggregation more flexible
- need a new post meter data API
- need a new list meters API
Code Changes
Code changes should include an overview of what needs to change, and in some cases even the specific details.
Migration
Include:
- data migration, if any
- redirects from old URLs to new ones, if any
- how users will be pointed to the new way of doing things, if necessary.
Test/Demo Plan
This need not be added or completed until the specification is nearing beta.
Unresolved issues
This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.
BoF agenda and discussion
Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.