Difference between revisions of "NewCeilometerAgent"
Line 4: | Line 4: | ||
I'd like to propose a new simpler agent for use with Ceilometer. One based on the [[StackTach]] model. | I'd like to propose a new simpler agent for use with Ceilometer. One based on the [[StackTach]] model. | ||
− | + | Problems with the existing agent framework: | |
+ | # Requires a custom notification driver to be deployed on the compute nodes | ||
+ | # Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data | ||
+ | # Requires explicit deployment on every compute node. | ||
+ | |||
+ | This new model would fix all of this: | ||
+ | # The worker can be deployed anywhere on the openstack network. | ||
+ | # One worker deploy can support multiple cells/deployments. | ||
+ | # No api extensions required | ||
+ | # No Compute node deployments required. | ||
+ | |||
+ | This model is already deployed and working within Rackspace. | ||
The replacement strategy would consist of the following: | The replacement strategy would consist of the following: |
Revision as of 13:33, 29 January 2013
New Ceilometer Agent
I'd like to propose a new simpler agent for use with Ceilometer. One based on the StackTach model.
Problems with the existing agent framework:
- Requires a custom notification driver to be deployed on the compute nodes
- Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data
- Requires explicit deployment on every compute node.
This new model would fix all of this:
- The worker can be deployed anywhere on the openstack network.
- One worker deploy can support multiple cells/deployments.
- No api extensions required
- No Compute node deployments required.
This model is already deployed and working within Rackspace.
The replacement strategy would consist of the following:
- Support KVM under the same monitoring mechanism as Xen (already supported in Nova at the Virt layer)
- Ensure the existing Usage mechanism works with KVM.
- Develop the new worker in parallel to the existing CM Agent. Nothing would be changed to the existing strategy.
- Once we have 100% feature coverage of the existing agent we can talk about dropping the old one.
- Initial deployment would assume:
- the existing stacktach logging and configuration mechanism
- RabbitMQ support only
- Subsequent deployments would:
- Replace the logging/config information with Oslo
- Make the AMQP mechanism driver-based and/or update Oslo to support notification-style events
A walkthrough of how the existing StackTach worker is built can be seen here: http://www.youtube.com/watch?v=thaZcHuJXhM
Push-back
Some of the arguments we've heard about going with a fully notification-based mechanism are:
- Unreliability of the periodic_task mechanism in the services.
- Proposal: find and fix these delays
- Not all services support notifications (i.e. Swift)
- Proposal: work with those openstack teams to get proper notification support
- Notification not suitable for high-speed monitoring.
- Proposal: Agreed. Create a new UDP-based notification driver for these events with a highly efficient aggregator like statsd.
These are all relatively trivial modifications to make.