Revision as of 13:33, 29 January 2013

New Ceilometer Agent

I'd like to propose a new simpler agent for use with Ceilometer. One based on the StackTach model.

Problems with the existing agent framework:

Requires a custom notification driver to be deployed on the compute nodes
Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data
Requires explicit deployment on every compute node.

This new model would fix all of this:

This model is already deployed and working within Rackspace.

The replacement strategy would consist of the following:

Support KVM under the same monitoring mechanism as Xen (already supported in Nova at the Virt layer)
Ensure the existing Usage mechanism works with KVM.
Develop the new worker in parallel to the existing CM Agent. Nothing would be changed to the existing strategy.
Once we have 100% feature coverage of the existing agent we can talk about dropping the old one.
Initial deployment would assume:
- the existing stacktach logging and configuration mechanism
- RabbitMQ support only
Subsequent deployments would:
- Replace the logging/config information with Oslo
- Make the AMQP mechanism driver-based and/or update Oslo to support notification-style events

A walkthrough of how the existing StackTach worker is built can be seen here: http://www.youtube.com/watch?v=thaZcHuJXhM

Some of the arguments we've heard about going with a fully notification-based mechanism are:

Unreliability of the periodic_task mechanism in the services.
- Proposal: find and fix these delays
Not all services support notifications (i.e. Swift)
- Proposal: work with those openstack teams to get proper notification support
Notification not suitable for high-speed monitoring.
- Proposal: Agreed. Create a new UDP-based notification driver for these events with a highly efficient aggregator like statsd.

These are all relatively trivial modifications to make.

@@ Line 4: / Line 4: @@
 I'd like to propose a new simpler agent for use with Ceilometer. One based on the [[StackTach]] model.
-The existing agent uses a hodge-podge of custom notification drivers, custom hypervidoe
+Problems with the existing agent framework:
+# Requires a custom notification driver to be deployed on the compute nodes
+# Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data
+# Requires explicit deployment on every compute node.
+This new model would fix all of this:
+# The worker can be deployed anywhere on the openstack network.
+# One worker deploy can support multiple cells/deployments.
+# No api extensions required
+# No Compute node deployments required.
+This model is already deployed and working within Rackspace.
 The replacement strategy would consist of the following: