New Ceilometer Agent

I'd like to propose a new simpler agent for use with Ceilometer. One based on the StackTach model.

Problems with the existing agent framework:

Requires a custom notification driver to be deployed on the compute nodes

    I think currently the compute agent only check the instance information and has no notification mechanism. --yjiang5

Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data
Requires explicit deployment on every compute node.

    I'm not sure if we can avoid compute-node deployment, for example, we may want to collect host utilization information or even thermal information. -- yjiang5

Mixes polling (periodic_task) and notification tracking to get data

    I remember notification/polling is in different agent now. -- yjiang5

Does not track all notifications, only select ones and only on the .info queue (.error is ignored)

This new model would fix all of this:

The worker can be deployed anywhere on the openstack network.
One worker deploy can support multiple cells/deployments.
No api extensions required
No Compute node deployments required.

     --As stated above, not sure if we can avoid this. -- yjiang5

This model is already deployed and working within Rackspace.

The replacement strategy would consist of the following:

Support KVM under the same monitoring mechanism as Xen (already supported in Nova at the Virt layer)
Ensure the existing Usage mechanism works with KVM.
Develop the new worker in parallel to the existing CM Agent. Nothing would be changed to the existing strategy.
Once we have 100% feature coverage of the existing agent we can talk about dropping the old one.
Initial deployment would assume:
- the existing stacktach logging and configuration mechanism
- RabbitMQ support only
Subsequent deployments would:
- Replace the logging/config information with Oslo
- Make the AMQP mechanism driver-based and/or update Oslo to support notification-style events

A walkthrough of how the existing StackTach worker is built can be seen here: http://www.youtube.com/watch?v=thaZcHuJXhM

Push-back

Some of the arguments we've heard about going with a fully notification-based mechanism are:

Unreliability of the periodic_task mechanism in the services.
- Proposal: find and fix these delays

    -- I think this was discussed several times in the mailing list and we can also notice some effort working on it already (http://lists.openstack.org/pipermail/openstack-dev/2013-January/004491.html), but I'm not sure if is easy to fix. I agree with Nick and Hellman that we need something working. In the long term, I'm not object to it. -- yjiang5

Not all services support notifications (i.e. Swift)
- Proposal: work with those openstack teams to get proper notification support
Notification not suitable for high-speed monitoring.
- Proposal: Agreed. Create a new UDP-based notification driver for these events with a highly efficient aggregator like statsd.

   -- agree with this in the long term. -- yjiang5

These are all relatively trivial modifications to make.

NewCeilometerAgent

New Ceilometer Agent

Push-back