Jump to: navigation, search

NewCeilometerAgent

Warning.svg Old Design Page

This page was used to help design a feature for a previous release of OpenStack. It may or may not have been implemented. As a result, this page is unlikely to be updated and could contain outdated information. It was last updated on 2014-05-15

New Ceilometer Agent

I'd like to propose a new simpler agent for use with Ceilometer. One based on the StackTach model.

Problems with the existing agent framework:

  1. Requires a custom notification driver to be deployed on the compute nodes
  2. Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data
  3. Requires explicit deployment on every compute node.
    • I'm not sure if we can avoid compute-node deployment, for example, we may want to collect host utilization information or even thermal information. -- yjiang5
      • For the 90% case, we can do it without an agent (including Host Utilization). If the thermal information comes from some other service that runs on the hypervisor (and doesn't have an RPC API) then yes, we would need a compute-side agent for that. Are you talking about thermal information coming from the libvirt api? If so, anything we can get from the standard virt layer interfaces we can expose in the notification. --S
  4. Mixes polling (periodic_task) and notification tracking to get data
    • I remember notification/polling is in different agent now. -- yjiang5
      • Looking at the above file it seems to share AgentManger. Unless I missed something? --S
        • They share a base class, but that's it. -- dhellmann
        • CollectorService define periodic_tasks as pass. So there are no peridoic work in notification. It is caused by inherited the PeriodicService class. --ronghui
  5. Does not track all notifications, only select ones and only on the .info queue (.error is ignored)
    • That's an easy enough fix to the existing code. -- dhellmann

This new model would fix all of this:

  1. The worker can be deployed anywhere on the openstack network.
  2. One worker deploy can support multiple cells/deployments.
  3. No api extensions required
  4. No Compute node deployments required.
    • --As stated above, not sure if we can avoid this. -- yjiang5
      • Perhaps in very limited situations, but I'd love to know the exact details of those use cases. --S

This model is already deployed and working within Rackspace.

The replacement strategy would consist of the following:

  1. Support KVM under the same monitoring mechanism as Xen (already supported in Nova at the Virt layer)
  2. Ensure the existing Usage mechanism works with KVM.
  3. Develop the new worker in parallel to the existing CM Agent. Nothing would be changed to the existing strategy.
  4. Once we have 100% feature coverage of the existing agent we can talk about dropping the old one.
  5. Initial deployment would assume:
    • the existing stacktach logging and configuration mechanism
    • RabbitMQ support only
  6. Subsequent deployments would:
    • Replace the logging/config information with Oslo
    • Make the AMQP mechanism driver-based and/or update Oslo to support notification-style events

A walkthrough of how the existing StackTach worker is built can be seen here: http://www.youtube.com/watch?v=thaZcHuJXhM

Push-back

Some of the arguments we've heard about going with a fully notification-based mechanism are:

  • Unreliability of the periodic_task mechanism in the services.
    • Proposal: find and fix these delays
      • -- I think this was discussed several times in the mailing list and we can also notice some effort working on it already (http://lists.openstack.org/pipermail/openstack-dev/2013-January/004491.html), but I'm not sure if is easy to fix. I agree with Nick and Hellman that we need something working. In the long term, I'm not object to it. -- yjiang5
        • Agreed. We don't want to replace everything *right now* and cause problems with the next release. But also, I'm pushing for not growing the current mechanism too much and instead thinking about putting our energy into an easier framework. --S
  • Not all services support notifications (i.e. Swift)
    • Proposal: work with those openstack teams to get proper notification support
  • Notification not suitable for high-speed monitoring.
    • Proposal: Agreed. Create a new UDP-based notification driver for these events with a highly efficient aggregator like statsd. -- agree with this in the long term. -- yjiang5

These are all relatively trivial modifications to make.