Jump to: navigation, search

Difference between revisions of "NewCeilometerAgent"

Line 18: Line 18:
 
     I remember notification/polling is in different agent now. -- yjiang5
 
     I remember notification/polling is in different agent now. -- yjiang5
 
* Looking at the above file it seems to share [[AgentManger]]. Unless I missed something? --S
 
* Looking at the above file it seems to share [[AgentManger]]. Unless I missed something? --S
 +
** They share a base class, but that's it. -- dhellmann
 
* Does not track all notifications, only select ones and only on the .info queue (.error is ignored)
 
* Does not track all notifications, only select ones and only on the .info queue (.error is ignored)
  

Revision as of 21:39, 30 January 2013

New Ceilometer Agent

I'd like to propose a new simpler agent for use with Ceilometer. One based on the StackTach model.

Problems with the existing agent framework:

  1. Requires a custom notification driver to be deployed on the compute nodes
    I think currently the compute agent only check the instance information and has no notification mechanism. --yjiang5
    I'm not sure if we can avoid compute-node deployment, for example, we may want to collect host utilization information or even thermal information. -- yjiang5
  • For the 90% case, we can do it without an agent (including Host Utilization). If the thermal information comes from some other service that runs on the hypervisor (and doesn't have an RPC API) then yes, we would need a compute-side agent for that. Are you talking about thermal information coming from the libvirt api? If so, anything we can get from the standard virt layer interfaces we can expose in the notification. --S
  • Mixes polling (periodic_task) and notification tracking to get data
    I remember notification/polling is in different agent now. -- yjiang5
  • Looking at the above file it seems to share AgentManger. Unless I missed something? --S
    • They share a base class, but that's it. -- dhellmann
  • Does not track all notifications, only select ones and only on the .info queue (.error is ignored)

This new model would fix all of this:

  1. The worker can be deployed anywhere on the openstack network.
  2. One worker deploy can support multiple cells/deployments.
  3. No api extensions required
  4. No Compute node deployments required.
     --As stated above, not sure if we can avoid this. -- yjiang5
  • Perhaps in very limited situations, but I'd love to know the exact details of those use cases. --S

This model is already deployed and working within Rackspace.

The replacement strategy would consist of the following:

  1. Support KVM under the same monitoring mechanism as Xen (already supported in Nova at the Virt layer)
  2. Ensure the existing Usage mechanism works with KVM.
  3. Develop the new worker in parallel to the existing CM Agent. Nothing would be changed to the existing strategy.
  4. Once we have 100% feature coverage of the existing agent we can talk about dropping the old one.
  5. Initial deployment would assume:
    • the existing stacktach logging and configuration mechanism
    • RabbitMQ support only
  6. Subsequent deployments would:
    • Replace the logging/config information with Oslo
    • Make the AMQP mechanism driver-based and/or update Oslo to support notification-style events

A walkthrough of how the existing StackTach worker is built can be seen here: http://www.youtube.com/watch?v=thaZcHuJXhM

Push-back

Some of the arguments we've heard about going with a fully notification-based mechanism are:

  • Unreliability of the periodic_task mechanism in the services.
    • Proposal: find and fix these delays
    -- I think this was discussed several times in the mailing list and we can also notice some effort working on it already (http://lists.openstack.org/pipermail/openstack-dev/2013-January/004491.html), but I'm not sure if is easy to fix. I agree with Nick and Hellman that we need something working. In the long term, I'm not object to it. -- yjiang5
  • Agreed. We don't want to replace everything *right now* and cause problems with the next release. But also, I'm pushing for not growing the current mechanism too much and instead thinking about putting our energy into an easier framework. --S
  • Not all services support notifications (i.e. Swift)
  • Proposal: work with those openstack teams to get proper notification support
  • Notification not suitable for high-speed monitoring.
  • Proposal: Agreed. Create a new UDP-based notification driver for these events with a highly efficient aggregator like statsd.
   -- agree with this in the long term. -- yjiang5

These are all relatively trivial modifications to make.