NewCeilometerAgent
New Ceilometer Agent
I'd like to propose a new simpler agent for use with Ceilometer. One based on the StackTach model.
Problems with the existing agent framework:
- Requires a custom notification driver to be deployed on the compute nodes
I think currently the compute agent only check the instance information and has no notification mechanism. --yjiang5
- Uses a round-about scheme of calling through the api-layer (requiring an api extension) to get hypervisor data
- Requires explicit deployment on every compute node.
I'm not sure if we can avoid compute-node deployment, for example, we may want to collect host utilization information or even thermal information. -- yjiang5
- Mixes polling (periodic_task) and notification tracking to get data
I remember notification/polling is in different agent now. -- yjiang5
- Does not track all notifications, only select ones and only on the .info queue (.error is ignored)
This new model would fix all of this:
- The worker can be deployed anywhere on the openstack network.
- One worker deploy can support multiple cells/deployments.
- No api extensions required
- No Compute node deployments required.
--As stated above, not sure if we can avoid this. -- yjiang5
This model is already deployed and working within Rackspace.
The replacement strategy would consist of the following:
- Support KVM under the same monitoring mechanism as Xen (already supported in Nova at the Virt layer)
- Ensure the existing Usage mechanism works with KVM.
- Develop the new worker in parallel to the existing CM Agent. Nothing would be changed to the existing strategy.
- Once we have 100% feature coverage of the existing agent we can talk about dropping the old one.
- Initial deployment would assume:
- the existing stacktach logging and configuration mechanism
- RabbitMQ support only
- Subsequent deployments would:
- Replace the logging/config information with Oslo
- Make the AMQP mechanism driver-based and/or update Oslo to support notification-style events
A walkthrough of how the existing StackTach worker is built can be seen here: http://www.youtube.com/watch?v=thaZcHuJXhM
Push-back
Some of the arguments we've heard about going with a fully notification-based mechanism are:
- Unreliability of the periodic_task mechanism in the services.
- Proposal: find and fix these delays
-- I think this was discussed several times in the mailing list and we can also notice some effort working on it already (http://lists.openstack.org/pipermail/openstack-dev/2013-January/004491.html), but I'm not sure if is easy to fix. I agree with Nick and Hellman that we need something working. In the long term, I'm not object to it. -- yjiang5
- Not all services support notifications (i.e. Swift)
- Proposal: work with those openstack teams to get proper notification support
- Notification not suitable for high-speed monitoring.
- Proposal: Agreed. Create a new UDP-based notification driver for these events with a highly efficient aggregator like statsd.
-- agree with this in the long term. -- yjiang5
These are all relatively trivial modifications to make.