Ceilometer/HypervisorDataAcquisition

= Hypervisor Measurement in Ceilometer =

This page intended to capture our analysis of how to move forward with hypervisor data acquisition in Ceilometer. A number of non-mutually exclusive options are presented.

Current Scenario
As of the grizzly release, ceilometer uses a simple virt inspector abstraction, driven on a periodic polling cycle from a compute agent that's deployed to every nova-cpu node.

This approach suffers from the following drawbacks:

There are at least two competing ideas on how these drawbacks could be addressed; integration with the Heathnmon virt drivers or native nova emission of notifications containing these data.
 * libvirt only, no support as yet for xenapi, ESX, hyper-V
 * requires yet another agent to be widely deployed
 * the current implementation is sub-optimal in terms of proliferation of calls into the hypervisor per polling cycle and proliferation of AMQP message (one per meter)

Healthnmon integration
One obvious value Healthnmon can bring to the table is the breath and reach of its hypervisor suppport. It is capable of acquiring data from xenapi, ESX, hyper-V etc. and doing so remotely, so that an agent-per-cpu-node deployment is not required. At the Havana summit we discussed taking this functionality into Ceilometer by wrapping the Healthnmon virt drivers in the Ceilometer inspector abstraction.

This would get us out of the libvirt-only rut, and also mitigate the deployment concerns somewhat.

Native nova emission
Here the idea is to avoid polling agents entirely, and instead extend nova to natively emit the required data via periodic notifications (in the style of the currently at-most-hourly compute.instance.exists notifications). This would require the following changes to nova:

A concern about this approach is whether it would lead an unreasonable load on the AMPQ layer. This could potentially be mitigated by more agressive batching of messages (if the sheer number of messages, as opposed to the individual message size, proves to be the scaling bottleneck in practice).
 * add a new notification, or allow the existing compute.instance.exists notification to be more frequent than hourly
 * extend nova.virt.ComputeDriver.get_all_bw_counters support to libvirt initially for parity, and ultimately to all hypervisor drivers
 * ensure the cadence of the periodic task driving the notification emission is regular and predictable
 * complete jd__'s prototype notifier driver, so as to be capable of emitting ceilometer counters directly from nova via the oslo notification subsystem (so that multi-publish could be used to route over UDP instead of AMQP, for example)