EfficientMetering/FutureNovaInteractionModel

= Future ceilometer/nova interaction model =

A discussion is ongoing on the openstack-dev mailing list, with a view to reaching a consensus with the nova domain experts on the best (most stable/supportable) approach for ceilometer to interact with nova going forward.

 1. Extend the existing os-server-diagnostics API extension to expose any additional stats that ceilo needs.

||+||would allow the ceilo compute agent to be scaled independently of the nova-compute node (i.e. no need for a 1:1 correspondence)|| ||-||the diagnostics returned are currently hypervisor-specific|| ||-||the additional nova-api-->nova-compute RPC call would add lag and impact timeliness for metrics gathering||

 2. Call the nova get_diagnostics RPC directly (as per the experimental patch proposed by Yunhong Jiang), or use a new RPC message specifically designed for this purpose.

||+/-||as for #1, but also removes the lag involved in an additional hop between nova services|| ||-||calling RPC directly would expose ceilo to a much less stable (i.e. rapidly rev'd) API than would be the case for #1||

 3. Have nova itself emit metering messages directly onto the ceilo message bus, encompassing both lifecycle events and usage stats, to be picked up and persisted by the ceilo collector or other agent.

||-||leaks ceilo concerns into nova|| ||-||<style="border:none;">requires message bus usage, probably inappropriate for time-sensitive measurements feeding into near-realtime metrics.||

<span id="option_4"> 4. Invert control and have the nova compute service itself call into a ceilo-provided API that abstracts the conduit used for publication (could be via the message bus, or UDP, or a direct call to a CW API)

||<style="border:none;">-||<style="border:none;">a loaded nova compute service may fall behind in this periodic task, especially if the reporting cadence is configured high||

<span id="option_4a"> 4a. Rename ceilometer-compute-agent to nova-compute-metering and move it into nova with its pollster. Make it uses the multi-publisher code from Ceilometer so it's able to publish to a variety of destination (ceilometer-collector, CW…) according to configuration, and polling on interval that is configured via the publisher (as already discussed on the multi-publisher blueprints).

||<style="border:none;">+||<style="border:none;">no request/reply (like option #1 and #2)|| ||<style="border:none;">+||<style="border:none;">maintained by nova, so doesn't break|| ||<style="border:none;">+||<style="border:none;">no need to have hypervizor specific code, possible to abstract|| ||<style="border:none;">+||<style="border:none;">no lag||

<span id="option_5"> 5. nova packages a consumable library layered over the hypervisor driver, that just exposes the diagnostics available from libvirt et al. The ceilo compute agent continues to exist under the ceilo umbrella, but talks to the hypervisor directly via this stable, versioned nova library.

||<style="border:none;">+||<style="border:none;">no remote calls required from ceilo-->nova-{api|compute}|| ||<style="border:none;">-||<style="border:none;">needs an independent versioning scheme|| ||<style="border:none;">-||<style="border:none;">still stuck in the "implicit trust" model?||

The discussion has not yet reached a definitive conlusion, but there was definite push-back from the nova domain expert on direct use of nova RPC by the ceilo agent (as this is considered an internal API). We await further feedback from the nova team on their attitude to accepting the ceilometer compute agent into nova as a separate daemon to run on nova compute nodes.