Ceilometer/CeilometerAndHealthnmon

Ceilometer and Healthnmon = Introduction = We'd like to discuss about the duplication as well as the difference between Ceilometer and Healthnmon here and see if there's way to unify them to avoid duplicated efforts.
 * Edited: Divakar Padiyar Nandavar

Ceilometer
Ceilometer aims to become the infrastructure to collect measurements within OpenStack so that no two agents would need to be written to collect the same data. Its primary targets are monitoring and metering, but the framework should be easily expandable to collect for other needs. To that effect, Ceilometer should be able to share collected data with a variety of consumers.

See http://wiki.openstack.org/Ceilometer for the detailed information about ceilometer.

Healthnmon
Healthnmon aims to deliver "Cloud Resource Monitor", an extensible service to OpenStack Cloud Operating system by providing monitoring service for Cloud Resources and Infrastructure with a pluggable framework for "Inventory Management", "Alerts and notifications" and "Utilization Data".

See the following for the detailed design of healthnmon:

http://wiki.openstack.org/CloudInventoryManager

http://wiki.openstack.org/ResourceMonitorAlertsandNotifications

http://wiki.openstack.org/utilizationdata

= Data Model = Though there is duplication in the data collected by ceilometer and healthnmon, e.g. VM instance, VM instance CPU/disk/network utilization data, network subnet/port, floating ip, storage volumes, etc. However the data models behind them are quite different.

Ceilometer
The data model in Ceilometer is based on the metering message, which contains the following user-visible attributes. See here for the full list of all the available meters which Ceilometer currently provides.

Healthnmon
The data model in Healthnmon is based on the resources it wants to monitor. Here is the UML diagram copied from its wiki page.



Comparison and ways to Unify
Ceilometer's data model is more generic and simple than the data model used in Healthnmon. It's easy to add new measurements into it: just publishing the new measurement into the data storage would be enough. There's no need to modify the DB schema/API and user APIs. In contrast, adding new measurements into Healthnmon's data model requires much more effort, including changing the resource model and DB schema, modifying both DB APIs and User APIs.

But in another usage scenario, what if the users want to get the relationship between different measurements? i.e. get all the available measurements related to a specific VM instance, e.g. CPU usage, disk IO usage, network information, storage volumes, …. In Ceilometer it would require extra post-processing of the 'resource_metadata' to find all the measurements. While in Healthnmon, the relationship is already in the data model so it's much easier.

The difference between the two data model may comes from that Ceilometer is originally designed for metering, and Healthnmon is designed for monitoring.

Not sure whether we can have a data model that is easy and simple to extend, as well as easy to get the relationship between different measurements.

''The ceilometer API reports the names of all of the meters being collected for a resource as part of the resource return value. - dhellmann''

= Data collection/storage mechanism =

Ceilometer
The ceilometer collects measurement data in 2 different ways:
 * agents periodically poll the data
 * openstack RabbitMQ notification listeners consume the notification event from other openstack services(i.e. nova, glance, quantum, cinder, etc.).

After the data is collected, it is sent from agents or notification listeners to the Ceilometer collector by RPC calls where the data would be written back into the storage DB. Currently, the ceilometer supports both mongoDB or sqlalchemy as its DB backend. (Besides sending the data to the Ceilometer collector, based on configuration, the raw data could also be transformed into new data format and can be published to other data sinks, e.g. CloudWatch).

The Ceilometer agents don't access DB directly, instead they talk to other openstack services(nova, glance, quantum, cinder, etc.) through their restful APIs.

Healthnmon
There is no agents/collector/notification listener separation in Healthnmon. In Healthnmon, a central collector polls every running nova compute node(by using libvirt and ssh) to collect data, and save the data back to its sqlalchemy DB backend. This central collector needs direct access to the nova DB to get the nova compute nodes status.

Comparison and ways to Unify
The current Ceilometer agents can be extended to collect the polling data for healthnmon, and can publish it to healthnmon as long as we have a Healthnmon data sink.

''There is some ongoing work on providing different data sinks for ceilometer, so this shouldn't be a roadblock at all. See https://blueprints.launchpad.net/ceilometer/+spec/multi-publisher for details. - dhellmann''

= Physical Device Monitoring = See http://wiki.openstack.org/Ceilometer/MonitoringPhysicalDevices for the requirement about physical device monitoring.

Currently, Healthnmon can only monitor the physical server on which the nova compute is running. It's referred as the "VmHosts" in Healthnmon. The data is polled from the target server using libvirt + ssh.

We think it'd be better to have a unified way to monitoring physical servers within openstack, by having an agent running on the physical server and periodically poll the data and send it back to the collectors to be stored.

= Notifications = Ceilometer consumes the notification event from other nova services(i.e. nova, glance, quantum, etc.) to generate measurement data.

In contrast, heathnmon generate its own notification event into openstack Rabbit MQ. These notification events are mainly life cycle event for cloud resources, e.g. Vm.Created, Vm.StateChanged, StorageVolume.Added, Network.Enabled, etc.

= Integration plan = The following integration plan was agreed on the Ceilometer meeting of Dec 27th, 2012:

[nijaba] takes the AI to get in touch with the healthmon team to see what their reaction is to our plan for integration

(a) Implement the missing meters in Ceilometer

 * 1) List the detailed information items where we should get from libvirt in Ceilometer for Healthnmon.
 * 2) Decide what kind of meters are missing in Ceilometer in order to group the information items from step 1.
 * 3) Implement the missing meters.

(b) Integrate Healthnmon through multi-publisher

 * 1) Implement the publisher/transformers to publish the meters to Healthnmon.
 * 2) Healthnmon may need to define the communication interface(or API) to allow Ceilometer publishing its data?

'''Following Integration plan is proposed by Healthnmon team

Healthnmon as the source of metering data (for compute to start with)

 * Healthnmon currently has implemented drivers for KVM to collect the required meters data for both Compute and Instances running on the compute remotely.  The same data can be leveraged by Ceilometer thru Healthnmon APIs as initial step
 * Ceilometer Centralized Agent mechanism can be leveraged to pull the required metering data from Healthnmon
 * Healthnmon to implement a configurable consumption based model that can push data per consumer requirements (instead of a pull from consumer)
 * Ceilometer Collector mechanism can be leveraged to consume the required metering data from Healthnmon
 * Healthnmon to provide the drivers for KVM, vCenter ESX and Hyper-V in grizzly timeline
 * For metering, minor modifications in healthnmon and ceilometer might be required for initial integration
 * Implement the missing meters required by Ceilometer in Healthnmon
 * List the detailed information items where we should get from libvirt in Healthnmon for Ceilometer.
 * Decide what kind of meters are missing in Healthnmon in order to group the information items from step 1.
 * Implement the missing meters.
 * Ceilometer to consume the metering data thru Healthnmon APIs
 * Ceilometer APIs to access the metering data using Healthnmon APIs
 * Healthnmon to extend the data model to accommodate additional metering attributes from Ceilometer for handling compute, swift, glance, quantum as a next step