Ceilometer and Healthnmon

Table of contents: <<TableOfContents()>>

Introduction

We'd like to discuss about the duplication as well as the difference between Ceilometer and Healthnmon here and see if there's way to unify them to avoid duplicated efforts.

Ceilometer

Ceilometer aims to become the infrastructure to collect measurements within OpenStack so that no two agents would need to be written to collect the same data. Its primary targets are monitoring and metering, but the framework should be easily expandable to collect for other needs. To that effect, Ceilometer should be able to share collected data with a variety of consumers.

See http://wiki.openstack.org/Ceilometer for the detailed information about ceilometer.

Healthnmon

Healthnmon aims to deliver "Cloud Resource Monitor", an extensible service to OpenStack Cloud Operating system by providing monitoring service for Cloud Resources and Infrastructure with a pluggable framework for "Inventory Management", "Alerts and notifications" and "Utilization Data".

See the following for the detailed design of healthnmon:

http://wiki.openstack.org/CloudInventoryManager

http://wiki.openstack.org/ResourceMonitorAlertsandNotifications

http://wiki.openstack.org/utilizationdata

Data Model

Though there is duplication in the data collected by ceilometer and healthnmon, e.g. VM instance, VM instance CPU/disk/network utilization data, network subnet/port, floating ip, storage volumes, etc. The data model behind them is quite different.

Ceilometer

The data model in ceilometer is based on the metering message, which contains the following user-visible attributes. See here for the full list of all the available meters which the ceilometer currently provides.

>counter_name

>counter_type

>counter_volume

>user_id

>project_id

>resource_id

>resource_metadata

>timestamp

Healthnmon

The data model in healthnmon is based on the resources it wants to monitor. Here is the UML diagram copied from its wiki page.

File:Ceilometer$$CeilometerAndHealthnmon$HnmResourceModel.gif

Comparison and ways to Unify

The ceilometer's data model is more generic and simple than the data model used in healthnmon. It's easy for adding new measurements into it: just publishing the new measurement into the data storage would be enough, there's no need to modify the DB schema/API and user APIs. While adding new measurements into the healthnmon's data model requires much more effort: including change the resource model and DB schema, modifying both DB APIs and User APIs,

But unlike the data model in healthnmon, if users want to get the relationship between different measurements in ceilometer, i.e. get all the available measurements related to a specific VM instance including CPU usage, disk IO usage, network information, storage volumes, …, it would require extra post-processing of the resource_metadata to find all the measurements. While in healthnmon, the relationship is already in the data model so it's much easier.

The difference between the two data model may come from that the ceilometer is originally designed for metering, and the healthnmon is designed for monitoring.

It's not sure whether we can have a data model that is easy and simple to extend and is also easy to get the relationship between different measurements.

Data collection/storage mechanism

Ceilometer

The ceilometer collects measurement data in 2 different ways:

agents periodically poll the data
openstack RabbitMQ notification listeners consume the notification event from other openstack services(i.e. nova, glance, quantum, cinder, etc.).

After the data is collected, it sent from agents or notification listeners to the ceilometer collector by RPC calls where the data would be written back into the storage DB. Currently, the ceilometer supports both mongoDB or sqlalchemy as its DB backend. (Besides sending the data to the ceilometer collector, the raw data could also be transformed into new data types and can be published to other data sinks based on configuration).

The ceilometer agents don't access DB directly, instead they talk to other openstack services(nova, glance, quantum, cinder, etc.) through restful APIs.

Healthnmon

There is no agents/collector/notification listener separation in healthnmon. In healthnmon, a central collector polls every running nova compute node(by using libvirt and ssh) to collect data, and save the data back to its sqlalchemy DB backend. This central collector also needs access nova DB to get the nova compute nodes status.

Comparison and ways to Unify

The current ceilometer agents can be extended to collect the polling data for healthnmon, and can publish it to healthnmon as long as we have a healthnmon data sink.

Physical Device Monitoring

See http://wiki.openstack.org/Ceilometer/MonitoringPhysicalDevices for the requirement about physical device monitoring.

Currently, the healthnmon can only monitor the physical server on which nova compute is running. It's referred as the "VmHosts" in healthnmon. The data is polled from the target server using libvirt + ssh.

We think it'd be better to have a unified way to monitoring physical servers within openstack, by having an agent running on the physical server and periodically poll the data and send it back to the collectors to be stored.

Notifications

Ceilometer consumes the notification event from other nova services(i.e. nova, glance, quantum, etc.) to generate measurement data.

In contrast, heathnmon generate its own notification event into openstack Rabbit MQ. These notification events are mainly life cycle event for cloud resources, e.g. Vm.Created, Vm.StateChanged, StorageVolume.Added, Network.Enabled, etc.

Ceilometer/CeilometerAndHealthnmon

Introduction

Ceilometer

Healthnmon

Data Model

Ceilometer

Healthnmon

Comparison and ways to Unify

Data collection/storage mechanism

Ceilometer

Healthnmon

Comparison and ways to Unify

Physical Device Monitoring

Notifications