EfficientMetering/ArchitectureProposalV1

= Metering Architecture Proposal 1 =

This document is based on EfficientMetering#Architecture 

Goals

 * Provide efficient collection of metering data, in terms of CPU and network costs.
 * Allow deployers to integrate with the metering system directly or by replacing components.
 * Data may be collected by monitoring notifications sent from existing services or by polling the infrastructure.
 * Allow deployers to configure the type of data collected to meet their operating requirements.
 * The data collected by the metering system is made visible to some users through a REST API.
 * The metering messages are signed and non repudiable (http://en.wikipedia.org/wiki/Non-repudiation)

High Level Description
There are 4 basic components to the system:

utilization statistics. There may be other types of agents in the future, but for now we will focus on creating the compute agent.
 * 1) An agent runs on each compute node and polls for resource

monitor the message queues (for notifications and for metering data coming from the agent). Notification messages are processed and turned into metering messages and sent back out onto the message bus using the appropriate topic. Metering messages are written to the data store without modification.
 * 1) The collector runs on one or more central management servers to

writes (from one or more collector instances) and reads (from the API server).
 * 1) The data store is a database capable of handling concurrent

provide access to the data from the data store. See EfficientMetering#API for details.
 * 1) The API server runs on one or more central management servers to

These services communicate using the standard OpenStack messaging bus. Only the collector and API server have access to the data store.

Detailed Description
These details cover only the compute agent and collector, as well as their communication via the messaging bus. More work is needed before the data store and API server designs can be documented.

Plugins
Although we have described a list of the metrics ceilometer should collect, we cannot predict all of the ways deployers will want to measure the resources their customers use. This means that ceilometer needs to be easy to extend and configure so it can be tuned for each installation. A plugin system based on setuptools entry points will make it easy to add new monitors in the collector or subagents for polling.

Each daemon provides basic essential services in a framework to be shared by the plugins, and the plugins do the specialized work. As a general rule, the plugins should be asked to do as little work as possible. This will make them more efficient as greenlets, maximize code reuse, and make them simpler to implement.

Installing a plugin automatically activates it the next time the ceilometer daemon starts. A global configuration option can be used to disable installed plugins (for example, one or more of the "default" set of plugins provided as part of the ceilometer package).

Plugins may require configuration options, so when the plugin is loaded it is asked to add options to the global flags object, and the results are made available to the plugin before it is asked to do any work.

Rather than running and reporting errors or simply consuming cycles for no-ops, plugins may disable themselves at runtime based on configuration settings defined by other components (for example, the plugin for polling libvirt would not run if it sees that the system is configured using some other virtualization tool). The plugin will be asked once at startup, after it has been loaded and given the configuration settings, if it should be enabled. Plugins should not define their own flags for enabling or disabling themselves.

Each plugin API is defined by the namespace and an abstract base class for the plugin instances. Plugins are not required to subclass from the API definition class, but it is encouraged as a way to discover API changes.

Note: There is ongoing work to add a generic plugin system to Nova. If that is implemented as part of the common library, we should try to use it (or adapt it as necessary for our use). If it remains part of Nova for Folsom we should probably not depend on it because loading plugins is trivial with setuptools.

Polling
Metering data comes from two sources: through notifications built into the existing OpenStack components and by polling the infrastructure (such as via libvirt). Polling is handled by an agent running on the compute node (where communication with the hypervisor is more efficient).

The agent daemon is configured to run one or more pollster plugins using the `ceilometer.poll.compute` namespace. The agent periodically asks each pollster for instances of `Counter` objects. The agent framework converts the Counters to metering messages, which it then signs and transmits on the metering message bus.

The pollster plugins should not communicate with the message bus directly, unless it is necessary to do so in order to collect the information for which they are polling.

All polling happens with the same frequency, controlled by a global setting for the agent. If we need to support polling different meters at different rates, we can investigate that in a future release.

Handling Notifications
The heart of the system is the collector, which monitors the message bus for data being provided by the pollsters via the agent as well as notification messages from other OpenStack components such as nova, glance, quantum, and swift.

The collector loads one or more listener plugins, using a namespace under `ceilometer.collector`. The namespace controls the exchange and topic where the listener is subscribed. For example, `ceilometer.collector.compute` listens on the `nova` exchange to the `notifications.info` topic while `ceilometer.collector.image` listens on the `glance` exchange for `notifications.info`.

The plugin provides a method to list the event types it wants and a callback for processing incoming messages. The registered name of the callback is used to enable or disable it using the global configuration option of the collector daemon. The incoming messages are filtered based on their event type value before being passed to the callback so the plugin only receives events it has expressed an interest in seeing. For example, a callback asking for `compute.instance.create.end` events under `ceilometer.collector.compute` would be invoked for those notification events on the `nova` exchange using the `notifications.info` topic.

The callback should return an iterable with zero or more `Counter` instances based on the data in the incoming message. The collector framework code converts the `Counter` instances to metering messages and publishes them on the metering message bus. Although we will provide a default storage solution to work with the API service, by republishing on the metering message bus we can support installations that want to handle their own data storage.

Handling Metering Messages
The listener for metering messages also runs in the collector. It validates the incoming data and (if the signature is valid) then writes the messages to the data store. (Note, because this listener is different, it may be implemented directly in the collector code instead of as a plugin. In fact, we might decide to put this in its own daemon entirely, but for now it seems OK to keep it in the collector process.)

Metering messages are signed using the hmac module in Python's standard library. A shared secret value can be provided in the ceilometer configuration settings. The messages are signed by feeding the message key names and values into the signature generator in sorted order. Non-string values are converted to unicode and then encoded as UTF-8. The message signature is included in the message for verification by the collector, and stored in the database for future verification by consumers who access the data via the API.

RPC
Until RPC services are moved into the openstack-common library we will use the version in nova.

Implementation Outline
These should eventually be moved into tickets or a blueprint.


 * 1) Implement a Counter class
 * 2) use a namedtuple?
 * 3) Library code to convert Counter instances to meter messages
 * 4) Library code to sign meter messages
 * 5) Library code to validate signature of metering messages
 * 6) Library code to publish meter messages
 * 7) Create the framework for the agent daemon to run on the compute node.
 * 8) Start the service
 * 9) Load the plugins and determine which are active
 * 10) Schedule the periodic task to poll the plugins
 * 11) Publish meter messages for the data returned by the polling plugins
 * 12) Create the framework for the collector daemon to run on the management node.
 * 13) Start the service
 * 14) Load the plugins and determine which are active (different rules from the polling plugins?)
 * 15) Establish a callback for each listener and event type
 * 16) Publish meter messages for the data returned by the listener plugins
 * 17) Update collector to handle metering messages
 * 18) Listen for metering messages (are these just "cast" calls via RPC?)
 * 19) On receipt of message, check the signature and "store" it (for now, just write it to the log)