This page is intended to describe Watcher components and how they interoperate. This is currently a work in progress.

Components

This page lists the different technical components of the Watcher module. The components are sorted in alphabetical order.

CEP : Complex Event Processing

CEP solutions have been developed to address the requirements of applications that analyze and react to events in real time.

It is able to analyze, aggregate, correlate, filter events and detect "short-live" opportunities or optimizations and respond to them as quickly as possible by triggering actions.

For performance reasons, there may be several instances of CEP Engine running simultaneously, each instance processing a certain type of metric/events. It is not possible to rely on a classical stateless load-balancing mechanism (with every CEP instance hosting the same rules) because some metrics/events are linked in time and the CEP must be able to build coherent streams. Therefore, it is more necessary to always send a given type of events/metrics to the same CEP instance. This is the reason why several metrics collector endpoints will be provided in the Watcher module.

In the Watcher system, the CEP will trigger two types of actions :

write relevant events/metrics into the Time Series database. Only the relevant metrics for Watcher must be stored in the time series database. The CEP can also consolidate/aggregate metrics/events and generate higher level metrics/events needed by the Watcher Decision Engine.
send relevant events to the Watcher Decision Engine component whenever this event may influence the result of current optimization strategy because an Openstack cluster is not a static system. For instance, it may be important to notify the Decision Engine that a new compute node has been added to the cluster or that a new instance has been requested by a customer project.

Time Series Database

This database stores all the timestamped information regarding cluster and resources history (state, metrics, events, ...). Those information are used by the decision engine to know at any time the current cluster state and its evolution in a given time-frame.

Watcher Actions Applier

This component is in charge of executing the plan of actions built by the Decision Engine.

If the actions plan is described as a BPMN 2.0 workflow, this component could be a BPMN workflow engine.

For each action of the workflow, this component may call directly the component responsible for this kind of action (Example : Nova API for an instance migration) or indirectly via some publish/subscribe pattern on the message bus (asynchronous workers).

It notifies continuously of the current progress of each ongoing Action Plan (and atomic Actions), sending status messages on the bus. Those events may be used by the CEP to trigger new actions (or even a new audit in case the automatic mode is enabled).

This component is also connected to the Watcher Database in order to :

get the description of the action plan to execute
persist its current state so that if it is restarted, it can restore each Action plan context and restart from the last known safe point of each ongoing workflow.

Watcher API

This component implements the REST API provided by the Watcher module to the external world. It enables a cluster administrator to control and monitor the Watcher system via any interaction mechanism connected to this API :

CLI
Horizon plugin
Python SDK

Watcher Conductor

This component is in charge of handling read/write of Watcher business objects in the Watcher database needed by the various Watcher components (Optimizer, Planner, Applier).

All those CRUD operations are done via RPC calls on the message bus. Therefore this module subscribe to one or several Oslo topics where other Watcher component can publish their RPC calls.

It can also generates events for the CEP in order to provide some loop-back mechanism to the Decision Engine.

Watcher Database

This database stores all the watcher business objects which can be requested by the Watcher API :

Audits description :
- list of auditing goal(s)
- scope of the audit : i.e. the list of compute nodes to which the audit must be applied.
- aggressivity level
Action plans
Actions history
Watcher settings :
- metrics/events collector endpoints for each type of metric
- manual/automatic mode

It may be any relational database or a key-value database.

Watcher Decision Engine

This component is responsible for computing a list of potential optimization actions in order to fulfill the goals of an audit.

It uses the following input data :

current, previous and predicted state of the cluster (hosts, instances, network, ...)
evolution of metrics within a time frame

Is first selects the most appropriate optimization strategy depending on several factors :

the optimization goals that must be fulfilled (energy consumption, bin-packing, ...)
the deadline that was provided by the Openstack cluster admin
the "aggressivity" level regarding potential optimization actions :
- is it allowed to do a lot of instance migrations ?
- is it allowed to consume a lot of bandwidth on the admin network ?
- is it allowed to violate initial placement constraint such as affinity/anti-affinity, region, ... ?

The strategy is then executed and generates a list of Actions in order to fulfill the goals of the Audit. Those actions are not necessarily ordered in time (it depends on the selected Strategy). Therefore, this component reorganizes the list of actions into an ordered sequence of actions (migrations, ...) such that all security, dependency, and performance requirements are met. An ordered sequence of actions is called an "Action Plan".

It builds an appropriate workflow (with any appropriate scheduling format such as BPMN 2.0) which defines how-to schedule in time the different actions and for each action what are the pre-requisite conditions.

A very simple Action Plan for example would consist in allowing only one migration of instance at a time, in order to make sure it does not consume too much bandwidth of the admin network.

Another Action Plan would consist in migrating all instances from a compute node before changing its power supply ACPI configuration.

This component saves the generated Action Plan(s) in the Watcher Database via an RPC call to the Watcher Conductor.

It notifies its current status (learning phase, ...) and the current status of each Audit on the message bus via an RPC call to the Watcher Conductor.

Watcher Message Bus

The message bus handles asynchronous communications between the different Watcher modules as well as RPC calls for read/write operations in the Watcher database.

Watcher Metrics Publisher

The metrics publisher is a component which collects and computes some metrics or events and publishes it to an endpoint of the CEP.

The metrics publisher can be a Ceilometer publisher, which already provides many metrics related to compute nodes, instances, storage and network as can be seen here : http://docs.openstack.org/admin-guide-cloud/content/section_telemetry-measurements.html

WatcherArchitecture

Contents