Jump to: navigation, search

Difference between revisions of "Monasca"

(Technologies)
m
 
(151 intermediate revisions by 14 users not shown)
Line 1: Line 1:
== Requirements ==
+
== Overview ==
  
* Monitors OpenStack infrastructure and resources
+
Monasca is a open-source multi-tenant, highly scalable, performant, fault-tolerant monitoring-as-a-service solution that integrates with OpenStack. It uses a REST API for high-speed metrics processing and querying and has a streaming alarm engine and notification engine.
** monitors both OpenStack infrastructure and its resources (e.g., VMs) so that another monitoring tool may not be needed.
 
** monitors OpenStack CRUD operations using StackTach
 
** (notion of host, applications, grouping)
 
  
* Can Scale
 
** to hundreds of thousands of servers, metrics, alarms
 
  
* Allows user-defined metric and alarm creation
+
[[File:OpenStack_Project_Monasca_vertical.png|thumb|right]]
** Where a user is system administrator of the cloud, or a cloud consumer
 
** Allows metrics and alarms to be defined on a group of resources (e.g., host aggregates, availability zones)
 
  
* Multi-tenant, multi-cloud, multiple deployment models
+
==== Contribution ====
** can process metrics from multiple clouds, tenants, and users
+
* [https://storyboard.openstack.org/#!/project_group/59 Project Group on StoryBoard]
** works in multiple deployment models
+
* [https://storyboard.openstack.org/#!/board/111 Important activities in Stein release]
*** single private cloud (e.g., 10 nodes)
+
* [https://storyboard.openstack.org/#!/board/141 Important activities in Train release]
*** multiple private clouds
+
* [https://docs.openstack.org/monasca-api/latest/contributor/index.html Contribution Guidelines]
*** private and public cloud (hybrid)
 
  
* Has an HTTP based extensible API
+
==== Communication and Meetings ====
** for metrics, alarm creation, metric collection.
+
* IRC: #openstack-monasca on OFTC
** API (event definition) compatability with Zabbix ?
+
* Weekly Meetings:
 +
** http://eavesdrop.openstack.org/#Monasca_Team_Meeting
 +
** [[Meetings/Monasca]]
 +
* Use [Monasca] tag on [http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss openstack-discuss mailing list].
  
* Can monitor OpenStack projects like Sahara, Marconi, Trove, Ironic
+
===== PTG meetings =====
 +
* [[MonascaSteinPTG|Stein PTG Summary (Denver)]]
 +
* [[MonascaTrainPTG|Train PTG Summary (Denver)]]
 +
* [https://etherpad.opendev.org/p/monasca-planning-ussuri Ussuri planning meeting]
 +
* [https://etherpad.opendev.org/p/monasca-ptg-victoria Victoria PTG meeting]
  
* Authentication
+
=== Documentation ===
** Authenticates with keystone. May also authenticate with other backends (e.g., CloudFoundry, Open LDAP, AWS)
+
: Monasca API (and links to other documents): https://docs.openstack.org/monasca-api/latest/
 +
: Monasca command line interface: https://docs.openstack.org/python-monascaclient/latest/
 +
: Monasca API Specification: https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md
 +
: Agent Documentation: https://github.com/openstack/monasca-agent
  
* Can collect configuration data
+
==== Presentations ====
** that is, openstack configuration files.
+
There have been many interesting and excellent presentations on Monasca, many given at the OpenStack or OpenInfra Summits.
 +
: [[Monasca/Presentations]] (updated with Denver 2019 links)
  
* Can collect and index log data
 
** e.g., collected logs are indexed into Elastic Search/Kibana etc
 
  
* Is compatible with alarms and checks of Nagios and Zabbix
+
==== Repositories ====
  
* Provides a pluggable messaging system
+
: '''Code'''
** e.g., Oslo messaging, Kafka, RabbitMQ
+
:: Core: https://git.openstack.org/cgit/?q=monasca (old git link)
 +
:: API: https://opendev.org/openstack/monasca-api
  
* Provides a pluggable backend store
+
: '''Deployment'''
** e.g., vertica, influx DB, Cassandra, MySQL
+
: The following repositories are available for deploying Monasca:
 +
:: Docker: https://github.com/monasca/monasca-docker
 +
:: Kubernetes: https://github.com/monasca/monasca-helm
 +
:: Ansible: https://github.com/search?utf8=%E2%9C%93&q=ansible-monasca
 +
:: Puppet: https://git.openstack.org/openstack/puppet-monasca
  
* Compatible with Ceilometer for metering
 
** through a ceilometer plugin
 
  
* A monitoring dashboard
+
==== Bugs ====
** using Kibana or Django etc
 
  
* Built using python and open source tools
+
===== All open bugs =====
 +
: https://storyboard.openstack.org/#!/worklist/213
 +
 
 +
===== In Stein triaged bugs =====
 +
: https://storyboard.openstack.org/#!/worklist/467
 +
 
 +
===== Bug-fixing progress =====
 +
: https://storyboard.openstack.org/#!/board/114
 +
 
 +
 
 +
==== Requirements ====
 +
[[Monasca/Requirements]] from design in 2015.
 +
 
 +
See also https://opendev.org/openstack/monasca-specs/
  
 
== Features ==
 
== Features ==
Line 67: Line 83:
 
* Compound alarms described using a simple expressive grammar composed of alarm sub-expressions and logical operators.
 
* Compound alarms described using a simple expressive grammar composed of alarm sub-expressions and logical operators.
  
* Monitoring agent that supports a number of built-in system and service checks and also supports Nagios checks and statsd.
+
* Monitoring agent that supports a number of built-in system and service checks and also supports Nagios checks and statsd, and support for scraping endpoints as Prometheus does.
  
 
* Open-source monitoring solution built on open-source technologies.
 
* Open-source monitoring solution built on open-source technologies.
 +
 +
 +
Monasca API can be integrated with other reporting, visualization, or billing systems, such as [https://wiki.openstack.org/wiki/CloudKitty CloudKitty] or Grafana.
 +
 +
Monasca community is also involved with the [https://wiki.openstack.org/wiki/Self-healing_SIG Self Healing] and [https://wiki.openstack.org/wiki/Auto-scaling_SIG Auto-Scaling] OpenStack SIGs.
 +
 +
=== Comparisons to alternatives ===
 +
[[Monasca/Comparison]] (to be written)
  
 
== Architecture ==
 
== Architecture ==
Line 77: Line 101:
 
** Metrics: Store and query massive amounts of metrics in real-time.
 
** Metrics: Store and query massive amounts of metrics in real-time.
 
** Statistics: Query statistics for metrics.
 
** Statistics: Query statistics for metrics.
** Alarms: Create, update, query and delete alarms and query the alarm history.
+
** Alarm Definitions: Create, update, query and delete alarm definitions.
 +
** Alarms: Query and delete the alarm history.
 
***Simple expressive grammar for creating compound alarms composed of alarm subexpressions and logical operators.
 
***Simple expressive grammar for creating compound alarms composed of alarm subexpressions and logical operators.
 
***Alarm severities can be associated with alarms.
 
***Alarm severities can be associated with alarms.
 
***The complete alarm state transition history is stored and queryable which allows for subsequent root cause analysis (RCA) or advanced analytics.
 
***The complete alarm state transition history is stored and queryable which allows for subsequent root cause analysis (RCA) or advanced analytics.
 
**Notification Methods: Create and delete notification methods and associate them with alarms, such as email. Supports the ability to notify users directly via email when an alarm state transitions occur.
 
**Notification Methods: Create and delete notification methods and associate them with alarms, such as email. Supports the ability to notify users directly via email when an alarm state transitions occur.
* Persister (monasca-persister): Consumes metrics and alarm state transitions from the MessageQ and stores them in the Metrics and Alarms database. We will look into converting the Persister to a Python component in the future.
+
** The Monasca API has both Java and Python implementations avaialble.
* Transform and Aggregation Engine: Transform metric names and values, such as delta or time-based derivative calculations, and creates new metrics that are published to the Message Queue. The Transform Engine is not available yet.
+
* Persister (monasca-persister): Consumes metrics and alarm state transitions from the MessageQ and stores them in the Metrics and Alarms database.
* Anomaly and Prediction Engine: Evaluates prediction and anomalies and generates predicted metrics as well as anomaly likelihood and anomaly scores.
+
** The Persister has both Java and Python implementations.
 +
* Transform and Aggregation Engine (monasca-transform): Transform metric names and values, such as delta or time-based derivative calculations, and creates new metrics that are published to the Message Queue. The Transform Engine is not available yet.
 +
* Anomaly and Prediction Engine: Evaluates prediction and anomalies and generates predicted metrics as well as anomaly likelihood and anomaly scores. The Anomaly and Prediction Engine is currently in a prototype status.
 
* Threshold Engine (monasca-thresh): Computes thresholds on metrics and publishes alarms to the MessageQ when exceeded. Based on Apache Storm a free and open distributed real-time computation system.
 
* Threshold Engine (monasca-thresh): Computes thresholds on metrics and publishes alarms to the MessageQ when exceeded. Based on Apache Storm a free and open distributed real-time computation system.
 
* Notification Engine (monasca-notification): Consumes alarm state transition messages from the MessageQ and sends notifications, such as emails for alarms. The Notification Engine is Python based.
 
* Notification Engine (monasca-notification): Consumes alarm state transition messages from the MessageQ and sends notifications, such as emails for alarms. The Notification Engine is Python based.
 +
* Analytics Engine (monasca-analytics): Consumes alarm state transisitions and metrics from the MessageQ and does anomaly detection and alarm clustering/correlation.
 
* Message Queue: A third-party component that primarily receives published metrics from the Monitoring API and alarm state transition messages from the Threshold Engine that are consumed by other components, such as the Persister and Notification Engine. The Message Queue is also used to publish and consume other events in the system. Currently, a Kafka based MessageQ is supported. Kafka is a high performance, distributed, fault-tolerant, and scalable message queue with durability built-in. We will look at other alternatives, such as RabbitMQ and in-fact in our previous implementation RabbitMQ was supported, but due to performance, scale, durability and high-availability limitiations with RabbitMQ we have moved to Kafka.
 
* Message Queue: A third-party component that primarily receives published metrics from the Monitoring API and alarm state transition messages from the Threshold Engine that are consumed by other components, such as the Persister and Notification Engine. The Message Queue is also used to publish and consume other events in the system. Currently, a Kafka based MessageQ is supported. Kafka is a high performance, distributed, fault-tolerant, and scalable message queue with durability built-in. We will look at other alternatives, such as RabbitMQ and in-fact in our previous implementation RabbitMQ was supported, but due to performance, scale, durability and high-availability limitiations with RabbitMQ we have moved to Kafka.
* Metrics and Alarms Database: A third-party component that primarily stores metrics and the alarm state history. Currently, Vertica is supported and development for InfluxDB is in progress.
+
* Metrics and Alarms Database: A third-party component that primarily stores metrics and the alarm state history. Currently, Vertica, InfluxDB, and Cassandra are supported.
* Config Database: A third-party component that stores a lot of the configuration and other information in the system. Currently, MySQL is supported.
+
* Config Database: A third-party component that stores a lot of the configuration and other information in the system. Currently, MySQL is supported. Support for PostgreSQL is in progress.
 
* Monitoring Client (python-monascaclient): A Python command line client and library that communicates and controls the Monitoring API. The Monitoring Client was written using the OpenStack Heat Python client as a framework. The Monitoring Client also has a Python library, "monascaclient" similar to the other OpenStack clients, that can be used to quickly build additional capabilities. The Monitoring Client library is used by the Monitoring UI, Ceilometer publisher, and other components.
 
* Monitoring Client (python-monascaclient): A Python command line client and library that communicates and controls the Monitoring API. The Monitoring Client was written using the OpenStack Heat Python client as a framework. The Monitoring Client also has a Python library, "monascaclient" similar to the other OpenStack clients, that can be used to quickly build additional capabilities. The Monitoring Client library is used by the Monitoring UI, Ceilometer publisher, and other components.
* Alarm Configuration Manager: A Python process that will detect new metrics and configure alarms based on the configuration. It uses the monitoring client library that communicates with the Monitoring API. The Alarm Configuration Manager is a Python Daemon that runs on a configurable interval and detects new metrics that need to be alarmed and creates the alarms.
 
 
* Monitoring UI: A Horizon dashboard for visualizing the overall health and status of an OpenStack cloud.
 
* Monitoring UI: A Horizon dashboard for visualizing the overall health and status of an OpenStack cloud.
 
* Ceilometer publisher: A multi-publisher plugin for Ceilometer, not shown, that converts and publishes samples to the Monitoring API.
 
* Ceilometer publisher: A multi-publisher plugin for Ceilometer, not shown, that converts and publishes samples to the Monitoring API.
 
Most of the components are described in their respective repositories. However, there aren't any repositories for the third-party components used, so we describe some of the relevant details here.
 
Most of the components are described in their respective repositories. However, there aren't any repositories for the third-party components used, so we describe some of the relevant details here.
  
=== Message Queue ===
+
Further historical context for the architecture can be found at [[Monasca/Architecture Details]]
A distributed, performant, scalable, HA message queue for distributing metrics, alarms and events in the monitoring system. Currently, based on Kafka.
 
  
==== Messages ====
+
=== Further Reading ===
There are several messages that are published and consumed by various components in the monitoring system via the MessageQ.
+
* [[Monasca/Message Schema]]
{| class="wikitable"
+
* Message Schema: A distributed, performant, scalable, HA message queue for distributing metrics, alarms and events in the monitoring system. Currently, based on Kafka.
|-
+
* Messages: There are several messages that are published and consumed by various components in Monasca via the MessageQ. See [[Monasca/Message Schema|Message Schema]].
! Message !! Produced By !! Consumed By !! Kafka Topic !! Description
+
* Metrics and Alarms Database: see [[Monasca/Architecture Details]]
|-
+
* Config Database: see [[Monasca/Architecture Details]]
| Metric || API, Transform and Aggregation Engine || Persister, Threshold Engine || metrics || A metric sent to the Monitoring API or created by the Transform and Aggregation Engine is published to the MessageQ.
 
|-
 
| Alarm Definition Event || API || Threshold Engine || events || When an alarm is created, updated, or deleted by the monitoring API an Alarm Definition Event is published to the MessageQ.
 
|-
 
| Alarm State Transitioned || API || Threshold Engine || events || When an alarm is created, updated, or deleted by the monitoring API an Alarm Definition Event is published to the MessageQ.
 
|-
 
| Alarm State Transitioned || Threshold Engine || Notification Engine, Persister || alarm-state-transitions || When an alarm transitions from the OK to Alarmed, Alarmed to OK, ..., this event is published to the MessageQ and persisted by the persister and processed by the Notification Engine. The Monitoring API can query the history of alarm state transition events.
 
|-
 
| Alarm Notification || Notification Engine || Persister || alarm-notifications || This event is published to the MessageQ when the notification engine processes an alarm and sends a notification. The alarm notification is persisted by the Persister and can be queried by the Monitoring API. The database maintains a history of all events.
 
|}
 
  
=== Metrics and Alarms Database ===
+
* Events: Support for real-time event stream processing in Monasca is in progress. For more details see the link at, [[Monasca/Events]].
  
A high-performance analytics database that can store massive amounts of metrics and alarms in real-time and also support interactive queries. Currently Vertica and InfluxDB are supported.
+
* Logging: Support for logging in Monasca is under discussion. For more details see the link at, [[Monasca/Logging]].
  
The SQL schema that is used by Vertica is as follows:
+
* Transform and Aggregation Engine: For more details see the link at, [[Monasca/Transform]].
  
* MonMetrics.Measurements: Stores the actual measurements that are sent.
+
* Analytics: Support for anomaly detection and alarm clustering/correlation is in progress. For more details see the link at, [[Monasca/Analytics]].
** id: An integer ID for the measurement.
 
** definition_dimensions_id: A reference to DefinitionDimensions.
 
** time_stamp
 
** value
 
* MonMetrics.DefinitionDimensions
 
** id: A sha1 hash of (defintion_id, dimension_set_id)
 
** definition_id: A reference to the Definitions.id
 
** dimension_set_id: A reference to the Dimensions.dimension_set_id
 
* MonMetrics.Definitions
 
** id: A sha1 hash of the (name, tenant_id, region)
 
** name: Name of the metric.
 
** tenant_id: The tenant_id that submitted the metric.
 
** region: The region the metric was submitted under.
 
* MonMetric.Dimensions
 
** dimension_set_id: A sha1 hash of the set of dimenions for a metric.
 
** name: Name of dimension.
 
** value: Value of dimension.
 
  
=== Config Database ===
+
* Monitoring: Enablement and usage for monitoring the status of Monasca is under discussion. For more details see the link at, [[Monasca/Monitoring_Of_Monasca]]
The config database store all the configuration information. Currently based on MySQL.
 
  
The SQL schema is as follows:
+
* UI/UX Support: Adding more support for common UI/UX queries is under discussion. For more details see the link at, [[Monasca/UI_UX_Support]]
  
* alarm
+
* Post Metric Sequence: see [[Monasca/Architecture Details]]
** id
 
** tenant_id
 
** name
 
** description
 
** expression
 
** state
 
** actions_enabled
 
** created_at
 
** updated_at
 
** deleted_at
 
* alarm_action
 
** alarm_id
 
** alarm_state
 
** action_id
 
* notification_method
 
** id
 
** tenant_id
 
** name
 
** type
 
** address
 
** created_at
 
** updated_at
 
* sub_alarm
 
** id
 
** alarm_id
 
** function
 
** metric_name
 
** operator
 
** threshold
 
** period
 
** periods
 
** state
 
** created_at
 
** updated_at
 
* sub_alarm_dimension
 
** sub_alarm_id
 
** dimension_name
 
** value
 
  
== Post Metric Sequence ==
+
* Alarm Managers: see [[Monasca/Architecture Details]] and the official documentation at docs.openstack.org
  
This section describes the sequence of operations involved in posting a metric to the Monasca API.
+
== Keystone Requirements ==
 +
Monasca relies on keystone for running and there are requirements about which keystone configuration must exist.
  
[[File:monasca-arch-post-metric-diagram.png|Monasca Architecture Post Metric Diagram]]
+
* The endpoint for the api must be registered in keystone as the 'monasca' service.
 +
* The api must have an admin token to use in verifying the keystone tokens it receives.
 +
* For each project which uses Monasca two users must exist, one will be in the 'monasca-agent' role and be used by the monasca-agent's running on machines. The other should not be in that role and can be used logging into the UI, using the CLI or for direct queries against the API.  
  
# A metric is posted to the Monasca API.
 
# The Monasca API authenticates and validates the request and publishes the metric to the the Message Queue.
 
# The Persister consumes the metric from the Message Queue and stores in the Metrics Store.
 
# The Transform Engine consumes the metrics from the Message Queue, performs transform and aggregation operations on metrics, and publishes metrics that it creates back to Message Queue.
 
# The Threshold Engine consumes metrics from the Message Queue and evaluates alarms. If a state change occurs in an alarm, an "alarm-state-transitioned-event" is published to the Message Queue.
 
# The Notification Engine consumes "alarm-state-transitioned-events" from the Message Queue, evaluates whether they have a Notification Method associated with it, and sends the appropriate notification, such as email.
 
# The Persister consumes the "alarm-state-transitioned-event" from the Message Queue and stores it in the Alarm State History Store.
 
  
 
= Development Environment =
 
= Development Environment =
  
* Comes with a turn-key development environment based on Vagrant, that can be used for quickly deploying on a client system, such as a MAC OS X based system. See https://github.com/stackforge/monasca-vagrant.
+
* Monasca DevStack Plugin:
 +
** DevStack is the primary developmement environment for OpenStack.
 +
***See http://docs.openstack.org/developer/devstack/
 +
** The Monasca DevStack plugin installs the Monasca Service, Agent, Horizon Monitoring Panel, and Grafana
 +
**  README at, https://github.com/openstack/monasca-api/tree/master/devstack
 +
** Best way to get started is to use Vagrant with the Vagrantfile at, https://github.com/openstack/monasca-api/blob/master/devstack/Vagrantfile.
  
* Also comes with a number of Chef cookbooks. See https://github.com/stackforge?query=cookbook-monasca.
+
* Project and Bug tracking
 +
** Monasca on Storyboard https://storyboard.openstack.org/#!/project/list?q=monasca
 +
** Monasca on LaunchPad. https://launchpad.net/monasca (history only, Storyboard is the official tracking since 2017)
  
* Monasca on LaunchPad. https://launchpad.net/monasca
+
* Monasca projects source code. https://git.openstack.org/cgit/?q=monasca
  
* Monasca projects source code. https://github.com/search?q=monasca&ref=cmdform
 
  
 
= Coding Standards =
 
= Coding Standards =
  
 
* Python: All Python code conforms to the OpenStack standards at, http://docs.openstack.org/developer/hacking/.
 
* Python: All Python code conforms to the OpenStack standards at, http://docs.openstack.org/developer/hacking/.
 +
** Note, all components in Monasca, except for the Threshold Engine, have been ported to Python.
  
* Java: OpenStack does not have any Java coding standards. We've adopted the Google Java Style at, https://google-styleguide.googlecode.com/svn/trunk/javaguide.html.
+
* Java: Several of the Monasca components are available as Java. OpenStack does not have any Java coding standards. We've adopted the Google Java Style at, https://google.github.io/styleguide/javaguide.html.
 
** The standard says either 80 or 100 length lines. We've adopted 100.
 
** The standard says either 80 or 100 length lines. We've adopted 100.
  
 
We are in the process of porting to Python and evaluating the performance for the API layer.
 
  
 
= Technologies =
 
= Technologies =
  
Uses a number of underlying technologies:
+
Monasca uses a number of third-party technologies:
  
* Vertica (http://www.vertica.com): A commercial Enterprise class SQL analytics database that is highly scalable. It offers built-in automatic high-availability and excels at in-database analytics and compressing and storing massive amounts of data. In the HP Public Cloud we use Vertica in a number of areas such as metrics and many other data streams. Currently, we process around 25 K metrics/sec and store them for > 13 month data retention periods. A free version of Vertica that can store up to 1 TB of data with no time-limit is available at, https://my.vertica.com/community/. This should be sufficient to get developers started and support smaller installations or where data retention periods are more limited.
+
* Internal Processing and Middleware
 +
** Apache Kafka (http://kafka.apache.org): Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Kafka is a highly performant, distributed, fault-tolerant, and scalable message queue with durability built-in.
 +
** Apache Storm (http://storm.incubator.apache.org/): Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
 +
** ZooKeeper (http://zookeeper.apache.org/): Used by Kafka and Storm.
 +
** Apache Spark: Used by Monasca Transform as an aggregation engine.
  
* Apache Kafka (http://kafka.apache.org): Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Kafka is a highly performant, distributed, fault-tolerant, and scalable message queue with durability built-in.
+
* Configuration database:
 
+
** MySQL: MySQL is supported as a Config Database.
* Apache Storm (http://storm.incubator.apache.org/): Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
+
** PostgreSQL: Support for POSTgres, via Hibernate and SQLAlchemy, for the Config Database.
 
 
* ZooKeeper (http://zookeeper.apache.org/): Used by Kafka and Storm.
 
 
 
* MySQL:
 
  
 
* Vagrant (http://www.vagrantup.com/): Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.
 
* Vagrant (http://www.vagrantup.com/): Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.
Line 238: Line 199:
 
* Dropwizard (https://dropwizard.github.io/dropwizard/): Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done. Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible.
 
* Dropwizard (https://dropwizard.github.io/dropwizard/): Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done. Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible.
  
* InfluxDB (http://influxdb.com/): An open-source distributed time series database with no external dependencies.
+
* Time Series Database:
 +
** InfluxDB (http://influxdb.com/): An open-source distributed time series database with no external dependencies. InfluxDB is supported for the Metrics Database.
 +
** Vertica (http://www.vertica.com): A commercial Enterprise class SQL analytics database that is highly scalable. It offers built-in automatic high-availability and excels at in-database analytics and compressing and storing massive amounts of data. A free community version of Vertica is available that can store up to 1 TB of data with no time-limit is available at, https://my.vertica.com/community/. Vertica is supported for the Metrics Database, though no longer commonly used.
 +
** Cassandra(https://cassandra.apache.org): Cassandra is supported for the Metrics Database.
 +
 
  
 
= License =
 
= License =
  
Copyright (c) 2014 Hewlett-Packard Development Company, L.P.
+
Copyright (c) 2014, 2015 Hewlett-Packard Development Company, L.P.
 +
 
 +
(C) Copyright 2019 SUSE LLC
  
 
Licensed under the Apache License, Version 2.0 (the "License");
 
Licensed under the Apache License, Version 2.0 (the "License");
Line 256: Line 223:
 
See the License for the specific language governing permissions and
 
See the License for the specific language governing permissions and
 
limitations under the License.
 
limitations under the License.
 +
 +
 +
[[File:yklogo.png|200px|thumb|left|Monasca uses YourKit Profiler for Java development]]
 +
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
 +
[https://www.yourkit.com/ Visit YourKit website for more information]

Latest revision as of 13:15, 1 June 2021

Overview

Monasca is a open-source multi-tenant, highly scalable, performant, fault-tolerant monitoring-as-a-service solution that integrates with OpenStack. It uses a REST API for high-speed metrics processing and querying and has a streaming alarm engine and notification engine.


OpenStack Project Monasca vertical.png

Contribution

Communication and Meetings

PTG meetings

Documentation

Monasca API (and links to other documents): https://docs.openstack.org/monasca-api/latest/
Monasca command line interface: https://docs.openstack.org/python-monascaclient/latest/
Monasca API Specification: https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md
Agent Documentation: https://github.com/openstack/monasca-agent

Presentations

There have been many interesting and excellent presentations on Monasca, many given at the OpenStack or OpenInfra Summits.

Monasca/Presentations (updated with Denver 2019 links)


Repositories

Code
Core: https://git.openstack.org/cgit/?q=monasca (old git link)
API: https://opendev.org/openstack/monasca-api
Deployment
The following repositories are available for deploying Monasca:
Docker: https://github.com/monasca/monasca-docker
Kubernetes: https://github.com/monasca/monasca-helm
Ansible: https://github.com/search?utf8=%E2%9C%93&q=ansible-monasca
Puppet: https://git.openstack.org/openstack/puppet-monasca


Bugs

All open bugs
https://storyboard.openstack.org/#!/worklist/213
In Stein triaged bugs
https://storyboard.openstack.org/#!/worklist/467
Bug-fixing progress
https://storyboard.openstack.org/#!/board/114


Requirements

Monasca/Requirements from design in 2015.

See also https://opendev.org/openstack/monasca-specs/

Features

This section describes the overall features.

  • A highly performant, scalable, reliable and fault-tolerant Monitoring as a Service (MONaaS) solution that scales to service provider metrics levels of metrics throughput. Performance, scalability and high-availability have been designed in from the start. Can process 100s of thousands of metrics/sec as well as offer data retention periods of greater than a year with no data loss while still processing interactive queries.
  • Rest API for storing and querying metrics and historical information. Most monitoring solution use special transports and protocols, such as CollectD or NSCA (Nagios). In our solution, http is the only protocol used. This simplifies the overall design and also allows for a much richer way of describing the data via dimensions.
  • Multi-tenant and authenticated. Metrics are submitted and authenticated using Keystone and stored associated with a tenant ID.
  • Metrics defined using a set of (key, value) pairs called dimensions.
  • Real-time thresholding and alarming on metrics.
  • Compound alarms described using a simple expressive grammar composed of alarm sub-expressions and logical operators.
  • Monitoring agent that supports a number of built-in system and service checks and also supports Nagios checks and statsd, and support for scraping endpoints as Prometheus does.
  • Open-source monitoring solution built on open-source technologies.


Monasca API can be integrated with other reporting, visualization, or billing systems, such as CloudKitty or Grafana.

Monasca community is also involved with the Self Healing and Auto-Scaling OpenStack SIGs.

Comparisons to alternatives

Monasca/Comparison (to be written)

Architecture

Monasca Architecture Component Diagram

  • Monitoring Agent (monasca-agent): A modern Python based monitoring agent that consists of several sub-components and supports system metrics, such as cpu utilization and available memory, Nagios plugins, statsd and many built-in checks for services such as MySQL, RabbitMQ, and many others.
  • Monitoring API (monasca-api): A well-defined and documented RESTful API for monitoring that is primarily focused on the following concepts and areas:
    • Metrics: Store and query massive amounts of metrics in real-time.
    • Statistics: Query statistics for metrics.
    • Alarm Definitions: Create, update, query and delete alarm definitions.
    • Alarms: Query and delete the alarm history.
      • Simple expressive grammar for creating compound alarms composed of alarm subexpressions and logical operators.
      • Alarm severities can be associated with alarms.
      • The complete alarm state transition history is stored and queryable which allows for subsequent root cause analysis (RCA) or advanced analytics.
    • Notification Methods: Create and delete notification methods and associate them with alarms, such as email. Supports the ability to notify users directly via email when an alarm state transitions occur.
    • The Monasca API has both Java and Python implementations avaialble.
  • Persister (monasca-persister): Consumes metrics and alarm state transitions from the MessageQ and stores them in the Metrics and Alarms database.
    • The Persister has both Java and Python implementations.
  • Transform and Aggregation Engine (monasca-transform): Transform metric names and values, such as delta or time-based derivative calculations, and creates new metrics that are published to the Message Queue. The Transform Engine is not available yet.
  • Anomaly and Prediction Engine: Evaluates prediction and anomalies and generates predicted metrics as well as anomaly likelihood and anomaly scores. The Anomaly and Prediction Engine is currently in a prototype status.
  • Threshold Engine (monasca-thresh): Computes thresholds on metrics and publishes alarms to the MessageQ when exceeded. Based on Apache Storm a free and open distributed real-time computation system.
  • Notification Engine (monasca-notification): Consumes alarm state transition messages from the MessageQ and sends notifications, such as emails for alarms. The Notification Engine is Python based.
  • Analytics Engine (monasca-analytics): Consumes alarm state transisitions and metrics from the MessageQ and does anomaly detection and alarm clustering/correlation.
  • Message Queue: A third-party component that primarily receives published metrics from the Monitoring API and alarm state transition messages from the Threshold Engine that are consumed by other components, such as the Persister and Notification Engine. The Message Queue is also used to publish and consume other events in the system. Currently, a Kafka based MessageQ is supported. Kafka is a high performance, distributed, fault-tolerant, and scalable message queue with durability built-in. We will look at other alternatives, such as RabbitMQ and in-fact in our previous implementation RabbitMQ was supported, but due to performance, scale, durability and high-availability limitiations with RabbitMQ we have moved to Kafka.
  • Metrics and Alarms Database: A third-party component that primarily stores metrics and the alarm state history. Currently, Vertica, InfluxDB, and Cassandra are supported.
  • Config Database: A third-party component that stores a lot of the configuration and other information in the system. Currently, MySQL is supported. Support for PostgreSQL is in progress.
  • Monitoring Client (python-monascaclient): A Python command line client and library that communicates and controls the Monitoring API. The Monitoring Client was written using the OpenStack Heat Python client as a framework. The Monitoring Client also has a Python library, "monascaclient" similar to the other OpenStack clients, that can be used to quickly build additional capabilities. The Monitoring Client library is used by the Monitoring UI, Ceilometer publisher, and other components.
  • Monitoring UI: A Horizon dashboard for visualizing the overall health and status of an OpenStack cloud.
  • Ceilometer publisher: A multi-publisher plugin for Ceilometer, not shown, that converts and publishes samples to the Monitoring API.

Most of the components are described in their respective repositories. However, there aren't any repositories for the third-party components used, so we describe some of the relevant details here.

Further historical context for the architecture can be found at Monasca/Architecture Details

Further Reading

  • Events: Support for real-time event stream processing in Monasca is in progress. For more details see the link at, Monasca/Events.
  • Logging: Support for logging in Monasca is under discussion. For more details see the link at, Monasca/Logging.
  • Transform and Aggregation Engine: For more details see the link at, Monasca/Transform.
  • Analytics: Support for anomaly detection and alarm clustering/correlation is in progress. For more details see the link at, Monasca/Analytics.
  • Monitoring: Enablement and usage for monitoring the status of Monasca is under discussion. For more details see the link at, Monasca/Monitoring_Of_Monasca
  • UI/UX Support: Adding more support for common UI/UX queries is under discussion. For more details see the link at, Monasca/UI_UX_Support

Keystone Requirements

Monasca relies on keystone for running and there are requirements about which keystone configuration must exist.

  • The endpoint for the api must be registered in keystone as the 'monasca' service.
  • The api must have an admin token to use in verifying the keystone tokens it receives.
  • For each project which uses Monasca two users must exist, one will be in the 'monasca-agent' role and be used by the monasca-agent's running on machines. The other should not be in that role and can be used logging into the UI, using the CLI or for direct queries against the API.


Development Environment


Coding Standards

  • Java: Several of the Monasca components are available as Java. OpenStack does not have any Java coding standards. We've adopted the Google Java Style at, https://google.github.io/styleguide/javaguide.html.
    • The standard says either 80 or 100 length lines. We've adopted 100.


Technologies

Monasca uses a number of third-party technologies:

  • Internal Processing and Middleware
    • Apache Kafka (http://kafka.apache.org): Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Kafka is a highly performant, distributed, fault-tolerant, and scalable message queue with durability built-in.
    • Apache Storm (http://storm.incubator.apache.org/): Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
    • ZooKeeper (http://zookeeper.apache.org/): Used by Kafka and Storm.
    • Apache Spark: Used by Monasca Transform as an aggregation engine.
  • Configuration database:
    • MySQL: MySQL is supported as a Config Database.
    • PostgreSQL: Support for POSTgres, via Hibernate and SQLAlchemy, for the Config Database.
  • Vagrant (http://www.vagrantup.com/): Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.
  • Dropwizard (https://dropwizard.github.io/dropwizard/): Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done. Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible.
  • Time Series Database:
    • InfluxDB (http://influxdb.com/): An open-source distributed time series database with no external dependencies. InfluxDB is supported for the Metrics Database.
    • Vertica (http://www.vertica.com): A commercial Enterprise class SQL analytics database that is highly scalable. It offers built-in automatic high-availability and excels at in-database analytics and compressing and storing massive amounts of data. A free community version of Vertica is available that can store up to 1 TB of data with no time-limit is available at, https://my.vertica.com/community/. Vertica is supported for the Metrics Database, though no longer commonly used.
    • Cassandra(https://cassandra.apache.org): Cassandra is supported for the Metrics Database.


License

Copyright (c) 2014, 2015 Hewlett-Packard Development Company, L.P.

(C) Copyright 2019 SUSE LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Monasca uses YourKit Profiler for Java development





Visit YourKit website for more information