Jump to: navigation, search

Difference between revisions of "Monasca/Operations"

(noted thresh in MoM alarms)
(Alarm Definition Configuration: refresh links)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
'''Information on this page is old and no longer kept up to date, see http://monasca.io/ for more up to date information'''
 +
 
= Monasca Overview =
 
= Monasca Overview =
 
Monasca is a monitoring system with many parts that can scale horizontally to service large cloud deployments. The Monasca components can roughly be broken down into two categories, those which are part of the server cluster and the components that interact only with the Monasca API. In the standard flow of data Monasca agents send [https://github.com/stackforge/monasca-agent/blob/master/docs/MonascaMetrics.md measurements] into the system which are then processed by the threshold engine as well as stored for future retrieval and/or graphing.
 
Monasca is a monitoring system with many parts that can scale horizontally to service large cloud deployments. The Monasca components can roughly be broken down into two categories, those which are part of the server cluster and the components that interact only with the Monasca API. In the standard flow of data Monasca agents send [https://github.com/stackforge/monasca-agent/blob/master/docs/MonascaMetrics.md measurements] into the system which are then processed by the threshold engine as well as stored for future retrieval and/or graphing.
Line 8: Line 10:
  
 
== Alarm Definition Configuration ==
 
== Alarm Definition Configuration ==
Alarm definitions can be created directly through [https://github.com/stackforge/monasca-api/blob/master/docs/monasca-api-spec.md#create-alarm-definition the api] and via the python-monascaclient cli]. Additionally there is an Ansible module to assist in Alarm definition creation and a role with many default alarms already defined, both found at https://github.com/hpcloud-mon/ansible-monasca-default-alarms.
+
Alarm definitions can be created directly through [https://github.com/stackforge/monasca-api/blob/master/docs/monasca-api-spec.md#create-alarm-definition the api] and via the python-monascaclient cli [https://docs.openstack.org/python-monascaclient/latest/].  
 +
 
 +
Additionally there is an Ansible module to assist in Alarm definition creation and a role with many default alarms already defined, both found at https://github.com/hpcloud-mon/ansible-monasca-default-alarms.
 +
 
 +
A newer example can be found in ArdanaCLM - see [https://github.com/ArdanaCLM/monasca-ansible/] and [https://github.com/ArdanaCLM/monasca-ansible/blob/master/roles/monasca-agent/library/monasca_agent_plugin.py].
  
 
== Agent Configuration ==
 
== Agent Configuration ==
Line 42: Line 48:
 
|-
 
|-
 
| Agent || emit time, collection time
 
| Agent || emit time, collection time
 +
|-
 +
| Persister || For the Java version a healthcheck on the admin url
 +
|-
 +
| API || For the Java version a healthcheck on the admin url
 
|}
 
|}

Latest revision as of 01:04, 15 December 2018

Information on this page is old and no longer kept up to date, see http://monasca.io/ for more up to date information

Monasca Overview

Monasca is a monitoring system with many parts that can scale horizontally to service large cloud deployments. The Monasca components can roughly be broken down into two categories, those which are part of the server cluster and the components that interact only with the Monasca API. In the standard flow of data Monasca agents send measurements into the system which are then processed by the threshold engine as well as stored for future retrieval and/or graphing.

The Monasca threshold engine evaluates metrics according to alarm definitions. As measurements come into the system alarms are created according to how they match the alarm definitions. Each alarm definition can be associated with a notification method which triggers when the alarm changes state.

Client Configuration

Monasca is fully multi-tenant so each project must configure the various agents and alarm definitions to drive the system.

Alarm Definition Configuration

Alarm definitions can be created directly through the api and via the python-monascaclient cli [1].

Additionally there is an Ansible module to assist in Alarm definition creation and a role with many default alarms already defined, both found at https://github.com/hpcloud-mon/ansible-monasca-default-alarms.

A newer example can be found in ArdanaCLM - see [2] and [3].

Agent Configuration

The Monasca agent is highly configurable and can collect measurements from many sources as well as be extended. For information on direct configuration refer to the agent documentation.

An Ansible role for installing and configuring the agent is available at https://github.com/hpcloud-mon/ansible-monasca-agent this includes an Ansible module (monasca_agent_plugin) for running specific monasca-setup detection plugins as well as examples of how to add in custom plugins with Ansible.

Server Installation and Configuration

The entire server stack with all of its components can be built and configured using Ansible. The team development environment does this on a small scale. The various roles have also been used in fully clustered deployments of Monasca.

Additionally some teams have configured Monasca via Puppet.

MoM - Monitoring of Monasca

Monasca itself needs to be monitored and is fully capable of monitoring itself. In non-production installations this can be done by the agent running on the Monasca boxes reporting back to the Monasca API. For production installations I recommend that the agent running on the Monasca nodes report to another installation of Monasca possibly a single vm 'mini-mon' which is itself monitored by the primary installation, this avoids dependency loops.

As components of Monasca are developed metrics for the monitoring of that component need to be added as well as alarm definitions and finally default graphs to view the metrics. The most basic metrics are used in simple up/down alarms and more advanced used for thresholds and graphs aiding in predictive failure and capacity planning.

Here are the MoM alarms broken down by component, the exact alarms can be found at https://github.com/hpcloud-mon/ansible-monasca-default-alarms/blob/master/tasks/monasca.yml

Component Alarm
zookeeper pid check, average latency, zookeeper connections_count
kafka pid check, consumer lag
mysql pid check, slow queries
notification pid check, config db time, email time
thresh/storm pid check of nimbus, supervisor and workers
Agent emit time, collection time
Persister For the Java version a healthcheck on the admin url
API For the Java version a healthcheck on the admin url