Difference between revisions of "Distributed Monitoring"
m |
(→Implementation) |
||
Line 29: | Line 29: | ||
* ... | * ... | ||
== Implementation == | == Implementation == | ||
− | + | Some external tools are useful for distributed monitoring. | |
You can learn an example of how to implement distributed monitoring with open source software. | You can learn an example of how to implement distributed monitoring with open source software. | ||
− | === | + | === Tools === |
==== collectd ==== | ==== collectd ==== | ||
collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways. | collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways. | ||
Line 51: | Line 51: | ||
=== Setup === | === Setup === | ||
− | + | ==== OpenStack and linux packages ==== | |
− | # | + | # Setup OpenStack environment using DevStack or manual installation. See OpenStack Installation Guide. |
− | # | + | # Install collectd, redis and scikit-learn. See official pages. |
− | |||
==== collectd ==== | ==== collectd ==== | ||
− | You can implement distributed monitoring by | + | You can implement distributed monitoring by confinguring collectd and making collectd's plugins. |
− | See [https:// | + | See [https://collectd.org/documentation/manpages/collectd.conf.5.shtml manpage of collectd.conf] and [https://collectd.org/wiki/index.php/Plugin_architecture collectd plugin architecture]. |
− | + | If you want to see an example for implementation of distributed monitoring, see [https://github.com/distributed-monitoring/ distributed monitoring github repository]. | |
+ | |||
+ | * Poller | ||
collectd has plugins to collect standard metrics such as cpu/memory/disk/network utiliziation. | collectd has plugins to collect standard metrics such as cpu/memory/disk/network utiliziation. | ||
− | See [https://collectd.org/ collectd's | + | See read type plugins on [https://collectd.org/wiki/index.php/Table_of_Plugins collectd's table of plugins]. |
− | |||
− | |||
− | |||
+ | * Collector | ||
+ | You can store the metrics in Redis with collectd Write Redis plugin. | ||
+ | You need to put configuration into collectd's config file as follows: | ||
<pre class="sourceCode none"> | <pre class="sourceCode none"> | ||
+ | LoadPlugin write_redis | ||
<Plugin "write_redis"> | <Plugin "write_redis"> | ||
<Node "dma"> | <Node "dma"> | ||
Line 77: | Line 79: | ||
</pre> | </pre> | ||
− | + | * Analytics Engine | |
− | + | Make collectd plugin for analysis. | |
− | ( | + | The plugin is python script including reading redis data, analysis with scikit-learn and sending the analysis result to Collector. |
+ | A rough sketch of the script is as follows: | ||
+ | <pre class="sourceCode none"> | ||
+ | def read(): | ||
+ | conn = redis.StrictRedis(host='localhost', port=6379) | ||
+ | rawlist = conn.zrange('collectd/localhost/memory/memory-used', | ||
+ | -2, -1) | ||
+ | ... | ||
+ | # (analysis with scikit-learn) | ||
+ | ... | ||
+ | vl = collectd.Values(host='localhost', plugin='dma', type='gauge') | ||
+ | vl.dispatch(values=[result]) | ||
+ | |||
+ | collectd.register_read(read, 1) | ||
+ | </pre> | ||
+ | To load the plugin, you need to put configuration into collectd's config file as follows: | ||
<pre class="sourceCode none"> | <pre class="sourceCode none"> | ||
+ | LoadPlugin python | ||
<Plugin python> | <Plugin python> | ||
ModulePath "/opt/dma/lib" | ModulePath "/opt/dma/lib" | ||
Line 94: | Line 112: | ||
− | + | * Transmitter | |
− | + | Make collectd plugin to transmit analysis result with python. | |
− | ( | + | A rough sketch of the script is as follows: |
+ | <pre class="sourceCode none"> | ||
+ | def notify(vl, data=None): | ||
+ | # vl is the metrics data including severity, time and value. | ||
+ | ... | ||
+ | # (transmit with metrics data, e.g. execute openstack command with python-openstackclient) | ||
+ | ... | ||
+ | |||
+ | collectd.register_notification(notify) | ||
+ | </pre> | ||
+ | |||
+ | To load the plugin and set transmission policy, you need to put configuration into collectd's config file as follows: | ||
<pre class="sourceCode none"> | <pre class="sourceCode none"> | ||
+ | LoadPlugin python | ||
+ | <Plugin python> | ||
+ | ModulePath "/opt/dma/lib" | ||
+ | LogTraces true | ||
+ | Interactive false | ||
+ | Import "write_openstack" | ||
+ | |||
+ | <Module "write_openstack"> | ||
+ | </Module> | ||
+ | </Plugin> | ||
+ | |||
+ | LoadPlugin "threshold" | ||
<Plugin "threshold"> | <Plugin "threshold"> | ||
<Host "localhost"> | <Host "localhost"> | ||
Line 108: | Line 149: | ||
</Plugin> | </Plugin> | ||
</Host> | </Host> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</Plugin> | </Plugin> | ||
</pre> | </pre> | ||
==== Redis ==== | ==== Redis ==== | ||
− | |||
We recommend that you disable saving data of Redis DB on disk. | We recommend that you disable saving data of Redis DB on disk. | ||
− | + | You need to edit redis.conf as follows: | |
− | |||
<pre class="sourceCode none"> | <pre class="sourceCode none"> | ||
#save 900 1 | #save 900 1 |
Revision as of 23:32, 27 September 2017
Contents
Distributed Monitoring
Overview
Monitoring and its application are becoming key factor for service lifecycle management of various systems such as NFV and cloud native platform. Distributed monitoring approach is one of the framework which enables flexible and scalable monitoring that can work with current OpenStack telemetry and monitoring framework. In this documantation, you can find the architecture and how to implement this framework to your environment.
Who will be interested in ?
- For infrastracture operators who want to collect detailed data in short intarval in their compute nodes.
- For NFV operators who want to know abnormal behaivers of Virtual Network Functions.
Architecture
In this architecture includes several functions for monitoring, collector, in-memory database, analysis engine and notification. Below picture shows the architecture of distributed monitoring. Each compute node has it own monitoring function following this architecture.
Function
- Poller/Notification process collect data from guest OSs and host OS using SNMP protocol, libvirt API, OpenStack API, etc.
- Collector format data suitable for in-memory database and insert these data into database.
- Analytics Engine analyzes data on in-memory database, you can use several analytics engine like machine learning libraries. And also you can use evaluator that function is threshold monitoring directly.
- Transmitter send analytics results and alarms that are caught on threshold to Operation Support System, OpenStack API and Orchestrator.
Feature
- Short interval
- You can collect several data in short interval like 0.1 sec. Because collect agent doesn't have to collect data from huge amount of computing nodes and VMs in this architecture. That means the load of agent become lower than centralized monitoring architecture.
- Scalable
- This architecture has high scalability. Because each compute node has it own monitoring function, so you don't have to caluculate specs of nodes for monitoring.
- Fast detection
- In some case, MQ is bottlneck of performances. But any MQ isn't used in this architecture, because monitoring process is closed in compute node.
Use Cases
- Memory Leak
- First use case is memory leak detection, not only for compute node, also virtual machine running in compute node. In case of out-of-memory (OOM), a corresponding node could be out of control suddenly, hence cloud administrator needs to identify such condition before its uncontrollable state. Distributed monitoring can retrieve memory usage in short interval and identify the memory leak using machine learning by scikit-learn.
- Micro Burst Traffic
- Virtual machine's network statistics are very important for network operation, especially Network Function Virtualization (NFV) use-cases. Operator is watching network function (VNF) always to keep the network healthy. Unexpectedly network goes to trouble due to 'micro burst', i.e. massive traffic in very short duration, and it is hard to identify because the duration is very short than monitoring interval. Distributed monitoring enables to monitor network stats in very short interval (e.g. 0.1sec) and identify the target node.
- Abnormal behaviour of software/hardware
- Distributed monitoring enables to monitor various parameters without to communicate to controller node, with low latency, hence it could be utilized to identify abnormal state of virtual machine and hypervisor.
- ...
Implementation
Some external tools are useful for distributed monitoring. You can learn an example of how to implement distributed monitoring with open source software.
Tools
collectd
collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways. You can use collectd as Poller/Notification to gather metrics and Collector to store the metrics in Redis. You can implement Analytics Engine and Transmitter as collectd's plugins.
Redis
Redis is an in-memory data structure store, used as a database, cache and message broker.
scikit-learn
scikit-learn is simple and efficient tools for data mining and data analysis. You can use scikit-learn as light weight machine learning library for Analytics Engine.
Setup
OpenStack and linux packages
- Setup OpenStack environment using DevStack or manual installation. See OpenStack Installation Guide.
- Install collectd, redis and scikit-learn. See official pages.
collectd
You can implement distributed monitoring by confinguring collectd and making collectd's plugins. See manpage of collectd.conf and collectd plugin architecture.
If you want to see an example for implementation of distributed monitoring, see distributed monitoring github repository.
- Poller
collectd has plugins to collect standard metrics such as cpu/memory/disk/network utiliziation. See read type plugins on collectd's table of plugins.
- Collector
You can store the metrics in Redis with collectd Write Redis plugin. You need to put configuration into collectd's config file as follows:
LoadPlugin write_redis <Plugin "write_redis"> <Node "dma"> Host "localhost" Port "6379" Timeout 1000 </Node> </Plugin>
- Analytics Engine
Make collectd plugin for analysis. The plugin is python script including reading redis data, analysis with scikit-learn and sending the analysis result to Collector. A rough sketch of the script is as follows:
def read(): conn = redis.StrictRedis(host='localhost', port=6379) rawlist = conn.zrange('collectd/localhost/memory/memory-used', -2, -1) ... # (analysis with scikit-learn) ... vl = collectd.Values(host='localhost', plugin='dma', type='gauge') vl.dispatch(values=[result]) collectd.register_read(read, 1)
To load the plugin, you need to put configuration into collectd's config file as follows:
LoadPlugin python <Plugin python> ModulePath "/opt/dma/lib" LogTraces true Interactive false Import "analysis" <Module "analysis"> </Module> </Plugin>
- Transmitter
Make collectd plugin to transmit analysis result with python. A rough sketch of the script is as follows:
def notify(vl, data=None): # vl is the metrics data including severity, time and value. ... # (transmit with metrics data, e.g. execute openstack command with python-openstackclient) ... collectd.register_notification(notify)
To load the plugin and set transmission policy, you need to put configuration into collectd's config file as follows:
LoadPlugin python <Plugin python> ModulePath "/opt/dma/lib" LogTraces true Interactive false Import "write_openstack" <Module "write_openstack"> </Module> </Plugin> LoadPlugin "threshold" <Plugin "threshold"> <Host "localhost"> <Plugin "dma"> <Type "gauge"> WarningMax 0 Hits 3 </Type> </Plugin> </Host> </Plugin>
Redis
We recommend that you disable saving data of Redis DB on disk. You need to edit redis.conf as follows:
#save 900 1 #save 300 10 #save 60 10000 save ""
References
- OpenStack Summit Balacerona
- OpenStack Summit Boston
- OpenStack Summit Sydney