Jump to: navigation, search

Difference between revisions of "Distributed Monitoring"

(Implementation)
Line 26: Line 26:
 
* ...  
 
* ...  
 
== Implementation ==
 
== Implementation ==
(For distributed monitoring, some open source software are useful. ...)
+
For distributed monitoring, some open source software are useful. ...
 
You can learn an example of how to implement distributed monitoring with open source software.
 
You can learn an example of how to implement distributed monitoring with open source software.
=== Software ===
+
=== OSS ===
 
==== collectd ====
 
==== collectd ====
 
collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways.
 
collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways.
 
You can use collectd as Poller/Notification to gather metrics and Collector to store the metrics in Redis.
 
You can use collectd as Poller/Notification to gather metrics and Collector to store the metrics in Redis.
 +
You can implement Analytics Engine and Transmitter as collectd's plugins.
  
 
[https://collectd.org/ collectd's official page]
 
[https://collectd.org/ collectd's official page]
Line 47: Line 48:
  
 
=== Setup ===
 
=== Setup ===
In computing nodes, controller nods and every nodes that you want to monitor, you can setup following example. In this example, ubuntu16.04 is selected as each node's OS.
+
(to be edited) In computing nodes, controller nods and every nodes that you want to monitor, you can setup following example. In this example, ubuntu16.04 is selected as each node's OS.
 
# set up OpenStack environment using DevStack or manual installation
 
# set up OpenStack environment using DevStack or manual installation
 
# install collectd, redis and some other python library
 
# install collectd, redis and some other python library
 
# get demo code of DMA
 
# get demo code of DMA
  
<pre class="sourceCode console"># apt install collectd redis python-pip python-dev </pre>
+
==== collectd ====
 +
You can implement distributed monitoring by setting collectd and making collectd's plugins.
 +
See [https://github.com/distributed-monitoring/ distributed monitoring github repository] if you want to see an example .
 +
 
 +
===== Poller =====
 +
collectd has plugins to collect standard metrics such as cpu/memory/disk/network utiliziation.
 +
See [https://collectd.org/ collectd's official page].
 +
 
 +
===== Collector =====
 +
(to be edited)
 +
 
 +
<pre class="sourceCode none">
 +
<Plugin "write_redis">
 +
  <Node "dma">
 +
    Host "localhost"
 +
    Port "6379"
 +
    Timeout 1000
 +
  </Node>
 +
</Plugin>
 +
</pre>
 +
 
 +
===== Analytics Engine =====
 +
Make analytics plugin with python and scikit learn.
 +
(to be edited)
 +
 
 +
<pre class="sourceCode none">
 +
<Plugin python>
 +
  ModulePath "/opt/dma/lib"
 +
  LogTraces true
 +
  Interactive false
 +
  Import "analysis"
 +
 
 +
  <Module "analysis">
 +
  </Module>
 +
</Plugin>
 +
</pre>
 +
 
  
=== collectd ===
+
===== Transmitter =====
 +
Make transimit plugin with python.
 +
(to be edited)
  
==== collectd.conf ====
 
 
<pre class="sourceCode none">
 
<pre class="sourceCode none">
<Plugin "write_redis">
+
<Plugin "threshold">
  <Node "dma">
+
  <Host "localhost">
      Host "localhost"
+
    <Plugin "dma">
      Port "6379"
+
      <Type "gauge">
      Timeout 1000
+
        WarningMax 0
  </Node>
+
        Hits 3
</Plugin>
+
      </Type>
}</pre>
+
    </Plugin>
 +
  </Host>
 +
</Plugin>
 +
 
 +
<Plugin python>
 +
  ModulePath "/opt/dma/lib"
 +
    LogTraces true
 +
    Interactive false
 +
    Import "write_openstack"
 +
 
 +
    <Module "write_openstack">
 +
    </Module>
 +
</Plugin>
 +
</pre>
 +
 
 +
==== Redis ====
  
=== redis ===
+
We recommend that you disable saving data of Redis DB on disk.
  
==== redis.conf ====
+
===== redis.conf =====
 
<pre class="sourceCode none">
 
<pre class="sourceCode none">
 +
#save 900 1
 
#save 300 10
 
#save 300 10
 
#save 60 10000
 
#save 60 10000
 +
save ""
 
</pre>
 
</pre>
  

Revision as of 11:58, 27 September 2017

Distributed Monitoring

Overview

Monitoring and its application are becoming key factor for service lifecycle management of various systems such as NFV and cloud native platform. Distributed monitoring approach is one of the framework which enables flexible and scalable monitoring that can work with current OpenStack telemetry and monitoring framework. In this documantation, you can find the architecture and how to implement this framework to your environment.

Who will be interested in ?

  • For infrastracture operators who want to collect detailed data in short intarval in their compute nodes.
  • For NFV operators who want to know abnormal behaivers of Virtual Network Functions.

Architecture

In this architecture includes several functions for monitoring, collector, in-memory database, analysis engine and notification. Below picture shows the architecture of distributed monitoring. Each compute node has it own monitoring function following this architecture.

Function

  • Poller/Notification process collect data from guest OSs and host OS using SNMP protocol, libvirt API, OpenStack API, etc.
  • Collector format data suitable for in-memory database and insert these data into database.
  • Analytics Engine analyzes data on in-memory database, you can use several analytics engine like machine learning libraries. And also you can use evaluator that function is threshold monitoring directly.
  • Transmitter send analytics results and alarms that are caught on threshold to Operation Support System, OpenStack API and Orchestrator.

Feature

  • Short interval
    • You can collect several data in short interval like 0.1 sec. Because collect agent doesn't have to collect data from huge amount of computing nodes and VMs in this architecture. That means the load of agent become lower than centralized monitoring architecture.
  • Scalable
    • This architecture has high scalability. Because each compute node has it own monitoring function, so you don't have to caluculate specs of nodes for monitoring.
  • Fast detection
    • In some case, MQ is bottlneck of performances. But any MQ isn't used in this architecture, because monitoring process is closed in compute node.

Use Cases

  • Micro Burst Traffic
  • Memory Leak
  • Abnormal behaviour of software/hardware
  • ...

Implementation

For distributed monitoring, some open source software are useful. ... You can learn an example of how to implement distributed monitoring with open source software.

OSS

collectd

collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways. You can use collectd as Poller/Notification to gather metrics and Collector to store the metrics in Redis. You can implement Analytics Engine and Transmitter as collectd's plugins.

collectd's official page

Redis

Redis is an in-memory data structure store, used as a database, cache and message broker.

Redis's official page

scikit-learn

scikit-learn is simple and efficient tools for data mining and data analysis. You can use scikit-learn as light weight machine learning library for Analytics Engine.

scikit-learn's official page

Setup

(to be edited) In computing nodes, controller nods and every nodes that you want to monitor, you can setup following example. In this example, ubuntu16.04 is selected as each node's OS.

  1. set up OpenStack environment using DevStack or manual installation
  2. install collectd, redis and some other python library
  3. get demo code of DMA

collectd

You can implement distributed monitoring by setting collectd and making collectd's plugins. See distributed monitoring github repository if you want to see an example .

Poller

collectd has plugins to collect standard metrics such as cpu/memory/disk/network utiliziation. See collectd's official page.

Collector

(to be edited)

<Plugin "write_redis">
  <Node "dma">
    Host "localhost"
    Port "6379"
    Timeout 1000
  </Node>
</Plugin>
Analytics Engine

Make analytics plugin with python and scikit learn. (to be edited)

<Plugin python>
  ModulePath "/opt/dma/lib"
  LogTraces true
  Interactive false
  Import "analysis"

  <Module "analysis">
  </Module>
</Plugin>


Transmitter

Make transimit plugin with python. (to be edited)

<Plugin "threshold">
  <Host "localhost">
    <Plugin "dma">
      <Type "gauge">
        WarningMax 0
        Hits 3
      </Type>
    </Plugin>
  </Host>
</Plugin>

<Plugin python>
   ModulePath "/opt/dma/lib"
     LogTraces true
     Interactive false
     Import "write_openstack"

     <Module "write_openstack">
     </Module>
</Plugin>

Redis

We recommend that you disable saving data of Redis DB on disk.

redis.conf
#save 900 1
#save 300 10
#save 60 10000
save ""

References

Who is contributing to this guide?