Jump to: navigation, search

Difference between revisions of "Zabbix-agent-adoption"

m (Feasibility analysis)
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
+
{{OldDesignPage}}
 
* '''Launchpad Entry''': CeilometerSpec:Zabbix-agent-adoption
 
* '''Launchpad Entry''': CeilometerSpec:Zabbix-agent-adoption
 
* '''Created''': Oct. 25, 2013
 
* '''Created''': Oct. 25, 2013
 
* '''Contributors''': Yu Zhang
 
* '''Contributors''': Yu Zhang
 +
 +
Blueprint link: https://blueprints.launchpad.net/ceilometer/+spec/zabbix-agent-adoption<br />
 +
Note: The blueprint has been renamed as "3rd-party monitoring agent<br />
 +
adoption mechanism" to be aligned with its current contents.
  
 
== Introduction ==
 
== Introduction ==
Line 61: Line 65:
 
* Sending commands/queries (via TCP connections) to an agent in<br />
 
* Sending commands/queries (via TCP connections) to an agent in<br />
 
an instance on the compute node where this proxy is located,
 
an instance on the compute node where this proxy is located,
* Geting data back from the agent,<br />
+
* Geting data back from the agent,
 
* Transforming data into samples, of which formats are legal for Ceilometer and
 
* Transforming data into samples, of which formats are legal for Ceilometer and
 
* Returning those samples to Ceilometer at last.
 
* Returning those samples to Ceilometer at last.
Line 80: Line 84:
 
If the 3rd-party agents are only loosely coupled with the server, and can be controlled<br />
 
If the 3rd-party agents are only loosely coupled with the server, and can be controlled<br />
 
by simple protocols, then the server will be unnecessary at all.
 
by simple protocols, then the server will be unnecessary at all.
 +
 +
The following figure briefs the logical structure of  an OpenStack compute node<br />
 +
involving both instances with 3rd-party agents inside and a Ceilometer compute agent<br />
 +
with a proxy.
 +
 +
[[File:Logical structure.jpeg]]
  
 
== Design and implementation ==
 
== Design and implementation ==
 +
The internal mechanism of Ceilometer compute agent is briefed in the following figure.
 +
 +
[[File:Compute agent.jpeg]]
 +
 +
As shown in the figure, a list of PollingTasks, each of which has its own<br />
 +
execution interval, are invoked periodically inside of the compute agent.<br />
 +
When invoked, a PollingTask will trigger each of its pollsters for each<br />
 +
instance on this compute node for polling data. Thanks to the highly<br />
 +
agile structure design of Ceilometer, all pollsters are in fact plugins, which<br />
 +
can be easily added into this framework. Therefore, to implement our proxy<br />
 +
for 3rd-party monitoring agents, we can just add a ProxyPollster plugin into<br />
 +
the pollster list. Then the ProxyPollster will be invoked periodically and<br />
 +
collect data from each 3rd-party agent in each instance.
 +
 +
For the detailed implementation of the ProxyPollster, two methods can be used:
 +
 +
* The first is just a in-pollster client of the 3rd-party agent communication<br />
 +
protocol. This works for the cases in which the protocol is quite simple and<br />
 +
easy to implement. As an example, for Zabbix, such a client in Python is<br />
 +
already available. Nagios NRPE client in PERL is also introduced.
 +
 +
* The second is calling a command-line utility provided by the 3rd-party<br />
 +
monitoring tool itself. The PorxyPollster only calls this utility and waits for<br />
 +
returned data. This method removes most of re-development efforts, but<br />
 +
introduces the requirement of installing the commandline utility on each<br />
 +
OpenStack compute nodes. All of Zabbix, Nagios and Ganglia provide such<br />
 +
a utility for use.
 +
 +
As mentioned, if we do not want to involve the 3rd-party monitoring server to<br />
 +
configure/initialize monitoring agents, we need to rely on a local config file for<br />
 +
describing at least what types of metrics should be collected.
 +
 +
A more-detailed internal structure design of ProxyPollster to be added here.
 +
 +
== References ==
 +
 +
Ceilometer plugin development<br />
 +
http://docs.openstack.org/developer/ceilometer/contributing/plugins.html
 +
 +
Zabbix official website:<br />
 +
https://www.zabbix.com
 +
https://www.zabbix.org
 +
 +
Zabbix python client for agents:<br />
 +
http://www.zabbix.com/img/zabconf2012/presentations/Zabbix_Conference_2012_Miracle__takanori_suzuki__FINAL_.pdf
 +
 +
Zabbix-get utility:<br />
 +
https://www.zabbix.com/documentation/2.0/manual/concepts/get
 +
 +
Nagios official website:<br />
 +
http://www.nagios.com/
 +
 +
Nagios command-line utility:<br />
 +
http://assets.nagios.com/downloads/nagiosxi/docs/Monitoring_Hosts_Using_NRPE.pdf
 +
 +
Nagios PERL client for agents:<br />
 +
http://andreasmarschke.wordpress.com/2013/09/24/the-nrpe-protocol-explained/
 +
 +
Ganglia official website:<br />
 +
http://ganglia.info/
  
To be added
+
Ganglia documents:<br />
 +
http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_documents

Latest revision as of 18:09, 15 May 2014

Warning.svg Old Design Page

This page was used to help design a feature for a previous release of OpenStack. It may or may not have been implemented. As a result, this page is unlikely to be updated and could contain outdated information. It was last updated on 2014-05-15

  • Launchpad Entry: CeilometerSpec:Zabbix-agent-adoption
  • Created: Oct. 25, 2013
  • Contributors: Yu Zhang

Blueprint link: https://blueprints.launchpad.net/ceilometer/+spec/zabbix-agent-adoption
Note: The blueprint has been renamed as "3rd-party monitoring agent
adoption mechanism" to be aligned with its current contents.

Introduction

Currently, Ceilometer collects instance data via compute agents installed on every
OpenStack compute nodes. PollingTasks in a compute agent invoke multiple pollsters,
which then call hypervisor-dependent inspectors for metering various metrics. As
an example, the CPUPollster calls the inspect_cpus() method of a hypervisor-dependent
inspector object to get VCPU data. If the hypervisor is KVM, inspect_cpus() calls
the info() method of the virDomain class of libvirt, then returns a list of 5 data elements,
including two CPUPollster cares about: VCPU number and running time.

Such pollsters work well for those data easily available to a hypervisor (http://www.mirantis.com/blog/openstack-metering-using-ceilometer/),
while ignoring detailed and precise guest system metrics which are not provided
by a hypervisor. As a simple case study, we can compare what CPUPollster
provides with those CPU monitoring items supported by Zabbix, one of the most
pupular system monitor tool. A snapshot of Zabbix web console is shown in the
following figure.

CPU monitoring items in Zabbix

In practice those guest system metrics provided by Zabbix are highly valuable for
both OpenStack Admins and tenants, which is verified by our own experiences and
feedback from other companies using OpenStack. Therefore, Zabbix has been
deployed in many product-oriented OpenStack clouds to achieve detailed and precise
monitoring. Other popular 3rd-party monitoring tools include Nagios, Ganglia, etc.

This work aims at leveraging existing monitoring assets and expertises in system
administration teams to the best extent, instead of removing or replacing them with
efforts. An adoption mechanism between 3rd-party monitoring agents in instances
and Ceilometer compute agents in compute nodes is added, therefore Ceilometer
can poll data from those agents directly to enhance its capability of monitoring
instances.

Feasibility analysis

Most 3rd-party monitoring tools are essentially client-server systems. For
each monitored system, an agent (e.g. Zabbix agent, Nagios NRPE, Ganglia
gmond, etc.) is installed. Some monitoring tools can leverage SNMP. In such
cases, we can consider the SNMP deamon in a monitored system as an agent.

To achieve cluster-wide monitoring, monitoring data storage and providing UI
interfaces, there is also a server in each tool, which, directly or via some
low-level utilities (e.g. Nagios check_nrpe), queries those agents in monitored
system periodically and polls data back. For all of Zabbix, Nagios and Ganglia,
such querying and polling are usually conducted via TCP connections between
agents and the server.

Therefore, it is reasonable for us to consider all VM instances on an OpenStack
compute node as a monitored cluster. A 3rd-party monitoring agent in each
instance listens to specified port and, when queries receieved, collects required
data and sends them back. The only difference could now be that, the queries
might be not from a 3rd-party tool monitoring server, but a local proxy which is
a plugin of the Ceilometer compute agent running on this compute node. Each
time the proxy is invoked, its working process can be briefed as the following
steps:

  • Receiving queries from Ceilometer,
  • Translating queries into the commands/queries meaningful to those

3rd-party agents,

  • Sending commands/queries (via TCP connections) to an agent in

an instance on the compute node where this proxy is located,

  • Geting data back from the agent,
  • Transforming data into samples, of which formats are legal for Ceilometer and
  • Returning those samples to Ceilometer at last.


In such a case, a question is whether or not we still need the 3rd-party monitor
server to be deployed. The answer depends on both the design of 3rd party tool
and the extra development efforts we want to afford. Take Zabbix as an example.
All Zabbix agents should be configured and initialized by the Zabbix server at first,
then they can be aware of what types of metrics they should collect, how long the
metering intervals should be, and so on. If this is the case, a deployed Zabbix server
can simply help us to manage all Zabbix agents in instances during the initial stages.
After all agents are set up, we can just use the proxy in Ceilometer to collect data
and the server might not be used quite often. Of course we can develop agent
management functions in Ceilometer (if the protocol is open) to replace the monitoring
server thoroughly, but the extra development efforts might not be ignorable.

If the 3rd-party agents are only loosely coupled with the server, and can be controlled
by simple protocols, then the server will be unnecessary at all.

The following figure briefs the logical structure of an OpenStack compute node
involving both instances with 3rd-party agents inside and a Ceilometer compute agent
with a proxy.

Logical structure.jpeg

Design and implementation

The internal mechanism of Ceilometer compute agent is briefed in the following figure.

Compute agent.jpeg

As shown in the figure, a list of PollingTasks, each of which has its own
execution interval, are invoked periodically inside of the compute agent.
When invoked, a PollingTask will trigger each of its pollsters for each
instance on this compute node for polling data. Thanks to the highly
agile structure design of Ceilometer, all pollsters are in fact plugins, which
can be easily added into this framework. Therefore, to implement our proxy
for 3rd-party monitoring agents, we can just add a ProxyPollster plugin into
the pollster list. Then the ProxyPollster will be invoked periodically and
collect data from each 3rd-party agent in each instance.

For the detailed implementation of the ProxyPollster, two methods can be used:

  • The first is just a in-pollster client of the 3rd-party agent communication

protocol. This works for the cases in which the protocol is quite simple and
easy to implement. As an example, for Zabbix, such a client in Python is
already available. Nagios NRPE client in PERL is also introduced.

  • The second is calling a command-line utility provided by the 3rd-party

monitoring tool itself. The PorxyPollster only calls this utility and waits for
returned data. This method removes most of re-development efforts, but
introduces the requirement of installing the commandline utility on each
OpenStack compute nodes. All of Zabbix, Nagios and Ganglia provide such
a utility for use.

As mentioned, if we do not want to involve the 3rd-party monitoring server to
configure/initialize monitoring agents, we need to rely on a local config file for
describing at least what types of metrics should be collected.

A more-detailed internal structure design of ProxyPollster to be added here.

References

Ceilometer plugin development
http://docs.openstack.org/developer/ceilometer/contributing/plugins.html

Zabbix official website:
https://www.zabbix.com https://www.zabbix.org

Zabbix python client for agents:
http://www.zabbix.com/img/zabconf2012/presentations/Zabbix_Conference_2012_Miracle__takanori_suzuki__FINAL_.pdf

Zabbix-get utility:
https://www.zabbix.com/documentation/2.0/manual/concepts/get

Nagios official website:
http://www.nagios.com/

Nagios command-line utility:
http://assets.nagios.com/downloads/nagiosxi/docs/Monitoring_Hosts_Using_NRPE.pdf

Nagios PERL client for agents:
http://andreasmarschke.wordpress.com/2013/09/24/the-nrpe-protocol-explained/

Ganglia official website:
http://ganglia.info/

Ganglia documents:
http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_documents