HypervisorMonitoringPlugin

Related to [[1]]

The compute.manager has a large number of periodic tasks that collect data about the running host. In large deployments it would be unwise to emit this data for every instance on the host as it would quickly saturate the system. Instead we are proposing to provide a means where an in-service plugin can be called to process the collected data locally (and in-memory) and only report on exceptional cases.

The typical use-case for this would be QoS and Alarming. If we were to see customer that has a 5Gb pipe has been running at 20Gb, we'd like to catch that early. Likewise, if a customer is running at 100% CPU on a 4 core image, that should be reported. Alternatively, the plugin could be as simple as taking this collected data and emitting it to a reporting tool like statsd/graphite via UDP.

The biggest challenge here is finding a way to update the plugin's configuration after the compute node has started. We don't want to restart the compute node everytime a high-watermark is moved.

_cleanup_running_deleted_instances - Should emit notification
_run_image_cache_manager_pass - Should emit notification
_run_pending_deletes - Should emit notification
_check_instance_build_time - Should emit notification
_heal_instance_info_cache - maybe?
_poll_rebooting_instances - Might emit notification on anomalies
_poll_rescued_instances - Might emit notification on anomalies
_poll_unconfirmed_resizes - Might emit notification on anomalies
_poll_shelved_instances - meh
_instance_usage_audit - maybe, but unlikely
_poll_bandwidth_usage - yes
_poll_volume_usage - yes
_sync_power_states - calls virt.get_info() which has mem/max_mem info ... dunno?
_reclaim_queued_deletes - meh

Plugin Interface

You don't need to override all of these methods. They'll only get called if they exist in the plugin.

class MetricPlugin(object):
    """Abstract base class for Metric Plugins.⋅

    See bp: host-metric-hook for more information.
    """

    def on_cpu(self, instance, cpu_dict):⋅
        """Called after hypervisor cpu stat polls.
        """
        pass

    def on_volume(self, instance, volume_dict):⋅
        """Called after volume stat polls.
        """
        pass

    def on_bandwidth(self, instance, bandwidth_dict):
        """Called after network bandwidth stat polls.
        """
        pass

    def on_ram(self, instance, ram_info):
        """Called after hypervisor RAM polls.
        """
        pass

What to do with measurements?

Send samples to Ceilometer via UDP
Emit over/under alerts via notifications.

Configuration / Restarts

How to configure plugins?
How to update configurations without restarting compute node?

HypervisorMonitoringPlugin

Contents

Periodic Tasks in compute.manager

Plugin Interface

What to do with measurements?

Configuration / Restarts