HypervisorMonitoringPlugin

The compute.manager has a large number of periodic tasks that collect data about the running host. In large deployments it would be unwise to emit this data for every instance on the host as it would quickly saturate the system. Instead we are proposing to provide a means where an in-service plugin can be called to process the collected data locally (and in-memory) and only report on exceptional cases.

The typical use-case for this would be QoS and Alarming. If we were to see customer that has a 5Gb pipe has been running at 20Gb, we'd like to catch that early. Likewise, if a customer is running at 100% CPU on a 4 core image, that should be reported. Alternatively, the plugin could be as simple as taking this collected data and emitting it to a reporting tool like statsd/graphite via UDP.

The biggest challenge here is finding a way to update the plugin's configuration after the compute node has started. We don't want to restart the compute node everytime a high-watermark is moved.

_cleanup_running_deleted_instances - Should emit notification
_run_image_cache_manager_pass - Should emit notification
_run_pending_deletes - Should emit notification
_check_instance_build_time - Should emit notification
_heal_instance_info_cache - maybe?
_poll_rebooting_instances - Might emit notification on anomalies
_poll_rescued_instances - Might emit notification on anomalies
_poll_unconfirmed_resizes - Might emit notification on anomalies
_poll_shelved_instances - meh
_instance_usage_audit - maybe, but unlikely
_poll_bandwidth_usage - yes
_poll_volume_usage - yes
_sync_power_states - calls virt.get_info() which has mem/max_mem info ... dunno?
_reclaim_queued_deletes - meh

Possible API

class MonitoringPlugin:

def on_cpu(self, cpu_dict):

def on_volume(self, volume_dict):

def on_bandwidth(self, bandwidth_dict):

def on_ram(self, ram_info): # could potentially get called from two places, need to ensure units and source are the same.

What to do with measurements?

Send samples to Ceilometer via UDP
Emit over/under alerts via notifications.

Configuration / Restarts

How to configure plugins?
How to update configurations without restarting compute node?

HypervisorMonitoringPlugin

Contents

Periodic Tasks in compute.manager

Possible API

What to do with measurements?

Configuration / Restarts