UtilizationAwareScheduling

UTILIZATION AWARE SCHEDULING

Overview

There are situations where it is desirable to be able to schedule VMs based upon transient resource usage beyond the current reliance on specific metrics like memory usage and CPU utilization. Advanced scheduling decisions can be made based upon enhanced usage statistics encompasing things like memory cache utilization, memory bandwidth utilization, network bandwith utilization or other, currently undefined metrics, that might be available in future platforms. This blueprint will provide an extensible framework that can be used to take advantage of current and future platform utilization metrics.

Architectural Introduction of the current Implementation

We made the following changes to implement the framework for utilization aware scheduling:

DB

Added a new TEXT column named 'metrics' into the compute_nodes table. This 'metrics' column contains the JSON encoded list of all the metrics collected by the nova compute node, for eaxmple.

      metrics: [ 
                     {"name": "metric1", "value": 1, "source": "source1(who/where the metric is collected)", "timestamp": "2013-07-30T10:22:31.638634"},
                     {"name": "metric2", "value": 2, "source": "source2", "timestamp": "2013-07-30T10:22:32.432343"}
               ]

Nova Compute node

Added a pluggable data collector framework to report the metrics. Each of the data collector plugin should be implemented as a subclass of nova.compute.plugins.ResourceMonitorPluginBase, and implement the abstract method of the base class. Each plugin can return a list of metrics data it collected as a list of dictionary in the following format:

   [    
        {'name': metric name,
         'value': metric value,
         'timestamp': the time when the value is retrieved,
         'source': where the value is got
         },
         ......
   ]

The nova compute node will call the data collectors as specified in the configuration file and save the returned list of metrics data into DB.

Nova Scheduler

The nova scheduler will get the metrics data from DB and populate them into the hostState object for each compute node.

A new scheduler weight plugin MetricsWeigher can be used to weight against the metrics data for scheduling purpose. The administrator can configure how the metrics are weight in the configuration file as in the form of "<name1>=<ratio1>, <name2>=<ratio2>, ...", where <nameX> is the name of the metric to be weighed, and <ratioX> is the corresponding ratio. So the final weight value for the configuration of "name1=1.0,name2=-1.0" would be "<metric name1>.value * 1.0 + <metric name2>.value * (-1.0)".

Original Design

The basic design is to provide a pluggable data collector that is part of the compute node. This collector will create a new dictionary, called `resources', that will contain key/value pairs that represent resouces used by the platform. This 'resources' list will be added to the dictionary that is sent periodically from the compute node to the scheduler. For example, currently all nodes report a single key/value pair for `memory_free' showing the amount of free memory on the node. e.g. { "memory_free" : "192" }. This metric will now become one of the elements of the `resources' list, e.g.:

	"resources" : [ "memory_free" : "192", ...]

This will make it simple to extend the 'resources' list to hold other metrics that are available either now, like network bandwidth utilization, or in the future, like memory cache utilization. For compatibility purposes current metrics, like `memory_free', will be left in their current position in the dictionary but they will be duplicated along with all new metrics in the 'resources' list. Further, to support rolling upgrades of compute nodes in a large cloud deployment, default values will be assumed for resource items that do not exist, example, NONE.

By putting all of these metrics into a single `resources' list it becomes easy to export this information to the scheduler in one well defined place. The 'resources' list will be part of the dictionary that is periodically sent from the compute node to the scheduler. The scheduler keeps this dictionary in its memory, available to all filter and weight plugins.

To utilize this data a new weight plugin will be created that references key/value metrics from the `resources' list in the scheduler's dictionary for each node. The new weighting function will be parameterized by flavors with extra specs key/vaue pairs using a `resources' scope. The `key' is the metric to weigh on while the `value' can be either `max' to select the node that has the maximum value for that metric or a `value' of `min' will select the node with the minimum value for that metric, or <,>, = than some specified value. For example, given the `resources' list:

	"resources" : [
		"memory_free" : 192,
		"network_BW" : "85M"
	]

an extra spec of "resources:memory_free=max" would select the compute node with the most amount of free memory while an extra spec of "resources:network_BW=min" would select the compute node with the lowest network utilization.

Note that values from the key/value pairs can be specified with suffixes to indicate multipliers. A suffix of `K' or `k' multiplies the value by 1,000, a suffix of `G' or `g' multiplies the value by 1,000,000 and a suffix of 'T' or 't' multiplies the value by 1,000,000,000.

UtilizationAwareScheduling

Contents