Jump to: navigation, search

UtilizationAwareScheduling

Revision as of 20:56, 29 March 2013 by Malini-k-bhandaru (talk | contribs) (Design)

UTILIZATION AWARE SCHEDULING

Overview

There are situations where it is desirable to be able to schedule VMs based upon transient resource usage beyond the current reliance on specific metrics like memory usage and CPU utilization. Advanced scheduling decisions can be made based upon enhanced usage statistics encompasing things like memory cache utilization, memory bandwidth utilization, network bandwith utilization or other, currently undefined metrics, that might be available in future platforms. This blueprint will provide an extensible framework that can be used to take advantage of current and future platform utilization metrics.

Design

The basic design is to provide a pluggable data collector that is part of the compute node. This collector will create a new dictionary, called `resources', that will contain key/value pairs that represent resouces used by the platform. This 'resources' list will be added to the dictionary that is sent periodically from the compute node to the scheduler. For example, currently all nodes report a single key/value pair for `memory_free' showing the amount of free memory on the node. e.g. { "memory_free" : "192" }. This metric will now become one of the elements of the `resources' list, e.g.:

	"resources" : [ "memory_free" : "192", ...]

This will make it simple to extend the 'resources' list to hold other metrics that are available either now, like network bandwidth utilization, or in the future, like memory cache utilization. For compatibility purposes current metrics, like `memory_free', will be left in their current position in the dictionary but they will be duplicated along with all new metrics in the 'resources' list. Further, to support rolling upgrades of compute nodes in a large cloud deployment, default values will be assumed for resource items that do not exist, example, NONE.

By putting all of these metrics into a single `resources' list it becomes easy to export this information to the scheduler in one well defined place. The 'resources' list will be part of the dictionary that is periodically sent from the compute node to the scheduler. The scheduler keeps this dictionary in its memory, available to all filter and weight plugins.

To utilize this data a new weight plugin will be created that references key/value metrics from the `resources' list in the scheduler's dictionary for each node. The new weighting function will be parameterized by flavors with extra specs key/vaue pairs using a `resources' scope. The `key' is the metric to weigh on while the `value' can be either `max' to select the node that has the maximum value for that metric or a `value' of `min' will select the node with the minimum value for that metric, or <,>, = than some specified value. For example, given the `resources' list:

	"resources" : [
		"memory_free" : 192,
		"network_BW" : "85M"
	]

an extra spec of "resources:memory_free=max" would select the compute node with the most amount of free memory while an extra spec of "resources:network_BW=min" would select the compute node with the lowest network utilization.

Note that values from the key/value pairs can be specified with suffixes to indicate multipliers. A suffix of `K' or `k' multiplies the value by 1,000, a suffix of `G' or `g' multiplies the value by 1,000,000 and a suffix of 'T' or 't' multiplies the value by 1,000,000,000.