UtilizationBasedSchedulingSpec

Launchpad Entry: NovaSpec:utilization-based-scheduling
Created: 10 Dec 2012
Contributors: Hans Lindgren, Fetahi Wuhib, Rerngvit Yanggratoke, Rolf Stadler

Summary

An IaaS provider defines strategies according to which resources of the cloud infrastructure are allocated to the customers’ instances. Such strategies are expressed as “management objectives” [1]. The specific management objective employed by a cloud service provider depends on a number of factors including the type of customers that are served, the kind of applications that are run, the characteristics of the underlying physical infrastructure, and the business strategy the provider pursues. As an open cloud management platform, OpenStack should support a wide range of such objectives.

In OpenStack, the Scheduler component makes the most important allocation decisions. Therefore, implementing a specific management objective requires setting the corresponding policy for this component. In order for the Scheduler to make optimal decisions with regards to allocation of resources, up-to-date resource utilization information is often needed (see User stories).

With this blueprint we propose the following enhancements to the OpenStack Scheduler:

We extend the state reporting such that up-to-date resource utilization of hosts is made available to the scheduler in addition to the static allocation information that is currently available. This enables the scheduler to achieve management objectives that depend on host resource utilization.
We add filters and cost functions for the FilterScheduler that exploit the above

Release Note

Up-to-date resource utilization information enables cloud providers to specify advanced management objective that enable more efficient resource utilization. It is mandatory.

Rationale

OpenStack currently ships with the FilterScheduler and a default compute_fill_first cost function, which calculate host costs based on unallocated host memory. This is reasonable for a default policy. However, it is unable to support management objectives that (1) consider CPU resources or objectives that are based on actual resource utilizations.

User stories

A cloud provider wants to ensure that CPU allocation of its hosts is balanced across the cloud with the goal of alleviating the effects of unforeseen spikes in the resource demand of running applications.
A cloud provider wants to reduce operational costs by consolidating load onto a minimal number of servers, putting unused servers into standby mode and thereby minimizing total energy consumption. The consolidation should be based on actual resource utilization in order to maximize energy savings.
A cloud provider wants to support two classes of service for its users whereby VMs in the first service class are guaranteed to get all the resources they require while VMs in the second class share the remaining resources in a fair manner.

Assumptions

Design

The key requirement of our design is that up-to-date resource utilization of hosts be made available to the scheduler. To this end, we extend the current periodic polling of the compute driver to include resource utilization stats for CPU and memory. (An alternative design could use an external collector to feed the scheduler with such information.)

Resource tracker monitoring and reporting is extended to support the added stats. In addition we extend its role to cover profiling of the stats, including forecasting based on incoming instance requests.

The reported stats are stored in the database using the ComputeNodeStats table, eliminating the need for database schema changes. The stats are made available to filtering and weighing functions of the scheduler through modifications to the HostState class in host_manager.py.

Implementation

Code Changes

The code changes listed below implement the following:

Driver reporting of actual resource utilization.
ResourceTracker support for added stats.
FilterScheduler support for using the stats in filters and cost functions.

/nova/virt/libvirt/driver.py

New methods implement collection of actual resource utilization statistics. get_available_resource is modified to include those in the results.

/nova/compute/resource_tracker.py

Periodic reporting and claims handling are extended to include and make use of the added stats.

/nova/compute/stats.py

Updated to support the added stats.

/nova/scheduler/filters/cpu_usage_filter.py

This new filter module implements filtering based on actual CPU utilization of hosts.

/nova/scheduler/filters/ram_usage_filter.py

This new filter module implements filtering based on actual memory utilization of hosts.

/nova/scheduler/host_manager.py

Changes are made to HostState in order to expose new stats for use by other parts of the scheduler, such as filters and cost functions.

/nova/scheduler/least_cost.py

New cost functions are included that support scheduling based on actual resource utilization of hosts.

Migration

The new filters and cost functions complement the existing ones and their use is configured with existing flags in nova.conf. Additionally, some new flags are added to set individual weights of new cost functions and to set the maximum allowed utilization of host resources.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.

References

[1] Wuhib, F., Stadler, R. & Lindgren, H. (2012). Dynamic Resource Allocation with Management Objectives: Implementation for an OpenStack Cloud. International Conference on Network and Service Management 2012.http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-93680