Nova/ResourcePartitioningSpec

Launchpad Entry: NovaSpec:resource-partitioning
Created:
Contributors:

Summary

Partition system resources such as network bandwidth, CPU power, and possibly even storage I/O bandwidth in such a way as to give each user a reserved set of resources and allow all users to simultaneously maximize usage of their reserved resources without issue. * Limit resources used by virtual guests

Make sure that resources promised to users will always be available for them.
When working well, this should give similar results of benchmark tests runned anytime on given instances having the same resource limits.
usage of this is optional.

Release Note

Not only is possible to limit resources used by virtual guests, but we can (optionally) make sure that resources promised to users will always be available for them. Now public cloud users will know for sure if the virtual guest will be powerfull enough for theirs needs. They will know what exactly they are paying for.

Definition of Terms

Resource: Limited resource needed for virtual guests of the cluster (and cloud), like for example:
- RAM,
- disk space,
- disk bandwidth for reading/writing,
- Internet bandwidth (in/out),
- intranet badwidth (in/out).
Reserved resources: resources which are always available for the instance on the hardware node, in ammounts specified in SLA
Shared resources: any user can use them as much as they want.
Limits: max. allowed resource usage. User cant use more of that resource.
Resource partitioning - users reserve a set of resources on a system, and these reserved resources, regardless of current usage, cannot be used by others.
Strict resource partitioning - users are restricted to their reserved set of resources
Loose resource partitioning - users are not restricted to their reserved set of resources, but may use free (unreserved) resources on the system.

Rationale

Users of public cloud want to be able to use what they paid for. Not less. Some of them do not want to share critical resources with other users. They want to have contracted resources available for them without waiting.Cloud providers want to provide users what they paid for. Not more.

User stories

Alice has a virtual server. Its Internet connection is shared with 100 other users. She is not happy with that. She enters into another SLA, now having reserved 50Mbps for upload and download from Internet. Internet operations of another virtual servers will not lower speed of her connection.

M. Hatter is using virtual servers for mathematical simulations. The host his virtual machine is sitting on has 100 users; therefore, he has quite limited access to CPU. He is willing to pay more to have some CPU power reserved only for him.

Assumptions

Live migration needs to be working in order to redistribute load across hosts.
We need to be able to limit resource usage for cluster resources. This is different for every node operating system and even for hypervizors.

Implementation

Having live instant migration would be very nice, but we can also do it without it and it still makes sense.
We will need to update (or subclass) scheduler to take into account not only actual load, but also reserved ammounts of resources. This should be relatively easy to do.
Using Linux nodes and KVM/QEMU/UML, most natural tool for limits is cgroups.
Plan:
- First we need to make CPU partitioning, which should not be that hard, at least for KVM/QEMU/UML. It might be enough for Bexar to implement this one.
- Having that working, we migth try partitioning bandwidth -- network, disk, even maybe memory bandwidth for each instance.
Hosting partitioned and normal instances in one system is easy, provided that scheduler will know which nodes (hosts) are using partitioning for its instances.

Test/Demo Plan

When working well, running benchmarks on the same class of virtual guests should give similar results anytime. Other virtual guests should not be able to change availability of contracted resources for a virtual guest

Unresolved issues

How to limit disk i/o in a standard way?

Disk throttling comes now into Linux kernel, so it can be soon configured using cgroups. This might be easier to be solved later, after bexar release.

How to measure 'whole resource' we are about to partition for users?

We need to agree on benchmarking rules and units to be used. Probably it will be best to choose simplest, highly focused benchmarks, which will be easiest to implement.

How to measure part of the resource for an instance?

Using % of the whole host would be easiest. But that would change when machine will migrate. So we need to invent a way to calculate percents of the resource from some expected benchmark result. This also means that we need to benchmark servers somehow, so scheduler will know them well and the instance will be able to make really predictable results.

BoF agenda and discussion

We are using etherpad page for discussion.

Please get in touch with us on irc (alekibango, tr3buchet) to talk about implementation, or if you are willing to help somehow.