Jump to: navigation, search


Revision as of 17:15, 23 April 2014 by Daniel Berrange (talk | contribs) (Virtualization Driver Guest CPU Topology)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Virtualization Driver Guest CPU Topology

NB Information on this page is being obsoleted by a new page which covers vCPU topology, NUMA, large pages and overcommit/dedicated resources as a single concept.

Background information

Each virtualization driver in OpenStack has its own approach to defining the CPU topology seen by guest virtual machines. The libvirt driver will expose all vCPUs as individual sockets, with 1 core and no hyper-threads.

UNIX operating systems will happily use any CPU topology that is exposed to them within a upper bound on the total number of logical CPUs. That said there can be performance implications from choosing different topologies. For example, 2 hyper-threads are usually not equivalent in performance to 2 cores or sockets, and as such operating system schedulers have special logic to deal with task placement. So if a host has a CPU with 2 cores with 2 threads, and two tasks to run, it will try to place them on different cores, rather than in different threads within a core. It follows that if a guest is shown 4 sockets, the operating system will not be making optimal scheduler placement decisions to avoid competing for constrained thread resources.

Windows operating systems meanwhile are more restrictive in the CPU topology they are willing to use. In particular some versions will have restrictions on the number of sockets they are prepared to use. So if a OS is limited to using 4 sockets and a 8 vCPU guest is desired, then the hypervisor must ensure it exposes a topology with at least 2 cores per socket. Failure to do this will result in the guest refusing to use some of the vCPUs it is assigned.

High level requirements / design

Since the restrictions on CPU topology vary according to the guest operating system being run, placing constraints at either the host level or flavour level is insufficiently flexible. Either the restrictions need to be in a lookaside database, such as that provided by the libosinfo project, or more immediately be expressed using metadata attributes on images. Flavours record the total number of vCPUs to exposed to the guest machine. It is desirable that a single image be capable of booting on arbitrary flavours. It follows from this that images should not express a literal number of sockets/core/threads, but rather express the upper limits for each of these.

Thus a Microsoft Windows Server 2008 R2 Standard Edition image may be tagged with max_sockets=4. If booted with a flavour saying vcpus=4 then any of the following topologies would be valid

  • sockets=4, cores=1, threads=1
  • sockets=2, cores=2, threads=1
  • sockets=2, cores=1, threads=2
  • sockets=1, cores=2, threads=2
  • sockets=1, cores=4, threads=1
  • sockets=1, cores=1, threads=4

If, however, the flavour said vcpus=8 then valid topologies would be

  • sockets=4, cores=2, threads=1
  • sockets=4, cores=1, threads=2
  • sockets=2, cores=4, threads=1
  • sockets=2, cores=4, threads=4
  • sockets=1, cores=4, threads=2
  • sockets=1, cores=2, threads=4
  • sockets=1, cores=8, threads=1
  • sockets=1, cores=1, threads=8

A compute node may choose between cores and threads based on what topology the host pCPUs have. eg if the host has 2 cores + 2 threads, then it would likely want to use only guest configs with threads==2, and then do strict placement of host<->guest CPUs to match topologies.

Conversely though the image owner may not want to have any threads at all, in which case they might tag the image with max_sockets=4,max_threads=1, in which case the compute node would only have control over the sockets & cores.

If the guest OS had an upper bound on all three levels, they could fully specify max_sockets=4,max_cores=12,max_threads=1. In such a case, the maximum vCPU that could be supported by the image is 48 (4x12). If an attempt was made to boot the image using a flavour which had vcpu=64, then this should result in failure to boot, since the topology constraints could not be satisfied while still providing the requisite number of vCPUs.

Technical design / implementation notes

The task of specifying guest CPU topology is something that is applicable to all virtualization drivers in Nova which use machine based virtualization. It is obviously irrelevant to container based virtualization where there is no concept of virtual CPUs. It is obviously desirable that the same image property syntax for specifying topology be used by all virt drivers in Nova. It is thus suggested that the shared virt driver module be provided with some helper methods for calculating the CPU topology required for booting a given image. The virt drivers would simply invoke the helper when configuring the guest to determine the topology.

It is suggested that something along the following lines could be suitable

 def get_guest_cpu_topology(self, inst_type, image, preferred_topology, mandatory_topology):
    """:inst_type: object returned from a self.virtapi.instance_type_get() call. Used to determine max vCPU count
        :image_meta: the metadata dict for the root disk image
        :preferred_topology: dict containing three keys: max_sockets, max_cores, max_threads
        :mandatory_topology: dict containing three keys: max_sockets, max_cores, max_threads
         Calculate the list of all possible valid topologies for configuring guest machine
         CPU topology within the given constraints. The caller should choose one element
         from the returned list to use as the topology. The returned list will be ordered such
         that it prefers sockets, over cores, over threads. 
         Returns list of dicts. Each dict containing three keys: sockets, cores, threads"""

Initially the image_meta would be queried to see if a 'hw_cpu_topology' metadata attribute has been set. The value should be a comma separated list of constraints, whose keys are 'max_sockets', 'max_cores', 'max_threads'. eg

  • "max_sockets=1"
  • "max_cores=4,max_threads=2"

Note any or all of keys are allowed to be omitted.

If a given key is omitted, then the corresponding key from the 'preferred_topology' parameter is used initially. The preferred topology is treated as a soft constraint, since these values can be increased if required to satisfy the flavour vCPU count.

No value (whether from image metadata, or automatically calculated) is permitted to exceeds the constraints expressed in the hard constraints specified by the 'mandatory_topology' parameter.

Example application of rules

Consider the following example constraints

  • image hw_cpu_topology="max_sockets=4"
  • preferred_topology={max_sockets=64,max_cores=4,max_threads=1}
  • mandatory_topology={max_sockets=64,max_cores=8,max_threads=2}

The mandatory topology parameter says that the hypervisor supports an absolute maximum of 64*8*2 == 1024 vCPUs.

The preferred topology parameter says that the hypervisor would like guests to use at max 4 cores and no threads by default.

The image metadata says that the guest OS can only cope with 4 sockets total. It expressed no constraints wrt cores or threads

In this example, the initial constraints used would be

  • max_sockets==4 (hard constraint)
  • max_cores==4 (soft constraint)
  • max_threads==1 (soft constraint)

If the flavour said vcpus==16, then this is satisfied by

  • sockets==4, cores==4

Note that we don't return 'sockets=1,cores=8', since that would unnecessarily exceed the soft constraints which we want to avoid by default.

If the flavour said vcpus=32, however, we're hitting our initial constraints. 2 of these constraints are soft constraints, so we can raise them until hitting the hard constraints, and thus return a list of two possible topologies

  • sockets==4, cores==8, threads=1
  • sockets==4, cores==4, threads=2

Note again that we don't return "sockets=4,cores=1,threads=8" since we always want to maximise use of the permitted soft constraints, before going beyond any of them.

If the flavour said vcpus=2048, then we're hitting the hard limits for all parameters. In this case we'd return an empty list of topology info and the virt driver would then refuse to launch the image with that flavour.