HeterogeneousInstanceTypes

= Heterogeneous instance types =

Note: This has been superseded by ScheduleHeterogeneousInstances


 * Launchpad Entry: NovaSpec:heterogeneous-instance-types
 * Creator: Brian Schott
 * Current maintainer: Lorin Hochstein
 * Contributors: USC Information Sciences Institute

Summary
Nova should have support for cpu architectures, accelerator architectures, and network interfaces as part of the definition of an instance type (or flavor using RackSpace API parlance). The target release for this is Diablo, however the USC-ISI team intends to have a stable test branch and deployment at Cactus release.

The USC-ISI team has a functional prototype here:
 * https://code.launchpad.net/~usc-isi/nova/hpc-trunk (usually in sync with nova/trunk)
 * https://code.launchpad.net/~usc-isi/nova/hpc-testing (a little older, but more stable)

The architecture-aware scheduler is blueprinted here:
 * HeterogeneousArchitectureScheduler

We are also drafting blueprints for three machine types:
 * HeterogeneousGpuAcceleratorSupport
 * HeterogeneousSgiUltraVioletSupport
 * HeterogeneousTileraSupport

An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousinstancetypes

Release Note
Nova has been extended to allow deployments to advertise and users to request specific processor, accelerator, and network interface options using instance_types (or flavors).

The nova-manage instance_types command supports additional fields:


 * cpu_arch - processor architecture. Ex: "x86_64", "i386", "P7", etc. (default x86_64)
 * cpu_info - json-formatted extended processor information
 * xpu_arch - accelerator architecture Ex: "fermi" (default "")
 * xpu_info - json-formatted extended accelerator information
 * xpus - Number of accelerators or accelerator processors
 * net_arch - primary network interface. Ex: "ethernet", "infiniband", "myrinet"
 * net_info - json-formatted extended network information
 * net_mbps - allocated network bandwidth (megabits per second)

Amazon GPU Node Example:

22 GB of memory 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture) 2 x NVIDIA Tesla “Fermi” M2050 GPUs 1690 GB of instance storage 64-bit platform I/O Performance: Very High (10 Gigabit Ethernet) API name: cg1.4xlarge

cg1.4xlarge: * memory_mb= 22000 * vcpus = 8 * local_gb = 1690 * cpu_arch = "x86_64" * cpu_info = '{"model":"Nehalem", "features":["tdtscp", "xtpr"]}' * xpu_arch = "fermi" * xpus = 2 * xpu_info ='{"model":"Tesla 2050", "gcores":"448"}' * net_arch = "ethernet" * net_info = '{"encap":"Ethernet", "MTU":"8000"}' * net_mbps = 10000

Rationale
Currently AWS supports two different CPU architecture types, "i386" and "x86_64". In addition, AWS describes many other instance type attributes by reference, such as: I/O Performance: (Moderate/High/Very High 10Gigabit Ethernet), extended CPU information (Intel Xeon X5570, quad-core “Nehalem” architecture), and now GPU accelerators (2 x NVIDIA Tesla “Fermi” M2050 GPUs). In order to implement similar functionality in nova, we need to capture this in a way that is accessible to advanced schedulers.

There are several related blueprints:
 * https://blueprints.launchpad.net/nova/+spec/cactus-migration-live
 * https://blueprints.launchpad.net/nova/+spec/compute-host-system-architecture-awareness
 * https://blueprints.launchpad.net/nova/+spec/instance-migration
 * https://blueprints.launchpad.net/nova/+spec/extra-data

User stories
Mary manages a cloud datacenter. In addition to her x86 blades, she wants to advertise her power7 high performance computing cloud with 40Gbit QDR Infiniband support to customers. Mary uses nova-manage instance_types create to define "p7.sippycup", "p7.tall", "p7.grande", and "p7.venti" with cpu_arch="power7" and an increasing number of default memory, storage, cores, and reserved bandwidth. Mary also has a small number of GPU-accelerated systems, so she defines "p7f.grande" and "p7f.venti" options with xpu_arch="fermi", and xpus = 1 for grande and xpus = 2 for venti.

Fred wants to run an 8 core machine with 1 fermi-based GPU accelerator. He looks on Mary's web site for text description, then wants the p7f.grande virtual machine. He runs:

euca-run-instances -t p7f.grande -k fred-keypair emi-12345678

Assumptions
This assumes that someone has ported OpenStack to different processor architecture systems and that accelerators such as GPUs can be passed through to the virtual instance. The USC-ISI team is working on this. We have linked in related blueprints, but the goal is that this top-level cpu architecture awareness stands alone.

Design
We propose to add cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes to instance_types, instances, and compute_nodes tables. Conceptually, this information is treated the same way that existing memory_mb, local_gb, vcpus fields are handled. They exist in "instance_types" and get copied as columns into "instances" table as instances are created.

The architecture aware scheduler will compare these additional fields when selecting target compute_nodes (nova-compute services).


 * cpu_arch, xpu_arch, and net_arch are intended to be high-level label switches for fast row filtering (like "i386" or "fermi" or "infiniband").
 * xpus and net_mbps are treated as quantity fields exactly like vcpus is used by schedulers
 * the cpu_info, xpu_info, and net_info follows the instance_migration branch example using a json formatted string to capture arbitrary configurations.

The context for these new fields changes slightly according to what table they are in.


 * In instance_types table, they represent advertised capabilities for the machine type, such as "this instance type provides 100 megabit bandwidth" or "this instance type supports Cortex-A9 processors".
 * In the instances table, they represent requested capabilities, such as "give me an instance with xpu_arch=fermi and xpus=2".
 * In the compute_nodes table, the fields represent the available resources of the host associated with the compute_nodes.

The processor architecture functionality cpu_arch is a no-brainer. Lots of deployments will want this today. Adding cpu_info is for many-core processors such as our Tilera systems on our project. We need to specify things like instance_type.cpu_info("geometry":"4x4") to be able to spatially tile multiple virtual machines on the 8x8 tilemp processor. It's easiest to define what "tile.small", "tile.medium", and "tile.large" mean within instance types.

Accelerators are also important, but instead of having dedicated GPU-related fields the design is trying to support other future accelerators like FPGAs, optical processors, whatever dedicated hardware resource that can get passed through to the virtual machine. The xpus quantity field is pretty crude and can't easily handle a box with 2 different kinds of accelerators, but this could be broken out later as a separate one-many relational table. We are trying to minimize the schema changes.

The networking fields attempt to promote network connectivity to be equal to cores, memory, and disk for selecting on what host instances get deployed. Enforcement of bandwidth at the VM would be nice, but even if we use "divide network bandwidth by number of instances" metric at scheduler it would be better than nothing. Also, the networking service will add another layer of complexity, but at least with this blueprint the networking service will know how much bandwidth an instance is requesting or has been allocated on the host.

We may want to consider additional top-level column fields in these tables for scheduler performance purposes, like cpu_model and xpu_model, but these are enhancements.

Supporting multiple accelerators
The proposed approach would only support one type of accelerator per machine. For example, you could have GPUs in the machine, or FPGAs, but not both. To support multiple accelerators, we would either need:


 * A separate table that contained accelerator information
 * Leverage the extra-data approach.

The separate table approach would make the implementation in the code simpler.

Information Flow
The basic information flow through nova is as follows:


 * 1) nova-compute starts on a host and registers architecture, accelerator, and networking capabilities in the ComputeNode table.  This functionality is provided by  the instance migration blueprint and is already merged. We need to add our new fields and populate them in the compute_services table using flags and/or extracted /proc information
 * 2) nova-api receives a run-instances request with instance_type string "m1.small" or "p7g.grande".  No change here.
 * 3) nova-api passes instance_type to compute/api.py create from api/ec2/cloud.py run_instances or api/openstack/servers.py create.  No change here.
 * 4) nova-api compute/api.py create reads from instance_types table and adds rows to instances table. We need to insert our new fields into base_options arg that gets passed to instances.db.create.  This might also be a good place to insert a  sanity check of the image cpu architecture supports cpu_arch.
 * 5) nova-api does an rpc.cast to scheduler num_instances times, passing instance_id. No change here.
 * 6) nova-scheduler selects compute_service host that matches the options specified in the instance table fields.  The simple scheduler will just work correctly and ignore these fields on a homogeneous deployment.  We need to add an arch scheduler that filters available compute_nodes by  cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps with the same fields .
 * 7) nova-scheduler rpc.cast to each selected compute service.  No change here.
 * 8) nova-compute receives rpc.cast with instance_id, launches the virtual machine, etc.  At this point, nova-compute has cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps fields in instance object and can configure libvirt as needed.  No change required for existing compute service manager.  USC-ISI team is adding GPU and other non-x86 architecture support (need to add blueprint references).

Schema Changes
Three tables are extended:

InstanceTypes
The instance_types are now stored in their own table in nova trunk: ConfigureInstanceTypesDynamically

class InstanceTypes(BASE, NovaBase): """Represent possible instance_types or flavor of VM offered""" __tablename__ = "instance_types" id = Column(Integer, primary_key=True) name = Column(String(255), unique=True) memory_mb = Column(Integer) vcpus = Column(Integer) local_gb = Column(Integer) flavorid = Column(Integer, unique=True) swap = Column(Integer, nullable=False, default=0) rxtx_quota = Column(Integer, nullable=False, default=0) rxtx_cap = Column(Integer, nullable=False, default=0) +   cpu_arch = Column(String(255), default='x86_64') +   cpu_info = Column(String(255), default='') +   xpu_arch = Column(String(255), default='') +   xpu_info = Column(String(255), default='') +   xpus = Column(Integer, nullable=false, default=0) +   net_arch = Column(String(255), default='') +   net_info = Column(String(255), default='') +   net_mbps = Column(Integer, nullable=false, default=0)

Compute Nodes
The compute nodes table is being included by: https://code.launchpad.net/~nttdata/nova/live-migration

class ComputeNode(BASE, NovaBase): """Represents a running compute service on a host.""" ...   hypervisor_type = Column(Text, nullable=True) hypervisor_version = Column(Integer, nullable=True) cpu_info = Column(Text, nullable=True) +   cpu_arch = Column(String(255), default='x86_64') +   xpu_arch = Column(String(255), default='') +   xpu_info = Column(String(255), default='') +   xpus = Column(Integer, default=0) +   net_arch = Column(String(255), default='') +   net_info = Column(String(255), default='') +    net_mbps = Column(Integer, default=0)

Instance
Instances table just carries the additional fields so that libvirt_conn can pick them up. This is also used by the scheduler like vcpus.

class Instance(BASE, NovaBase): """Represents a guest vm.""" ....     instance_type = Column(String(255)) +   cpu_arch = Column(String(255), default='x86_64') +   cpu_info = Column(String(255), default='') +   xpu_arch = Column(String(255), default='') +   xpu_info = Column(String(255), default='') +   xpus = Column(Integer, default=0) +   net_arch = Column(String(255), default='') +   net_info = Column(String(255), default='') +   net_mbps = Column(Integer, default=0)

Implementation
The USC-ISI team has a functional prototype: https://code.launchpad.net/~usc-isi/nova/hpc-trunk

UI Changes
There are no UI changes exposed to cloud users. They access the functionality through instance_types/flavors.

For administrators, we should add the fields to "nova-manage instance_types create/list" command. One question is how to handle the json text fields for user entry, but straight text isn't too bad. Need to decide if other nova-manage describe resources should show all of this to end users or bury as advanced/verbose argument to the command.

There are also additional flags available in nova.conf for specifying cpu_arch, xpu_arch, net_arch when a compute service is launched.

Code Changes
Summary of changes:

- Schema changes for ComputeNode, Instance, and InstanceType - Migration code is such fun. - Migration code is such fun. - Migration code is such fun. - Flags for default values inserted in ComputeNode - Periodic updates to ComputeNode - Added fields to base_options copied into Instances table
 * nova/db/sqlalchemy/models.py
 * nova/db/sqlalchemy/migrate_repo/versions/013_add_architecture_to_instance_types.py
 * nova/db/sqlalchemy/migrate_repo/versions/014_add_architecture_to_instances.py
 * nova/db/sqlalchemy/migrate_repo/versions/015_add_architecture_to_compute_node.py
 * nova/compute/manager.py
 * nova/compute/api.py

Migration
Very little needs to change in terms of the way deployments will use this if we set sane defaults like "x86_64" as assumed today.

Test/Demo Plan
This need not be added or completed until the specification is nearing beta.

Unresolved issues
This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion
Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.