Difference between revisions of "HeterogeneousInstanceTypes"

Revision as of 11:31, 4 March 2011

Launchpad Entry: NovaSpec:heterogeneous-instance-types
Created: Brian Schott
Contributors: USC Information Sciences Institute

Summary

Nova should have support for cpu architectures, accelerator architectures, and network interfaces as part of the definition of an instance type (or flavor using RackSpace API parlance). The target release for this is Diablo, however the USC-ISI team intends to have a stable test branch at Cactus release.

An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousinstancetypes

Release Note

Nova has been extended to allow deployments to advertise and users to request specific processor, accelerator, and network interface options using instance_types (or flavors).

The nova-manage instance_types command supports additional fields:

cpu_arch - processor architecture. Ex: "x86_64", "i386", "P7", etc. (default x86_64)
cpu_info - json-formatted extended processor information
xpu_arch - accelerator architecture Ex: "fermi" (default "")
xpu_info - json-formatted extended accelerator information
xpus - Number of accelerators or accelerator processors
net_arch - primary network interface. Ex: "ethernet", "infiniband", "myrinet"
net_info - json-formatted extended network information
net_mbps - allocated network bandwidth (megabits per second)

Amazon GPU Node Example:

22 GB of memory 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture) 2 x NVIDIA Tesla “Fermi” M2050 GPUs 1690 GB of instance storage 64-bit platform I/O Performance: Very High (10 Gigabit Ethernet) API name: cg1.4xlarge

cg1.4xlarge:
 * memory_mb= 22000
 * vcpus = 8
 * local_gb = 1690
 * cpu_arch = "x86_64"
 * cpu_info = '{"model":"Nehalem", "features":["tdtscp", "xtpr"]}' 
 * xpu_arch = "gpu"
 * xpus = 2
 * xpu_info ='{"gpu_arch":"fermi", "model":"Tesla 2050", "gcores":"448"}'
 * net_arch = "ethernet"
 * net_info = '{"encap":"Ethernet", "MTU":"8000"}'
 * net_mbps = 10000

Rationale

Currently AWS supports two different CPU architecture types, "i386" and "x86_64". In addition, AWS describes many other instance type attributes by reference, such as: I/O Performance: (Moderate/High/Very High 10Gigabit Ethernet), extended CPU information (Intel Xeon X5570, quad-core “Nehalem” architecture), and now GPU accelerators (2 x NVIDIA Tesla “Fermi” M2050 GPUs). In order to implement similar functionality in nova, we need to capture this in a way that is accessible to advanced schedulers.

There are several related blueprints:

User stories

Mary manages a cloud datacenter. In addition to her x86 blades, she wants to advertise her power7 high performance computing cloud with 40Gbit QDR Infiniband support to customers. Mary uses nova-manage instance_types create to define "p7.sippycup", "p7.tall", "p7.grande", and "p7.venti" with cpu_arch="power7" and an increasing number of default memory, storage, cores, and reserved bandwidth. Mary also has a small number of GPU-accelerated systems, so she defines "p7f.grande" and "p7f.venti" options with xpu_arch="gpu", xpu_info = '{"gpu_arch":"fermi"}', and xpus = 1 for grande and xpus = 2 for venti.

Fred wants to run an 8 core machine with 1 fermi-based GPU accelerator. He looks on Mary's web site for text description, then wants the p7f.grande virtual machine. He runs:

euca-run-instances -t p7f.grande -k fred-keypair emi-12345678

Assumptions

This assumes that someone has ported OpenStack to different processor architecture systems and that accelerators such as GPUs can be pass through to the virtual instance. The USC-ISI team is working on this. We will link in related blueprints, but the goal is that this top-level cpu architecture awareness stands alone.

Design

We propose to add cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes to instance_types, instances, and compute_services tables. Conceptually, this information is treated the same way that existing memory_mb, local_gb, vcpus fields are handled. They exist in instance_types and get copied as columns into instances table as instances are created.

cpu_arch, xpu_arch, and net_arch are intended to be high-level label switches for fast row filtering (like "i386" or "gpu" or "infiniband").
xpus and net_mbps are treated as quantity fields exactly like vcpus is used by schedulers
the cpu_info, xpu_info, and net_info follows the instance_migration branch example using a json formatted string to capture arbitrary configurations.

The context for these new fields changes slightly according to what table they are in.

In instance_types table, they represent advertised capabilities for the machine type, such as "this instance type provides 100 megabit bandwidth" or "this instance type supports Cortex-A9 processors".
In the instances table, they represent requested capabilities, such as "give me an instance with xpu_arch=gpu, cpu_info(gpu_arch=fermi), xpus=2".
In the compute_service table, the fields represent the available capabilities of the host associated with the compute_service.

The processor architecture functionality cpu_arch is a no-brainer. Lots of deployments will want this today. Adding cpu_info is for many-core processors such as our Tilera systems on our project. We need to specify things like instance_type.cpu_info("geometry":"4x4") to be able to spatially tile multiple virtual machines on the 8x8 tilemp processor. It's easiest to define what "tile.small", "tile.medium", and "tile.large" mean within instance types.

Accelerators are also important, but instead of having dedicated GPU-related fields the design is trying to support other future accelerators like FPGAs, optical processors, whatever dedicated hardware resource that can get passed through to the virtual machine. The xpus quantity field is pretty crude and can't easily handle a box with 2 different kinds of accelerators, but this could be broken out later as a separate one-many relational table. We are trying to minimize the schema changes.

The networking fields attempt to promote network connectivity to be equal to cores, memory, and disk for selecting on what host instances get deployed. Enforcement of bandwidth at the VM would be nice, but even if we use "divide network bandwidth by number of instances" metric it would be better than nothing. Also, the networking service will add another layer of complexity, but at least with this blueprint the networking service will know how much bandwidth an instance is requesting or has been allocated on the host.

We may want to consider additional top-level column fields in these tables for scheduler performance purposes, like cpu_model and xpu_model.

Scheduler Flow

The basic compute scheduler flow is as follows:

nova-compute starts on a host and registers architecture, accelerator, and networking capabilities in the ComputeService table. This functionality is provided by [5] blueprint and is already implemented. . We need to add our new fields and populate them in the compute_services table using flags and/or extracted /proc information
nova-api receives a run-instances request with instance_type string "m1.small" or "p7g.grande". No change here.
nova-api passes instance_type to compute/api.py create() from api/ec2/cloud.py run_instances() or api/openstack/servers.py create(). No change here.
nova-api compute/api.py create() reads from instance_types table and adds rows to instances table. We need to insert our new fields into base_options arg that gets passed to instances.db.create(). This might also be a good place to insert a sanity check of the image cpu architecture supports cpu_arch.
nova-api does an rpc.cast() to scheduler num_instances times, passing instance_id. No change here.
nova-scheduler selects compute_service host that matches the options specified in the instance table fields. The simple scheduler will just work correctly and ignore these fields on a homogeneous deployment. We need to add an arch scheduler that filters available compute_services by cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps with the same fields .
nova-scheduler rpc.cast() to each selected compute service. No change here.
nova-compute receives rpc.cast() with instance_id, launches the virtual machine, etc. At this point, nova-compute has cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps fields in instance object and can configure libvirt as needed. No change required for existing compute service manager. USC-ISI team is adding GPU and other non-x86 architecture support (need to add blueprint references).

Schema Changes

Three tables are extended.

InstanceTypes

The instance_types are now stored in their own table in nova trunk: [6]

class InstanceTypes(BASE, NovaBase):
    """Represent possible instance_types or flavor of VM offered"""
    __tablename__ = "instance_types"
    id = Column(Integer, primary_key=True)
    name = Column(String(255), unique=True)
    memory_mb = Column(Integer)
    vcpus = Column(Integer)
    local_gb = Column(Integer)
    flavorid = Column(Integer, unique=True)
    swap = Column(Integer, nullable=False, default=0)
    rxtx_quota = Column(Integer, nullable=False, default=0)
    rxtx_cap = Column(Integer, nullable=False, default=0)
+    cpu_arch = Column(String(255), default='x86_64')
+    cpu_info = Column(String(255), default='')
+    xpu_arch = Column(String(255), default='')
+    xpu_info = Column(String(255), default='')
+    xpus = Column(Integer, nullable=false, default=0)
+    net_arch = Column(String(255), default='')
+    net_info = Column(String(255), default='')
+    net_mbps = Column(Integer, nullable=false, default=0)

Compute Service

The compute service table is being included by:

[7]

It will either make it into Cactus, or we will

 class ComputeService(BASE, NovaBase):
    """Represents a running compute service on a host."""
 
     __tablename__ = 'compute_services'
    id = Column(Integer, primary_key=True)  # FK service.id
    memory_mb = Column(Integer)
    local_gb = Column(Integer)
    vcpus = Column(Integer)
    id = Column(Integer, primary_key=True)
    service_id = Column(Integer, ForeignKey('services.id'), nullable=True)
    service = relationship(Service,
                           backref=backref('compute_service'),
                           foreign_keys=service_id,
                           primaryjoin='and_('
                                'ComputeService.service_id == Service.id,'
                                'ComputeService.deleted == False)')

    vcpus = Column(Integer, nullable=True)
    memory_mb = Column(Integer, nullable=True)
    local_gb = Column(Integer, nullable=True)
    vcpus_used = Column(Integer, nullable=True)
    memory_mb_used = Column(Integer, nullable=True)
    local_gb_used = Column(Integer, nullable=True)
    hypervisor_type = Column(Text, nullable=True)
    hypervisor_version = Column(Integer, nullable=True)
+     cpu_arch = Column(String(255), default='x86_64')
+    cpu_info = Column(String(255), default='')
+    xpu_arch = Column(String(255), default='')
+    xpu_info = Column(String(255), default='')
+    xpus = Column(Integer, default=0)
+    net_arch = Column(String(255), default='')
+    net_info = Column(String(255), default='')
+     net_mbps = Column(Integer, default=0)

Instance

Instances table just carries the additional fields so that libvirt_conn can pick them up. This is also used by the scheduler like vcpus.

 class Instance(BASE, NovaBase):
     """Represents a guest vm."""
.... 
     instance_type = Column(String(255))
+    cpu_arch = Column(String(255), default='x86_64')
+    cpu_info = Column(String(255), default='')
+    xpu_arch = Column(String(255), default='')
+    xpu_info = Column(String(255), default='')
+    xpus = Column(Integer, default=0)
+    net_arch = Column(String(255), default='')
+    net_info = Column(String(255), default='')
+     net_mbps = Column(Integer, default=0)

Implementation

The following

UI Changes

We should add the fields to nova-manage instance_types create. One question is how to handle the json text fields for user entry, but straight text isn't too bad. Need to decide if other nova-manage describe resources should show all of this.

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Very little needs to change in terms of the way deployments will use this if we set sane defaults like "x86_64" as assumed today.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.

@@ Line 6: / Line 6: @@
 == Summary ==
-Nova should have support for cpu architectures, accelerator architectures, and network interfaces as part of the definition of an instance type (or flavor using [[RackSpace]] API parlance).
+Nova should have support for cpu architectures, accelerator architectures, and network interfaces as part of the definition of an instance type (or flavor using [[RackSpace]] API parlance).  The target release for this is Diablo, however the USC-ISI team intends to have a stable test branch at Cactus release.
 An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousinstancetypes
@@ Line 76: / Line 76: @@
 == Assumptions ==
-This assumes that someone has ported [[OpenStack]] to different processor architecture systems and that accelerators such as GPUs can be pass through to the virtual instance. The USC-ISI team is working on this. We will link in related blueprints, but the goal is that this top-level architecture awareness stands alone.
+This assumes that someone has ported [[OpenStack]] to different processor architecture systems and that accelerators such as GPUs can be pass through to the virtual instance. The USC-ISI team is working on this. We will link in related blueprints, but the goal is that this top-level cpu architecture awareness stands alone.
 == Design ==
@@ Line 84: / Line 84: @@
 * cpu_arch, xpu_arch, and net_arch are intended to be high-level label switches for fast row filtering (like "i386" or "gpu" or "infiniband").
 * xpus and net_mbps are treated as quantity fields exactly like vcpus is used by schedulers
-* the cpu_info, xpu_info, and net_info follows the instance_migration branch example using a json formatted string to capture arbitrary requirements.
+* the cpu_info, xpu_info, and net_info follows the instance_migration branch example using a json formatted string to capture arbitrary configurations.
 The context for these new fields changes slightly according to what table they are in.
 * In instance_types table, they represent '''advertised''' capabilities for the machine type, such as "this instance type provides 100 megabit bandwidth" or "this instance type supports Cortex-A9 processors".
-* In the  instances table, they represent '''requested''' capabilities, such as "give me an instance with xpu_arch=gpu, cpu_info.gpu_arch=fermi, xcpus=2".
+* In the  instances table, they represent '''requested''' capabilities, such as "give me an instance with xpu_arch=gpu, cpu_info(gpu_arch=fermi), xpus=2".
-* In the compute_service table, the fields represent the available capabilities of the compute_service.
+* In the compute_service table, the fields represent the available capabilities of the host associated with the compute_service.
-=== Scheduler Flow ===
+The processor architecture functionality cpu_arch is a no-brainer.  Lots of deployments will want this today.  Adding cpu_info is for many-core processors such as our Tilera systems on our project.  We need to specify things like instance_type.cpu_info("geometry":"4x4") to be able to spatially tile multiple virtual machines on the 8x8 tilemp processor.  It's easiest to define what "tile.small", "tile.medium", and "tile.large" mean within instance types.
-The basic compute scheduler flow is as follows:
+Accelerators are also important, but instead of having dedicated GPU-related fields the design is trying to support other future accelerators like FPGAs, optical processors, whatever dedicated hardware resource that can get passed through to the virtual machine.  The xpus quantity field is pretty crude and can't easily handle a box with 2 different kinds of accelerators, but this could be broken out later as a separate one-many relational table.  We are trying to minimize the schema changes.
-# nova-compute starts on a host and registers architecture, accelerator, and networking capabilities in the [[ComputeService]] table.  "ADD populate compute_services table."  This functionality is provided by [https://blueprints.launchpad.net/nova/+spec/instance-migration] blueprint and is already implemented.
+The networking fields attempt to promote network connectivity to be equal to cores, memory, and disk for selecting on what host instances get deployed.  Enforcement of bandwidth at the VM would be nice, but even if we use "divide network bandwidth by number of instances" metric it would be better than nothing.  Also, the networking service will add another layer of complexity, but at least with this blueprint the networking service will know how much bandwidth an instance is requesting or has been allocated on the host.
-# nova-api receives a run-instances request with instance_type string "m1.small".  No change here.
-# nova-api passes instance_type to compute/api.py create() from api/ec2/cloud.py run_instances() or api/openstack/servers.py create().  No change here.
-# nova-api compute/api.py create() reads from instance_types table and adds rows to instances table.  "ADD our new fields into base_options arg for instances.db.create()"
-. nova-api does an rpc.cast() to scheduler num_instances times, passing instance_id.  No change here.
+We may want to consider additional top-level column fields in these tables for scheduler performance purposes, like cpu_model and xpu_model.
-. nova-scheduler selects compute_service host that matches the options specified in the instance table fields cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps.  "ADD resource-aware scheduler functionality"  simple scheduler will just work correctly and ignore these fields if a homogeneous installation.
+=== Scheduler Flow ===
-. nova-scheduler rpc.cast() to each selected compute service.  No change here.
+The basic compute scheduler flow is as follows:
-. nova-compute receives rpc.cast() with instance_id, launches the virtual machine, etc.  At this point, nova-compute has cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps fields in instance record and can configure libvirt as needed.  No change required for existing compute service.  USC-ISI team is adding GPU and other non-x86 architecture support (need to add blueprint references).
+# nova-compute starts on a host and registers architecture, accelerator, and networking capabilities in the [[ComputeService]] table.  This functionality is provided by [https://blueprints.launchpad.net/nova/+spec/instance-migration] blueprint and is already implemented. . ''We need to add our new fields and populate them in the compute_services table using flags and/or extracted /proc information''
+# nova-api receives a run-instances request with instance_type string "m1.small" or "p7g.grande".  No change here.
+# nova-api passes instance_type to compute/api.py create() from api/ec2/cloud.py run_instances() or api/openstack/servers.py create().  No change here.
+# nova-api compute/api.py create() reads from instance_types table and adds rows to instances table. ''We need to insert our new fields into base_options arg that gets passed to instances.db.create().  This might also be a good place to insert a  sanity check of the image cpu architecture supports cpu_arch.''
+# nova-api does an rpc.cast() to scheduler num_instances times, passing instance_id. No change here.
+# nova-scheduler selects compute_service host that matches the options specified in the instance table fields.  The simple scheduler will just work correctly and ignore these fields on a homogeneous deployment.  ''We need to add an arch scheduler that filters available compute_services by  cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps with the same fields .
+# nova-scheduler rpc.cast() to each selected compute service.  No change here.
+# nova-compute receives rpc.cast() with instance_id, launches the virtual machine, etc.  At this point, nova-compute has cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps fields in instance object and can configure libvirt as needed.  No change required for existing compute service manager.  USC-ISI team is adding GPU and other non-x86 architecture support (need to add blueprint references).
 === Schema Changes ===
+Three tables are extended.
 ==== [[InstanceTypes]] ====
-The instance_types are now stored in their own table:
+The instance_types are now stored in their own table in nova trunk:
 [http://wiki.openstack.org/ConfigureInstanceTypesDynamically]
@@ Line 142: / Line 148: @@
 ==== Compute Service ====
-The compute service is being included by:
+The compute service table is being included by:
   [https://blueprints.launchpad.net/nova/+spec/instance-migration]
+It will either make it into Cactus, or we will
@@ Line 185: / Line 192: @@
 ==== Instance ====
-Instances table just carries the additional fields so that libvirt_conn can pick them up.
+Instances table just carries the additional fields so that libvirt_conn can pick them up.  This is also used by the scheduler like vcpus.
@@ Line 207: / Line 214: @@
 == Implementation ==
-This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:
+The following
 === UI Changes ===
-We should add the fields to nova-manage instance_types create.  One question is how to handle the json text fields.
+We should add the fields to nova-manage instance_types create.  One question is how to handle the json text fields for user entry, but straight text isn't too bad.  Need to decide if other nova-manage describe resources should show all of this.
 === Code Changes ===
@@ Line 219: / Line 226: @@
 === Migration ===
-Include:
+Very little needs to change in terms of the way deployments will use this if we set sane defaults like "x86_64" as assumed today.
-* data migration, if any
-* redirects from old URLs to new ones, if any
-* how users will be pointed to the new way of doing things, if necessary.
 == Test/Demo Plan ==