Difference between revisions of "HeterogeneousGpuAcceleratorSupport"

Latest revision as of 02:49, 13 June 2015

Launchpad Entry: NovaSpec:heterogeneous-gpu-accelerator-support
Created: Brian Schott
Current maintainer: John Paul Walters
Contributors: USC Information Sciences Institute

Summary

This blueprint proposes to add support for GPU-accelerated machines as an alternative machine type in OpenStack.

The target release for this is Grizzly. We plan to have a stable branch at https://code.launchpad.net/~usc-isi/nova/hpc-testing.

The USC-ISI team has a functional prototype here:

https://code.launchpad.net/~usc-isi/nova/hpc-trunk

This blueprint is related to the HeterogeneousInstanceTypes blueprint here:

http://wiki.openstack.org/HeterogeneousInstanceTypes

An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousultravioletsupport

Release Note

Nova has been extended to make NVIDIA GPUs available to provisioned instances for CUDA programming.

Rationale

See HeterogeneousInstanceTypes.

The goal of this blueprint is to allow GPU-accelerated computing in OpenStack.

User stories

Jackie has a CUDA-accelerated application and wants to run it on an instance that has access to GPU hardware. She chooses a cg1.xlarge instance type, that provides access to two NVIDIA Fermi GPUs:

$ nova flavor-list
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs                                  |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
| 9  | cg1.xlarge | 16384     | 160  | 0         |      | 8     | 1.0         | True      | {u'hypervisor': u's== LXC', u'gpus': u'= 2', u'gpu_arch':u's== fermi'} |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
 
$ nova boot --flavor 9 --key-name mykey --image 2b1509fe-b573-488a-be4d-d61d25c7ab4f  gpu_test

Assumptions

The only approach that has been successful for CUDA access from a kvm virtual machine that we know of is gVirtuS [1]. Here we propose direct access of gpus from LXC instances. We assume that the host system's kernel supports 'lxc-attach', and the utilities for 'lxc-attach' are installed.

Design

We have have new nova.virt.GPULibvirt which is an extension of nova.virt.libvirt to instantiate a GPU-enabled virtual machine when requested.

When an instance is spawned (or rebooted), nova starts an LXC VM
The requested gpu(s) is(are) marked as allocated and its(their) device(s) is(are) created inside LXC using 'lxc-attach'
Access permission to the gpu(s) is added to /cgroup
Boot finalizes
When an instance is terminated (destroyed), the gpu(s) are deallocated.

Schema Changes

Implementation

The USC-ISI team has a functional prototype: https://code.launchpad.net/~usc-isi/nova/hpc-trunk

GPUs (NVIDIA Teslas)

Code Changes

added nova/virt/gpu/driver.py

       Inherits LibvirtDriver and extends a few methods to provision gpus
       Adds a few flags to describe gpu architecture, number of gpus, device ids, etc.

added nova/virt/gpu/utils.py

       Gpu provisioning routines

Migration

Unless the migration target supports gpus of the same indices, it may not work.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.

@@ Line 1: / Line 1: @@
-__NOTOC__
-* '''Launchpad Entry''': [[NovaSpec]]:heterogeneous-gpu-accelerator-support
+* '''Launchpad Entry''': NovaSpec:heterogeneous-gpu-accelerator-support
 * '''Created''': [https://launchpad.net/~bfschott Brian Schott]
 * '''Current maintainer''': [https://launchpad.net/~jwalters-isi John Paul Walters]
@@ Line 7: / Line 7: @@
 == Summary ==
-This blueprint proposes to add support for GPU-accelrated machines as an alternative machine type in [[OpenStack]].  This blueprint is dependent on the schema changes described in the [[HeterogeneousInstanceTypes]] blueprint and the scheduler in [[HeterogeneousArchitectureScheduler]].
+This blueprint proposes to add support for GPU-accelerated machines as an alternative machine type in [[OpenStack]].
-The target release for this is Diablo, however the USC-ISI team intends to have a stable test branch and deployment at Cactus release.
+The target release for this is Grizzly.  We plan to have a stable branch at https://code.launchpad.net/~usc-isi/nova/hpc-testing.
 The USC-ISI team has a functional prototype here:
 * https://code.launchpad.net/~usc-isi/nova/hpc-trunk
-* https://code.launchpad.net/~usc-isi/nova/hpc-testing (most stable)
 This blueprint is related to the [[HeterogeneousInstanceTypes]] blueprint here:
 * http://wiki.openstack.org/HeterogeneousInstanceTypes
-We are also drafting blueprints for other machine types:
-* http://wiki.openstack.org/HeterogeneousSgiUltraVioletSupport
-* http://wiki.openstack.org/HeterogeneousTileraSupport
 An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousultravioletsupport
@@ Line 36: / Line 31: @@
 == User stories ==
-Jackie has a CUDA-accelerated application and wants to run it on an instance that has access to GPU hardware. She chooses a cg1.4xlarge instance type, that provides access to two NVIDIA Fermi GPUs:
+Jackie has a CUDA-accelerated application and wants to run it on an instance that has access to GPU hardware. She chooses a cg1.xlarge instance type, that provides access to two NVIDIA Fermi GPUs:
 <pre><nowiki>
-euca-run-instances -t cg1.4xlarge -k jackie -keypair emi-12345678
+$ nova flavor-list
++----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
+| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs                                  |
++----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
+| 9  | cg1.xlarge | 16384     | 160  | 0         |      | 8     | 1.0         | True      | {u'hypervisor': u's== LXC', u'gpus': u'= 2', u'gpu_arch':u's== fermi'} |
++----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
+$ nova boot --flavor 9 --key-name mykey --image 2b1509fe-b573-488a-be4d-d61d25c7ab4f  gpu_test
 </nowiki></pre>
@@ Line 45: / Line 47: @@
 == Assumptions ==
-This blueprint is dependent on cg1.4xlarge and cg1.8xlarge being selectable instance types and that the scheduler knows that this instance must get routed to a machine with GPU accelerator attached.  See [[HeterogeneousArchitectureScheduler]].
+The only approach that has been successful for CUDA access from a kvm virtual machine that we know of is gVirtuS [http://osl.uniparthenope.it/projects/gvirtus/].
+Here we propose direct access of gpus from LXC instances.
-The only approach that has been successful for CUDA access from a kvm virtual machine that we know of is gVirtuS [http://osl.uniparthenope.it/projects/gvirtus/].  We are actively looking for alternative approaches with kvm or XEN.  We assume that library has been installed.
+We assume that the host system's kernel supports 'lxc-attach', and the utilities for 'lxc-attach' are installed.
 == Design ==
-We propose to add cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes to instance_types, instances, and compute_nodes tables.  See [[HeterogeneousInstanceTypes]].
+We have have new nova.virt.GPULibvirt which is an extension of nova.virt.libvirt to instantiate a GPU-enabled virtual machine when requested.
-We have have augmented nova.virt.libvirt_conn to instantiate a GPU-enabled virtual machine when requested.
-* When an instance is spawned (or rebooted), nova starts a gvirtus-enabled VM, and parses the qemu log
+* When an instance is spawned (or rebooted), nova starts an LXC VM
-* The virtual serial port is found in the qemu log
+* The requested gpu(s) is(are) marked as allocated and its(their) device(s) is(are) created inside LXC using 'lxc-attach'
-* An instance of gVirtus is started and attached to the virtual serial port
+* Access permission to the gpu(s) is added to /cgroup
 * Boot finalizes
-* When an instance is terminated (destroyed) the gVirtus process is sent a sigkill, killing gVirtus and destroying the instance
+* When an instance is terminated (destroyed), the gpu(s) are deallocated.
 === Schema Changes ===
-See [[HeterogeneousInstanceTypes]].
-We're proposing the following default values added to the instance_types table:
-<pre><nowiki>
-     # x86+GPU
-   # TODO: we need to identify machine readable string for xpu arch
-   'cg1.small': dict(memory_mb=2048, vcpus=1, local_gb=20,
-                     flavorid=100,
-                     cpu_arch="x86_64", xpu_arch="fermi", xpus=1),
-   'cg1.medium': dict(memory_mb=4096, vcpus=2, local_gb=40,
-                      flavorid=101,
-                      cpu_arch="x86_64", xpu_arch="fermi", xpus=1),
-   'cg1.large': dict(memory_mb=8192, vcpus=4, local_gb=80,
-                     flavorid=102,
-                     cpu_arch="x86_64", xpu_arch="fermi", xpus=1,
-                     net_mbps=1000),
-   'cg1.xlarge': dict(memory_mb=16384, vcpus=8, local_gb=160,
-                      flavorid=103,
-                      cpu_arch="x86_64", xpu_arch="fermi", xpus=1,
-                      net_mbps=1000),
-   'cg1.2xlarge': dict(memory_mb=16384, vcpus=8, local_gb=320,
-                       flavorid=104,
-                       cpu_arch="x86_64", xpu_arch="fermi", xpus=2,
-                       net_mbps=1000),
-   'cg1.4xlarge': dict(memory_mb=22000, vcpus=8, local_gb=1690,
-                       flavorid=105,
-                       cpu_arch="x86_64", cpu_info='{"model":"Nehalem"}',
-                       xpu_arch="fermi", xpus=2,
-                       xpu_info='{"model":"Tesla 2050", "gcores":"448"}',
-                       net_arch="ethernet", net_mbps=10000),
-   'cg1.8xlarge': dict(memory_mb=22000, vcpus=8, local_gb=1690,
-                       flavorid=105,
-                       cpu_arch="x86_64", cpu_info='{"model":"Nehalem"}',
-                       xpu_arch="fermi", xpus=2,
-                       xpu_info='{"model":"Tesla 2050", "gcores":"448"}',
-                       net_arch="ethernet", net_mbps=10000),
-</nowiki></pre>
 == Implementation ==
@@ Line 108: / Line 67: @@
 The USC-ISI team has a functional prototype:
 https://code.launchpad.net/~usc-isi/nova/hpc-trunk
-Our approach currently leverages the gVirtuS drivers:
-http://osl.uniparthenope.it/projects/gvirtus/
-=== UI Changes ===
-The following will be available as new default instance types.
 == GPUs (NVIDIA Teslas) ==
-Available resources per physical node: 8 cores, 24-4=20 GB RAM, 1000GB - 100GB = 900 GB.  These match the non-GPU small, medium, large, xlarge, 2xlarge, 4xlarge  definitions.  In addition, the cg1.2xlarge is the same as Amazon GPU node definition.  The cpu_arch is "x86_64" and the xpu_arch is "fermi".
-=== GPU small ===
-* API name: '''cg1.small'''
-* 1 Fermi GPU
-* 2 GB RAM (2048 MB)
-* 1 virtual core
-* 20 GB of instance storage
-=== GPU medium ===
-* API name:'''cg1.medium'''
-* 1 Fermi GPUs
-* 4 GB RAM (4096 MB)
-* 2 virtual cores
-* 40 GB of instance storage
-=== GPU large ===
-* API name: '''cg1.large'''
-* 1 Fermi GPUs
-* 8 GB RAM (8192 MB)
-* 4 virtual cores
-* 80 GB of instance storage
-=== GPU xlarge ===
-* API name: '''cg1.xlarge'''
-* 1 Fermi GPUs
-* 8 GB RAM (8192 MB)
-* 8 virtual cores
-* 160 GB of instance storage
-=== GPU 2xlarge ===
-* API name: '''cg1.2xlarge'''
-* 2 Fermi GPUs
-* 16 GB RAM (16384 MB)
-* 8 virtual cores
-* 320 GB of instance storage
-=== GPU 4xlarge ===
-* API name: '''cg1.4xlarge'''
-* 2 Fermi GPUs
-* 22 GB RAM (22000 MB)
-* 8 virtual cores
-* 1.6 TB (1690MB) of instance storage
-=== GPU 8xlarge ===
-* API name: '''cg1.8xlarge'''
-* 4 Fermi GPUs
-* 22 GB RAM (22000 MB)
-* 8 virtual cores
-* 1.6 TB (1690MB) of instance storage
 ==== Code Changes ====
-* db/sqlalchemy/migrate_repo/versions/013_add_architecture_to_instance_types.py
+* added nova/virt/gpu/driver.py
-   - add default instance types for shared memory systems
+        Inherits [[LibvirtDriver]] and extends a few methods to provision gpus
-* nova/virt/libvirt_conn.py
+        Adds a few flags to describe gpu architecture, number of gpus, device ids, etc.
-   - add code to support starting/stopping the gVirtus driver
-* nova/flags.py
-   - add support for gpu to connection_type
-* Also requires supported qemu, the gVirtus host/VM driver, and libserial
+* added nova/virt/gpu/utils.py
+        Gpu provisioning routines
 === Migration ===
-Very little needs to change in terms of the way deployments will use this if we set sane defaults like "x86_64" as assumed today.
+ Unless the migration target supports gpus of the same indices, it may not work.
 == Test/Demo Plan ==
@@ Line 196: / Line 88: @@
 == Unresolved issues ==
-One of the challenges we have is that the flavorid field in the instance_types table isn't auto-increment.  We've selected high numbers to avoid collisions, but the community should discuss how flavorid behaves and the best approach for adding future new instance types.
-A second issue is that currently gVIrtus requires a virtual serial port for VM<->host initialization.  This requires us to use the serial port that is otherwise used by the Ajax term.  A consequence is that VMs using the GPUs currently cannot start an Ajax console.
 == BoF agenda and discussion ==