Jump to: navigation, search

Difference between revisions of "HeterogeneousGpuAcceleratorSupport"

(Summary)
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
__NOTOC__
+
 
* '''Launchpad Entry''': [[NovaSpec]]:heterogeneous-gpu-accelerator-support
+
* '''Launchpad Entry''': NovaSpec:heterogeneous-gpu-accelerator-support
 
* '''Created''': [https://launchpad.net/~bfschott Brian Schott]
 
* '''Created''': [https://launchpad.net/~bfschott Brian Schott]
 
* '''Current maintainer''': [https://launchpad.net/~jwalters-isi John Paul Walters]
 
* '''Current maintainer''': [https://launchpad.net/~jwalters-isi John Paul Walters]
Line 7: Line 7:
 
== Summary ==
 
== Summary ==
  
This blueprint proposes to add support for GPU-accelrated machines as an alternative machine type in [[OpenStack]].  This blueprint is dependent on the schema changes described in the [[HeterogeneousInstanceTypes]] blueprint and the scheduler in [[HeterogeneousArchitectureScheduler]].
+
This blueprint proposes to add support for GPU-accelerated machines as an alternative machine type in [[OpenStack]].  
  
The target release for this is Diablo, however the USC-ISI team intends to have a stable test branch and deployment at Cactus release.
+
The target release for this is Grizzly.  We plan to have a stable branch at https://code.launchpad.net/~usc-isi/nova/hpc-testing.
  
 
The USC-ISI team has a functional prototype here:
 
The USC-ISI team has a functional prototype here:
 
* https://code.launchpad.net/~usc-isi/nova/hpc-trunk
 
* https://code.launchpad.net/~usc-isi/nova/hpc-trunk
* https://code.launchpad.net/~usc-isi/nova/hpc-testing (most stable)
 
  
 
This blueprint is related to the [[HeterogeneousInstanceTypes]] blueprint here:
 
This blueprint is related to the [[HeterogeneousInstanceTypes]] blueprint here:
 
* http://wiki.openstack.org/HeterogeneousInstanceTypes
 
* http://wiki.openstack.org/HeterogeneousInstanceTypes
 
We are also drafting blueprints for other machine types:
 
* http://wiki.openstack.org/HeterogeneousSgiUltraVioletSupport
 
* http://wiki.openstack.org/HeterogeneousTileraSupport
 
  
 
An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousultravioletsupport
 
An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousultravioletsupport
Line 36: Line 31:
 
== User stories ==
 
== User stories ==
  
Jackie has a CUDA-accelerated application and wants to run it on an instance that has access to GPU hardware. She chooses a cg1.4xlarge instance type, that provides access to two NVIDIA Fermi GPUs:
+
Jackie has a CUDA-accelerated application and wants to run it on an instance that has access to GPU hardware. She chooses a cg1.xlarge instance type, that provides access to two NVIDIA Fermi GPUs:
  
 
<pre><nowiki>
 
<pre><nowiki>
euca-run-instances -t cg1.4xlarge -k jackie -keypair emi-12345678
+
$ nova flavor-list
 +
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
 +
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs                                  |
 +
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
 +
| 9  | cg1.xlarge | 16384    | 160  | 0        |      | 8    | 1.0        | True      | {u'hypervisor': u's== LXC', u'gpus': u'= 2', u'gpu_arch':u's== fermi'} |
 +
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
 +
 +
$ nova boot --flavor 9 --key-name mykey --image 2b1509fe-b573-488a-be4d-d61d25c7ab4f  gpu_test
 
</nowiki></pre>
 
</nowiki></pre>
  
Line 45: Line 47:
 
== Assumptions ==
 
== Assumptions ==
  
This blueprint is dependent on cg1.4xlarge and cg1.8xlarge being selectable instance types and that the scheduler knows that this instance must get routed to a machine with GPU accelerator attached.  See [[HeterogeneousArchitectureScheduler]].
+
The only approach that has been successful for CUDA access from a kvm virtual machine that we know of is gVirtuS [http://osl.uniparthenope.it/projects/gvirtus/].   
 
+
Here we propose direct access of gpus from LXC instances.  
The only approach that has been successful for CUDA access from a kvm virtual machine that we know of is gVirtuS [http://osl.uniparthenope.it/projects/gvirtus/].  We are actively looking for alternative approaches with kvm or XEN. We assume that library has been installed.
+
We assume that the host system's kernel supports 'lxc-attach', and the utilities for 'lxc-attach' are installed.
  
 
== Design ==
 
== Design ==
  
We propose to add cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes to instance_types, instances, and compute_nodes tables. See [[HeterogeneousInstanceTypes]].
+
We have have new nova.virt.GPULibvirt which is an extension of nova.virt.libvirt to instantiate a GPU-enabled virtual machine when requested.
 
 
We have have augmented nova.virt.libvirt_conn to instantiate a GPU-enabled virtual machine when requested.
 
  
* When an instance is spawned (or rebooted), nova starts a gvirtus-enabled VM, and parses the qemu log
+
* When an instance is spawned (or rebooted), nova starts an LXC VM
* The virtual serial port is found in the qemu log
+
* The requested gpu(s) is(are) marked as allocated and its(their) device(s) is(are) created inside LXC using 'lxc-attach'
* An instance of gVirtus is started and attached to the virtual serial port
+
* Access permission to the gpu(s) is added to /cgroup
 
* Boot finalizes
 
* Boot finalizes
* When an instance is terminated (destroyed) the gVirtus process is sent a sigkill, killing gVirtus and destroying the instance
+
* When an instance is terminated (destroyed), the gpu(s) are deallocated.
  
 
=== Schema Changes ===
 
=== Schema Changes ===
 
See [[HeterogeneousInstanceTypes]].
 
 
We're proposing the following default values added to the instance_types table:
 
 
 
<pre><nowiki>
 
    # x86+GPU                                                                                                                       
 
  # TODO: we need to identify machine readable string for xpu arch                                                               
 
  'cg1.small': dict(memory_mb=2048, vcpus=1, local_gb=20,
 
                    flavorid=100,
 
                    cpu_arch="x86_64", xpu_arch="fermi", xpus=1),
 
  'cg1.medium': dict(memory_mb=4096, vcpus=2, local_gb=40,
 
                      flavorid=101,
 
                      cpu_arch="x86_64", xpu_arch="fermi", xpus=1),
 
  'cg1.large': dict(memory_mb=8192, vcpus=4, local_gb=80,
 
                    flavorid=102,
 
                    cpu_arch="x86_64", xpu_arch="fermi", xpus=1,
 
                    net_mbps=1000),
 
  'cg1.xlarge': dict(memory_mb=16384, vcpus=8, local_gb=160,
 
                      flavorid=103,
 
                      cpu_arch="x86_64", xpu_arch="fermi", xpus=1,
 
                      net_mbps=1000),
 
  'cg1.2xlarge': dict(memory_mb=16384, vcpus=8, local_gb=320,
 
                      flavorid=104,
 
                      cpu_arch="x86_64", xpu_arch="fermi", xpus=2,
 
                      net_mbps=1000),
 
  'cg1.4xlarge': dict(memory_mb=22000, vcpus=8, local_gb=1690,
 
                      flavorid=105,
 
                      cpu_arch="x86_64", cpu_info='{"model":"Nehalem"}',
 
                      xpu_arch="fermi", xpus=2,
 
                      xpu_info='{"model":"Tesla 2050", "gcores":"448"}',
 
                      net_arch="ethernet", net_mbps=10000),
 
  'cg1.8xlarge': dict(memory_mb=22000, vcpus=8, local_gb=1690,
 
                      flavorid=105,
 
                      cpu_arch="x86_64", cpu_info='{"model":"Nehalem"}',
 
                      xpu_arch="fermi", xpus=2,
 
                      xpu_info='{"model":"Tesla 2050", "gcores":"448"}',
 
                      net_arch="ethernet", net_mbps=10000),
 
</nowiki></pre>
 
 
  
 
== Implementation ==
 
== Implementation ==
Line 108: Line 67:
 
The USC-ISI team has a functional prototype:
 
The USC-ISI team has a functional prototype:
 
https://code.launchpad.net/~usc-isi/nova/hpc-trunk
 
https://code.launchpad.net/~usc-isi/nova/hpc-trunk
 
Our approach currently leverages the gVirtuS drivers:
 
http://osl.uniparthenope.it/projects/gvirtus/
 
 
=== UI Changes ===
 
 
The following will be available as new default instance types.
 
  
 
== GPUs (NVIDIA Teslas) ==
 
== GPUs (NVIDIA Teslas) ==
 
Available resources per physical node: 8 cores, 24-4=20 GB RAM, 1000GB - 100GB = 900 GB.  These match the non-GPU small, medium, large, xlarge, 2xlarge, 4xlarge  definitions.  In addition, the cg1.2xlarge is the same as Amazon GPU node definition.  The cpu_arch is "x86_64" and the xpu_arch is "fermi".
 
 
=== GPU small ===
 
 
* API name: '''cg1.small'''
 
* 1 Fermi GPU
 
* 2 GB RAM (2048 MB)
 
* 1 virtual core
 
* 20 GB of instance storage
 
 
=== GPU medium ===
 
 
* API name:'''cg1.medium'''
 
* 1 Fermi GPUs
 
* 4 GB RAM (4096 MB)
 
* 2 virtual cores
 
* 40 GB of instance storage
 
 
=== GPU large ===
 
 
* API name: '''cg1.large'''
 
* 1 Fermi GPUs
 
* 8 GB RAM (8192 MB)
 
* 4 virtual cores
 
* 80 GB of instance storage
 
 
=== GPU xlarge ===
 
 
* API name: '''cg1.xlarge'''
 
* 1 Fermi GPUs
 
* 8 GB RAM (8192 MB)
 
* 8 virtual cores
 
* 160 GB of instance storage
 
 
=== GPU 2xlarge ===
 
 
* API name: '''cg1.2xlarge'''
 
* 2 Fermi GPUs
 
* 16 GB RAM (16384 MB)
 
* 8 virtual cores
 
* 320 GB of instance storage
 
 
=== GPU 4xlarge ===
 
 
* API name: '''cg1.4xlarge'''
 
* 2 Fermi GPUs
 
* 22 GB RAM (22000 MB)
 
* 8 virtual cores
 
* 1.6 TB (1690MB) of instance storage
 
 
=== GPU 8xlarge ===
 
 
* API name: '''cg1.8xlarge'''
 
* 4 Fermi GPUs
 
* 22 GB RAM (22000 MB)
 
* 8 virtual cores
 
* 1.6 TB (1690MB) of instance storage
 
  
 
==== Code Changes ====
 
==== Code Changes ====
  
* db/sqlalchemy/migrate_repo/versions/013_add_architecture_to_instance_types.py
+
* added nova/virt/gpu/driver.py
  - add default instance types for shared memory systems
+
        Inherits [[LibvirtDriver]] and extends a few methods to provision gpus
* nova/virt/libvirt_conn.py
+
        Adds a few flags to describe gpu architecture, number of gpus, device ids, etc.
  - add code to support starting/stopping the gVirtus driver
 
* nova/flags.py
 
  - add support for gpu to connection_type
 
  
* Also requires supported qemu, the gVirtus host/VM driver, and libserial
+
* added nova/virt/gpu/utils.py
 +
        Gpu provisioning routines
  
 
=== Migration ===
 
=== Migration ===
  
Very little needs to change in terms of the way deployments will use this if we set sane defaults like "x86_64" as assumed today.
+
Unless the migration target supports gpus of the same indices, it may not work.
  
 
== Test/Demo Plan ==
 
== Test/Demo Plan ==
Line 196: Line 88:
  
 
== Unresolved issues ==
 
== Unresolved issues ==
 
One of the challenges we have is that the flavorid field in the instance_types table isn't auto-increment.  We've selected high numbers to avoid collisions, but the community should discuss how flavorid behaves and the best approach for adding future new instance types.
 
 
A second issue is that currently gVIrtus requires a virtual serial port for VM<->host initialization.  This requires us to use the serial port that is otherwise used by the Ajax term.  A consequence is that VMs using the GPUs currently cannot start an Ajax console.
 
  
 
== BoF agenda and discussion ==
 
== BoF agenda and discussion ==

Latest revision as of 02:49, 13 June 2015

Summary

This blueprint proposes to add support for GPU-accelerated machines as an alternative machine type in OpenStack.

The target release for this is Grizzly. We plan to have a stable branch at https://code.launchpad.net/~usc-isi/nova/hpc-testing.

The USC-ISI team has a functional prototype here:

This blueprint is related to the HeterogeneousInstanceTypes blueprint here:

An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/heterogeneousultravioletsupport

Release Note

Nova has been extended to make NVIDIA GPUs available to provisioned instances for CUDA programming.

Rationale

See HeterogeneousInstanceTypes.

The goal of this blueprint is to allow GPU-accelerated computing in OpenStack.

User stories

Jackie has a CUDA-accelerated application and wants to run it on an instance that has access to GPU hardware. She chooses a cg1.xlarge instance type, that provides access to two NVIDIA Fermi GPUs:

$ nova flavor-list
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs                                  |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
| 9  | cg1.xlarge | 16384     | 160  | 0         |      | 8     | 1.0         | True      | {u'hypervisor': u's== LXC', u'gpus': u'= 2', u'gpu_arch':u's== fermi'} |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+----------------------------------------------+
 
$ nova boot --flavor 9 --key-name mykey --image 2b1509fe-b573-488a-be4d-d61d25c7ab4f  gpu_test


Assumptions

The only approach that has been successful for CUDA access from a kvm virtual machine that we know of is gVirtuS [1]. Here we propose direct access of gpus from LXC instances. We assume that the host system's kernel supports 'lxc-attach', and the utilities for 'lxc-attach' are installed.

Design

We have have new nova.virt.GPULibvirt which is an extension of nova.virt.libvirt to instantiate a GPU-enabled virtual machine when requested.

  • When an instance is spawned (or rebooted), nova starts an LXC VM
  • The requested gpu(s) is(are) marked as allocated and its(their) device(s) is(are) created inside LXC using 'lxc-attach'
  • Access permission to the gpu(s) is added to /cgroup
  • Boot finalizes
  • When an instance is terminated (destroyed), the gpu(s) are deallocated.

Schema Changes

Implementation

The USC-ISI team has a functional prototype: https://code.launchpad.net/~usc-isi/nova/hpc-trunk

GPUs (NVIDIA Teslas)

Code Changes

  • added nova/virt/gpu/driver.py
       Inherits LibvirtDriver and extends a few methods to provision gpus
       Adds a few flags to describe gpu architecture, number of gpus, device ids, etc.
  • added nova/virt/gpu/utils.py
       Gpu provisioning routines

Migration

Unless the migration target supports gpus of the same indices, it may not work.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.