Jump to: navigation, search

Difference between revisions of "Meetings/Passthrough"

(Agenda on Jan. 14th 2014)
Line 1: Line 1:
 +
===Agenda on Jan 15th, 2014===
 +
* Ian's proposal for flavor and backend tagging: https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#
 +
* Review this against known use cases
 +
* Document any use cases not in that document
 
=== Agenda on Jan. 14th 2014===
 
=== Agenda on Jan. 14th 2014===
 
* PCI group versus PCI flavor: let's sort out what exactly they are, APIs around them, and pros and cons of each.  
 
* PCI group versus PCI flavor: let's sort out what exactly they are, APIs around them, and pros and cons of each.  

Revision as of 12:29, 15 January 2014

Agenda on Jan 15th, 2014

Agenda on Jan. 14th 2014

  • PCI group versus PCI flavor: let's sort out what exactly they are, APIs around them, and pros and cons of each.
  • Please check [1]
  • Division of works

POC Implementation

See POC implementation

Definitions

A specific PCI attachment (could by a virtual function) is described by:

  • vendor_id
  • product_id
  • address

There is a whitelist (at the moment):

  • which devices on a specific hypervisor host can be exposed

There is an alias (at the moment):

  • groupings of PCI devices


The user view of system

For GPU passthrough, we need things like:

  • user request a "large" GPU
  • could be from various vendors or product versions

For network, we need things like:

  • user requests a nic for a specific neutron network
  • they want to say if it's virtual (the default type) or passthrough (super fast, slow, etc)
  • this includes groups by address (virtual function, etc) so it's specific to a particular _group_ of neutron network, each with specific configurations (e.g. VLAN id, a NIC attached to a specific provider network)
  • or it involves a NIC that can be programmatically made to attach to a specific neutron network

The user view of requesting things

For GPU passthrough:

  • user requests a flavor extra specs *imply* which possible PCI devices can be connected
  • nova boot --image some_image --flavor flavor_that_has_big_GPU_attached some_name

The admin would expose a flavor that gives you, for example, one large GPU and one small GPU:

  • nova flavor-key m1.large set "pci_passthrough:alias"=" large_GPU:1,small_GPU:1"
  • TODO - this may change in the future


For SRIOV:

  • in the most basic case, the user may be given direct access to a network card, just like we do with GPU, but this is less useful than...
  • user requests neutron nics, on specific neutron networks, but connected in a specific way (i.e. high speed SRIOV vs virtual)
  • note that some of the nics may be virtual, some may be passthrough, and some might be a different type of passthrough
  • nova boot --flavor m1.large --image <image_id> --nic net-id=<net-id>,nic-type=<slow | fast | foobar> <vm-name>
  • (where slow is a virtual connection, fast is a PCI passthrough, and foobar is some other type of PCI passthrough)
  • consider several nics, of different types: nova boot --flavor m1.large --image <image_id> --nic net-id=<net-id-1> --nic net-id=<net-id-2>,nic-type=fast --nic net-id=<net-id-3>,nic-type=faster <vm-name>
  • when hot-plugging hot-unplugging, we also need to specify vnic-type in a similar way
  • also, this should work nova boot --flavor m1.large --image <image_id> --nic port-id=<port-id>, given
  • quantum port-create --fixed-ip subnet_id=<subnet-id>,ip_address=192.168.57.101 <net-id> --nic-type=<slow | fast | foobar>

TODO: need agreement, but one idea for admin...

  • pci_alias_1='{"name":"Cisco.VIC", devices:[{"vendor_id":"1137","product_id":"0071", address:"*", "attach-type":"macvtap"}],"nic-type":"fast"}'
  • pci_alias_2='{"name":"Fast",devices:[{"vendor_id":"1137","product_id":"0071", address:"*","attach-type":"direct"}, {"vendor_id":"123","product_id":"0081", address:"*","attach-type":"macvtap"}],"nic-type":"faster", }'

New Proposal for admin view

Whitelist:

  • only certain devices exposed to Nova
  • just a list of addresses that are allowed (including wildcards)
  • by default, nothing is allowed
  • this is assumed to be (mostly) static for the lifetime of the machine
  • contained in nova.conf

PCI flavors:

  • specify groups of PCI devices, to be used in Neutron port descriptions or Server flavor extra specs
  • configured using host aggregates API:
    • a combination of whitelist, alias and group
    • raw device passthrough (grouped by device_id and product_id)
    • network device passthrough (grouped by device address also)
    • note: there might be several options for each (GPU v3 and GPU v4 in a single flavor)
  • only servers in the aggregate will be considered by the scheduler for each PCI flavor
  • these are shared across the whole child cell (or if no cells, whole nova deploy)

Scheduler updates:

  • on periodic update, report current status of devices
  • if any devices are in the whitelist, look up host aggregates to check what device types to report
  • report the number of free devices per PCI flavor
  • device usage tracked by resource manager as normal, looking at all devices in whitelist

On attach of PCI device:

  • scheduler picks host it things has a free device
  • check with resource manage in usual way
  • assign device to VM
  • ignoring migration for now

On attach of VIF device (through boot or otherwise):

  • TBD... very sketchy plan...
  • ideally neutron port contains associated PCI flavor/alias, or its assumed to be a virtual port
  • neutron supplies the usual information, VLAN-id, etc
  • neutron and nova negotiate which VIF driver to use, in usual way, given extra info about nic-type from PCI alias settings, etc
  • VIF driver given a hypervisor agnostic lib to attach the PCI device, extracted from Nova attach PCI device code
  • VIF driver is free to configure the specific PCI device before attaching it using the callback into the Nova driver (or modify Nova code to extend the create API)

Agenda on Jan. 8th, 2014

Let's go over the key concepts and use cases. In the use cases, neutron or neutron plugin specific configurations are not mentioned.

Key Concepts

  • PCI Groups
  1. A PCI group is a collection of PCI devices that share the same functions or belong to the same subsystem in a cloud.

In fact, two proposals exist for PCI group definition - via API, with the implication that they're stored centrally in the database, and via config, specifically a (match expression -> PCI group name) in the compute node configuration. A competing proposal is PCI aliases, which work on the current assumption that all PCI device data is returned to the database and PCI devices can be selected by doing matching at schedule time and thus a name -> match expression mapping is all that need be saved. Thus the internal question of "should all device information be returned to the controller" drives some of the design options.

  1. it's worth mentioning that using an API to define PCI groups make them owned by the tenant who creates them.
  • Pre-defined PCI Groups
For each PCI device class that openstack supports, a PCI group is defined and associated with the PCI devices belonging to that device class. For example, for the PCI device class net, there is a predefined PCI group named net
  • User-defined PCI Groups
User can define PCI groups using a Nova API.
  • PCI Passthrough List (whitelist)
  1. Specified on a compute node to define all the PCI passthrough devices and their associated PCI groups that are available on the node.
  2. blacklist (exclude list) may be added later if deemed necessary.
  • vnic_type:
  1. virtio: a virtual port that is attached to a virtual switch
  2. direct: SRIOV without macvtap
  3. macvtap: SRIOV with macvtap

This configuration item is not essential to PCI passthrough. It's also a Neutron configuration item.

  • nova boot: new parameters in --nic option
  1. vnic-type=“vnic” | “direct” | “macvtap”
  2. pci-group=pci-group-name
  3. port-profile=port-profile-name This property is not related directly to use of PCI passthrough for networks. It is a requirement of 802.1BR-based systems.
  • neutron port-create: new arguments
  1. --vnic-type “vnic” | “direct” | “macvtap”
  2. --pci-group pci-group-name
  3. port-profile port-profile-name
  • Nova SRIOV Configuration
  1. vnic_type = <vnic-type>: specified in controller node to indicate the default vnic-type that VMs will be booted with. default value is "vnic"
  2. sriov_auto_all = <on | off>: specified in compute nodes to indicate that all sriov capable ports are added into the ‘net’ PCI group.
  3. sriov_only = <on | off>: specified in compute nodes to indicate that nova can only place VMs with sriov vnics onto these nodes. Default value is on for nodes with SRIOV ports.
  4. sriov_pci_group = <pci-group-name>: specified in compute nodes in which all of its SRIOV ports belong to a single pci group.

The SRIOV cofiguration items are enhancements to the base proposal that make it much easier to configure compute hosts where it is known that all VFs will be available to cloud users.

Use Cases

These use cases do not include non-network passthrough cases.

  • SRIOV based cloud
  1. All the compute nodes are identical and all the NICs are SRIOV based
  2. All the NICs are connected to the same physical network
In this cloud, the admin only needs to specify vnic_type=direct on the controller and sriov_auto_all=on on the compute nodes in the nova configuration file. In addition, the new arguments introduced in the nova boot command are not required.
  • A cloud with mixed Vnics
  1. On compute nodes with sriov ports only, set sriov_auto_all = on
  2. On compute nodes without sriov ports, no change is required
In such a cloud, when booting a VM with sriov vnic, the nova boot command would look like:
   nova boot --flavor m1.large --image <image_id>
                          --nic net-id=<net-id>,vnic-type=direct <vm-name>
This will require some minimum change in the existing applications.
  • A Cloud that requires multiple SRIOV PCI groups
  1. create all the pci-groups in the cloud by invoking a Nova API
  2. on compute nodes that support a single pci group and in which all of the SRIOV ports belong to this group, set sriov_auto_all=on, sriov_pci_group=<group_name>
  3. on compute nodes that support multiple pci groups, define the pci-passthrough-list
In such a cloud, when booting a VM with sriov macvtap, the nova boot command would look like:
    nova boot --flavor m1.large --image <image_id> 
                   --nic net-id=<net-id>,vnic-type=macvtap,pci-group=<group-name> <vm-name>
  • Introducing new compute nodes with SRIOV into an existing cloud
Depending on the cloud and the compute node being introduced:
  1. it could be as simple as adding sriov_auto_all=on into the nova config file
  2. it could be setting sriov_auto_all=on and pci_group=<group_name>
  3. it could be defining the pci-passthrough-list.
  • NIC hot plug

Evolving Design Doc

https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs/edit?usp=sharing

Ian typed up a complete proposal in two sections in that document, which is pulled out here: https://docs.google.com/document/d/1svN89UXKbFoka0EF6MFUP6OwdNvhY4OkdjEZN-rD-0Q/edit# - this proposal takes the 'PCI groups via compute node config' approach and makes no attempt at proposing APIs.

Previous Meetings

http://eavesdrop.openstack.org/meetings/pci_passthrough_meeting/2013/pci_passthrough_meeting.2013-12-24-14.02.log.html http://eavesdrop.openstack.org/meetings/pci_passthrough/

Meeting log on Dec. 17th, 2013

Meetings/Passthrough/dec-17th-2013.log