Jump to: navigation, search

PCI passthrough SRIOV support

Revision as of 15:35, 11 December 2013 by Yongli.he@intel.com (talk | contribs) (admin check avaliable pci flavor (white-list))

background

PCI devices has not only PCI standard property like BDF, vendor_id etc, it also has some extra information which may be application specific. For example, attached network switch for NIC, or resolution for GPU etc.These information can't be achieved through hypervisor, and may be provided externally through like configuration file.

Currently nova PCI support has basic support for such extra information in database and object layer. But we need more effort to it, including: get such information from configuration file, group devices with same extra information value etc.

this design based on this discsstion docs, the part which achieve agreement : https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs/edit?pli=1#heading=h.30de7p6sgoxp

link back to bp: https://blueprints.launchpad.net/nova/+spec/pci-extra-info


PCI configration API use cases

To get a better between user and amdin, and remove redudent code from pci, alias will fade out, white-list will be used to map devices to an pci-flavor: the group to use for scheduler and configration. this approach keep the possibility to take advange of aggregate.


User will see flavors like:

  • flavor that gives you a cheap GPU
  • flavor that gives you a big GPU
  • flavor that gives you two SSD disks (of varying types depending on where it lands) and a big GPU
  • flavor that gives you your public network via SRIOV (which could be one of several makes of network card, depending on the host picked)

And for the admin...

Admin sees:

  • per host devices (adding things to os-host/<host>/pci-device)
    • lists pci devices present
    • lists pci devices that are exposed to users, and which are in use or free
  • per pci-flavor
    • creates a pci-device description
    • specifies vendor-id, address, uuid, name, etc
    • this used to be a combination of alias and whitelist
    • this could be overlapping descriptions
  • flavor extra specs
    • this has entries that describe:
      • list of possible pci-flavors that could be picked
      • use the key: pci-passthrough:<label_just_to_make_uniqe value:<pci-flavor-uuid-1>,<pci-flavor-uuid-2>
    • for multiple devices, you just add multiple entries

Take advantage of host aggregate:

  • host aggregates used to map hosts to pci-flavor
    • use host aggregates to expose specific pci-flavors as available on a particular host

Use cases

admin check PCI devices present per host

admin might want to know if there are some pci device avaliable to use, eventhough admin should know such infomation but if the inspect works, it's convenience. there is 2 methods to implemt inspect.

   nova  host-list pci-device 
   GET  v2/​{tenant_id}​/os-hosts/​{host_name}​/os-pci-devices

return a summary infomation about pci devices on this host:

       os-pci-devices:{
                     [  
                             {
                                  'vendor_id':'8086', 
                                  'product_id':'xxx',  
                                   'address': '0000:01:00.7', 
                                   'pci-type'VF', 
                                    'status': 'avaliable' ,
                               }, 
                     ]
        }

to find which pci-flavor this device belong to, status, we had to query the database.

currently pci device in the databse is after filter, if want inpect the device on node from db, we should let all device going in to database. db will too large then eventually slow down query, became a scale problem. so we'd use RPC call for this goal, use RPC call to get reulst from a node, show it to admin.

Review comments: (addressed, by yongli)

  • please remove pci-flavor here, it doesn't quite make sense here I feel, and even if it was here, it should have been a list.
admin check avaliable pci flavor (white-list)
  • list all avaliable pci flavor (whitelist)
   nova pci-flavor-list 

   GET v2/​{tenant_id}​/os-pci-flavors
   data:
    os-pci-flavors:{
                 [ 
                            { 
                                'UUID':'xxxx-xx-xx' , 
                                'description':'xxxx' 
                                'vendor_id':'8086', 
                                'product_id':'xxx',  
                                 'address': '0000:01:*.7', 
                                 'pci-flavor':'xxx', 
                               } ,
                ]
     }

Review comments:

  • this should be called pci-flavor-list
  • I don't understand why this has all the details, is that not what the show command if for?
  • why is the host present here? even if it is here, surely it should be a list of hosts? ( all the flavor is global make sense, i remove it)
  • please look at other nova api calls, you are not following their style (see https://wiki.openstack.org/wiki/Nova/APIStyleGuide)

I hope that helps.

  • list avaliable pci flavor on host (white list)
   nova host-list pci-flavor
   GET  v2/​{tenant_id}​/os-hosts/​{host_name}​/os-pci-flavors
   data:
    os-pci-flavors{
                 [ 
                            { 
                                   'pci_flavor_uuid': <uuid>,
                                    total: 10,  
                                    available: 6,
                                    in_use: 2
                               } ,
                ]
     }

review comments: I would have expected more details about availability and less about the flavor, should it not be more like:

   [ { 'pci_flavor_uuid': <uuid>, total: 10, available: 6, in_use: 2}, {...}, ...]
  • get detailed infomation about one pci-flavor:
    nova pci-flavor-show  <white-list UUID>
  
   GET v2/​{tenant_id}​/os-pci-flavor/<UUID>
   data:
        os-pci-flavor: { 
                                'UUID':'xxxx-xx-xx' , 
                                'description':'xxxx' 
                                'vendor_id':'8086', 
                                'product_id':'xxx',  
                                 'address': '0000:01:*.7', 
                                 'pci-flavor':'xxx', 
          } 

Review comments: why is the host here? it shouldn't be here. Even if it should be, it would be a list.

admin create a pci flavor (white-list)
  1. create flavor
  nova pci-flavor-create  name 'GetMePowerfulldevice'  description "xxxxx"
  API:
  POST  v2/​{tenant_id}​/os-pci-flavors
  
  data: 
      pci-flavor: { 
             'name':'GetMePowerfulldevice',
              description: "xxxxx" 
      }
  action:  create database entry for this flavor.

Reviewer comments: data payload should be more like other nova calls: {"pci-flavor": {'name':<name>, ... other data goes here...}}

  1. update flavor defination
   nova pci-flavor-update UUID  set    'description'='xxxx'   'address'= '0000:01:*.7', 'host'='compute-id'
      
    PUT v2/​{tenant_id}​/os-pci-flavors/<UUID>
    with data  :
         { 'action': "update", 
           'pci-flavor':
                          { 
                             'description':'xxxx',
                             'address': '0000:01:*.7'}
                          }
         }
    action: set this as the new defination of the pci flavor.

Reviewer comments: again, payload should be more like { 'action': "update", 'pci-flavor': {...}}

Take advantage of host aggregate

host aggregate can be used to enhancement the scheduler for PCI.

  • create aggregate
   nova aggregate-create  pci-aware-group
   nova aggregate-add-host  host1
   nova aggregate-add-host  host2
  • map flavor to aggregate
   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:a='1:intelNICpublic'
   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:b='1:intelNICprivate'
   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:c='1:nvidiaGPUnew,nvidiaGPUolder'
  • set aggregate extra spec to enhancement PCI scheduler
   nova flavor-create --is-public true m1.iwantPCI 100 2048 20 2
   nova flavor-key 100 set  'pci-flavor='1:intelNICprivate, 1:intelNICprivate'

reviewer comments: I think I would prefer considering a flavor with one public nic, one private nic, and one GPU from a range of flavors:

   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:a='1:intelNICpublic'
   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:b='1:intelNICprivate'
   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:c='1:nvidiaGPUnew,nvidiaGPUolder'

Where a,b,c are arbitary labels to make the keys unique. They are separated into separate keys so the values don't get too big.

admin delete a pci flavor (white-list)
   nova pci-flavor-delete <UUID>
   API will be:
   DELETE v2/​{tenant_id}​/os-pci-flavor/<UUID>
   flow: delete it from database
  
admin configures extra spec in flavor request pci device

to allocate the device from a pci flavor, just fill pci flavor into the flavor's extra spec:

    nova flavor-key 100 set  'pci-flavor='1:intelNICprivate, pci-flavor='1:intelNICprivate'
    

Review comments:

  • why do we need this if we already set the flavor on the extraspecs?
admin boot VM with this flavours
   nova boot  mytest  --flavor m1.small  --image=cirros-0.3.1-x86_64-uec

Review:

  • perfect, this is what I hoped for
admin configures SRIOV flavor
  • create a pci flavor for the SRIOV
  nova pci-flavor-create  name 'vlan-SRIOV'  description "xxxxx"
  nova pci-flavor-update UUID  set    'description'='xxxx'   'address'= '0000:01:*.7'


Admin config SRIOV
  • create pci-flavor :
   {"name": "privateNIC", neutron-network-uuid: "uuid-1", ...}
   {"name": "publicNIC", neutron-network-uuid: "uuid-2", ...}
   {"name": "smallGPU", neutron-network-uuid: "", ...}
  • set aggregate meta for network

flavor extra-specs, for a VM that gets two small GPUs and VIFs attached from the above SRIOV NICs:

   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:a='2:smallGPU'
   nova aggregate-set-metadata pci-aware-group set 'pci-flavor:a='vif:privateNIC,privateNIC'
  • create instance flavor for sriov
    nova flavor-key 100 set  'pci-flavor='1:privateNIC, 2:smallGPU'
   
  • User just specifies a quantum port as normal:
   nova boot --flavor "sriov-plus-two-gpu" --image img --nic net-id=uuid-2 --nic net-id=uuid-1 vm-name

And the uuid-1 and uuid-2 map to a "provider" network (with VLAN config, etc) that gets implemented using the privateNIC and publicNIC flavors?

Reviewer comment: the flavor creation is an admin operation, not a user opertaion. Reviewer comments:

  • I still think this is wrong...
  • can't a pci-flavour just take a quantum network uuid? when network uuid is specified, the device is only ever attached by the VIF driver
  • then the user requests a flavor where there are some SRIOV options that the VIF attach, and the VIF driver does what it can.

here is the example...

transite config file to API

  1. the config file for alias and whitelist defination is going to deprecated.
  2. if database is not NULL , configration is ommit and given deprecated warning.
  3. if database is NULL, config if read from the file,
    *white list/alias schema still work
    * And also  given a deprecated notice, alias will fade out  which will be remove start from next release.

with this solution, we move pci config from file to API.

Reviewer comments: This sounds good, but I am stopping reviewer here.

DB for pci configration

each pci flavor will be a set of (k,v), and the pci flavor don't need to contain same k, v pair. another problem this define try to slove is, i,.e SRIOV also want feature autodiscovery(under discuss), with this, the flavor might need a 'feature' key to be added if not store it as (k,v) pair. the (k,v) paire define let more extra infomation can be store in the pci device.

  talbe: pci_flavor{
               id   :  data base of this k,v pair
               UUID:  which pci-flavor the  k,v belong to
               compute_node_id:  easy to query per host configration, NULL for global flavor defination( if aggreate avaliable, this fade out)
               key 
               value
               value_type:  ( value might be a simple value or Regular express, don't sure for now.)
           }

PCI releated Objects

  Objects: white list/ alias/ group keys(on demand grop)
  objects property:  sets of k,v
   

API interface

  • get pci devices infomation on host
  nova  host-list pci-device 
  GET  v2/​{tenant_id}​/os-hosts/​{host_name}​os-host/<host>/os-pci-devices
  return a summary infomation about pci devices on this host:
   [  {'vendor_id':'8086', 'product_id':'xxx',  'address': '0000:01:00.7',  'pci-type'VF', 'status': 'avaliable' , 'pci-flavor':'xxx'  }}
* list avaliable pci flavor on host (white list) [might not needed]
  nova host-list pci-flavor
  GET  v2/​{tenant_id}​/os-hosts/​{host_name}​os-host/<host>/os-pci-flavors
  data:
   [  { 'description':'xxxx' 'vendor_id':'8086', 'product_id':'xxx',  'address': '0000:01:*.7',  'pci-flavor':'xxx','host':'compute-id' , 'UUID':'xxxx-xx-xx'}  }
  • list all avaliable pci flavor (whitelist)
  nova pci-flavor-show 
  GET v2/​{tenant_id}​/os-pci-flavors
   data:
   [  { 'description':'xxxx' 'vendor_id':'8086', 'product_id':'xxx',  'address': '0000:01:*.7',  'pci-flavor':'xxx',  'host':'compute-id' ,'UUID':'xxxx-xx-xx' }  }


  • get detailed infomation about one pci-flavor:
   nova pci-flavor-show  <white-list UUID>
  GET v2/​{tenant_id}​/os-pci-flavor/<UUID>
  data:
  { 'description':'xxxx'  'vendor_id':'8086', 'product_id':'xxx',  'address': '0000:01:*.7',  'pci-flavor':'xxx', 'UUID':'xxxx-xx-xx' ,'host':'compute-id' }
  • create pci flavor
 nova pci-flavor-create  name 'GetMePowerfulldevice'  description "xxxxx"
 POST  v2/​{tenant_id}​/os-pci-flavors
 data: 
      { 'pci-flavor':'GetMePowerfulldevice', description: "xxxxx" }
  • update the pci flavor
   nova pci-flavor-update UUID  set    'description'='xxxx'   'address'= '0000:01:*.7', 'host'='compute-id'
   PUT v2/​{tenant_id}​/os-pci-flavors/<UUID>
   with data  :
        { 'description':'xxxx',  'address': '0000:01:*.7'}
  • delete a pci flavor
  nova pci-flavor-delete <UUID>
  DELETE v2/​{tenant_id}​/os-pci-flavor/<UUID>

Requements from SRIOV

  • group device
  for SRIOV, all VFs belong to same PF share same physical network reachability. so if you want, say, deploy a vlan network, you need choose the right PF's VF, otherwise network does not work for you.
  • tracking device alloced to the NIC
  networking or other special deive is not as simple as pass though to the VM, there is need more configration. to acheive this, SRIOV must know the device infomation allocation to the specific NIC.

Implement the grouping

  spec: a filter defined by (k,v) paris
  extra_spec: the filter defined by (k, v) and k not in the pci object fileds. 

pci utils/objects support grouping

      * pci utils k,v match support the list values
      * objects provide a class level extrac interface to extract base spec and extra spec

pci-flavor(white list) support address set

      * white list support 'address':[bdf1, ....]
      * white list support  any other (k,v) pair to group or store special infomation 
      * object extrac specs and extra_info, specs use as whitelist spec, extra info will be updated to device's extra_info fields

enable flavor support pci-flavor

      * pci-flavor's name set in the extra spec in the instance type 
      * pci manager use extrac method extrac the specs and extra_specs, match them agains  the pci object & object.extra_info.

pci stats grouping device on demand

        * pci_grouping_key configration option define a set of key name which will used to group the device to stats
        * default value is  [vendor_id, product_id], this current implemtation
        * limited support to 3 keys grouping for algorithm simplicity.

Implement tracking device allocated for the pci-flavor

here is the idea how user can identify which device allocated for the pci-flavor.

    *while allocated device, user put a  marker into the device ( into the pci device extra_info fileds) 
    *after finished allocation, user can seach a instance's pci devices to find the specific marker
   the way marker data transfer from user to device utilize the pci_request, which convert from the pci-flavor.

Reviewer: please see the extensible resource manager blueprint