Jump to: navigation, search

Difference between revisions of "PCI passthrough SRIOV support Icehouse"

(Background)
 
(29 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== Background ===
+
===Overall Idea===
 +
Cloud admin provides extra attributes for assignable PCI devices, user (including neutron and normal cloud user) can request PCI device with specified extra attributes, and nova scheduler can make decision based on these extra PCI device attributes.
  
This document describes the generic PCI pass-through enhancement that we want to achieve in IceHouse release, to support the SRIO-IOV NIC pass-through.
+
This is an important requirement for SR-IOV NIC support, and also will be helpful to other usage cases as well.
  
The corresponding long-term design documents is at https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1# . Because of tight schedule and still some disagreement left, we will only implement part of the design document in I release.
+
=== Changes Plan===
  
Please refer to https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support for final changes plan.
 
  
===Use cases ===
+
====PCI Infomation Config Item====
 +
*Description
 +
:A new configuration item added to extend current PCI white list configuration items to:
  
==== General PCI pass through ====
+
::Select assignable PCI devices based on reduced regular expression
given compute nodes contain 1 GPU with vendor:device 8086:0001
+
::Configure additional arbitrary info for these devices, as any  (k,v) pair, k and v is string.
  
*on the compute nodes, config the pci_information
+
*Backward compatibility
    pci_information =  { { 'device_id': "8086", 'vendor_id': "0001" }, {} }
+
:For compute node w/o PCI information configuration item, the PCI white list will be used, as no additional info provided.
 +
:For compute node w/ both PCI information and PCI white list configuration items, the PCI information will overwrite the PCI whitelist.
  
* on controller
+
*Long Term Change
  pci_flavor_attrs = ['device_id', 'vendor_id']
+
:No. This item will be same for long term design.
  pci_alias = {'vendor_id':'8086', 'product_id':'0001', 'name':bigGPU', description: 'passthrough Intel's on-die GPU'}
 
  
the compute node would report PCI stats group by ('device_id', 'vendor_id').
+
*Example
pci stats will report one pool:  
+
:pci_information =  { { 'vendor_id': "8086", 'device_id': "000[1-2]" }, { 'e.physical_netowrk': 'X' } }
  {'device_id':'0001', 'vendor_id':'8086', 'count': 1 }
 
  
* create flavor and boot with it
+
:This configuration specified device with vendor_id as 0x8086, and device_id as 0x0001 or 0x0002 as assignable device. And these devices have additional information of 'e.physical_network' as 'X', meaning that the physical network connected to these devices are 'X'.
  
  nova flavor-key m1.small set pci_passthrough:pci_flavor= 1:bigGPU
+
====Pci_flavor_attrs Config Item====
  nova boot  mytest  --flavor m1.tiny  --image=cirros-0.3.1-x86_64-uec
 
  
==== General PCI pass through with multi PCI flavor candidate ====
+
*Description
 +
:Specify the PCI information and extra information that can be used to express PCI device requirement and that can be used by PCI scheduler to make decision.
  
given compute nodes contain 2 type GPU with , vendor:device 8086:0001, or vendor:device 8086:0002
+
*Backward Compatibility
 +
:If not specified, it's vendor_id/device_id/extra_info as currently implemented in pci/pci_stats.py.
  
*on the compute nodes, config the pci_information
+
*Long Term Change
    pci_information =  { { 'device_id': "8086", 'vendor_id': "000[1-2]" }, {} }
+
:In I release, the pci_flavor_attrs is defined in both compute nodes and controller nodes. After I release, it will be defined in controller nodes (the scheduler node) only, and compute nodes get such information from controller nodes.
  
* on controller  
+
:As the controller nodes are always updated before compute node, there will be no update issue.
  pci_flavor_attrs = ['device_id', 'vendor_id']
 
  pci_alias = [{'vendor_id':'8086', 'product_id':'0001', 'name':bigGPU', description: 'Intel's on-die GPU'},  {'vendor_id':'8086', 'product_id':'0002', 'name':bigGPU2', description: ' New Intel's on-die GPU'}]
 
  
the compute node would report PCI stats group by ('device_id', 'vendor_id').
+
*Example
pci stats will report 2 pool:
+
:pci_flavor_attrs=[product_id, vendor_id, e.physical_network]
  
  {'device_id':'0001', 'vendor_id':'8086', 'count': 1 }
+
====Extending PCI Stats====
  {'device_id':'0002', 'vendor_id':'8086', 'count': 1 }
 
  
 +
*Description
 +
:Current PCI stats group devices only on  [vendor_id, product_id, extra_info]. This will be extended to group by keys specified in pci_flavor_attrs.
  
* create flavor and boot with it
+
*Backward compatibility
  nova flavor-key m1.small set pci_passthrough:pci_flavor= '1:bigGPU,1:bigGPU2;'
+
:The PCI stats are populated by compute node into DB and then utilized by Nova Scheduler. During live update, there will be compute node still populate based on old PCI stats configuration.
  nova boot  mytest  --flavor m1.tiny  --image=cirros-0.3.1-x86_64-uec
 
  
==== General PCI pass through wild-cast PCI flavor ====
+
:But this should be harmless since:
 +
::a) If compute node provides more information than pci_flavor_attrs required, scheduler will not use that PCI information and the schedule decision is correct still.
 +
::b) If compute node provides less information than pci_flavor_attrs required, the scheduler will treat the value of the information as None and the result is some PCI requirement may fail while there are host can meet the requirement.  But it is transient and have no correctness issue.
  
given compute nodes contain 2 type GPUs with vendor:device 8086:0001, or vendor:device 8086:0002
+
*Long Term Change
 +
:No. This is same as long term changes.
  
*on the compute nodes, config the pci_information
+
*Example
    pci_information =  { { 'device_id': "8086", 'vendor_id': "000[1-2]" }, {} }
+
:N/A
  
* on controller
+
====Extending PCI Alias====
  pci_flavor_attrs = ['device_id', 'vendor_id']
+
*Description
  pci_alias = [{'vendor_id':'8086', 'product_id':'000[1-2]', 'name':bigGPU', description: 'Intel's on-die GPU'}]
+
: Currently PCI alias support PCI requirement with only vendor_id/device_id as keys. Now, as cloud admin can specify additional device information, and user can requires PCI devices with specified additional information, the PCI alias should be extended to support keys defined in pci_flavor_attrs.
  
the compute node would report PCI stats group by ('device_id', 'vendor_id').
+
*Backward Compatibility
pci stats will report 2 pool:
+
:No.
  
  {'device_id':'0001', 'vendor_id':'8086', 'count': 1 }
+
:The PCI alias is translated into PCI request when VM launch and not referred anymore. When new alias is defined at controller node at upgrade time, with keys from pci_flavor_attrs, new instanced will use the new alias. Old instances are safe as the PCI alias is not referred anymore.
  {'device_id':'0002', 'vendor_id':'8086', 'count': 1 }
 
  
 +
*Long Term Change
 +
:In long term, the PCI alias will be replaced by PCI flavor, which will be created by API.
  
* create flavor and boot with it
+
:I think there will be upgrade issue in such situation.
  
  nova flavor-key m1.small set pci_passthrough:pci_flavor= '1:bigGPU;'
+
*Example:
  nova boot  mytest  --flavor m1.tiny  --image=cirros-0.3.1-x86_64-uec
+
:pci_alias = [{'e.physical_netowrk':'X', 'vendor_id': '8086', 'name':'intel_nic_x_net', description: 'Intel NIC connected to physical network X'}]
  
====  PCI pass through support grouping tag ====
 
  
given compute nodes contain 2 type GPU with , vendor:device 8086:0001, or vendor:device 8086:0002
+
=== Usage Cases ===
  
*on the compute nodes, config the pci_information
+
==== Graphics Assignment ====
    pci_information =  { { 'device_id': "8086", 'vendor_id': "000[1-2]" }, { 'e.group':'gpu' } }
+
*Description
 +
:Considering different GPUs in cloud system, some support DirectX 11 and some support DirectX 10 only. Cloud vendor will charge different price for different GPU capability.
  
* on controller
+
:A cloud user wants to create 2 VMs utilizing the GPU card.  The application in one VM need only DirectX 10 and in another VM need DirectX 11. This requirement can't be achieved in H release implementation but can be achieved through this enhancement.
  pci_flavor_attrs = ['e.group']
 
  pci_alias = [{'e.group':'gpu', 'name':bigGPU', description: 'Intel's on-die GPU'}]
 
  
the compute node would report PCI stats group by ('e.group').
+
*Steps
pci stats will report 1 pool:
+
:Cloud admin defines extra information for the GPU cards, eg, e.highest_directx_version = '10' or e.highest_directx_version = '11' .
{'e.group':'gpu', 'count': 2 }
 
  
* create flavor and boot with it
+
:Cloud admin then puts 'e.hightest_directx_version' in the pci_flavor_attrs.
  
  nova flavor-key m1.small set pci_passthrough:pci_flavor= '1:bigGPU;'
+
:Cloud admin defines pci_alias as :
  nova boot  mytest  --flavor m1.tiny  --image=cirros-0.3.1-x86_64-uec
 
  
====  PCI SRIOV with tagged flavor ====
+
::pci_alias = [{'e.hightest_directx_version':'10', 'vendor_id': '8086', 'name':'intel_gfx_dirx_10', description: 'Intel graphics card support DirectX 10 at most'}]
given compute nodes contain 5 PCI NIC , vendor:device 8086:0022, and it connect to physical network "X".
 
  
*on the compute nodes, config the pci_information
+
::pci_alias = [{'e.hightest_directx_version':'11', 'vendor_id': '8086', 'name':'intel_gfx_dirx_11', description: 'Intel graphics card support DirectX 11 at most'}]
  
    pci_information = { { 'device_id': "8086", 'vendor_id': "000[1-2]" }, { 'e.physical_netowrk': 'X' } }
+
:Cloud admin defines instance flavors as:
 +
::nova flavor-key m1.small set pci_passthrough:pci_alias= 1:intel_gfx_dirx_10
 +
::nova flavor-key m1.big set pci_passthrough:pci_alias= 1:intel_gfx_dirx_11
  
* on controller
+
:Cloud user creates instances as:
 +
::nova boot  direct10_app  --flavor m1.small  --image=cirros-0.3.1-x86_64-uec
 +
::nova boot  direct11_app  --flavor m1.big  --image=cirros-0.3.1-x86_64-uec
  
  pci_flavor_attrs = ['e.physical_netowrk']
 
  pci_alias = [{'e.physical_netowrk':'X', 'name':phyX_NIC', description: 'NIC connect to physical network X'}]
 
  
the compute node would report PCI stats group by ('e.group').
+
====PCI SR-IOV NIC====
pci stats will report 1 pool:
+
:See https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov for detailed usage.
 
 
  {'e.physical_netowrk':'X', 'count': 1 }
 
 
 
 
 
* create flavor and boot with it
 
  nova boot  mytest  --flavor m1.tiny  --image=cirros-0.3.1-x86_64-uec  --nic  net-id=network_X  pci_flavor= 'phyX_NIC:1'
 
 
 
=== Changes ===
 
 
 
====PCI infomation Config Item====
 
A new configuration item added to extend current PCI white list configuration items, to:
 
      * Select assignable PCI devices based on  reduce regular expression
 
      * Accept and store pci device's extra info as any  (k,v) pair, k and v is string.
 
 
 
====pci_flavor_attrs Config Items====
 
      * Specify the PCI information and extra information that can be used for PCI alias and PCI stats.
 
 
 
====pci stats grouping device base on pci_flavor_attrs ====
 
       
 
  *  current gourping base on  [vendor_id, product_id, extra_info] only
 
  *  going to gourping by key specified by  pci_flavor_attrs.
 
 
====PCI alias====
 
    *current PCI alias can only accept 'vendor_id, product_id,device_type' , will be extended to support any key specified in pci_flavor_attrs.
 
    *current PCI alias can only compare if the pci device value for the key is equal to the one specified in pci alias, will be extended to support simple regex.
 

Latest revision as of 02:52, 4 February 2014

Overall Idea

Cloud admin provides extra attributes for assignable PCI devices, user (including neutron and normal cloud user) can request PCI device with specified extra attributes, and nova scheduler can make decision based on these extra PCI device attributes.

This is an important requirement for SR-IOV NIC support, and also will be helpful to other usage cases as well.

Changes Plan

PCI Infomation Config Item

  • Description
A new configuration item added to extend current PCI white list configuration items to:
Select assignable PCI devices based on reduced regular expression
Configure additional arbitrary info for these devices, as any (k,v) pair, k and v is string.
  • Backward compatibility
For compute node w/o PCI information configuration item, the PCI white list will be used, as no additional info provided.
For compute node w/ both PCI information and PCI white list configuration items, the PCI information will overwrite the PCI whitelist.
  • Long Term Change
No. This item will be same for long term design.
  • Example
pci_information = { { 'vendor_id': "8086", 'device_id': "000[1-2]" }, { 'e.physical_netowrk': 'X' } }
This configuration specified device with vendor_id as 0x8086, and device_id as 0x0001 or 0x0002 as assignable device. And these devices have additional information of 'e.physical_network' as 'X', meaning that the physical network connected to these devices are 'X'.

Pci_flavor_attrs Config Item

  • Description
Specify the PCI information and extra information that can be used to express PCI device requirement and that can be used by PCI scheduler to make decision.
  • Backward Compatibility
If not specified, it's vendor_id/device_id/extra_info as currently implemented in pci/pci_stats.py.
  • Long Term Change
In I release, the pci_flavor_attrs is defined in both compute nodes and controller nodes. After I release, it will be defined in controller nodes (the scheduler node) only, and compute nodes get such information from controller nodes.
As the controller nodes are always updated before compute node, there will be no update issue.
  • Example
pci_flavor_attrs=[product_id, vendor_id, e.physical_network]

Extending PCI Stats

  • Description
Current PCI stats group devices only on [vendor_id, product_id, extra_info]. This will be extended to group by keys specified in pci_flavor_attrs.
  • Backward compatibility
The PCI stats are populated by compute node into DB and then utilized by Nova Scheduler. During live update, there will be compute node still populate based on old PCI stats configuration.
But this should be harmless since:
a) If compute node provides more information than pci_flavor_attrs required, scheduler will not use that PCI information and the schedule decision is correct still.
b) If compute node provides less information than pci_flavor_attrs required, the scheduler will treat the value of the information as None and the result is some PCI requirement may fail while there are host can meet the requirement. But it is transient and have no correctness issue.
  • Long Term Change
No. This is same as long term changes.
  • Example
N/A

Extending PCI Alias

  • Description
Currently PCI alias support PCI requirement with only vendor_id/device_id as keys. Now, as cloud admin can specify additional device information, and user can requires PCI devices with specified additional information, the PCI alias should be extended to support keys defined in pci_flavor_attrs.
  • Backward Compatibility
No.
The PCI alias is translated into PCI request when VM launch and not referred anymore. When new alias is defined at controller node at upgrade time, with keys from pci_flavor_attrs, new instanced will use the new alias. Old instances are safe as the PCI alias is not referred anymore.
  • Long Term Change
In long term, the PCI alias will be replaced by PCI flavor, which will be created by API.
I think there will be upgrade issue in such situation.
  • Example:
pci_alias = [{'e.physical_netowrk':'X', 'vendor_id': '8086', 'name':'intel_nic_x_net', description: 'Intel NIC connected to physical network X'}]


Usage Cases

Graphics Assignment

  • Description
Considering different GPUs in cloud system, some support DirectX 11 and some support DirectX 10 only. Cloud vendor will charge different price for different GPU capability.
A cloud user wants to create 2 VMs utilizing the GPU card. The application in one VM need only DirectX 10 and in another VM need DirectX 11. This requirement can't be achieved in H release implementation but can be achieved through this enhancement.
  • Steps
Cloud admin defines extra information for the GPU cards, eg, e.highest_directx_version = '10' or e.highest_directx_version = '11' .
Cloud admin then puts 'e.hightest_directx_version' in the pci_flavor_attrs.
Cloud admin defines pci_alias as :
pci_alias = [{'e.hightest_directx_version':'10', 'vendor_id': '8086', 'name':'intel_gfx_dirx_10', description: 'Intel graphics card support DirectX 10 at most'}]
pci_alias = [{'e.hightest_directx_version':'11', 'vendor_id': '8086', 'name':'intel_gfx_dirx_11', description: 'Intel graphics card support DirectX 11 at most'}]
Cloud admin defines instance flavors as:
nova flavor-key m1.small set pci_passthrough:pci_alias= 1:intel_gfx_dirx_10
nova flavor-key m1.big set pci_passthrough:pci_alias= 1:intel_gfx_dirx_11
Cloud user creates instances as:
nova boot direct10_app --flavor m1.small --image=cirros-0.3.1-x86_64-uec
nova boot direct11_app --flavor m1.big --image=cirros-0.3.1-x86_64-uec


PCI SR-IOV NIC

See https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov for detailed usage.