Difference between revisions of "Mellanox-Neutron-Train-InfiniBand"
(→DHCP Agent) |
(→Example1:) |
||
(27 intermediate revisions by the same user not shown) | |||
Line 26: | Line 26: | ||
==SM Node== | ==SM Node== | ||
[https://linux.die.net/man/8/opensm OpenSM] is a common implementation for an Infiniband SM. | [https://linux.die.net/man/8/opensm OpenSM] is a common implementation for an Infiniband SM. | ||
+ | |||
OpenSM can configured in two ways: | OpenSM can configured in two ways: | ||
=== OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver === | === OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver === | ||
Line 34: | Line 35: | ||
For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration | For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration | ||
out of band. | out of band. | ||
+ | |||
+ | Install OpenSM that comes bundled with [http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers Mellanox OFED] 4.6-1.0.1.1 or greater | ||
+ | |||
+ | '''Note:''' to create opensm configuration file run: | ||
+ | # opensm --create-config /etc/opensm/opensm.conf | ||
to disable mlnx_sdn_assist sync with the SDN set the following configuration option in ''/etc/neutron/plugins/ml2/ml2_conf.ini'' | to disable mlnx_sdn_assist sync with the SDN set the following configuration option in ''/etc/neutron/plugins/ml2/ml2_conf.ini'' | ||
Line 43: | Line 49: | ||
Change the following in /etc/opensm/opensm.conf: | Change the following in /etc/opensm/opensm.conf: | ||
allow_both_pkeys TRUE | allow_both_pkeys TRUE | ||
+ | |||
+ | ===== Partition membership configuration ===== | ||
+ | In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). | ||
+ | The ''partitions.conf'' file contains configuration of partition membership for all network endpoints in the system. | ||
+ | |||
+ | It is required that Each PKEY to have all GUIDs as members. | ||
+ | |||
+ | There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10) | ||
+ | |||
+ | ====== Example1: ====== | ||
+ | |||
+ | The following configurations supports 3 networks on VLANs 3,4,5. | ||
+ | |||
+ | management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; | ||
+ | vlan3=0x3, ipoib, defmember=full : ALL; | ||
+ | vlan4=0x4, ipoib, defmember=full : ALL; | ||
+ | vlan5=0x5, ipoib, defmember=full : ALL; | ||
+ | untagged=0xfff, ipoib, defmember=full : ALL; # Required for port cleanup | ||
====For ConnectX®-4 or newer use the following configuration==== | ====For ConnectX®-4 or newer use the following configuration==== | ||
Line 48: | Line 72: | ||
Change the following in /etc/opensm/opensm.conf: | Change the following in /etc/opensm/opensm.conf: | ||
virt_enabled 2 | virt_enabled 2 | ||
− | |||
− | ==== Partition membership configuration ==== | + | ===== Partition membership configuration ===== |
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). | In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). | ||
− | + | The ''partitions.conf'' file contains configuration of partition membership for all network endpoints in the system. | |
The following guidelines need to be followed to allow connectivity | The following guidelines need to be followed to allow connectivity | ||
Line 63: | Line 86: | ||
There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10) | There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10) | ||
− | ===== Example1: ===== | + | ====== Example1: ====== |
The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent | The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent | ||
Line 129: | Line 152: | ||
type_drivers = vlan,flat | type_drivers = vlan,flat | ||
tenant_network_types = vlan | tenant_network_types = vlan | ||
− | mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband | + | mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband |
− | |||
[ml2_type_vlan] | [ml2_type_vlan] | ||
network_vlan_ranges = default:1:10 | network_vlan_ranges = default:1:10 | ||
− | [sdn] | + | [sdn] |
− | bind_normal_ports = true | + | bind_normal_ports = true |
− | bind_normal_ports_physnets = ibphysnet | + | bind_normal_ports_physnets = ibphysnet |
+ | |||
+ | Note: In case the deployment consists of an Ethernet fabric as well, add the relevant mechanism driver e.g | ||
+ | mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch | ||
3. Start (or restart) the Neutron server: | 3. Start (or restart) the Neutron server: | ||
Line 143: | Line 168: | ||
Enabling PciPassthroughFilter modify /etc/nova/nova.conf | Enabling PciPassthroughFilter modify /etc/nova/nova.conf | ||
scheduler_available_filters = nova.scheduler.filters.all_filters | scheduler_available_filters = nova.scheduler.filters.all_filters | ||
− | scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter | + | scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter, NUMATopologyFilter |
== Network Node== | == Network Node== | ||
# Install python-networking-mlnx package and configure DHCP and L3 agents | # Install python-networking-mlnx package and configure DHCP and L3 agents | ||
− | |||
===DHCP Agent=== | ===DHCP Agent=== | ||
Line 153: | Line 177: | ||
1. Modify '''/etc/neutron/dhcp_agent.ini''' as follows: | 1. Modify '''/etc/neutron/dhcp_agent.ini''' as follows: | ||
dhcp_broadcast_reply = True | dhcp_broadcast_reply = True | ||
− | |||
interface_driver = multi | interface_driver = multi | ||
multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib | multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib | ||
ipoib_physical_interface = ib2 | ipoib_physical_interface = ib2 | ||
− | Note: ''multi_interface_driver_mappings'' contains the mapping between physnet and the desired interface driver to be used for that physnet. | + | '''Note''': ''multi_interface_driver_mappings'' contains the mapping between physnet and the desired interface driver to be used for that physnet. |
in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''. | in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''. | ||
− | 2. Restart DHCP | + | 2. Restart DHCP agent: |
# systemctl restart neutron-dhcp-agent.service | # systemctl restart neutron-dhcp-agent.service | ||
Line 174: | Line 197: | ||
in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''. | in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''. | ||
− | 2. Restart | + | 2. Restart L3 agent: |
# systemctl restart neutron-l3-agent.service | # systemctl restart neutron-l3-agent.service | ||
Line 182: | Line 205: | ||
# Install python-networking-mlnx package | # Install python-networking-mlnx package | ||
+ | <br /> | ||
+ | For ConnectX®-3/ConnectX®-3Pro perform the following steps: | ||
− | + | 1. Create the file '''/etc/modprobe.d/mlx4_ib.conf''' and add the following: | |
options mlx4_ib sm_guid_assign=0 | options mlx4_ib sm_guid_assign=0 | ||
− | + | 2. Restart the driver: | |
− | |||
# /etc/init.d/openibd restart | # /etc/init.d/openibd restart | ||
Line 206: | Line 230: | ||
1. Run: | 1. Run: | ||
# systemctl enable neutron-mlnx-agent.service | # systemctl enable neutron-mlnx-agent.service | ||
− | |||
2. Run: | 2. Run: | ||
# systemctl daemon-reload | # systemctl daemon-reload | ||
− | 3. In the file '''/etc/neutron/plugins/ | + | 3. In the file '''/etc/neutron/plugins/ml2/ml2_conf.ini''', the parameters ''tenant_network_type'' and ''network_vlan_ranges'' should be configured as in the controller node. |
− | physical_interface_mappings = | + | In addition, the following agent specific configuration need to be applied: |
+ | [eswitch] | ||
+ | physical_interface_mappings = ibphysnet:<ib_interface>(for example default:ib0) | ||
4. Modify the file '''/etc/neutron/plugins/ml2/eswitchd.conf''' as follows: | 4. Modify the file '''/etc/neutron/plugins/ml2/eswitchd.conf''' as follows: | ||
− | fabrics = | + | fabrics = ibphysnet:<ib_interface> (for example default:ib0) |
− | 5. Start | + | 5. Start eswitchd: |
# systemctl enable eswitchd.service | # systemctl enable eswitchd.service | ||
− | # systemctl start | + | # systemctl start eswitchd.service |
6. Start the Neutron agent: | 6. Start the Neutron agent: | ||
# systemctl restart neutron-mlnx-agent | # systemctl restart neutron-mlnx-agent | ||
+ | |||
+ | ==Limitations== | ||
+ | Currently, a deployment is limited to 127 segmented networks. Additional networks created will not be able to get DHCP and Routing services. | ||
+ | This is due to a kernel limitation where a PF net device cannot be a member of more than 128 PKEYs. It shall be dealt with in the future. | ||
==Known issues and Troubleshooting== | ==Known issues and Troubleshooting== |
Latest revision as of 18:14, 10 December 2019
Contents
- 1 Overview
- 2 InfiniBand Network
- 3 Deployment
Overview
In this section we will discuss configuration and deployment requirements to allow VM to VM connectivity over Infiniband fabric in an OpenStack cloud. Overview of Mellanox ML2 Mechanism Drivers can be found here.
The supported Network type is VLAN, segmentation is achieved by configuring Partition Keys (PKEY) per network. Its concept is similar to VLAN.
Prerequisites
- CentOS 7.6 / Ubuntu 18.04 or later
- Mellanox ConnectX® Family device:
ConnectX®-3/ConnectX®-3 PRO ConnectX®-4/ConnectX®-4Lx ConnectX®-5 ConnectX®-6
- Driver: Mellanox OFED 4.6-1.0.1.1 or greater
- A running OpenStack environment installed (RDO Manager or Packstack).
- SR-IOV enabled on all compute nodes.
- The software package iproute2 installed on all Compute nodes
- Mellanox UFM greater than 5.9.5 (if mlnx_sdn_assist is used for PKEY configurations)
InfiniBand Network
An Infiniband network relies on a software entity to manage the network. This entity is referred to as a Subnet Manager or SM. The subnet manager can run on a dedicated node or as part of the controller node, this document assumes the former. As mentioned earlier, segmentation is achieved through PKEY configuration done by the SM.
SM Node
OpenSM is a common implementation for an Infiniband SM.
OpenSM can configured in two ways:
OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver
SDN Mechanism Driver allows OpenSM dynamically assign PKs in the IB network. More details about applying SDN Mechanism Driver with NEO can be found here
Manual OpenSM Configuration
For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration out of band.
Install OpenSM that comes bundled with Mellanox OFED 4.6-1.0.1.1 or greater
Note: to create opensm configuration file run:
# opensm --create-config /etc/opensm/opensm.conf
to disable mlnx_sdn_assist sync with the SDN set the following configuration option in /etc/neutron/plugins/ml2/ml2_conf.ini
[sdn] enable_sync=false
For ConnectX®-3/ConnectX®-3Pro use the following configuration
Change the following in /etc/opensm/opensm.conf:
allow_both_pkeys TRUE
Partition membership configuration
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). The partitions.conf file contains configuration of partition membership for all network endpoints in the system.
It is required that Each PKEY to have all GUIDs as members.
There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)
Example1:
The following configurations supports 3 networks on VLANs 3,4,5.
management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3=0x3, ipoib, defmember=full : ALL; vlan4=0x4, ipoib, defmember=full : ALL; vlan5=0x5, ipoib, defmember=full : ALL; untagged=0xfff, ipoib, defmember=full : ALL; # Required for port cleanup
For ConnectX®-4 or newer use the following configuration
Change the following in /etc/opensm/opensm.conf:
virt_enabled 2
Partition membership configuration
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). The partitions.conf file contains configuration of partition membership for all network endpoints in the system.
The following guidelines need to be followed to allow connectivity
For each PKEY its required to:
- Have OpenSM PF(physical function IB device) GUID be a member of the partition.
- Have L3 and DHCP PF GUID be a member of the partition.
- Have VF GUID be member of the partition (of the relevant VMs).
There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)
Example1:
The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent
management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b; vlan3_vm2=0x3, indx0, ipoib, defmember=full : 0xfa163e0000df0519; vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan3_sm=0x3, ipoib, defmember=full : SELF;
Going over it line by line:
- make all member of the default (management) pkey
- VM1 port guid (VF) member of network with pkey 3
- VM2 port guid (VF) member of network with pkey 3
- L3/DHCP agent PF guid member of network with pkey3
- Subnet manager member of network with pkey3
Example 2:
The following configurations corresponds to a two vlan networks (vlan 3, vlan 5) with one VM on each network and a DHCP and L3 agent
management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b; vlan5_vm2=0x5, indx0, ipoib, defmember=full : 0xfa163e0000df0519; vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan5_l3_dhcp_services=0x5, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan3_sm=0x3, ipoib, defmember=full : SELF; vlan5_sm=0x5, ipoib, defmember=full : SELF;
Going over it line by line:
- make all member of the default (management) pkey
- VM1 port guid (VF) member of network with pkey 3
- VM2 port guid (VF) member of network with pkey 5
- L3/DHCP agent PF guid member of network with pkey3
- L3/DHCP agent PF guid member of network with pkey5
- Subnet manager member of network with pkey3
- Subnet manager member of network with pkey5
Restart the OpenSM:
After opensm.conf and partitions.conf have been updated, restart opensm service to load configurations
# systemctl restart opensmd.service
Deployment
The deployment assumes the presence of two physical networks:
- default - Ethernet network
- ibphysnet - Infiniband network
Controller Node
To configure the Controller node:
1. Install python-networking-mlnx package
Neutron Server
1. Make sure ML2 is the current Neutron plugin by checking the core_plugin parameter in /etc/neutron/neutron.conf:
core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
2. Modify /etc/neutron/plugins/ml2/ml2_conf.ini by adding the following:
[ml2] type_drivers = vlan,flat tenant_network_types = vlan mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband [ml2_type_vlan] network_vlan_ranges = default:1:10 [sdn] bind_normal_ports = true bind_normal_ports_physnets = ibphysnet
Note: In case the deployment consists of an Ethernet fabric as well, add the relevant mechanism driver e.g
mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch
3. Start (or restart) the Neutron server:
# systemctl restart neutron-server.service
Nova Scheduler
Enabling PciPassthroughFilter modify /etc/nova/nova.conf
scheduler_available_filters = nova.scheduler.filters.all_filters scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter, NUMATopologyFilter
Network Node
- Install python-networking-mlnx package and configure DHCP and L3 agents
DHCP Agent
1. Modify /etc/neutron/dhcp_agent.ini as follows:
dhcp_broadcast_reply = True interface_driver = multi multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib ipoib_physical_interface = ib2
Note: multi_interface_driver_mappings contains the mapping between physnet and the desired interface driver to be used for that physnet. in the example above, openvswitch is used for the default physnet and ipoib is used for ibphysnet.
2. Restart DHCP agent:
# systemctl restart neutron-dhcp-agent.service
L3 Agent
1. Modify /etc/neutron/l3_agent.ini as follows:
interface_driver = multi multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib ipoib_physical_interface = ib2
Note: multi_interface_driver_mappings contains the mapping between physnet and the desired interface driver to be used for that physnet.
in the example above, openvswitch is used for the default physnet and ipoib is used for ibphysnet.
2. Restart L3 agent:
# systemctl restart neutron-l3-agent.service
Compute Nodes
To configure the Compute Node:
- Install python-networking-mlnx package
For ConnectX®-3/ConnectX®-3Pro perform the following steps:
1. Create the file /etc/modprobe.d/mlx4_ib.conf and add the following:
options mlx4_ib sm_guid_assign=0
2. Restart the driver:
# /etc/init.d/openibd restart
Nova Compute
Nova-compute needs to know which PCI devices are allowed to be passed through to the VMs. Also for SRIOV PCI devices it needs to know to which physical network the VF belongs. This is done through the pci_passthrough_whitelist parameter under the default section in /etc/nova/nova.conf. For example if we want to whitelist and tag the VFs by their PCI address we would use the following setting: [pci] passthrough_whitelist = {"address":"*:0a:00.*","physical_network":"default"} This associates any VF with address that includes ':0a:00.' in its address to the physical network default.
1. add pci passthrough_whitelist to /etc/nova/nova.conf
2. Restart Nova:
# systemctl restart openstack-nova-compute
Neutron MLNX Agent
1. Run:
# systemctl enable neutron-mlnx-agent.service
2. Run:
# systemctl daemon-reload
3. In the file /etc/neutron/plugins/ml2/ml2_conf.ini, the parameters tenant_network_type and network_vlan_ranges should be configured as in the controller node. In addition, the following agent specific configuration need to be applied:
[eswitch] physical_interface_mappings = ibphysnet:<ib_interface>(for example default:ib0)
4. Modify the file /etc/neutron/plugins/ml2/eswitchd.conf as follows:
fabrics = ibphysnet:<ib_interface> (for example default:ib0)
5. Start eswitchd:
# systemctl enable eswitchd.service # systemctl start eswitchd.service
6. Start the Neutron agent:
# systemctl restart neutron-mlnx-agent
Limitations
Currently, a deployment is limited to 127 segmented networks. Additional networks created will not be able to get DHCP and Routing services. This is due to a kernel limitation where a PF net device cannot be a member of more than 128 PKEYs. It shall be dealt with in the future.
Known issues and Troubleshooting
For known issues and troubleshooting options refer to Mellanox OpenStack Troubleshooting
Issue: Missing zmq package on all nodes (Controller/Compute) Solution:
# wget https://bootstrap.pypa.io/get-pip.py # sudo python get-pip.py # sudo pip install pyzmq