Mellanox-Neutron-Train-InfiniBand

=Overview= In this section we will discuss configuration and deployment requirements to allow VM to VM connectivity over Infiniband fabric in an OpenStack cloud. Overview of Mellanox ML2 Mechanism Drivers can be found here.

The supported Network type is VLAN, segmentation is achieved by configuring Partition Keys (PKEY) per network. Its concept is similar to VLAN.

Prerequisites
ConnectX®-3/ConnectX®-3 PRO ConnectX®-4/ConnectX®-4Lx ConnectX®-5 ConnectX®-6
 * CentOS 7.6 / Ubuntu 18.04 or later
 * Mellanox ConnectX® Family device:
 * Driver: Mellanox OFED 4.6-1.0.1.1 or greater
 * A running OpenStack environment installed (RDO Manager or Packstack).
 * SR-IOV enabled on all compute nodes.
 * The software package iproute2 installed on all Compute nodes
 * Mellanox UFM greater than 5.9.5 (if mlnx_sdn_assist is used for PKEY configurations)

=InfiniBand Network= An Infiniband network relies on a software entity to manage the network. This entity is referred to as a Subnet Manager or SM. The subnet manager can run on a dedicated node or as part of the controller node, this document assumes the former. As mentioned earlier, segmentation is achieved through PKEY configuration done by the SM.

SM Node
OpenSM is a common implementation for an Infiniband SM.

OpenSM can configured in two ways:

OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver
SDN Mechanism Driver allows OpenSM dynamically assign PKs in the IB network. More details about applying SDN Mechanism Driver with NEO can be found here

Manual OpenSM Configuration
For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration out of band.

Install OpenSM that comes bundled with Mellanox OFED 4.6-1.0.1.1 or greater

Note: to create opensm configuration file run:
 * 1) opensm --create-config /etc/opensm/opensm.conf

to disable mlnx_sdn_assist sync with the SDN set the following configuration option in /etc/neutron/plugins/ml2/ml2_conf.ini [sdn] enable_sync=false

For ConnectX®-3/ConnectX®-3Pro use the following configuration
Change the following in /etc/opensm/opensm.conf: allow_both_pkeys TRUE

Partition membership configuration
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). The partitions.conf file contains configuration of partition membership for all network endpoints in the system.

It is required that Each PKEY to have all GUIDs as members.

There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)

Example1:
The following configurations supports 3 networks on VLANs 3,4,5.

management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3=0x3, ipoib, defmember=full : ALL; vlan4=0x4, ipoib, defmember=full : ALL; vlan5=0x5, ipoib, defmember=full : ALL; untagged=0xfff, ipoib, defmember=full : ALL; # Required for port cleanup

For ConnectX®-4 or newer use the following configuration
Change the following in /etc/opensm/opensm.conf: virt_enabled 2

Partition membership configuration
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). The partitions.conf file contains configuration of partition membership for all network endpoints in the system.

The following guidelines need to be followed to allow connectivity

For each PKEY its required to:
 * 1) Have OpenSM PF(physical function IB device) GUID be a member of the partition.
 * 2) Have L3 and DHCP PF GUID be a member of the partition.
 * 3) Have VF GUID be member of the partition (of the relevant VMs).

There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)

Example1:
The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent

management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b; vlan3_vm2=0x3, indx0, ipoib, defmember=full : 0xfa163e0000df0519; vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan3_sm=0x3, ipoib, defmember=full : SELF;

Going over it line by line:


 * 1) make all member of the default (management) pkey
 * 2) VM1 port guid (VF) member of network with pkey 3
 * 3) VM2 port guid (VF) member of network with pkey 3
 * 4) L3/DHCP agent PF guid member of network with pkey3
 * 5) Subnet manager member of network with pkey3

Example 2:
The following configurations corresponds to a two vlan networks (vlan 3, vlan 5) with one VM on each network and a DHCP and L3 agent

management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b; vlan5_vm2=0x5, indx0, ipoib, defmember=full : 0xfa163e0000df0519; vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan5_l3_dhcp_services=0x5, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan3_sm=0x3, ipoib, defmember=full : SELF; vlan5_sm=0x5, ipoib, defmember=full : SELF;

Going over it line by line:


 * 1) make all member of the default (management) pkey
 * 2) VM1 port guid (VF) member of network with pkey 3
 * 3) VM2 port guid (VF) member of network with pkey 5
 * 4) L3/DHCP agent PF guid member of network with pkey3
 * 5) L3/DHCP agent PF guid member of network with pkey5
 * 6) Subnet manager member of network with pkey3
 * 7) Subnet manager member of network with pkey5

Restart the OpenSM:
After opensm.conf and partitions.conf have been updated, restart opensm service to load configurations


 * 1) systemctl restart opensmd.service

=Deployment=

The deployment assumes the presence of two physical networks:
 * 1) default - Ethernet network
 * 2) ibphysnet - Infiniband network

Controller Node
To configure the Controller node:

1. Install python-networking-mlnx package

Neutron Server
1. Make sure ML2 is the current Neutron plugin by checking the core_plugin parameter in /etc/neutron/neutron.conf: core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin

2. Modify /etc/neutron/plugins/ml2/ml2_conf.ini by adding the following: [ml2] type_drivers = vlan,flat tenant_network_types = vlan mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband [ml2_type_vlan] network_vlan_ranges = default:1:10 [sdn] bind_normal_ports = true bind_normal_ports_physnets = ibphysnet

Note: In case the deployment consists of an Ethernet fabric as well, add the relevant mechanism driver e.g mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch

3. Start (or restart) the Neutron server:
 * 1) systemctl restart neutron-server.service

Nova Scheduler
Enabling PciPassthroughFilter modify /etc/nova/nova.conf scheduler_available_filters = nova.scheduler.filters.all_filters scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter, NUMATopologyFilter

Network Node

 * 1) Install python-networking-mlnx package and configure DHCP and L3 agents

DHCP Agent
1. Modify /etc/neutron/dhcp_agent.ini as follows: dhcp_broadcast_reply = True interface_driver = multi multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib ipoib_physical_interface = ib2

Note: multi_interface_driver_mappings contains the mapping between physnet and the desired interface driver to be used for that physnet. in the example above, openvswitch is used for the default physnet and ipoib is used for ibphysnet.

2. Restart DHCP agent:
 * 1) systemctl restart neutron-dhcp-agent.service

L3 Agent
1. Modify /etc/neutron/l3_agent.ini as follows: interface_driver = multi multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib ipoib_physical_interface = ib2

Note: multi_interface_driver_mappings contains the mapping between physnet and the desired interface driver to be used for that physnet. in the example above, openvswitch is used for the default physnet and ipoib is used for ibphysnet.

2. Restart L3 agent:
 * 1) systemctl restart neutron-l3-agent.service

Compute Nodes
To configure the Compute Node:


 * 1) Install python-networking-mlnx package

For ConnectX®-3/ConnectX®-3Pro perform the following steps:

1. Create the file /etc/modprobe.d/mlx4_ib.conf and add the following: options mlx4_ib sm_guid_assign=0 2. Restart the driver:
 * 1) /etc/init.d/openibd restart

Nova Compute
Nova-compute needs to know which PCI devices are allowed to be passed through to the VMs. Also for SRIOV PCI devices it needs to know to which physical network the VF belongs. This is done through the pci_passthrough_whitelist parameter under the default section in /etc/nova/nova.conf. For example if we want to whitelist and tag the VFs by their PCI address we would use the following setting: [pci] passthrough_whitelist = {"address":"*:0a:00.*","physical_network":"default"} This associates any VF with address that includes ':0a:00.' in its address to the physical network default.

1. add pci passthrough_whitelist to /etc/nova/nova.conf

2. Restart Nova:
 * 1) systemctl restart openstack-nova-compute

Neutron MLNX Agent
1. Run:
 * 1) systemctl enable neutron-mlnx-agent.service

2. Run:
 * 1) systemctl daemon-reload

3. In the file /etc/neutron/plugins/ml2/ml2_conf.ini, the parameters tenant_network_type and network_vlan_ranges should be configured as in the controller node. In addition, the following agent specific configuration need to be applied: [eswitch] physical_interface_mappings = ibphysnet:(for example default:ib0)

4. Modify the file /etc/neutron/plugins/ml2/eswitchd.conf as follows: fabrics = ibphysnet: (for example default:ib0)

5. Start eswitchd:
 * 1) systemctl enable eswitchd.service
 * 2) systemctl start eswitchd.service

6. Start the Neutron agent:
 * 1) systemctl restart neutron-mlnx-agent

Limitations
Currently, a deployment is limited to 127 segmented networks. Additional networks created will not be able to get DHCP and Routing services. This is due to a kernel limitation where a PF net device cannot be a member of more than 128 PKEYs. It shall be dealt with in the future.

Known issues and Troubleshooting
For known issues and troubleshooting options refer to Mellanox OpenStack Troubleshooting

Issue: Missing zmq package on all nodes (Controller/Compute) Solution:
 * 1) wget https://bootstrap.pypa.io/get-pip.py
 * 2) sudo python get-pip.py
 * 3) sudo pip install pyzmq