Jump to: navigation, search

Difference between revisions of "Mellanox-Neutron-Train-InfiniBand"

(Compute Nodes)
(Example1:)
 
(25 intermediate revisions by the same user not shown)
Line 26: Line 26:
 
==SM Node==
 
==SM Node==
 
[https://linux.die.net/man/8/opensm OpenSM] is a common implementation for an Infiniband SM.
 
[https://linux.die.net/man/8/opensm OpenSM] is a common implementation for an Infiniband SM.
 +
 
OpenSM can configured in two ways:
 
OpenSM can configured in two ways:
 
=== OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver ===
 
=== OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver ===
Line 34: Line 35:
 
For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration
 
For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration
 
out of band.
 
out of band.
 +
 +
Install OpenSM that comes bundled with [http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers Mellanox OFED] 4.6-1.0.1.1 or greater
 +
 +
'''Note:''' to create opensm configuration file run:
 +
# opensm --create-config /etc/opensm/opensm.conf
  
 
to disable mlnx_sdn_assist sync with the SDN set the following configuration option in ''/etc/neutron/plugins/ml2/ml2_conf.ini''
 
to disable mlnx_sdn_assist sync with the SDN set the following configuration option in ''/etc/neutron/plugins/ml2/ml2_conf.ini''
Line 43: Line 49:
 
Change the following in /etc/opensm/opensm.conf:
 
Change the following in /etc/opensm/opensm.conf:
 
   allow_both_pkeys TRUE
 
   allow_both_pkeys TRUE
 +
 +
===== Partition membership configuration =====
 +
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key).
 +
The ''partitions.conf'' file contains configuration of partition membership for all network endpoints in the system.
 +
 +
It is required that Each PKEY to have all GUIDs  as members.
 +
 +
There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)
 +
 +
====== Example1: ======
 +
 +
The following configurations supports 3 networks on VLANs 3,4,5.
 +
 +
management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full;
 +
vlan3=0x3, ipoib, defmember=full : ALL;
 +
vlan4=0x4, ipoib, defmember=full : ALL;
 +
vlan5=0x5, ipoib, defmember=full : ALL;
 +
untagged=0xfff, ipoib, defmember=full : ALL; # Required for port cleanup
  
 
====For ConnectX®-4 or newer use the following configuration====
 
====For ConnectX®-4 or newer use the following configuration====
Line 48: Line 72:
 
Change the following in /etc/opensm/opensm.conf:
 
Change the following in /etc/opensm/opensm.conf:
 
  virt_enabled 2
 
  virt_enabled 2
allow_both_pkeys TRUE
 
  
==== Partition membership configuration ====
+
===== Partition membership configuration =====
 
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key).
 
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key).
This file contains configuration of partition membership for all network endpoints in the system.
+
The ''partitions.conf'' file contains configuration of partition membership for all network endpoints in the system.
  
 
The following guidelines need to be followed to allow connectivity
 
The following guidelines need to be followed to allow connectivity
Line 63: Line 86:
 
There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)
 
There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)
  
===== Example1: =====
+
====== Example1: ======
  
 
The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent
 
The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent
Line 129: Line 152:
 
  type_drivers = vlan,flat
 
  type_drivers = vlan,flat
 
  tenant_network_types = vlan
 
  tenant_network_types = vlan
  mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch
+
  mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband
# or mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,linuxbridge
 
 
  [ml2_type_vlan]
 
  [ml2_type_vlan]
 
  network_vlan_ranges = default:1:10
 
  network_vlan_ranges = default:1:10
[sdn]
+
[sdn]
bind_normal_ports = true
+
bind_normal_ports = true
bind_normal_ports_physnets = ibphysnet
+
bind_normal_ports_physnets = ibphysnet
 +
 
 +
Note: In case the deployment consists of an Ethernet fabric as well, add the relevant mechanism driver e.g
 +
  mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch
  
 
3. Start (or restart) the Neutron server:  
 
3. Start (or restart) the Neutron server:  
Line 143: Line 168:
 
Enabling PciPassthroughFilter modify /etc/nova/nova.conf
 
Enabling PciPassthroughFilter modify /etc/nova/nova.conf
 
   scheduler_available_filters = nova.scheduler.filters.all_filters
 
   scheduler_available_filters = nova.scheduler.filters.all_filters
   scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter
+
   scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter, NUMATopologyFilter
  
 
== Network Node==
 
== Network Node==
 
# Install python-networking-mlnx package and configure DHCP and L3 agents
 
# Install python-networking-mlnx package and configure DHCP and L3 agents
# Install up to date pyroute2 package from master branch (release TBD....)
 
  
 
===DHCP Agent===
 
===DHCP Agent===
Line 153: Line 177:
 
1. Modify '''/etc/neutron/dhcp_agent.ini''' as follows:
 
1. Modify '''/etc/neutron/dhcp_agent.ini''' as follows:
 
  dhcp_broadcast_reply = True
 
  dhcp_broadcast_reply = True
dnsmasq_local_resolv = True
 
 
  interface_driver = multi
 
  interface_driver = multi
 
  multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib
 
  multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib
Line 161: Line 184:
 
in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''.
 
in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''.
  
2. Restart DHCP server:  
+
2. Restart DHCP agent:  
 
  # systemctl restart neutron-dhcp-agent.service
 
  # systemctl restart neutron-dhcp-agent.service
  
Line 174: Line 197:
 
           in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''.
 
           in the example above, ''openvswitch'' is used for the ''default'' physnet and ''ipoib'' is used for ''ibphysnet''.
  
2. Restart DHCP server:  
+
2. Restart L3 agent:  
 
  # systemctl restart neutron-l3-agent.service
 
  # systemctl restart neutron-l3-agent.service
  
Line 182: Line 205:
  
 
# Install python-networking-mlnx package
 
# Install python-networking-mlnx package
# Create the file '''/etc/modprobe.d/mlx4_ib.conf''' and add the following:   
+
<br />
 +
For ConnectX®-3/ConnectX®-3Pro perform the following steps:
 +
 
 +
1. Create the file '''/etc/modprobe.d/mlx4_ib.conf''' and add the following:   
 
  options mlx4_ib sm_guid_assign=0
 
  options mlx4_ib sm_guid_assign=0
# Restart the driver:  
+
2. Restart the driver:  
 
  # /etc/init.d/openibd restart
 
  # /etc/init.d/openibd restart
  
Line 204: Line 230:
 
1. Run:  
 
1. Run:  
 
  # systemctl enable neutron-mlnx-agent.service
 
  # systemctl enable neutron-mlnx-agent.service
# systemctl start neutron-mlnx-agent.service
 
  
 
2. Run:  
 
2. Run:  
 
  # systemctl daemon-reload
 
  # systemctl daemon-reload
  
3. In the file '''/etc/neutron/plugins/mlnx/mlnx_conf.ini''', the parameters tenant_network_type , and network_vlan_ranges should be configured as the controllers:  
+
3. In the file '''/etc/neutron/plugins/ml2/ml2_conf.ini''', the parameters ''tenant_network_type'' and ''network_vlan_ranges'' should be configured as in the controller node.
  physical_interface_mappings = default:<ib_interface>(for example default:ib0)
+
In addition, the following agent specific configuration need to be applied:
 +
[eswitch]
 +
  physical_interface_mappings = ibphysnet:<ib_interface>(for example default:ib0)
  
 
4. Modify the file '''/etc/neutron/plugins/ml2/eswitchd.conf''' as follows:
 
4. Modify the file '''/etc/neutron/plugins/ml2/eswitchd.conf''' as follows:
  fabrics = default:<ib_interface> (for example default:ib0)
+
  fabrics = ibphysnet:<ib_interface> (for example default:ib0)
  
5. Start eSwitchd:  
+
5. Start eswitchd:  
 
  # systemctl enable eswitchd.service
 
  # systemctl enable eswitchd.service
  # systemctl start eswitchd.service
+
  # systemctl start eswitchd.service
  
 
6. Start the Neutron agent:  
 
6. Start the Neutron agent:  
 
  # systemctl restart neutron-mlnx-agent
 
  # systemctl restart neutron-mlnx-agent
 +
 +
==Limitations==
 +
Currently, a deployment is limited to 127 segmented networks. Additional networks created will not be able to get DHCP and Routing services.
 +
This is due to  a kernel limitation where a PF net device cannot be a member of more than 128 PKEYs. It shall be dealt with in the future.
  
 
==Known issues and Troubleshooting==
 
==Known issues and Troubleshooting==

Latest revision as of 18:14, 10 December 2019

Overview

In this section we will discuss configuration and deployment requirements to allow VM to VM connectivity over Infiniband fabric in an OpenStack cloud. Overview of Mellanox ML2 Mechanism Drivers can be found here.

The supported Network type is VLAN, segmentation is achieved by configuring Partition Keys (PKEY) per network. Its concept is similar to VLAN.

Prerequisites

  • CentOS 7.6 / Ubuntu 18.04 or later
  • Mellanox ConnectX® Family device:
   ConnectX®-3/ConnectX®-3 PRO 
   ConnectX®-4/ConnectX®-4Lx
   ConnectX®-5
   ConnectX®-6
  • Driver: Mellanox OFED 4.6-1.0.1.1 or greater
  • A running OpenStack environment installed (RDO Manager or Packstack).
  • SR-IOV enabled on all compute nodes.
  • The software package iproute2 installed on all Compute nodes
  • Mellanox UFM greater than 5.9.5 (if mlnx_sdn_assist is used for PKEY configurations)

InfiniBand Network

An Infiniband network relies on a software entity to manage the network. This entity is referred to as a Subnet Manager or SM. The subnet manager can run on a dedicated node or as part of the controller node, this document assumes the former. As mentioned earlier, segmentation is achieved through PKEY configuration done by the SM.

SM Node

OpenSM is a common implementation for an Infiniband SM.

OpenSM can configured in two ways:

OpenSM Provisioning with mlnx_sdn_assist Mechanism Driver

SDN Mechanism Driver allows OpenSM dynamically assign PKs in the IB network. More details about applying SDN Mechanism Driver with NEO can be found here

Manual OpenSM Configuration

For development and feature evaluation process it is possible to disable mlnx_sdn_assist sync with a network management endpoint and perform the configuration out of band.

Install OpenSM that comes bundled with Mellanox OFED 4.6-1.0.1.1 or greater

Note: to create opensm configuration file run:

# opensm --create-config /etc/opensm/opensm.conf

to disable mlnx_sdn_assist sync with the SDN set the following configuration option in /etc/neutron/plugins/ml2/ml2_conf.ini

[sdn]
enable_sync=false

For ConnectX®-3/ConnectX®-3Pro use the following configuration

Change the following in /etc/opensm/opensm.conf:

  allow_both_pkeys TRUE
Partition membership configuration

In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). The partitions.conf file contains configuration of partition membership for all network endpoints in the system.

It is required that Each PKEY to have all GUIDs as members.

There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)

Example1:

The following configurations supports 3 networks on VLANs 3,4,5.

management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full;
vlan3=0x3, ipoib, defmember=full : ALL;
vlan4=0x4, ipoib, defmember=full : ALL;
vlan5=0x5, ipoib, defmember=full : ALL;
untagged=0xfff, ipoib, defmember=full : ALL; # Required for port cleanup

For ConnectX®-4 or newer use the following configuration

Change the following in /etc/opensm/opensm.conf:

virt_enabled 2
Partition membership configuration

In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). The partitions.conf file contains configuration of partition membership for all network endpoints in the system.

The following guidelines need to be followed to allow connectivity

For each PKEY its required to:

  1. Have OpenSM PF(physical function IB device) GUID be a member of the partition.
  2. Have L3 and DHCP PF GUID be a member of the partition.
  3. Have VF GUID be member of the partition (of the relevant VMs).

There is a 1:1 mapping between the segmentation ID assigned to a network in openstack and the translation to PKEY (e.g segmentation ID 10 translates to PKEY 10)

Example1:

The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent

management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full;
vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b;
vlan3_vm2=0x3, indx0, ipoib, defmember=full : 0xfa163e0000df0519;
vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa;
vlan3_sm=0x3, ipoib, defmember=full : SELF;

Going over it line by line:

  1. make all member of the default (management) pkey
  2. VM1 port guid (VF) member of network with pkey 3
  3. VM2 port guid (VF) member of network with pkey 3
  4. L3/DHCP agent PF guid member of network with pkey3
  5. Subnet manager member of network with pkey3
Example 2:

The following configurations corresponds to a two vlan networks (vlan 3, vlan 5) with one VM on each network and a DHCP and L3 agent

management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full;
vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b;
vlan5_vm2=0x5, indx0, ipoib, defmember=full : 0xfa163e0000df0519;
vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa;
vlan5_l3_dhcp_services=0x5, ipoib, defmember=full : 0xe41d2d030061f5fa;
vlan3_sm=0x3, ipoib, defmember=full : SELF;
vlan5_sm=0x5, ipoib, defmember=full : SELF;

Going over it line by line:

  1. make all member of the default (management) pkey
  2. VM1 port guid (VF) member of network with pkey 3
  3. VM2 port guid (VF) member of network with pkey 5
  4. L3/DHCP agent PF guid member of network with pkey3
  5. L3/DHCP agent PF guid member of network with pkey5
  6. Subnet manager member of network with pkey3
  7. Subnet manager member of network with pkey5

Restart the OpenSM:

After opensm.conf and partitions.conf have been updated, restart opensm service to load configurations

# systemctl restart opensmd.service

Deployment

The deployment assumes the presence of two physical networks:

  1. default - Ethernet network
  2. ibphysnet - Infiniband network

Controller Node

To configure the Controller node:

1. Install python-networking-mlnx package

Neutron Server

1. Make sure ML2 is the current Neutron plugin by checking the core_plugin parameter in /etc/neutron/neutron.conf:

core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin

2. Modify /etc/neutron/plugins/ml2/ml2_conf.ini by adding the following:

[ml2]
type_drivers = vlan,flat
tenant_network_types = vlan
mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband
[ml2_type_vlan]
network_vlan_ranges = default:1:10
[sdn]
bind_normal_ports = true
bind_normal_ports_physnets = ibphysnet

Note: In case the deployment consists of an Ethernet fabric as well, add the relevant mechanism driver e.g

 mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch

3. Start (or restart) the Neutron server:

# systemctl restart neutron-server.service

Nova Scheduler

Enabling PciPassthroughFilter modify /etc/nova/nova.conf

 scheduler_available_filters = nova.scheduler.filters.all_filters
 scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter, NUMATopologyFilter

Network Node

  1. Install python-networking-mlnx package and configure DHCP and L3 agents

DHCP Agent

1. Modify /etc/neutron/dhcp_agent.ini as follows:

dhcp_broadcast_reply = True
interface_driver = multi
multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib
ipoib_physical_interface = ib2

Note: multi_interface_driver_mappings contains the mapping between physnet and the desired interface driver to be used for that physnet. in the example above, openvswitch is used for the default physnet and ipoib is used for ibphysnet.

2. Restart DHCP agent:

# systemctl restart neutron-dhcp-agent.service

L3 Agent

1. Modify /etc/neutron/l3_agent.ini as follows:

interface_driver = multi
multi_interface_driver_mappings = default:openvswitch,ibphysnet:ipoib
ipoib_physical_interface = ib2

Note: multi_interface_driver_mappings contains the mapping between physnet and the desired interface driver to be used for that physnet.

         in the example above, openvswitch is used for the default physnet and ipoib is used for ibphysnet.

2. Restart L3 agent:

# systemctl restart neutron-l3-agent.service

Compute Nodes

To configure the Compute Node:

  1. Install python-networking-mlnx package


For ConnectX®-3/ConnectX®-3Pro perform the following steps:

1. Create the file /etc/modprobe.d/mlx4_ib.conf and add the following:

options mlx4_ib sm_guid_assign=0

2. Restart the driver:

# /etc/init.d/openibd restart

Nova Compute

Nova-compute needs to know which PCI devices are allowed to be passed through to the VMs. Also for SRIOV PCI devices it needs to know to which physical network the VF belongs. This is done through the pci_passthrough_whitelist parameter under the default section in /etc/nova/nova.conf. For example if we want to whitelist and tag the VFs by their PCI address we would use the following setting: [pci] passthrough_whitelist = {"address":"*:0a:00.*","physical_network":"default"} This associates any VF with address that includes ':0a:00.' in its address to the physical network default.

1. add pci passthrough_whitelist to /etc/nova/nova.conf

2. Restart Nova:

# systemctl restart openstack-nova-compute

Neutron MLNX Agent

1. Run:

# systemctl enable neutron-mlnx-agent.service

2. Run:

# systemctl daemon-reload

3. In the file /etc/neutron/plugins/ml2/ml2_conf.ini, the parameters tenant_network_type and network_vlan_ranges should be configured as in the controller node. In addition, the following agent specific configuration need to be applied:

[eswitch]
physical_interface_mappings = ibphysnet:<ib_interface>(for example default:ib0)

4. Modify the file /etc/neutron/plugins/ml2/eswitchd.conf as follows:

fabrics = ibphysnet:<ib_interface> (for example default:ib0)

5. Start eswitchd:

# systemctl enable eswitchd.service
# systemctl start eswitchd.service

6. Start the Neutron agent:

# systemctl restart neutron-mlnx-agent

Limitations

Currently, a deployment is limited to 127 segmented networks. Additional networks created will not be able to get DHCP and Routing services. This is due to a kernel limitation where a PF net device cannot be a member of more than 128 PKEYs. It shall be dealt with in the future.

Known issues and Troubleshooting

For known issues and troubleshooting options refer to Mellanox OpenStack Troubleshooting

Issue: Missing zmq package on all nodes (Controller/Compute) Solution:

 # wget https://bootstrap.pypa.io/get-pip.py
 # sudo python get-pip.py
 # sudo pip install pyzmq