Difference between revisions of "Mellanox-Neutron-Train-InfiniBand"
(→Overview) |
(→Overview) |
||
Line 1: | Line 1: | ||
=Overview= | =Overview= | ||
In this section we will discuss configuration and deployment requirements to allow VM to VM connectivity in an openstack cloud. | In this section we will discuss configuration and deployment requirements to allow VM to VM connectivity in an openstack cloud. | ||
− | Overview of Mellanox ML2 Mechanism Drivers can be found [Mellanox-Neutron-ML2-Train|here]. | + | Overview of Mellanox ML2 Mechanism Drivers can be found [https://wiki.openstack.org/w/index.php?title=Mellanox-Neutron-ML2-Train|here]. |
===Prerequisites=== | ===Prerequisites=== |
Revision as of 12:36, 5 September 2019
Contents
Overview
In this section we will discuss configuration and deployment requirements to allow VM to VM connectivity in an openstack cloud. Overview of Mellanox ML2 Mechanism Drivers can be found [1].
Prerequisites
- CentOS 7.6 / Ubuntu 18.04 or later
- Mellanox ConnectX® Family device:
ConnectX®-3/ConnectX®-3 PRO ConnectX®-4/ConnectX®-4Lx ConnectX®-5 ConnectX®-6
- Driver: Mellanox OFED 4.6-1.0.1.1 or greater
- A running OpenStack environment installed (RDO Manager or Packstack).
- SR-IOV enabled on all compute nodes.
- The software package iproute2 installed on all Compute nodes
- Mellanox UFM greater than 5.9.5
InfiniBand Network
An Infiniband network relies on a software entity to manage the network. This entity is referred to as a Subnet Manager or SM. Mellanox Neutron Plugin uses InfiniBand Partitions (PKeys) to separate Networks.
SM Node
OpenSM Provisioning with SDN Mechanism Driver
SDN Mechanism Driver allows OpenSM dynamically assign PKs in the IB network.
More details about applying SDN Mechanism Driver with NEO can be found here
Manual OpenSM Configuration
For ConnectX®-3/ConnectX®-3Pro use the following configuration
Change the following in /etc/opensm/opensm.conf:
allow_both_pkeys TRUE
For ConnectX®-4 or newer use the following configuration
Change the following in /etc/opensm/opensm.conf:
virt_enabled 2
Partition membership configuration
In an Infiniband network, network segmentation is achieved through partitions (roughly equivalent to ETH VLAN). Each partition has its own key (15bit partition key). This file contains configuration of partition membership for all network endpoints in the system.
The following guidelines need to be followed to allow connectivity
For each PKEY its required to:
- Have OpenSM PF(physical function IB device) GUID be a member of the partition.
- Have L3 and DHCP PF GUID be a member of the partition.
- Have VF GUID be member of the partition (of the relevant VMs).
There is a 1:1 mapping between VLAN assigned to a network in openstack and the translation of that VLAN to PKEY (e.g VLAN 10 translates to PKEY 10)
Example1:
The following configurations corresponds to a single vlan network (vlan 3) with two VMs and a DHCP agent
management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b; vlan3_vm2=0x3, indx0, ipoib, defmember=full : 0xfa163e0000df0519; vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan3_sm=0x3, ipoib, defmember=full : SELF;
Going over it line by line:
- make all member of the default (management) pkey
- VM1 port guid (VF) member of network with pkey 3
- VM2 port guid (VF) member of network with pkey 3
- L3/DHCP agent PF guid member of network with pkey3
- Subnet manager member of network with pkey3
Example 2:
The following configurations corresponds to a two vlan networks (vlan 3, vlan 5) with one VM on each network and a DHCP and L3 agent
management=0x7fff,ipoib, defmember=full : ALL, ALL_SWITCHES=full,SELF=full; vlan3_vm1=0x3, indx0, ipoib, defmember=full : 0xfa163e0000c0851b; vlan5_vm2=0x5, indx0, ipoib, defmember=full : 0xfa163e0000df0519; vlan3_l3_dhcp_services=0x3, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan5_l3_dhcp_services=0x5, ipoib, defmember=full : 0xe41d2d030061f5fa; vlan3_sm=0x3, ipoib, defmember=full : SELF; vlan5_sm=0x5, ipoib, defmember=full : SELF;
Going over it line by line:
- make all member of the default (management) pkey
- VM1 port guid (VF) member of network with pkey 3
- VM2 port guid (VF) member of network with pkey 5
- L3/DHCP agent PF guid member of network with pkey3
- L3/DHCP agent PF guid member of network with pkey5
- Subnet manager member of network with pkey3
- Subnet manager member of network with pkey5
Restart the OpenSM:
After opensm.conf and partitions.conf have been updated, restart opensm service to load configurations
# systemctl restart opensmd.service
Controller Node
To configure the Controller node:
1. Install python-networking-mlnx package
Neutron Server
1. Make sure ML2 is the current Neutron plugin by checking the core_plugin parameter in /etc/neutron/neutron.conf:
core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
2. Make sure /etc/neutron/plugin.ini is pointing to /etc/neutron/plugins/ml2/ml2_conf.ini (symbolic link)
3. Modify /etc/neutron/plugins/ml2/ml2_conf.ini by adding the following:
[ml2] type_drivers = vlan,flat tenant_network_types = vlan mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,openvswitch # or mechanism_drivers = mlnx_sdn_assist,mlnx_infiniband,linuxbridge [ml2_type_vlan] network_vlan_ranges = default:1:10
[sdn] bind_normal_ports = true bind_normal_ports_physnets = ibphysnet
4. Start (or restart) the Neutron server:
# systemctl restart neutron-server.service
Nova Scheduler
Enabling PciPassthroughFilter modify /etc/nova/nova.conf
scheduler_available_filters = nova.scheduler.filters.all_filters scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, PciPassthroughFilter
Network Node
- Install python-networking-mlnx package and configure DHCP and L3 agents
- Install up to date pyroute2 package from master branch (release TBD....)
DHCP Agent (usually part of the Network node)
1. Modify /etc/neutron/dhcp_agent.ini as follows and according to OVS or Linuxbridge:
dhcp_driver = networking_mlnx.dhcp.mlnx_dhcp.MlnxDnsmasq dhcp_broadcast_reply = True
1.1 For OVS interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver 1.2 For Linux Bridge interface_driver = neutron.agent.linux.interface.BridgeInterfaceDriver
2. Restart DHCP server:
# systemctl restart neutron-dhcp-agent.service
L3 Agent (usually part of the Network node)
Compute Nodes
To configure the Compute Node:
1. Install Mellanox RPMs:
# yum install --nogpgcheck -y python-networking-mlnx
2. Create the file /etc/modprobe.d/mlx4_ib.conf and add the following:
options mlx4_ib sm_guid_assign=0
3. Restart the driver:
# /etc/init.d/openibd restart
Nova Compute
Nova-compute needs to know which PCI devices are allowed to be passed through to the VMs. Also for SRIOV PCI devices it needs to know to which physical network the VF belongs. This is done through the pci_passthrough_whitelist parameter under the default section in /etc/nova/nova.conf. For example if we want to whitelist and tag the VFs by their PCI address we would use the following setting: [pci] passthrough_whitelist = {"address":"*:0a:00.*","physical_network":"default"} This associates any VF with address that includes ':0a:00.' in its address to the physical network default.
1. add pci passthrough_whitelist to /etc/nova/nova.conf
2. Restart Nova:
# systemctl restart openstack-nova-compute
Neutron MLNX Agent
1. Run:
# systemctl enable neutron-mlnx-agent.service # systemctl start neutron-mlnx-agent.service
2. Run:
# systemctl daemon-reload
3. In the file /etc/neutron/plugins/mlnx/mlnx_conf.ini, the parameters tenant_network_type , and network_vlan_ranges should be configured as the controllers:
physical_interface_mappings = default:<ib_interface>(for example default:ib0)
4. Modify the file /etc/neutron/plugins/ml2/eswitchd.conf as follows:
fabrics = default:<ib_interface> (for example default:ib0)
5. Start eSwitchd:
# systemctl enable eswitchd.service # systemctl start eswitchd.service
6. Start the Neutron agent:
# systemctl restart neutron-mlnx-agent
Known issues and Troubleshooting
For known issues and troubleshooting options refer to Mellanox OpenStack Troubleshooting
Issue: Missing zmq package on all nodes (Controller/Compute) Solution:
# wget https://bootstrap.pypa.io/get-pip.py # sudo python get-pip.py # sudo pip install pyzmq