Mellanox Technologies
=====================
===============================================================================
Ethernet over IB (EoIB) for Linux README
January 2013
Document No. 3289
===============================================================================
Contents:
=========
1. Overview
1.1 General
1.2 EoIB Topology
1.2.1 External ports (eports) and GW
1.2.2 Virtual Hubs (vHubs)
1.2.3 Virtual NIC (vNic)
2. EoIB vNic Configuration
2.1 EoIB Host Administered vNic
2.1.1 Central Configuration File - mlx4_vnic.conf
2.1.2 vNic Specific Configuration Files - ifcfg-eth<x>
2.1.3 mlx4_vnic_confd
2.2 EoIB Network Administered vNic
2.3 VLAN Configuration
2.4 EoIB Multicast Configuration
2.5 EoIB and QoS
2.6 IP Configuration Based on DHCP
2.6.1 DHCP Server
2.7 Static EoIB Configuration
2.8 Sub Interfaces (VLAN)
3. EoIB Usage and Configuration
3.1 mlx4_vnic_info
3.2 ethtool
3.3 Link State
3.4 Offloads and Feature
3.5 Jumbo Frames
3.6 Discovery Partitions Configuration
3.7 ALL VLAN
4. Advanced EoIB settings:
4.1 Module Parameters
4.2 Bonding
4.3 vNic Interface Naming
4.4 Para-Virtualized vNic
4.5 EoIB Subnet Agent Query
1 Overview
==========
1.1 General
-----------
The Ethernet over IB (EoIB) mlx4_vnic module is a network interface
implementation over InfiniBand. EoIB encapsulates Layer 2 datagrams over an
InfiniBand Datagram (UD) transport service. The InfiniBand UD datagrams
encapsulates the entire Ethernet L2 datagram and its payload.
To perform this operation the module performs an address translation
from Ethernet layer 2 MAC addresses (48 bits long) to InfiniBand layer 2
addresses made of LID/GID and QPN. This translation is totally invisible
to the OS and user. Thus, differentiating EoIB
from IPoIB which exposes a 20 Bytes HW address to the OS.
The mlx4_vnic module is designed for Mellanox's ConnectX family of HCAs and
intended to be used with Mellanox's BridgeX gateway family. Having a BridgeX
gateway is a requirement for using EoIB. It performs the following operations:
* Enables the layer 2 address translation required by the mlx4_vnic module.
* Enables routing of packets from the InfiniBand fabric to a 1 or 10 GigE
Ethernet subnet.
1.2 EoIB Topology
-----------------
EoIB is designed to work over an InfiniBand fabric and requires the presence
of two entities:
* Subnet Manager (SM)
* BridgeX gateway
The required subnet manager configuration is similar to that of other
InfiniBand applications and ULPs and is not unique to EoIB.
The BridgeX gateway is at the heart of EoIB. On one side, usually referred to
as the "internal" side, it is connected to the InfiniBand fabric by one or more
links. On the other side, usually referred to as the "external" side, it is
connected to the Ethernet subnet by one or more ports. The Ethernet connections
on the BridgeX's external side are called external ports or eports. Every
BridgeX that is in use with EoIB needs to have one or more eports connected.
1.2.1 External Ports (eports) and GW
---------------------------------------
The combination of a specific BridgeX box and a specific eport is referred to
as a gateway (GW). The GW is an entity that is visible to the EoIB host driver
and is used in the configuration of the network interfaces on the host side.
For example, in host administered vNics the user will request to open an
interface on a specific GW identifying it by the BridgeX box and eport name.
Distinguishing between GWs is important because they determine the network
topology and affect the path that a packet traverses between hosts. A packet
that is sent from the host on a specific EoIB interface will be routed to the
Ethernet subnet through a specific external port connection on the BridgeX box.
1.2.2 Virtual Hubs (vHubs)
---------------------------------------
Virtual hubs connect zero or more EoIB interfaces (on internal hosts) and an
eport through a virtual hub. Each vHub has a unique virtual LAN (VLAN) ID.
Virtual hub participants can send packets to one another directly without the
assistance of the Ethernet subnet (external side) routing. This means that two
EoIB interfaces on the same vHub will communicate solely using the InfiniBand
fabric. EoIB interfaces residing on two different vHubs (whether on the same
GW or not) cannot communicate directly.
There are two types of vHubs:
- a default vHub (one per GW) without a VLAN ID
- vHubs with unique different VLAN IDs
Each vHub belongs to a specific GW (BridgeX + eport),
and each GW has one default vHub, and zero or more
VLAN-associated vHubs. A specific GW can have multiple vHubs distinguishable
by their unique VLAN ID. Traffic coming from the Ethernet side
on a specific eport will be routed to the relevant vHub group based on its
VLAN tag (or to the default vHub for that GW if no VLAN ID is present).
1.2.3 Virtual NIC (vNic)
--------------------------------------
A virtual NIC is a network interface instance on the host side which belongs
to a single vHub on a specific GW. The vNic behaves like
any regular hardware network interface.
The host can have multiple interfaces that belong to the same vHub.
2 EoIB vNic Configuration
=========================
The mlx4_vnic module supports two different modes of configuration:
- host administration where the vNic is configured on the host side
- network administration where the configuration is done by the BridgeX
and this configuration is passed to the host mlx4_vnic driver using the EoIB
protocol.
Both modes of operation require the presence of a BridgeX gateway
in order to work properly. The EoIB driver supports a mixture
of host and network administered vNics.
2.1 EoIB Host Administered vNic
-------------------------------
In the host administered mode, vNics are configured using static configuration
files located on the host side. These configuration files define the number of
vNics, and the vHub that each host administered vNic will belong to (i.e., the
vNic's BridgeX box, eport and VLAN id properties). The mlx4_vnic_confd service
is used to read these configuration files and pass the relevant data to the
mlx4_vnic module.
EoIB Host Administered vNic supports two forms of configuration files:
- A central configuration file (mlx4_vnic.conf)
- vNic-specific configuration files (ifcfg-eth<x>)
Both forms of configuration supply the same functionality. If both forms of
configuration files exist, the central configuration file has precedence and
only this file will be used.
2.1.1 Central Configuration File - /etc/infiniband/mlx4_vnic.conf
----------------------------------------------------------------------
The mlx4_vnic.conf file consists of lines, each describing one vNic. The
following file format is used:
name=eth44 mac=00:25:8B:27:14:78 ib_port=mlx4_0:1 vid=3 vnic_id=5 bx=00:00:00:00:00:00:04:B2 eport=A10
name=eth45 mac=00:25:8B:27:15:78 ib_port=mlx4_0:1 vnic_id=6 bx=00:00:00:00:00:00:05:B2 eport=A10
name=eth47 mac=00:25:8B:27:16:84 ib_port=mlx4_0:1 vid=2 vnic_id=7 bx=BX001 eport=A11
name=eth40 mac=02:AA:8B:27:17:93 ib_port=mlx4_0:2 vnic_id=8 bx=BX001 eport=A12
The fields used in the file have the following meaning:
name - The name of the interface that is displayed when running ifconfig.
mac - The mac address to assign to the vNic.
ib_port - The device name and port number in the form
[device name]:[port number]. The device name can be retrieved by
running ibv_devinfo and using the output of hca_id field. The port
number can have a value of 1 or 2.
vid - [Optional] If VLAN ID exists, the vNic will be assigned
the specified VLAN ID. This value must be between 0 and 4095.
- If the vid is set to 'all', the ALL-VLAN mode will be enabled and