没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
iRDMA 流量控制总结
1.0 内容简介
This will introduce Ethernet flow control on Intel
®
Ethernet 800 Series Network Adapters
with RDMA driver - iRDMA, with a focus on best practices for Linux RDMA traffic. 我们将
介绍英特尔®以太网 800 系列网络适配器上的以太网流量控制与 RDMA 驱动程序 - iRDMA,重点
介绍 Linux RDMA 流量控制的实例。
It includes:包括如下基本内容
• Background on Ethernet flow control (FC) and Data Center Bridging (DCB). 以太网流
量控制(Ethernet flow control FC)和数据中心桥接(Data Center Bridging DCB)的背景
• Differences between Link-level Flow Control (LFC) and Priority Flow Control (PFC). 链
路级流量控制(Link-level Flow Control LFC)和带优先级的流量控制(Priority Flow
Control PFC)的区别
• Configuration steps for each type on 800 Series Linux hosts. 在 Linux 主机上对 800 系
列的每种类型进行配置的步骤
• Verification tips. 验证流量控制
1.1 QoS/Flow Control Limitations on the 800 Series 800 系列上的
QoS/流量控制限制
• Although the 800 Series hardware supports eight Traffic Classes (TCs), the
maximum supported configuration is four TCs per port. Only one TC can have
Priority Flow Control enabled per port. 虽然 800 系列硬件支持八个流量类别 (Traffic
Classes TC),但每个端口最多支持四个 TC。每个端口只能有一个 TC 启用带优先级的流量控
制。
Number of Adapter Ports
Traffic Class Recommendation
RDMA
1, 2, or 4
Up to four TCs, with one of them enabled with PFC.
Supported
More than 4
No DCB Support
Not Supported
• In RoCEv2 mode, if no flow control is detected (either LFC or PFC), the driver
automatically de-tunes. This is an intentional design to allow RoCEv2 to operate
without flow control, but with lower performance. 在 RoCEv2 模式下,如果检测不到流
量控制(LFC 或 PFC)规则,驱动程序会自动取消流量控制。这是一种有意的设计,允许
RoCEv2 在没有流量控制的情况下运行,但性能较低。
• When the 800 Series is in firmware Link Layer Discovery Protocol (LLDP) mode, only
three application priorities are supported. Software LLDP supports 32. This refers to
the LLDP APP TLV - see man lldptool-app for more info. 当 800 系列处于
(Firmware)FW 链路层发现协议 (LLDP) 状态时,只支持三种应用优先级。而软件 LLDP 支持 32
个。这指的是 LLDP APP TLV - 更多信息请参阅 man lldptool-app。
2.0 Background 背景介绍
2.1 Ethernet Flow Control 以太网流量控制
By design, Ethernet is an unreliable protocol with no guarantee that packets arrive at
their destination correctly and in order. Instead, Ethernet relies on upper-layer
protocols (such as TCP) or applications to provide reliable service and error correction.
根据设计,以太网是一种不可靠的协议,不能保证数据包正确无误地到达目的地。相反,以太网依
赖于上层协议(如 TCP)或应用程序来提供可靠的服务和纠错。
The 802.3x standard introduced flow control to the Ethernet protocol, defining a
mechanism for throttling the flow of data between two directly connected full-duplex
network devices. If the sender transmits data faster than the receiver can accept it,
the overwhelmed receiver can send a pause signal (Xoff or transmit off) to the sender,
requesting that the sender stop transmitting data for a specified period of time. The
sender resumes transmission either after the timeout period expires or if the receiver
indicates that it is ready to accept more data by sending an Xon (transmit on) signal.
802.3x 标准为以太网协议引入了流量控制,定义了一种机制,用于控制两个直接连接的全双工网
络设备之间的数据流。如果发送方传输数据的速度超过了接收方的接受能力,不堪重负的接收方可
以向发送方发送暂停信号(Xoff 或发送关闭),要求发送方在指定时间内停止传输数据。超时时间
结束后,或者接收方通过发送 Xon(发送开启)信号表示可以接受更多数据时,发送方才会恢复传
输。
Without flow control, data might be lost or need to be re-transmitted by a ULP or
application, which can significantly affect performance. 如果没有流量控制,ULP 或应用程
序可能会丢失数据或需要重新传输数据,从而严重影响性能。
2.2 Flow Control in RDMA Networks RDMA 网络中的流量控制
The 800 Series supports both iWARP and RoCEv2 RDMA transports. Flow control is
strongly recommended for RoCEv2, but iWARP also benefits. 800 系列同时支持 iWARP
和 RoCEv2 RDMA 传输。RoCEv2 强烈建议使用流量控制,但 iWARP 也同样适用。
Base Transport
Flow Control Requirements
iWARP TCP
•
iWARP runs over TCP, a reliable protocol that implements its own flow control.
•
TCP's flow control might be relatively slow to respond in a high-performance, low-latency RDMA
environment, especially under bursty traffic patterns.
•
Ethernet flow control is optional, but can be beneficial for iWARP.
•
iWARP mode requires VLAN to be configured fully to enable PFC.
RoCEv2 UDP
•
RoCEv2 runs over UDP, an unreliable protocol with no built-in flow control.
•
RoCEv2 therefore requires a lossless Ethernet network to ensure packet delivery.
•
If the irdma driver is in RoCEv2 mode and detects no flow control, it automatically de-tunes, causing
lower performance.
•
Flow control is always recommended for RoCEv2.
2.3 Types of Flow Control: LFC vs. PFC 流量控制类型: LFC 与 PFC
Ethernet standards define two types of flow control: 以太网标准定义了两种类型的流量控
制:
• Link-level Flow Control (LFC) 链路级流量控制(LFC)
• Priority Flow Control (PFC) 带优先级的流量控制 (PFC)
Both types use Xon/Xoff pause frames to control data transmission. The primary
difference is that LFC pauses all traffic on a link, but PFC supports Quality-of-Service
(QoS) by defining different traffic priorities that can be individually paused. PFC
therefore offers greater flexibility when running multiple traffic streams 这两种类型都使
用 Xon/Xoff 暂停帧来控制数据传输。两者的主要区别在于,LFC 暂停链路上的所有流量,而 PFC
则通过定义可单独暂停的不同流量优先级来支持服务质量(QoS)。因此,PFC 在运行多个数据流
时具有更大的灵活性
NOTE 提示
Despite LFC being called link-level flow control, both LFC and PFC operate at the data
link level (OSI Layer 2). 尽管 LFC 被称为链路层流量控制,但 LFC 和 PFC 都在数据链路层
(OSI 第 2 层)运行。
Table 1. LFC vs. PFC Comparison
LFC
PFC
Standard
IEEE 802.3x (1997)
IEEE 802.1Qbb (2011)
Pause Type
Global pause - pauses the entire link, affecting all
traffic on that link.
If a link carries multiple traffic streams, a high-flow
stream can cause the link to pause, thereby
blocking ALL streams.
Priority pause - defines eight priorities that can be
individually paused.
High-bandwidth applications can be paused while
allowing low-bandwidth applications to continue
running.
Traffic Shaping
None.
Supports traffic classes, priorities, bandwidth
allocation, and other QoS features.
Ease of Setup
Straightforward. Turn on Tx/Rx flow control on both
the adapter and switch.
More complicated. Priorities, traffic classes,
bandwidth allocations, and willing/non-willing mode
must be configured on the adapter, switch, or both.
PFC and LFC are mutually exclusive. Only one type at a time can be enabled on a
device. PFC 和 LFC 相互排斥。一台设备一次只能启用一种类型。
• PFC is generally recommended. It has greater flexibility to handle multiple traffic
streams and enhanced QoS capabilities. 一般推荐使用 PFC。它在处理多个数据流和增强
QoS 功能方面具有更大的灵活性。
• LFC can be used in situations where there are no differentiated classes of traffic. It
is usually used for testing purposes for RDMA. LFC 可用于没有区分流量类别的情况。它
通常用于 RDMA 的测试。
3.0 Link-Level Flow Control 链路级流量控制
3.1 LFC Setup Instructions LFC 设置说明
Configuring LFC on an 800 Series network is relatively straightforward; enable flow
control in both directions (Tx and Rx) on both sides of the link. 在 800 系列网络上配置
LFC 相对简单,只需在链路两侧的两个方向(Tx 和 Rx)上启用流量控制即可。
• If your hosts are connected through a switch, you must also enable flow control
on the switch ports. 如果主机通过交换机连接,还必须在交换机端口上启用流量控制。
• If your hosts are connected back-to-back, enable LFC on both adapters. 如果主机背
靠背直接连接,请在两个网络适配器上都启用 LFC。
The examples below use eth0 as the 800 Series net device name (use ip a to find the
device name on your system). 以下示例使用 eth0 作为 800 系列网络设备名称(使用 ip a
查找系统中的网络设备名称)。
Switch settings vary by manufacturer. In your switch manual, look for syntax
containing words like: flowcontrol, flow-control, tx-pause, or rx-pause. 交换机设置因制
造商而异。在交换机手册中,请查找包含以下词语的命令:flowcontrol、flow-control、tx-
pause 或 rx-pause。
To enable LFC on your network: 要在网络上启用 LFC:
1. Disable firmware-based DCB on the adapter. 禁用网络适配器上基于 Firmware 的 DCB。
# ethtool --set-priv-flags <interface> fw-lldp-agent off
2. Verify that firmware DCB is disabled 验证 Firmware 上的 DCB 已禁用.
# ethtool --show-priv-flags <interface> | grep fw-lldp-agent fw-lldp-
agent : off
3. Ensure thatlldpad is not running. 确保 lldpad 没有运行
# ps -ef | grep lldpad
4. Disable PFC on your switch, if applicable (show PFC status per port). 禁用交换机上
的 PFC(如果有 PFC 的话)(显示每个端口的 PFC 状态)。
switch>show priority-flow-control
For example, disable PFC on a given port (like port 31/1) on an Arista 7060CX 例
如,在 Arista 7060CX 的指定端口(如端口 31/1)上禁用 PFC:
switch>enable
switch#configure
switch(config)#interface Ethernet 31/1 switch(config-if-
Et31/1)#no priority-flow-control
NOTE 提示
Some switches allow for a range of ports to be specified, for example 1/1-32/1. 有
些交换机允许指定端口范围,如 1/1-32/1。
5. Enable link-level flow control on the adapter. 在适配器上启用链路级流量控制。
# ethtool -A eth0 rx on tx on
6. Check LFC settings on the adapter. 检查网络适配器上的 LFC 设置
# ethtool -a eth0 Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on
RX negotiated: on
TX negotiated: on
7. Enable flow control on the switch ports. 在交换机端口上启用流量控制
For example, enable Rx and Tx flow control on switch port 21 on an Arista
7060CX: 例如,在 Arista 7060CX 的交换机端口 21 上启用 Rx 和 Tx 流量控制:
switch>enable
switch#configure
switch(config)#interface Ethernet 21/1 switch(config-if-
Et31/1)#flowcontrol receive on switch(config-if-Et31/1)#flowcontrol send
on
8. Check LFC settings on the switch. 检查交换机上的 LFC 设置
For example, show flow control settings on ports 21-22 on an Arista 7060CX: 例如,
显示 Arista 7060CX 上端口 21-22 的流量控制设置:
Switch(config-if-Et31/1)#show interfaces ethernet 21/1-22/1 flowcontrol
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
---------- -------- -------- -------- -------- ------------- -------------
Et21/1 on on off off 170373384 0
Et22/1 on on off off 289143370 0
3.2 Symmetric vs. Asymmetric LFC 对称与非对称 LFC
LFC operates in both the Tx (send) and Rx (receive) directions. LFC 可在 Tx(发送)和 Rx
(接收)两个方向上运行。
• Tx flow control means that the port generates and sends Ethernet pause frames as
needed. Tx 流量控制是指端口根据需要生成并发送以太网暂停帧
• Rx flow control means that the port accepts and responds to Ethernet pause frames
received from the connected peer. Rx 流量控制表示端口接受并响应从所连接对等设备接收
的以太网暂停帧。
When using LFC on the 800 Series, Intel recommends enabling both Tx and Rx flow
control on both sides of the link. Also, configuring asymmetric settings (different Tx or
Rx settings on each side) might have non-intuitive results. 在 800 系列上使用 LFC 时,英
特尔建议在链路两侧同时启用 Tx 和 Rx 流量控制。此外,配置非对称设置(每侧不同的 Tx 或 Rx
设置)可能会产生非直观的结果。
For expected behavior, see the pause resolution table of IEEE 802.3bz shown in Figure
1. 有关预期行为,请参见图 1 所示的 IEEE 802.3bz 暂停结果表。
From the IEEE Standard:
“The PAUSE bit indicates that the device is capable of providing the
symmetric PAUSE functions as defined in Annex 31B. The ASM_DIR bit
indicates that asymmetric PAUSE operation is supported. The value of the
PAUSE bit when the ASM_DIR bit is set indicates the direction PAUSE
frames are supported for flow across the link. ”
剩余44页未读,继续阅读
资源评论
mounter625
- 粉丝: 1049
- 资源: 85
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- # 微信小程序-健康菜谱 基于微信小程序的一个查找检索菜谱的应用 ### 效果 !动态图(./res/gif/demo
- zabbix-get命令包资源
- 毕业设计,基于PyQt5实现的可视化界面的Python车牌自动识别系统源码
- 26-朴素贝叶斯分类.rar
- 没有安Matlab 也可以 生成FIR抽头系数工具.py
- python烟花代码.rar
- 实验目的: 1.构建基于verilog语言的组合逻辑电路和时序逻辑电路; 2.掌握verilog语言的电路设计技巧 3.完成如
- 扩展卡尔曼滤波matlab仿真
- 3_base.apk.1
- 躺赢者PRO飞控常见典型问题合集(续一)无名小哥 余义 20240501待修
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功