Supplement to InfiniBand
TM
Architecture Specification
Volume 1 Release 1.2.1
Copyright © 2010 by InfiniBand
TM
Trade Association.
All rights reserved.
All trademarks and brands are the property of their respective owners.
This document contains information proprietary to the InfiniBand
TM
Trade Association. Use or disclosure without
written permission by an officer of the InfiniBand
TM
Trade Association is prohibited.
September 2, 2014
Annex A17:
RoCEv2
Table 0 Revision History
Revision Date
1.0 Sept. 2, 2014 General Release
InfiniBand
TM
Architecture Release 1.2.1 RoCEv2 (IP Routable RoCE) September 2, 2014
Volume 1 - General Specifications
InfiniBand
SM
Trade Association
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
LEGAL DISCLAIMER This specification provided “AS IS” and without any
warranty of any kind, including, without limitation,
any express or implied warranty of non-infringement,
merchantability or fitness for a particular purpose.
In no event shall IBTA or any member of IBT
A be liable
for any direct, indirect, special, exemplary, punitive,
or consequential damages, including, without limita-
tion, lost profits, even if advised of the possibility of
such damages.
InfiniBand
TM
Architecture RoCEv2 (IP Routable RoCE) September 2, 2014
V
OLUME 1 - GENERAL SPECIFICATIONS
InfiniBand
SM
Trade Association Page 1 Proprietary and Confidential
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
ANNEX A17: ROCEV2 (IP ROUTABLE ROCE)
A17.1 INTRODUCTION
This document is an annex to Volume 1 release 1.2.1 of the InfiniBand Ar-
chitecture, herein referred to as the base specification. This annex is Op-
tional Normative, meaning that implementation of the feature described by
this annex is Optional, but if present, the implementation must comply with
the compliance statements contained within this annex. This specification
follows the spirit of the RoCE Annex (Annex A16 to the base specification)
in defining a new InfiniBand protocol variant that uses an IP network layer
(with an IP header instead of InfiniBand‘s GRH) thus allowing IP routing
of its packets.
A17.2 OVERVIEW
A17.2.1 THE INFINIBAND ARCHITECTURE
The InfiniBand Architecture offers a rich set of I/O services based on an
RDMA access method and message passing semantics. Included are a
variety of transport services, reliable and unreliable, connected and un
-
connected, support for atomic operations, multicast and others.
InfiniBand defines a layered architecture that specifies the first four layers
of the OSI reference stack including the physical, link, network and trans
-
port layers as well as an accompanying management framework. In addi-
tion, the IB specification defines a software interface and its
accompanying verbs which are designed to allow smooth access to the
services provided by the InfiniBand Architecture.
A17.2.2 RDMA OVER CONVERGED ETHERNET (ROCE)
RDMA over Converged Ethernet (RoCE) is an InfiniBand Trade Associa-
tion Standard designed to provide InfiniBand Transport Services on
Ethernet Networks
4
. RoCE preserves the InfiniBand Verbs Semantics to-
gether with its Transport and Network Protocols and replaces the Infini-
Band Link and Physical Layers with those of Ethernet. The network
management infrastructure for RoCE is also that of Ethernet.
4. http://www.infinibandta.org/content/pages.php?pg=about_us_RoCE
InfiniBand
TM
Architecture RoCEv2 (IP Routable RoCE) September 2, 2014
V
OLUME 1 - GENERAL SPECIFICATIONS
InfiniBand
SM
Trade Association Page 2 Proprietary and Confidential
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Figure 1 InfiniBand and RoCE Protocol Stacks
A17.2.3 THE NEED FOR (IP) ROUTABLE RDMA
RoCE packets are regular Ethernet frames
5
that carry an Ethertype
value
6
allocated by IEEE which indicates that the next header is a RoCE
GRH.
Figure 2 RoCE Packet Format
Since RoCE traffic doesn't carry an IP header, it can't be routed across the
boundaries of Ethernet L2 Subnets using regular IP routers. Under this
scheme, RoCE provides RDMA services for communication within an
Ethernet L2 domain.
5. Including VLANs and all other Ethernet header variations as defined by IEEE
802
6. 0x8915
InfiniBand
TM
Architecture RoCEv2 (IP Routable RoCE) September 2, 2014
V
OLUME 1 - GENERAL SPECIFICATIONS
InfiniBand
SM
Trade Association Page 3 Proprietary and Confidential
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
A17.2.4 ROCEV2 (IP ROUTABLE ROCE)
RoCEv2 is a straightforward extension of the RoCE protocol that involves
a simple modification of the RoCE packet format. Instead of the GRH,
RoCEv2 packets carry an IP header which allows traversal of IP L3
Routers and a UDP header that serves as a stateless encapsulation layer
for the RDMA Transport Protocol Packets over IP.
Figure 3 RoCEv2andRoCEFrameFormatDifferences
RoCEv2 packets use a well-known UDP Destination Port (dport) value
that unambiguously distinguishes them in a stateless manner.
As an additional benefit, following common practices in UDP encapsu-
lated protocols, the UDP Source Port (sport) field of RoCEv2 packets
serves as an
opaque flow identifier that can be used by the networking in-
frastructure for packet forwarding op
timizations - see Section 17.9.4,
“ECMP for RoCEv2,” on page 21.
Since this approach exclusively affects the packet format on the wire, and
due
to the fact that with RDMA semantics packets are generated and con-
sumed below the API, applications can operate over any form of RDMA
service (includin
g RoCEv2) in a completely transparent way
7
(see Figure
4).
7. WidespreadRDMAAPIsareIPbasedforallexistingRDMAtechnologies