Fundamentals 8 min read

Understanding RoCE: RoCEv1, RoCEv2, and Soft‑RoCE in Data Center Networks

RoCE (RDMA over Converged Ethernet) enables lossless, high‑performance data transfer in data‑center networks by extending InfiniBand over Ethernet, with RoCEv1 operating at Layer 2, RoCEv2 adding UDP/IPv4/IPv6 routing, and Soft‑RoCE providing a software‑only solution for environments lacking RDMA‑capable hardware.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Understanding RoCE: RoCEv1, RoCEv2, and Soft‑RoCE in Data Center Networks

Ethernet remains dominant in the global Internet, but in high‑bandwidth, low‑latency private networks its limitations have led to the development of Data Center Bridging (DCB) and lossless links based on RDMA/InfiniBand, culminating in the RoCE (RDMA over Converged Ethernet) standard.

RoCEv1

Released in April 2010 by the IBTA as an addendum to the InfiniBand Architecture Specification (also called IBoE), RoCEv1 replaces the TCP/IP layer with the InfiniBand network layer, operates at the Ethernet link layer (Ethertype 0x8915), and does not support IP routing. The InfiniBand link‑layer header is removed and the GUID is mapped to a MAC address. RoCE relies on lossless Ethernet, requiring L2 QoS mechanisms such as Priority Flow Control (PFC); all endpoints, switches and routers must support PFC for the link to function correctly.

RoCEv1 frame structure diagram

For the full protocol specification, see InfiniBand™ Architecture Specification Release 1.2.1 Annex A16: RoCE.

RoCEv2

Because RoCEv1 frames lack an IP header and can only communicate within an L2 subnet, IBTA introduced RoCEv2 in 2014. RoCEv2 adds a Global Routing Header (GRH) replaced by a UDP header plus an IP header, enabling routing across L3 networks. The frame structure is shown below.

RoCEv2 frame structure

RoCEv2 packet example

RoCEv1 operates at Layer 2 with Ethertype 0x8915; normal frame size is 1500 bytes, Jumbo frames up to 9000 bytes.

RoCEv2 operates at Layer 3 over UDP/IPv4 or UDP/IPv6, using UDP port 4791; because it is routable, it is sometimes called “Routable RoCE” or “RRoCE”.

Soft‑RoCE

Since Linux kernel 4.9, a software implementation of RoCEv2 called Soft‑RoCE is available. Unlike hardware RoCE, Soft‑RoCE works on any Ethernet environment without requiring RDMA‑capable NICs, switches, or L2 QoS support. It consists of a user‑space library librxe that interfaces with the RDMA stack (libibverbs) and a kernel module rxe.ko that connects to the Linux network stack. A UDP tunnel on a regular Ethernet NIC creates a virtual RDMA device for transmitting RoCE data.

Soft‑RoCE communication diagram

In performance‑sensitive virtualized scenarios, Soft‑RoCE enables VMs to access RDMA functionality without exposing physical NICs, offering a low‑cost way to build efficient RDMA networks in data centers that lack specialized hardware.

Network Requirements

RoCE can operate in both lossless and lossy network environments. In a lossy environment it is referred to as Resilient RoCE; in a lossless environment it is called Lossless RoCE.

Resilient RoCE – operates over lossy networks without requiring PFC/ECN. Link

Lossless RoCE – requires PFC flow‑control to guarantee a lossless fabric. Link

Summary: Although RoCE imposes special dependencies on the link and physical layers, modern switches, NICs, and SoCs typically integrate DCB and RDMA support, making RoCE the optimal choice for new data‑center or SAN deployments. For legacy expansions or cost‑sensitive optimizations, RNIC iWRAP or the hardware‑independent Soft‑RoCE are more appropriate.

References

https://www.cnblogs.com/echo1937/p/7018266.html

http://hustcat.github.io/roce-protocol/

RoCE: An Ethernet‑InfiniBand Love Story

InfiniBand™ Architecture Specification Release 1.2.1 Annex A16: RoCE

InfiniBand™ Architecture Specification Release 1.2.1 Annex A17: RoCEv2

RoCEv2 CNP Packet Format Example

NetworkingRDMAdata centerEthernetRoCESoft-RoCE
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.