Fundamentals 8 min read

Understanding RDMA: Principles, Advantages, and Implementation Details

This article explains how RDMA (Remote Direct Memory Access) technology, originating from InfiniBand and extended to Ethernet (RoCE) and TCP/IP (iWARP), provides ultra‑low latency, high throughput, and minimal CPU usage for high‑performance computing and big‑data applications by bypassing traditional OS and protocol stack processing.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Understanding RDMA: Principles, Advantages, and Implementation Details

High‑performance computing, big‑data analytics and bursty I/O applications demand lower latency and CPU usage than traditional TCP/IP stacks can provide.

RDMA (Remote Direct Memory Access) enables direct memory transfers between endpoints over the network, bypassing the OS and protocol stack, thus achieving microsecond‑level latency, high throughput, and minimal CPU overhead.

Originally part of InfiniBand, RDMA has been extended to Ethernet via RoCE and to TCP/IP via iWARP, with standards defined by RDMAC, IBTA and the Open Fabric Alliance (OFA).

InfiniBand achieves low latency through cut‑through switching, credit‑based flow control, hardware offload, and small buffers.

RoCE provides InfiniBand‑like performance on Ethernet, requiring DCB support, while iWARP leverages TCP/IP at higher hardware cost.

The RDMA software stack (e.g., OFED) offers Verbs APIs and UL‑P layers that allow existing applications to use RDMA without code changes.

RDMA communication uses Queue Pairs (QP) composed of Send and Receive Queues, Completion Queues, and Work Requests that are transformed into Work Queue Elements for asynchronous NIC processing.

Two operation modes exist: two‑sided SEND/RECEIVE requiring remote participation, and one‑sided READ/WRITE allowing direct remote memory access without remote software involvement.

Typical data transfer flows for both two‑sided and one‑sided operations are described, highlighting zero‑copy and kernel bypass benefits.

In summary, RDMA reduces latency from tens of microseconds to a few microseconds, consumes little CPU, and, combined with high‑bandwidth, loss‑free networks (InfiniBand or modern Ethernet), drives the adoption of RoCE, iWARP, and InfiniBand in future high‑performance systems.

low latencyRDMAhigh-performance networkingInfiniBandRoCEiWARP
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.