Overview of InfiniBand Technology and Its Protocol Stack
This article provides a comprehensive overview of InfiniBand technology, covering its open‑standard architecture, history, OFED software stack, protocol layers, performance advantages over traditional storage networks, and its primary use cases in high‑performance computing and data‑center environments.
InfiniBand is an open‑standard technology that simplifies and accelerates connections between servers while also supporting connections to remote storage and network devices. The OpenFabrics Enterprise Distribution (OFED) is an open‑source software stack that includes drivers, kernel code, middleware, and user‑level interfaces for InfiniBand fabrics.
The first OFED version was released in 2005 by the OpenFabrics Alliance (OFA). Mellanox OFED provides drivers and tools for Linux and Windows (WinOF), offering diagnostics and performance monitoring for bandwidth and congestion within InfiniBand networks.
OFA is a community‑driven organization that develops, tests, and supports the OpenFabrics Enterprise Distribution, aiming to deliver high‑efficiency messaging, low latency, and maximum bandwidth with minimal CPU overhead.
Founded in June 2004 as the OpenIB Alliance, the group initially focused on a vendor‑independent, Linux‑based InfiniBand software stack. In 2005 it added Windows support, making the stack truly cross‑platform.
In 2006 the alliance expanded its charter to include iWARP support; in 2010 it added RoCE (RDMA over Converged Ethernet) for high‑performance RDMA over Ethernet; and in 2014, with the creation of the OpenFabrics Interfaces working group, it broadened support to other high‑performance networks.
InfiniBand specifications were drafted starting in 1999 and officially published in 2000. After 2005, InfiniBand Architecture (IBA) became widely adopted in cluster supercomputers, with many TOP500 systems using IBA.
Key industry members such as CRAY, Emulex, HP, IBM, Intel, Mellanox, Microsoft, Oracle, and QLogic promote InfiniBand, while other vendors like Cisco, Sun, NEC, and LSI are joining or returning. New generations like 56 Gbps FDR and 100 Gbps EDR are now common to meet the I/O demands of HPC, enterprise data centers, and cloud environments.
Compared with Fibre Channel (FC), InfiniBand delivers roughly 3.5× higher performance, with switch latency about one‑tenth of FC, and supports both SAN and NAS. Storage systems such as EMC, IBM FlashSystem, IBM XIV Gen3, and DDN SFA use InfiniBand networking.
InfiniBand employs PCI‑serial high‑speed links (SDR, DDR, QDR, FDR, EDR) achieving sub‑microsecond latency, and its link‑layer flow‑control provides advanced congestion management.
Virtual Lanes (VL) enable QoS by allowing up to 15 independent logical channels plus a management channel on a single physical link.
InfiniBand’s architecture follows a layered model similar to TCP/IP. The physical layer defines electrical and mechanical characteristics of copper and fiber media. The link layer specifies packet formats, flow control, and routing within a subnet. The network layer handles inter‑subnet routing using global routing headers (GRH) with IPv6‑style addressing. The transport layer manages packet segmentation, reassembly, and queue pair (QP) operations. Upper‑layer protocols include SDP, SRP, iSER, RDS, IPoIB, and uDAPL, providing storage, messaging, and IP‑over‑IB capabilities.
Typical application scenarios for InfiniBand are high‑performance computing (HPC) and large‑scale data‑center storage, where requirements include latency below 10 µs, CPU utilization under 10 %, and bandwidth of 56 Gbps or 100 Gbps.
In summary, InfiniBand’s RDMA offload reduces CPU load, lowering data‑processing latency from tens of microseconds to about one microsecond, while its high bandwidth (40‑100 Gbps), ultra‑low latency (hundreds of nanoseconds), and lossless transmission combine the reliability of FC with the flexibility of Ethernet.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.