Fundamentals 14 min read

Understanding Mellanox InfiniBand Technology and Its Role in High‑Performance Computing

The article explains Nvidia's $6.9 billion acquisition of Mellanox, outlines Mellanox's history and product portfolio, and provides a detailed overview of InfiniBand architecture, network topologies, protocols, and related software stacks such as OFED, highlighting their importance for data‑center, HPC, and cloud environments.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Understanding Mellanox InfiniBand Technology and Its Role in High‑Performance Computing

On March 11, Nvidia announced a $6.9 billion cash deal to acquire Israeli chip maker Mellanox Technologies, marking the largest acquisition in Nvidia's history.

Previously, Microsoft, Xilinx and Intel had shown interest; Bloomberg reported Nvidia will pay $125 per share, a 14 % premium over the closing price.

Nvidia expects the acquisition to immediately boost profit and free cash flow, making it the most expensive deal the company has ever made.

Mellanox, founded in 1999 with headquarters in California and Israel, is a leading supplier of end‑to‑end InfiniBand solutions for servers and storage. In 2010 it acquired Voltaire, expanding its capabilities in HPC, cloud, data‑center and enterprise markets.

IB Network and Topology

InfiniBand replaces shared‑bus architectures with channel‑based serial links, separating I/O subsystems from CPU/memory. Nodes (hosts, HCAs, TCAs) connect via channel adapters, switches and routers, forming scalable topologies.

InfiniBand follows a layered protocol model similar to TCP/IP, supporting multicast, partitions, IP compatibility, flow control and rate control.

Routing algorithms include shortest‑path, Min‑Hop UPDN, and Fat‑Tree based methods. Fat‑Tree topologies are favored in HPC and large clusters for their non‑blocking characteristics.

To avoid congestion in three‑tier designs, fat‑node (fat‑tree) architectures provide sufficient ports and bandwidth at aggregation and core layers.

Fat‑Tree consists of leaf (Spine) switches; leaf switches connect to servers or storage adapters, while spine switches interconnect leaves, ensuring equal bandwidth distribution.

Ports on the same switch form a port group; switches at the same rank share identical up‑ and down‑port groups.

All terminal nodes' HCAs reside at the same rank level.

Software Stack – OFED

Mellanox OFED (OpenFabrics Enterprise Distribution) provides drivers, middleware, user interfaces and protocols such as IPoIB, SDP, SRP, iSER, RDS, and DAPL, supporting MPI, Lustre/NFS over RDMA and the Verbs API. It is maintained by the OpenFabrics community and distributed as ISO images with source and binary packages.

InfiniBand Network Management

OpenSM is the subnet manager for InfiniBand, running on top of Mellanox OFED. It provides device discovery, fabric visualization, health monitoring and performance management.

Parallel Computing

MPI (Message Passing Interface) implementations for InfiniBand are provided by Open MPI and OSU MVAPICH, enabling high‑performance parallel workloads.

RDS (Reliable Datagram Socket) offers ordered, reliable delivery over RC or TCP/IP, useful for Oracle RAC 11g.

Socket‑Based Network Capabilities

IPoIB/EoIB encapsulate IP traffic over InfiniBand. SDP provides TCP‑like stream semantics with lower latency and higher bandwidth.

Storage Support

InfiniBand supports iSER, NFS‑over‑RDMA, SRP and other protocols, enabling remote direct memory access (RDMA) for fast data movement without CPU involvement.

Mellanox Product Overview

Mellanox offers InfiniBand and Ethernet interconnects, including VPI‑enabled adapters, switches, cables and transceivers. Products span from leaf/Spine switches to core systems supporting 40‑100 Gb/s ports and up to thousands of nodes.

InfiniBand switches (SwitchX, SwitchIB) provide non‑blocking fabrics with support for SDR, DDR, QDR, FDR and EDR speeds. Fat‑Tree topologies are realized using leaf and spine modules.

ConnectX adapters (HCAs) and TCA cards provide PCIe‑based InfiniBand connectivity, while Mellanox routers (e.g., SB7780, SX6036G) enable inter‑subnet traffic and VPI bridging between InfiniBand and Ethernet.

LinkX cables and transceivers support 10‑100 Gb/s copper and optical links, enabling end‑to‑end 200‑400 Gb/s solutions for modern data‑center fabrics.

For further reading, see the linked articles on AI chip companies and micro‑service transformation.

High Performance Computingdata centerInfiniBandMellanoxNetwork TopologyOFED
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.