Fundamentals 8 min read

Deep Dive into NVMe over Fabrics: Design Principles, Data Transfer Process, and Protocol Extensions

This article thoroughly explains the NVMe over Fabrics architecture, covering its encapsulation format, IO transmission steps, connection management, command extensions, discovery services, and the Linux reference implementation for RDMA and Fibre Channel transports.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Deep Dive into NVMe over Fabrics: Design Principles, Data Transfer Process, and Protocol Extensions

This article provides an in‑depth exploration of the internal design of NVMe over Fabrics (NVMe‑of) and the IO transmission process defined by its specification.

NVMe‑of addresses three main challenges: (1) providing a transparent encapsulation format for messages and data across different interconnects; (2) mapping NVMe interface operations to network transports; and (3) handling node discovery, multipathing, and other network‑introduced issues.

The protocol defines a complete encapsulation scheme that differs from traditional NVMe. While NVMe uses an asynchronous software‑driven mechanism where only descriptors are sent in the command and completion packets, the actual data and SGL descriptors reside in host memory and are fetched by the hardware via DMA.

Because inter‑node latency in a fabric is much larger than the sub‑microsecond PCIe latency, NVMe‑of allows request packets to carry optional data or SGL descriptors, and completion packets to carry return data, thereby reducing unnecessary round‑trips.

In order to avoid flow‑control overhead, the completion queue in NVMe‑of does not implement flow control; the receiver must allocate enough space to hold all outstanding completions.

The single IO transmission proceeds as follows:

Initiator driver packages the request and hands it to the hardware.

The Initiator hardware posts the request to the Target’s submission queue.

The Target controller processes the IO and prepares a completion request for its hardware.

The Target hardware posts the completion to the Initiator’s receive queue.

If a request does not carry data, the Target can still retrieve the necessary data directly from the Initiator, as illustrated in the following diagram:

NVMe‑of extends the standard NVMe command set with five fabric‑specific commands: Connect, Property Get/Set, and Authentication Send/Receive. Authentication commands implement the SPC‑4 security protocol between Initiator and Target.

The Connect command creates a paired submission/completion queue, carrying the Host NQN, NVM Subsystem NQN, and Host Identifier, and can target either a static or a dynamic controller. A host may establish multiple connections to the same subsystem using different NQNs or fabric ports, offering great flexibility.

In classic NVMe, a controller is tied to a specific PCIe port. In NVMe‑of, a fabric port can host multiple controllers, allowing either static controllers (with distinct capabilities) or dynamic controllers (a simple model serving identical hosts).

Because fabrics lack a BAR‑style register space, NVMe‑of defines Property Get/Set commands to read and write controller registers remotely.

To support discovery, NVMe‑of specifies a discovery service that lets an Initiator query available NVM Subsystems, namespaces, and multipathing information via the Discovery Log Page command.

A reference implementation for Linux, covering both RDMA and Fibre Channel transports, provides Initiator and Target drivers, CLI tools, and OS integration, offering a solid starting point for the ecosystem.

Author: Lu Xiangfeng, CTO of Memblaze, originally published on the "Crystalwit" public account.

Disclaimer: The article is reproduced with permission; for copyright issues, please contact the publisher.

LinuxnetworkingRDMAdata transferNVMe over FabricsStorage Protocol
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.