Understanding NVMe/TCP and Its Role in Modern Data Center Storage
The article explains the evolution of NVMe‑oF, compares RDMA, FC and TCP transports, highlights the advantages and challenges of NVMe/TCP in modern data‑center and cloud storage, and discusses Lightbits' LightOS and accelerator card as a cost‑effective solution for high‑performance distributed storage.
In the networking world, Ethernet and IP have long dreamed of a unified data‑center network, but their progress in storage networking has been uneven. FCoE once looked promising but was defeated by the resilience of Fibre Channel and is now largely abandoned. With the rise of NVMe‑oF, IP found a new opportunity; at the end of 2018, NVMe/TCP became a standard, driven in part by Lightbits Labs, a startup that recently secured $50 million investment from Dell.
Recently, many colleagues have asked for my view on Lightbits and the future of NVMe/TCP, so I will discuss this topic.
NVMe‑oF has been a standard for several years. Initially it only supported RDMA, later added Fibre Channel support, and now includes TCP support.
NVMe uses a PCIe, memory‑mapped protocol, whereas NVMe‑oF is message‑based but can also support shared memory. RDMA (including IB, RoCE, iWARP) supports both message and memory semantics, while FC and TCP only support messages.
Because RDMA supports two mechanisms, both command and response are encapsulated as messages, but data is processed using memory semantics.
FC and TCP only support messages, so data is also transferred as messages.
TCP may seem less powerful than RDMA, but its ubiquity, familiarity, and ability to support long‑distance, large‑scale deployments make it attractive for cloud environments. Although TCP introduces latency and jitter challenges, they are not insurmountable.
NVMe/TCP simply encapsulates NVMe‑oF messages inside TCP/IP packets for transport.
The diagram shows the three NVMe‑oF transport mechanisms; NVMe/TCP and NVMe/FC have the most similar transmission mechanisms.
In terms of latency, TCP is slower than RDMA and suffers from the incast problem, where congestion causes all senders to pause and then resume simultaneously.
This typical saw‑tooth pattern means NVMe/TCP requires very strict congestion control.
Fortunately, modern data‑center switches provide priority or flow‑control features (e.g., DCTCP, ECN) that can mitigate incast.
If you plan to use NVMe/TCP, ensure your switches support these flow‑control features and are properly configured.
Although NVMe/TCP requires modern switches, the host side does not need an RDMA HCA; a standard NIC suffices, making it cheaper than RDMA, especially in large‑scale host deployments. Consequently, Lightbits, together with Facebook, Dell EMC, and Intel, pushed NVMe/TCP to become an international standard on 15 Nov 2018.
Latency tests show NVMe/TCP latency is about twice that of RDMA, but still in the microsecond range, sufficient for most scenarios and far superior to iSCSI. Cloud providers such as Facebook and Google are enthusiastic about this technology.
Lightbits seized the opportunity and built its own NVMe/TCP storage operating system, LightOS.
LightOS offers enterprise‑grade features such as erasure coding, data reduction, and QoS.
To accelerate performance, Lightbits also developed an accelerator card that provides EC and compression acceleration as well as TCP offload. The card does not contain a network port, so it must be used together with a regular NIC.
Combined with LightOS, this accelerator card forms a high‑performance distributed NVMe/TCP storage solution.
Lightbits recently secured a $50 million investment, with backing from Dell, Cisco, Micron, and other major IT vendors.
The three founders include a CTO with a semiconductor background.
In my opinion, although NVMe/TCP inherits some inherent TCP issues, cloud providers have no alternative but to adopt it, unlike the fate of FCoE.
Pure Storage’s NVMe‑oF roadmap this year includes RoCE v2, plans FC support by year‑end, and NVMe/TCP support next year, which I consider a reasonable timeline.
Nevertheless, I believe NVMe/TCP will not dominate the data‑center network in the short term because traditional data‑center operators are accustomed to Fibre Channel, making NVMe/FC the preferred evolution path for many enterprises. Consequently, all three NVMe‑oF protocols are likely to coexist:
NVMe/FC – high‑end storage and critical business workloads.
NVMe/RDMA – HPC and latency‑sensitive scenarios.
NVMe/TCP – cloud and distributed storage environments.
In public clouds, NVMe/TCP is the only viable option. However, public‑cloud providers typically do not purchase full storage systems, only certain components, so Lightbits can only sell its accelerator cards in that market, limiting revenue. Real success will require adoption in enterprise data centers.
Warm tip: Scan the QR code to follow the public account, click the original link for more technical material and summaries.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.