Low‑Latency Network Architecture for High‑Frequency Trading
This article explains how high‑frequency trading firms achieve ultra‑low network latency by combining proximity deployment, dedicated links, microwave transmission, InfiniBand, low‑latency switches, kernel bypass, RDMA, TCP offload engines and FPGA acceleration, and summarizes the impact of each technique on overall request latency.
Hello everyone, I'm Fei! Recently I did a simple study of low‑latency network architecture and would like to share my findings.
When people think of excellent low‑latency network architectures, they often imagine the big internet companies like Tencent, Alibaba, or ByteDance, assuming they have the best solutions. In most internet applications, users tolerate a second or even two‑three seconds of delay without noticing, so optimization is not pushed to the extreme.
However, there is a niche domain that is extremely sensitive to latency: high‑frequency quantitative trading. In this business, the first quote wins.
If a quant firm is just 1 ms slower than a competitor, it can lose a lot of money. Industry estimates suggest that a 5 ms delay can cost about 1 % of profit, and a 10 ms delay can double that loss.
Therefore, the network architecture used in high‑frequency trading is among the world’s most advanced low‑latency designs and is worth studying. I consulted several quant professionals and reviewed technical material, and I’m sharing a preliminary summary.
First, a data set from an exchange shows that trading latency was already optimized to 278 µs back in 2016.
A network request from the user to final processing can be divided into two parts: network forwarding latency and system processing latency.
We will examine the "black‑tech" used in each of these two parts for high‑frequency trading.
1. Network Forwarding Latency
In high‑frequency trading, network forwarding is optimized to the extreme.
1.1 Proximity Deployment & Dedicated Lines
Placing servers close to the exchange (co‑location) shortens the physical path and reduces transmission time. For cross‑region connections, dedicated physical lines provide the shortest route, avoid public‑internet interference, and improve stability and security.
1.2 Microwave Transmission
Light travels at about 3×10⁸ m/s in vacuum, but slower in denser media. In optical fiber the speed drops to roughly 50‑66 % of the vacuum speed, and fiber routes are rarely straight, adding extra distance.
Quant trading networks therefore use microwave links, allowing signals to travel through air rather than fiber. For example, a microwave link between Chicago CME and New Jersey NASDAQ data centers reduced RTT from 8.7 ms (2012) to 7.9 ms (2016).
Satellite constellations such as SpaceX Starlink can also provide ultra‑low latency for long‑distance scenarios.
1.3 InfiniBand Replacing Ethernet
Some exchanges replace Ethernet with InfiniBand, which combines a physical link protocol with Remote Direct Memory Access (RDMA).
In Ethernet, each packet is received, its headers examined, and then forwarded, which introduces processing overhead. InfiniBand can forward packets more like a railway switch, moving data directly without full packet inspection.
Latency measurements show Ethernet TCP/UDP processing around 10 µs, while InfiniBand NICs achieve about 600 ns for send/write operations.
InfiniBand also avoids the jitter problems common in Ethernet.
However, InfiniBand requires a scheduling center, has higher cost, and limited address space, so only a few exchanges use it.
1.4 Higher Bandwidth
Higher bandwidth reduces latency because, for a given packet size, a faster link transmits the data more quickly.
Trading venues therefore use 10 Gbps or even 25 Gbps networks.
1.5 Low‑Latency Switches
Traditional switches operate at Layer 2 and use protocols like CSMA/CD to avoid collisions. If there are no collisions, the extra processing is unnecessary.
Arista produces switches that forward at the physical layer, essentially acting like a straight fiber: a packet entering one port is immediately sent out another without contention.
Reference: Arista 7130 Series
2. System Processing Latency
After a request reaches the server, the system begins processing it, and high‑frequency trading applies many optimizations here as well.
2.1 Kernel Bypass
Traditional kernel stacks introduce significant overhead: packets traverse multiple protocol layers, are placed in receive queues, and the kernel must wake the user process, which may wait in the CPU run queue.
Kernel bypass techniques avoid the kernel stack by polling in user space and processing packets directly, eliminating kernel‑user context switches.
Examples include Solarflare Onload and Exablaze ExaSOCK, widely used in quant trading.
2.2 RDMA
Remote Direct Memory Access (RDMA) further reduces latency by allowing direct memory operations on the remote host. InfiniBand natively supports RDMA; other solutions include RoCE and iWARP, which work over Ethernet but require specialized NICs.
RoCE v1 operates on Ethernet L2 with PFC flow control, while RoCE v2 encapsulates RDMA over UDP/IP, enabling L3 routing.
2.3 TOE Hardware Offload
Traditional processing is CPU‑bound. To further accelerate, TCP Offload Engine (TOE) moves TCP/IP processing to FPGA or dedicated DPU hardware.
FPGA offers lower latency than GPUs because data does not cross the PCIe bus and avoids multiple memory copies, achieving 10‑100× CPU performance.
Some firms implement the entire protocol stack and even business logic on FPGA, reaching sub‑nanosecond processing times.
2.4 Other Optimizations
Additional measures include using massive memory (≥1 TB) to avoid paging, deploying very high‑frequency CPUs (5 GHz+), binding threads to dedicated cores, aligning cache lines, and minimizing system calls.
3. Summary
A network request’s total latency consists of network forwarding latency and system processing latency.
In the network forwarding path, quant firms employ microwave links, InfiniBand, low‑latency switches, proximity deployment, and dedicated lines.
On the processing side, they use kernel bypass, RDMA, and offload intensive workloads to FPGA, while also applying traditional tricks such as large memory, high‑frequency CPUs, core pinning, and cache‑line alignment.
Overall, quant trading companies adopt a suite of cutting‑edge technologies to push processing latency to the absolute minimum.
Acknowledgements
I initially knew only that quant trading networks were low‑latency, but not why or how. Thanks to friends (list of names) and the authors of several articles for their valuable information.
Technical sharing helps everyone grow faster, which is why I continue to write despite a busy schedule.
Today is Programmer's Day (Oct 24). My publisher is offering a 50 % discount on my book "Deep Understanding of Linux Networking" and Dong’s "Algorithm Cheat Sheet" for interview preparation.
Entry link:
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.