Capturing 10 Million Packets per Second on Linux without Specialized Libraries
This article explains how to achieve multi‑million‑packet‑per‑second network capture on Linux 3.16 using standard C/C++ code, by distributing interrupts across cores, employing AF_PACKET with FANOUT and RX_RING, and optimizing memory handling to eliminate costly kernel locks and copies.
In this article I describe how to capture up to 10 million packets per second on a Linux 3.16 system without relying on specialized libraries such as Netmap, PF_RING, or DPDK, using only standard C/C++ code.
Understanding the Limitations of pcap
Traditional packet capture tools based on pcap (e.g., iftop, tcpdump, arpwatch) suffer from high CPU load because each packet is copied from kernel space to user space via a recv system call, and the call is made for every packet. On modern 10 GE NICs this can mean more than 14 million system calls per second, which quickly becomes a bottleneck.
Additionally, most capture applications run on a single logical core, while the NIC may distribute incoming packets across many cores. Kernel locks are taken when multiple cores contend for the same resources, and these locks can consume up to 90 % of CPU time.
Distributing Interrupts to All Cores
Enabling promiscuous mode on the NIC:
ifconfig eth6 promiscand then assigning each NIC queue to a different logical core reduces the load on any single core. The following script distributes the interrupts from all eight queues of an ixgbe NIC to the eight available logical CPUs:
#!/bin/bash
ncpus=$(grep -ciw ^processor /proc/cpuinfo)
[ "$ncpus" -gt 1 ] || exit 1
n=0
for irq in $(cat /proc/interrupts | grep eth | awk '{print $1}' | sed s/://g)
do
f="/proc/irq/$irq/smp_affinity"
[ -r "$f" ] || continue
cpu=$(( ncpus - (n % ncpus) - 1 ))
if [ $cpu -ge 0 ]; then
mask=$(printf %x $((2 ** cpu)))
echo "Assign SMP affinity: eth queue $n, irq $irq, cpu $cpu, mask 0x$mask"
echo "$mask" > "$f"
n=$((n+1))
fi
doneAfter applying this script the observed packet‑per‑second rate rose to about 12 MPPS, and CPU usage became evenly distributed across cores.
AF_PACKET Capture without Optimizations
Running a basic AF_PACKET capture shows the problem: the single‑core application quickly saturates all CPUs, and profiling reveals that most time is spent in kernel spin locks and packet processing functions.
We process: 222048 pps
We process: 186315 ppsUsing FANOUT to Parallelise Capture
By enabling PACKET_FANOUT_CP and spawning one capture process per logical core, the load is spread and lock contention disappears. The code flag bool use_multiple_fanout_processes = true activates this mode.
We process: 2250709 pps
We process: 2234301 pps
We process: 2266138 ppsCPU utilization remains high on each core, but profiling shows that the previous lock‑related hotspots have vanished.
RX_RING Circular Buffer Optimization
Further speed gains are achieved by using an RX_RING buffer, which eliminates the second memory copy from kernel to user space. With this technique the capture reaches roughly 4 MPPS.
We process: 3582498 pps
We process: 3757254 pps
We process: 3815506 ppsThe approach also switches from per‑packet recv calls to a poll -based model that wakes the application only when a whole block of packets is ready.
Combining RX_RING with FANOUT
Applying FANOUT to the RX_RING setup further reduces lock contention and pushes the throughput to around 9 MPPS.
We process: 9611580 pps
We process: 8912556 pps
We process: 8941682 ppsProfiling now shows that the dominant kernel functions are the NIC driver’s interrupt handler and packet reception routines, while lock‑related overhead is minimal.
Conclusion
Linux provides a powerful platform for ultra‑high‑speed packet capture without the need for custom kernel modules. By distributing interrupts, using AF_PACKET with FANOUT, and employing RX_RING circular buffers, it is possible to approach the wire speed of 10 GE NICs on modest hardware.
Recommended reading:
packet_mmap documentation
packet(7) man page
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.