Operations 23 min read

Comprehensive Guide to Diagnosing and Resolving Linux Network Packet Loss

This article explains common Linux network packet loss scenarios, details the kernel’s packet receive and transmit paths, examines hardware and ARP issues, Conntrack limits, UDP buffer problems, and provides practical troubleshooting tools and commands to accurately detect and fix packet drops.

Deepin Linux
Deepin Linux
Deepin Linux
Comprehensive Guide to Diagnosing and Resolving Linux Network Packet Loss

In modern computing, network reliability is essential, yet Linux users often encounter mysterious packet loss that degrades performance and can cause economic loss for enterprise applications. Understanding the kernel’s packet processing pipeline is the first step to diagnosing these issues.

1. Linux Network Packet Loss Overview

Packet loss increases latency, causes jitter in real‑time communication, and can corrupt data transfers. For critical services such as online transactions, video conferencing, or cloud workloads, loss can translate directly into revenue loss.

2. Kernel Packet Receive and Transmit Flow

2.1 Receive Path

When a NIC receives a frame, DMA copies the packet into a ring buffer, then triggers a hardware interrupt. The CPU’s interrupt handler places packet metadata into poll_list and schedules a soft‑interrupt (ksoftirqd) which runs net_rx_action() . The packet is then pulled from the ring buffer, validated at the link layer, processed through the IP and TCP/UDP layers, and finally queued in the socket’s receive buffer for the application.

2.2 Transmit Path

Applications invoke a socket API (e.g., sendmsg ) which copies data from user space to a kernel sk_buff (skb). The skb traverses the protocol stack, receiving TCP/UDP headers, routing decisions, and netfilter checks before being placed in the NIC’s transmit ring buffer. After the NIC sends the frame, a hardware interrupt and subsequent soft‑interrupt clean the ring buffer.

3. NIC‑Related Packet Loss Scenarios

3.1 Ring Buffer Saturation

If the NIC receives traffic faster than the CPU can process, the receive ring buffer fills and packets are dropped. Monitoring rx_no_buffer_count via ethtool -S helps identify this condition.

Uneven distribution of hardware interrupts can exacerbate the problem. Use cat /proc/interrupts to view interrupt affinity, disable irqbalance , and manually bind interrupts to multiple CPUs with a script such as set_irq_affinity.sh .

For NICs supporting multiple receive queues, enable RSS (Receive Side Scaling) and tune the hash keys with ethtool --show-ntuple and ethtool --config-ntuple . If the NIC lacks enough queues, the kernel’s RPS (Receive Packet Steering) can distribute soft‑interrupts, e.g., echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus , though performance may be lower than hardware balancing.

Sudden traffic spikes can also overflow the ring buffer; increase its size with ethtool -G eth0 rx N after checking the current configuration via ethtool -g .

3.2 Using ntuple for Priority Traffic

Enabling ntuple allows directing specific flows (e.g., TCP port 23) to a dedicated queue:

ethtool -K eth0 ntuple on
ethtool -U eth0 flow-type tcp4 dst-port 23 action 9
ethtool -X eth0 equal 8

This creates a “VIP lane” for critical traffic, provided the hardware supports it.

4. ARP Table Overflows

4.1 Neighbor Table Overflow

When the neighbor (ARP) table fills, the kernel logs neighbour: arp_cache: neighbor table overflow! . Check /proc/net/stat/arp_cache for the table_fulls counter. The overflow is usually caused by a rapid influx of new IP‑MAC mappings without timely garbage collection.

Adjust the thresholds in /proc/sys/net/ipv4/neigh/default/ (gc_thresh1, gc_thresh2, gc_thresh3) to enlarge the table, e.g., echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1 , and make the changes permanent via /etc/sysctl.conf .

4.2 Unresolved ARP Drops

If a packet is sent before ARP resolution, it is queued until the ARP reply arrives. The queue size is limited by net.core.wmem_default . Excessive unresolved packets increase the unresolved_discards counter. Increase /proc/sys/net/ipv4/neigh/eth0/unres_qlen_bytes to mitigate.

5. Conntrack and UDP Buffer Issues

5.1 nf_conntrack Table Full

The connection‑tracking subsystem drops packets when its table is full, emitting nf_conntrack: table full, dropping packet . Compare net.netfilter.nf_conntrack_count with net.netfilter.nf_conntrack_max . Raise the maximum with sysctl -w net.netfilter.nf_conntrack_max=1048576 (adjust based on RAM).

5.2 UDP Receive Buffer Saturation

When the UDP receive buffer is exhausted, incoming packets are discarded. Use netstat -su to view “receive buffer errors”. Increase the buffer limits with sysctl -w net.core.rmem_max=8388608 and net.core.rmem_default=8388608 .

6. Practical Packet‑Loss Debugging

6.1 dropwatch

Install dependencies, clone the drop_watch repository, compile, and run sudo ./dropwatch -l kas . Start monitoring with start to see real‑time drop locations (e.g., icmp_rcv+0x11c ).

6.2 iptables LOG

Add a LOG rule to trace packet flow: iptables -A INPUT -j LOG --log-prefix "iptables-" . Examine syslog entries for source/destination IPs, ports, and protocols to identify where packets are being dropped.

6.3 Custom iptables Drop Rules for Testing

Simulate drops for specific traffic, e.g., iptables -A INPUT -s 192.168.1.100 -p udp -j DROP or iptables -A INPUT -i eth0 -p tcp -j DROP , then observe application behavior to pinpoint problematic paths.

6.4 Additional Tools

Beyond ping, tools like traceroute , nslookup , and especially mtr (my traceroute) combine ping and traceroute functionality for more detailed network diagnostics.

operationskernelnetworkLinuxTroubleshootingPacketLoss
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.