Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers
This guide explains the fundamentals of network packet loss, illustrates how packets are sent and received, and provides step‑by‑step troubleshooting methods for hardware NIC, driver, kernel stack, TCP/UDP, and application‑level issues on Linux systems, complete with command examples and visual diagrams.
Introduction
This article shares a common network issue—packet loss. When a ping to a server succeeds and returns full data, the communication is healthy; if the ping fails or the response is incomplete, packets are likely being dropped. The following sections present typical packet‑loss diagnosis methods.
What Is Packet Loss?
Data on the Internet is transmitted in packets (bytes). Network devices, link quality, and other factors can cause the received data to be smaller than the sent data, resulting in packet loss.
Packet Transmission Principles
Sending Packets:
Application data is encapsulated with a TCP header at the TCP layer, forming a transmittable packet.
An IP header is added at the IP layer, forming an IP packet.
The NIC driver adds a 14‑byte MAC header, creating a frame that contains source and destination MAC addresses.
The driver copies the frame into the NIC buffer for the NIC to process.
The NIC adds synchronization information and a CRC, encapsulating the frame into a packet that is transmitted on the wire.
Receiving Packets:
The NIC receives the packet, checks CRC, and discards frames with invalid CRC. It also verifies that the destination MAC matches the NIC’s MAC; otherwise the frame is dropped.
The NIC copies the frame into its ring buffer.
The NIC driver notifies the kernel, which processes the packet through the TCP/IP stack.
The application reads the data from the socket buffer.
Core Idea
Understanding the send/receive process reveals that packet loss mainly originates from three layers: NIC hardware, NIC driver, and the kernel protocol stack. The troubleshooting approach follows a bottom‑up layer analysis, checking key information at each layer to pinpoint the cause.
Packet‑Loss Scenarios Overview
Hardware NIC loss
NIC driver loss
Ethernet link‑layer loss
Network IP‑layer loss
Transport‑layer UDP/TCP loss
Application‑layer socket loss
Hardware NIC Loss
Ring Buffer Overflow
When incoming packets arrive faster than the kernel can consume them, the NIC’s ring buffer fills and new packets are dropped.
Check:
<code>$ ethtool -S eth0 | grep rx_fifo</code><code>$ cat /proc/net/dev</code>To view the NIC’s ring buffer size:
<code>$ ethtool -g eth0</code>Solution: Increase the NIC’s receive and transmit hardware buffers.
<code>$ ethtool -G eth0 rx 4096 tx 4096</code>Port Negotiation Loss
Check NIC statistics and configuration:
<code>$ ethtool -S eth1</code><code>$ ethtool eth1</code>Solution: Re‑negotiate the link or force a specific speed.
<code>$ ethtool -r eth1</code><code>$ ethtool -s eth1 speed 1000 duplex full autoneg off</code>Flow‑Control Loss
Check flow‑control counters:
<code>$ ethtool -S eth1 | grep control</code>Solution: Disable NIC flow control.
<code>$ ethtool -A ethx autoneg off</code><code>$ ethtool -A ethx tx off</code><code>$ ethtool -A ethx rx off</code>MAC‑Address Mismatch Loss
If the NIC operates in non‑promiscuous mode, it only accepts frames addressed to its own MAC. A stale ARP entry or changed NIC can cause drops.
Check with tcpdump in promiscuous mode or examine ARP tables on both ends.
Solution: Refresh the ARP table or set a correct static ARP entry.
Other NIC Anomalies
Check NIC firmware version for bugs:
<code>$ ethtool -i eth1</code>Check cable integrity if CRC errors increase:
<code>$ ethtool -S eth0</code>Solution: Reseat or replace the cable.
Packet Length Loss
Ethernet frames must be 64‑1518 bytes. Oversized or undersized frames may be dropped.
<code>$ ethtool -S eth1 | grep length_errors</code>Solution: Adjust MTU or enable jumbo frames, and ensure proper segmentation on the sender.
NIC Driver Loss
Check driver statistics with
ifconfig eth1or
ethtool -S eth1.
RX Errors : Total receive errors, including FIFO overruns and CRC failures.
RX Dropped : Packets entered the ring buffer but were dropped due to insufficient memory.
RX Overruns : Kernel couldn’t keep up with NIC interrupts, causing packet loss.
RX Frame Errors : Misaligned frames or other hardware issues.
Driver Queue Overflow
Linux uses
netdev_max_backlogas a per‑CPU backlog queue. When the queue exceeds the limit, packets are dropped.
<code>$ cat /proc/net/softnet_stat</code>Solution: Increase
net.core.netdev_max_backlog.
<code>$ sysctl -w net.core.netdev_max_backlog=2000</code>Single‑CPU High Load
High soft‑interrupt usage on one CPU can starve packet processing.
<code>$ mpstat -P ALL 1</code>Solution: Balance IRQs across CPUs, adjust RSS queues, or enable IRQ affinity.
<code>$ ethtool -x ethx</code><code>$ ethtool -X ethx 8</code>Also consider disabling interrupt coalescing if it adds latency:
<code>$ ethtool -C ethx adaptive-rx on</code>Kernel Protocol‑Stack Loss
Ethernet Link‑Layer Loss
ARP Ignoring
Configure
arp_ignoreto control which ARP requests are answered.
<code>$ sysctl -a | grep arp_ignore</code>Solution: Set the appropriate value for the environment.
ARP Filter
In multi‑NIC setups,
arp_filterprevents the wrong NIC from answering ARP requests.
<code>$ sysctl -a | grep arp_filter</code>Solution: Enable or disable based on topology.
ARP Table Overflow
<code>$ cat /proc/net/stat/arp_cache</code><code>$ dmesg | grep neighbour</code>Solution: Increase ARP cache limits.
<code>$ sysctl -w net.ipv4.neigh.default.gc_thresh1=1024</code><code>$ sysctl -w net.ipv4.neigh.default.gc_thresh2=2048</code><code>$ sysctl -w net.ipv4.neigh.default.gc_thresh3=4096</code>Network IP‑Layer Loss
Interface IP Misconfiguration
Verify that the loopback and other interfaces have correct IPs.
<code>$ ip a add 1.1.1.1 dev eth0</code>Routing Loss
Check routing tables and policy routing.
<code>$ ip r get 8.8.8.8</code><code>$ netstat -s | grep "dropped because of missing route"</code>Solution: Correct routing entries.
Reverse‑Path Filtering
<code>$ cat /proc/sys/net/ipv4/conf/eth0/rp_filter</code>Set to 0 or 2 depending on the environment.
<code>$ sysctl -w net.ipv4.conf.all.rp_filter=2</code>Firewall Drop
<code>$ iptables -nvL | grep DROP</code>Solution: Adjust firewall rules.
Connection‑Tracking Table Overflow
<code>$ cat /proc/sys/net/netfilter/nf_conntrack_max</code>Increase limits or reduce timeout values.
<code>$ sysctl -w net.netfilter.nf_conntrack_max=3276800</code><code>$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=1200</code>Transport‑Layer UDP/TCP Loss
TCP Connection‑Tracking Security Check
<code>$ sysctl -a | grep nf_conntrack_tcp_be_liberal</code>Toggle the setting if it causes drops.
Fragment Reassembly Loss
<code>$ netstat -s | grep "fragments dropped after timeout"</code>Adjust
net.ipv4.ipfrag_timeand related thresholds.
<code>$ sysctl -w net.ipv4.ipfrag_time=60</code>TCP Congestion Control
BBR may cause latency spikes in deep‑queue scenarios; consider switching to Cubic or tuning BBR parameters.
<code>$ sysctl -w net.ipv4.tcp_congestion_control=cubic</code>UDP‑Layer Loss
Check UDP buffer statistics and increase
net.ipv4.udp_mem,
udp_rmem_min,
udp_wmem_minas needed.
<code>$ sysctl -w net.ipv4.udp_mem="65536 131072 262144"</code>Remember that UDP is unreliable; design applications accordingly or add reliability at the application layer.
Application‑Layer Socket Loss
Inspect socket receive errors:
<code>$ netstat -s | grep "packet receive errors"</code>Adjust socket buffer sizes:
<code>$ sysctl -w net.core.rmem_default=31457280</code><code>$ sysctl -w net.core.rmem_max=67108864</code>For send‑buffer errors, increase
net.core.wmem_defaultand
net.core.wmem_max.
<code>$ sysctl -w net.core.wmem_default=31457280</code><code>$ sysctl -w net.core.wmem_max=33554432</code>Calculate appropriate buffer sizes using the bandwidth‑delay product (BDP = bandwidth × RTT).
Related Tools
dropwatch : Monitors kernel drop events and prints the call stack where packets are discarded.
tcpdump : Captures network traffic for detailed analysis.
Use
wiresharkor
tsharkfor GUI or command‑line packet inspection.
Conclusion
This article covers most common packet‑loss points and provides specific diagnosis steps and solutions for each layer. While modern cloud networks involve complex underlay and overlay topologies, mastering these fundamentals enables systematic, layer‑by‑layer troubleshooting to quickly locate and resolve packet‑loss incidents.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.