Operations 34 min read

Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers

This guide explains the fundamentals of network packet loss, illustrates how packets are sent and received, and provides step‑by‑step troubleshooting methods for hardware NIC, driver, kernel stack, TCP/UDP, and application‑level issues on Linux systems, complete with command examples and visual diagrams.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers

Introduction

This article shares a common network issue—packet loss. When a ping to a server succeeds and returns full data, the communication is healthy; if the ping fails or the response is incomplete, packets are likely being dropped. The following sections present typical packet‑loss diagnosis methods.

What Is Packet Loss?

Data on the Internet is transmitted in packets (bytes). Network devices, link quality, and other factors can cause the received data to be smaller than the sent data, resulting in packet loss.

Packet Transmission Principles

Sending Packets:

Application data is encapsulated with a TCP header at the TCP layer, forming a transmittable packet.

An IP header is added at the IP layer, forming an IP packet.

The NIC driver adds a 14‑byte MAC header, creating a frame that contains source and destination MAC addresses.

The driver copies the frame into the NIC buffer for the NIC to process.

The NIC adds synchronization information and a CRC, encapsulating the frame into a packet that is transmitted on the wire.

Receiving Packets:

The NIC receives the packet, checks CRC, and discards frames with invalid CRC. It also verifies that the destination MAC matches the NIC’s MAC; otherwise the frame is dropped.

The NIC copies the frame into its ring buffer.

The NIC driver notifies the kernel, which processes the packet through the TCP/IP stack.

The application reads the data from the socket buffer.

Core Idea

Understanding the send/receive process reveals that packet loss mainly originates from three layers: NIC hardware, NIC driver, and the kernel protocol stack. The troubleshooting approach follows a bottom‑up layer analysis, checking key information at each layer to pinpoint the cause.

Packet‑Loss Scenarios Overview

Hardware NIC loss

NIC driver loss

Ethernet link‑layer loss

Network IP‑layer loss

Transport‑layer UDP/TCP loss

Application‑layer socket loss

Hardware NIC Loss

Ring Buffer Overflow

When incoming packets arrive faster than the kernel can consume them, the NIC’s ring buffer fills and new packets are dropped.

Check:

<code>$ ethtool -S eth0 | grep rx_fifo</code><code>$ cat /proc/net/dev</code>

To view the NIC’s ring buffer size:

<code>$ ethtool -g eth0</code>

Solution: Increase the NIC’s receive and transmit hardware buffers.

<code>$ ethtool -G eth0 rx 4096 tx 4096</code>

Port Negotiation Loss

Check NIC statistics and configuration:

<code>$ ethtool -S eth1</code><code>$ ethtool eth1</code>

Solution: Re‑negotiate the link or force a specific speed.

<code>$ ethtool -r eth1</code><code>$ ethtool -s eth1 speed 1000 duplex full autoneg off</code>

Flow‑Control Loss

Check flow‑control counters:

<code>$ ethtool -S eth1 | grep control</code>

Solution: Disable NIC flow control.

<code>$ ethtool -A ethx autoneg off</code><code>$ ethtool -A ethx tx off</code><code>$ ethtool -A ethx rx off</code>

MAC‑Address Mismatch Loss

If the NIC operates in non‑promiscuous mode, it only accepts frames addressed to its own MAC. A stale ARP entry or changed NIC can cause drops.

Check with tcpdump in promiscuous mode or examine ARP tables on both ends.

Solution: Refresh the ARP table or set a correct static ARP entry.

Other NIC Anomalies

Check NIC firmware version for bugs:

<code>$ ethtool -i eth1</code>

Check cable integrity if CRC errors increase:

<code>$ ethtool -S eth0</code>

Solution: Reseat or replace the cable.

Packet Length Loss

Ethernet frames must be 64‑1518 bytes. Oversized or undersized frames may be dropped.

<code>$ ethtool -S eth1 | grep length_errors</code>

Solution: Adjust MTU or enable jumbo frames, and ensure proper segmentation on the sender.

NIC Driver Loss

Check driver statistics with

ifconfig eth1

or

ethtool -S eth1

.

RX Errors : Total receive errors, including FIFO overruns and CRC failures.

RX Dropped : Packets entered the ring buffer but were dropped due to insufficient memory.

RX Overruns : Kernel couldn’t keep up with NIC interrupts, causing packet loss.

RX Frame Errors : Misaligned frames or other hardware issues.

Driver Queue Overflow

Linux uses

netdev_max_backlog

as a per‑CPU backlog queue. When the queue exceeds the limit, packets are dropped.

<code>$ cat /proc/net/softnet_stat</code>

Solution: Increase

net.core.netdev_max_backlog

.

<code>$ sysctl -w net.core.netdev_max_backlog=2000</code>

Single‑CPU High Load

High soft‑interrupt usage on one CPU can starve packet processing.

<code>$ mpstat -P ALL 1</code>

Solution: Balance IRQs across CPUs, adjust RSS queues, or enable IRQ affinity.

<code>$ ethtool -x ethx</code><code>$ ethtool -X ethx 8</code>

Also consider disabling interrupt coalescing if it adds latency:

<code>$ ethtool -C ethx adaptive-rx on</code>

Kernel Protocol‑Stack Loss

Ethernet Link‑Layer Loss

ARP Ignoring

Configure

arp_ignore

to control which ARP requests are answered.

<code>$ sysctl -a | grep arp_ignore</code>

Solution: Set the appropriate value for the environment.

ARP Filter

In multi‑NIC setups,

arp_filter

prevents the wrong NIC from answering ARP requests.

<code>$ sysctl -a | grep arp_filter</code>

Solution: Enable or disable based on topology.

ARP Table Overflow

<code>$ cat /proc/net/stat/arp_cache</code><code>$ dmesg | grep neighbour</code>

Solution: Increase ARP cache limits.

<code>$ sysctl -w net.ipv4.neigh.default.gc_thresh1=1024</code><code>$ sysctl -w net.ipv4.neigh.default.gc_thresh2=2048</code><code>$ sysctl -w net.ipv4.neigh.default.gc_thresh3=4096</code>

Network IP‑Layer Loss

Interface IP Misconfiguration

Verify that the loopback and other interfaces have correct IPs.

<code>$ ip a add 1.1.1.1 dev eth0</code>

Routing Loss

Check routing tables and policy routing.

<code>$ ip r get 8.8.8.8</code><code>$ netstat -s | grep "dropped because of missing route"</code>

Solution: Correct routing entries.

Reverse‑Path Filtering

<code>$ cat /proc/sys/net/ipv4/conf/eth0/rp_filter</code>

Set to 0 or 2 depending on the environment.

<code>$ sysctl -w net.ipv4.conf.all.rp_filter=2</code>

Firewall Drop

<code>$ iptables -nvL | grep DROP</code>

Solution: Adjust firewall rules.

Connection‑Tracking Table Overflow

<code>$ cat /proc/sys/net/netfilter/nf_conntrack_max</code>

Increase limits or reduce timeout values.

<code>$ sysctl -w net.netfilter.nf_conntrack_max=3276800</code><code>$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=1200</code>

Transport‑Layer UDP/TCP Loss

TCP Connection‑Tracking Security Check

<code>$ sysctl -a | grep nf_conntrack_tcp_be_liberal</code>

Toggle the setting if it causes drops.

Fragment Reassembly Loss

<code>$ netstat -s | grep "fragments dropped after timeout"</code>

Adjust

net.ipv4.ipfrag_time

and related thresholds.

<code>$ sysctl -w net.ipv4.ipfrag_time=60</code>

TCP Congestion Control

BBR may cause latency spikes in deep‑queue scenarios; consider switching to Cubic or tuning BBR parameters.

<code>$ sysctl -w net.ipv4.tcp_congestion_control=cubic</code>

UDP‑Layer Loss

Check UDP buffer statistics and increase

net.ipv4.udp_mem

,

udp_rmem_min

,

udp_wmem_min

as needed.

<code>$ sysctl -w net.ipv4.udp_mem="65536 131072 262144"</code>

Remember that UDP is unreliable; design applications accordingly or add reliability at the application layer.

Application‑Layer Socket Loss

Inspect socket receive errors:

<code>$ netstat -s | grep "packet receive errors"</code>

Adjust socket buffer sizes:

<code>$ sysctl -w net.core.rmem_default=31457280</code><code>$ sysctl -w net.core.rmem_max=67108864</code>

For send‑buffer errors, increase

net.core.wmem_default

and

net.core.wmem_max

.

<code>$ sysctl -w net.core.wmem_default=31457280</code><code>$ sysctl -w net.core.wmem_max=33554432</code>

Calculate appropriate buffer sizes using the bandwidth‑delay product (BDP = bandwidth × RTT).

Related Tools

dropwatch : Monitors kernel drop events and prints the call stack where packets are discarded.

tcpdump : Captures network traffic for detailed analysis.

Use

wireshark

or

tshark

for GUI or command‑line packet inspection.

Conclusion

This article covers most common packet‑loss points and provides specific diagnosis steps and solutions for each layer. While modern cloud networks involve complex underlay and overlay topologies, mastering these fundamentals enables systematic, layer‑by‑layer troubleshooting to quickly locate and resolve packet‑loss incidents.

NetworkTCPLinuxtroubleshootingUDPethtoolpacket loss
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.