Operations 28 min read

Root Cause Analysis and Optimization of Network Packet Loss in High‑Traffic Redis Services

The article investigates why massive Redis deployments experience network packet loss despite using 10 Gbps NICs, explains how Linux kernel counters such as net.if.in.dropped are derived from /proc/net/dev, walks through the driver‑to‑kernel processing path, and proposes CPU‑affinity, interrupt‑affinity and NUMA‑aware tuning to eliminate the drops.

Qunar Tech Salon

Mar 21, 2018

Root Cause Analysis and Optimization of Network Packet Loss in High‑Traffic Redis Services

Background: As Meituan‑Dianping’s Redis usage grew from hundreds of billions to trillions of requests per day, the team observed increasing packet loss on 10 Gbps NICs, even after replacing 1 Gbps cards with 10 Gbps ones.

Locating the loss : Monitoring the net.if.in.dropped metric revealed that drops were reported by the kernel, not the hardware. The metric is populated from /proc/net/dev, whose fields are printed by the kernel functions dev_seq_show() and dev_seq_printf_stats() in net/core/net‑procfs.c.

static int dev_seq_show(struct seq_file *seq, void *v) {
    if (v == SEQ_START_TOKEN)
        seq_puts(seq, "Inter-|   Receive ... |  Transmit
");
    else
        dev_seq_printf_stats(seq, v);
    return 0;
}

static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) {
    const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
    seq_printf(seq, "%6s: %7llu %7llu ... %9llu
",
               dev->name, stats->rx_bytes, stats->rx_packets,
               stats->rx_errors, stats->rx_dropped + stats->rx_missed_errors,
               ...);
}

The rx_dropped field counts packets dropped because the kernel’s receive buffers ran out of space, while rx_missed_errors indicates FIFO overrun in the NIC driver. Different NICs expose these counters differently (e.g., Intel ixgbe vs. Broadcom bnx2).

Packet‑receive flow : The NIC writes incoming packets into DMA‑mapped sk_buff buffers via the Rx ring. When the driver cannot allocate a buffer fast enough, the NIC’s internal FIFO overflows, incrementing rx_fifo_errors. After DMA, the NIC raises a hardware interrupt, the driver’s ISR schedules a NAPI poll, and the packet is processed in the soft‑irq context.

Key kernel functions:

static int enqueue_to_backlog(struct sk_buff *skb, int cpu, unsigned int *qtail) {
    struct softnet_data *sd = &per_cpu(softnet_data, cpu);
    local_irq_save(flags);
    rps_lock(sd);
    if (skb_queue_len(&sd->input_pkt_queue) <= netdev_max_backlog) {
        if (skb_queue_len(&sd->input_pkt_queue))
            __skb_queue_tail(&sd->input_pkt_queue, skb);
        else
            ____napi_schedule(sd, &sd->backlog);
        rps_unlock(sd);
        local_irq_restore(flags);
        return NET_RX_SUCCESS;
    }
    sd->dropped++;
    rps_unlock(sd);
    local_irq_restore(flags);
    atomic_long_inc(&skb->dev->rx_dropped);
    kfree_skb(skb);
    return NET_RX_DROP;
}

If the per‑CPU backlog exceeds net.core.netdev_max_backlog (default 1000), packets are dropped, which is what the team observed on their Docker virtual NICs.

Interrupt handling : The driver registers MSI‑X vectors (up to 64 for X540). Each vector’s ISR either calls ixgbe_msix_clean_rings() (which schedules NAPI) or the legacy ixgbe_intr(). NAPI then invokes ixgbe_poll(), which processes a limited number of packets (controlled by net.core.netdev_budget) to avoid long‑running soft‑irqs.

Optimization strategies :

Increase net.core.netdev_max_backlog – helps only for minor spikes.

Bind NIC interrupt vectors to a set of CPU cores using /proc/irq/<em>n</em>/smp_affinity so that multiple cores share the load.

Assign Redis processes to the remaining cores with taskset to avoid contention between interrupt handling and Redis request processing.

Prefer keeping all interrupt handling within a single NUMA node; this reduces cross‑node memory accesses for both the kernel’s softnet structures and Redis’s socket buffers.

After moving the first eight interrupt vectors to the first NUMA node and pinning Redis to the second node, packet loss disappeared and Redis slow‑query counts dropped dramatically, confirming the effectiveness of CPU‑affinity and NUMA‑aware tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Linux kernel NUMA CPU affinity Packet Loss

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.