Optimizing Network Latency in TencentOS: Kernel SoftIRQ and ksoftirqd Tuning
This article details a systematic investigation and multi‑stage optimization of TencentOS network latency, uncovering kernel soft‑interrupt bottlenecks, proposing and upstreaming patches to net_rx_action, RPS handling, and ksoftirqd scheduling, ultimately reducing handshake failures by over 80% while sharing practical insights for Linux kernel developers.
When Tencent's core services migrated to the cloud, the TencentOS kernel experienced a sharp rise in network connection failures, with peak failure counts reaching 59 per monitoring interval. The business required sub‑50 ms handshake latency, but many connections exceeded this threshold, prompting a deep dive into the kernel networking stack.
The investigation identified several potential delay sources: delayed ACK (≈40 ms), RTO retransmission (≈200 ms), and SYN retransmission back‑off (up to several seconds). The team focused on the packet reception path from the host (mother) VM to the guest (child) VM, examining queueing, interrupt handling, and soft‑IRQ processing.
First root‑cause analysis revealed that the soft‑IRQ entry point net_rx_action() was being delayed, with the function exiting early due to budget or time limits, causing packets to be deferred to the next soft‑IRQ cycle. The relevant code snippet:
net_rx_action()
{
// 第一层干扰项: time
time_limit = jiffies + usecs_to_jiffies(netdev_budget_usecs);
// 第二层干扰项: budget
budget = netdev_budget;
budget -= napi_poll(n, &repoll);
…
if (budget <= 0 || time_after_eq(jiffies, time_limit)) {
break;
}
…
}Further profiling showed that the child VM's network driver often failed to drain packets, leaving the host unable to re‑enable interrupts. Consequently, packets waited for the kernel thread ksoftirqd to process them, adding latency.
Upper‑layer mitigation involved unbinding the business processes from specific CPUs, which immediately reduced handshake failure rates to below five per interval.
Subsequent kernel‑level attempts included:
First kernel attempt : Adjusting /proc/sys/net/core/netdev_budget and netdev_budget_usecs , and tuning virtio_net.napi_weight . These changes yielded modest improvements but did not match the unbinding effect.
Second attempt (RPS optimization) : Modifying the RPS scheduling code to avoid unnecessary soft‑IRQ wake‑ups. Relevant snippet:
napi_schedule_rps(struct softnet_data *sd)
{
struct softnet_data *mysd = this_cpu_ptr(&softnet_data);
#ifdef CONFIG_RPS
if (sd != mysd) {
sd->rps_ipi_next = mysd->rps_ipi_list;
mysd->rps_ipi_list = sd;
__raise_softirq_irqoff(NET_RX_SOFTIRQ);
return 1;
}
#endif
__napi_schedule_irqoff(&mysd->backlog);
return 0;
}This patch reduced NET_RX_SOFTIRQ invocations by ~10% but still left latency issues under load.
Third attempt (eliminating ksoftirqd delegation) : Introducing a patch that prevents the kernel from fully delegating network soft‑IRQ handling to ksoftirqd when the local CPU is not overloaded. Key code:
static inline void invoke_softirq(void)
{
if (ksoftirqd_running(local_softirq_pending()))
return;
if (!force_irqthreads() || !__this_cpu_read(ksoftirqd)) {
__do_softirq();
}
}After upstreaming this change, the handshake failure count dropped by about 87.5%, as shown by before/after graphs.
All three patches were submitted to the Linux kernel community, with two being accepted upstream. The author also shared observation tools, links to mailing list discussions, and detailed performance graphs.
In conclusion, the article documents a comprehensive, multi‑stage optimization process—from business‑level CPU binding adjustments to kernel‑level soft‑IRQ redesign—demonstrating how deep kernel knowledge and upstream collaboration can dramatically improve cloud network latency.
At the end of the article, the TencentOS team posted a recruitment notice for senior Linux kernel engineers, inviting interested candidates to contact them via email.
Tencent Architect
We share insights on storage, computing, networking and explore leading industry technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.