Performance Degradation After Containerization: Analysis and Optimization Strategies
The article investigates why applications experience higher latency and lower QPS after moving from virtual machines to Kubernetes containers, analyzes soft‑interrupt overhead caused by Calico's overlay network, and proposes underlay networking and Cilium CNI as optimization solutions.
1. Background: As more companies adopt cloud‑native micro‑service architectures and move from VMs to containers orchestrated by Kubernetes, a performance regression was observed after containerization.
2. Benchmark Results: Using the wrk tool, the VM deployment achieved an average latency of 1.68 ms and 716 QPS, while the containerized deployment showed 2.11 ms latency and 554 QPS, indicating a 25 % increase in latency and 29 % drop in QPS.
3. Root‑Cause Analysis: The degradation is mainly caused by higher soft‑interrupt (softirq) processing due to the Calico IPIP overlay network and veth‑pair communication between pod and host namespaces. Perf profiling revealed increased softirq usage, and kernel code paths (veth_xmit → veth_forward_skb → netif_rx → __raise_softirq_irqoff) confirm the extra interrupt handling.
static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) {
...
if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp))
...
}
static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb,
struct veth_rq *rq, bool xdp) {
return __dev_forward_skb(dev, skb) ?: xdp ?
veth_xdp_rx(rq, skb) :
netif_rx(skb); // interrupt handling
}
/* Called with irq disabled */
static inline void ____napi_schedule(struct softnet_data *sd,
struct napi_struct *napi) {
list_add_tail(&napi->poll_list, &sd->poll_list);
__raise_softirq_irqoff(NET_RX_SOFTIRQ); // trigger softirq
}4. Optimization Strategies: Switching from Calico IPIP overlay to underlay solutions such as ipvlan (L2 or L3 mode) eliminates the veth‑pair overhead. Additionally, using Cilium as a high‑performance eBPF‑based CNI further reduces iptables processing and improves QPS and CPU utilization.
5. Conclusion: Containerization brings many benefits but also adds network complexity that can hurt performance. Selecting appropriate CNI plugins (ipvlan, Cilium) and understanding kernel networking paths are essential for performance‑critical workloads in cloud‑native environments.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.