Cloud Native 29 min read

Troubleshooting Common Kubernetes Networking Issues: Cross-VPC NodePort Timeouts, LB Pressure Test CPS Low, DNS Delays, and More

This guide walks through eight frequent Kubernetes networking problems in Tencent Cloud Kubernetes Service—such as cross‑VPC NodePort timeouts, low load‑balancer CPS, DNS resolution delays, apiserver access lag, mis‑configured resolv.conf, liveness‑probe failures, and externalTrafficPolicy = Local timeouts—explaining their root causes and providing concrete kernel, iptables, DNS, and configuration fixes.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Troubleshooting Common Kubernetes Networking Issues: Cross-VPC NodePort Timeouts, LB Pressure Test CPS Low, DNS Delays, and More

The article presents a detailed troubleshooting guide for eight common Kubernetes networking problems encountered in TKE (Tencent Cloud Kubernetes Service) environments, covering root causes and solutions.

Issue 01 – Cross VPC NodePort timeout: caused by ip-masq-agent SNAT and kernel parameter net.ipv4.tcp_tw_recycle=1 leading to TIME_WAIT recycling issues under NAT; solution: disable tcp_tw_recycle .

Issue 02 – Low LB pressure test CPS on bare-metal TKE: due to conntrack insert failures from source port collisions during double SNAT (LB and node); solution: use iptables MASQUERADE --random-fully or bind LB directly to Pod IPs to avoid SNAT.

Issue 03 – Occasional 5s DNS resolution delay: caused by concurrent A/AAAA queries sharing same socket leading to conntrack conflicts in DNAT; solutions: force TCP DNS via resolv.conf options use-vc , or set single-request-reopen / single-request , or use local DNS cache.

Issue 04 – Pod accessing another cluster’s apiserver delay: due to Go’s DNS resolver ignoring /etc/nsswitch.conf on musl libc (Alpine) and preferring DNS over hosts, combined with conntrack loss; fix: change base image or mount nsswitch.conf .

Issue 05 – DNS resolution abnormal: upstream DNS misconfiguration from stale resolv.conf entries added by TKE Ubuntu image; fix: clean /etc/resolvconf/resolv.conf.d/base or rebuild image.

Issue 06 – Intermittent liveness probe failures: caused by small net.ipv4.tcp_max_syn_backlog leading to SYN queue overflow under high concurrency; solution: increase backlog (e.g., to 8096).

Issue 07 – externalTrafficPolicy=Local Service LB timeout: caused by kube‑proxy binding the LB’s EXTERNAL‑IP to the dummy kube-ipvs0 interface, putting the IP in local route and making the kernel drop health‑check packets; temporary fix: revert the offending PR or use --exclude-external-ip flag; long‑term: community fix.

Conclusion: summarizes the debugging journey and encourages readers to share experiences.

TroubleshootingNetworkingDNSKubernetestkeconntrackLBsysctl
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.