How We Scaled 10,000+ K8s CronJobs with Serverless and Solved Node Instability
This article describes the challenges encountered when migrating tens of thousands of Kubernetes cronjobs from VMs to a cluster—node instability, low resource utilization, and scheduling delays—and explains how introducing a serverless architecture with virtual nodes, a custom job scheduler, unified logging and monitoring, and sandbox reuse restored stability, improved performance, and reduced resource costs by about 70%.
Background Introduction
During the cloud‑native container migration at Zuoyebang, scheduled tasks originally running on virtual machines were moved to Kubernetes CronJobs. The system performed well with fewer than 1,000 CronJobs, but problems emerged as the scale grew to tens of thousands.
Problem Discovery
Two main issues were identified: (1) node stability within the cluster, and (2) low resource utilization.
Issue 1: Node Stability
Frequent minute‑level tasks caused rapid pod creation and destruction, resulting in hundreds of containers being created and removed per minute on a single node. This led to excessive cgroup entries, especially memory cgroups that were not reclaimed promptly. The kubelet’s periodic reads of
/sys/fs/cgroup/memory/memory.statslowed down, increasing CPU time in kernel mode and causing noticeable network latency.
Performance profiling (
perf record cat /sys/fs/cgroup/memory/memory.statand
perf report) showed most CPU consumption in
memcg_stat_show. The memcg_stat_show function in cgroup‑v1 traverses the memory cgroup tree many times per CPU core; with millions of memory cgroup entries, this became disastrous.
Memory cgroups are not released immediately after container termination because the kernel must walk all cached pages, which can be slow. The delayed reclamation strategy works for typical workloads but fails when a single machine creates and destroys hundreds of containers per minute, leading to tens of thousands of memory cgroup entries and seconds‑long reads of
memory.stat. This also caused high dockerd load, slow kubelet PLEG, and nodes becoming UnReady.
Issue 2: Resource Utilization
The CNI network mode reserves a large portion of pod slots for cronjob pods, many of which run for only a few seconds and consume minimal resources, resulting in substantial idle capacity.
Other Issues: Scheduling Speed and Service Isolation
At peak times (e.g., midnight), thousands of jobs need to start simultaneously. The default Kubernetes scheduler processes pod placement serially, taking minutes to schedule all jobs, which is unacceptable for workloads requiring sub‑second precision. Additionally, CPU‑ or I/O‑intensive pods can interfere with normal services due to incomplete cgroup isolation.
Using Serverless in the K8s Cluster
To achieve stronger isolation, finer‑grained node control, and faster scheduling for cronjob workloads, a serverless solution was adopted. Virtual nodes were introduced, allowing pods to run on serverless nodes with the same security isolation and network connectivity as regular nodes, but without reserved resources and with pay‑per‑use billing.
Job Scheduler
All cronjob workloads now use a custom job scheduler that dispatches pods to serverless nodes in parallel, achieving millisecond‑level scheduling and falling back to regular nodes if serverless resources are insufficient.
Bridging Differences Between Serverless and Regular Pods
Key integration points include:
Unified Log Collection: Since virtual nodes cannot run DaemonSets, a custom log consumer aggregates logs from various cloud‑provider log services, normalizes them, and forwards them to a shared Kafka cluster.
Unified Monitoring and Alerting: Serverless pods expose the same Prometheus metrics (CPU, memory, disk, network) as regular pods, ensuring consistent observability.
Improving Startup Performance
Serverless jobs require second‑level startup to meet strict timing constraints. The main latency sources are sandbox creation/initialization and pulling the business image. By reusing sandboxes for identical workloads, the first launch may be slower, but subsequent launches achieve near‑instant startup.
Conclusion
Through a custom job scheduler, isolation of serverless pods from regular pods, and performance optimizations for serverless pod startup, the migration to serverless was transparent to developers. The approach eliminated the need to reserve resources for cronjobs, freeing roughly 10% of cluster capacity (tens of thousands of pods) and cutting cronjob resource costs by about 70%, while also resolving node instability caused by excessive pod churn.
Zuoyebang Tech Team
Sharing technical practices from Zuoyebang
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.