Cloud Native 9 min read

How We Scaled 10,000+ K8s CronJobs with Serverless and Solved Node Instability

This article describes the challenges encountered when migrating tens of thousands of Kubernetes cronjobs from VMs to a cluster—node instability, low resource utilization, and scheduling delays—and explains how introducing a serverless architecture with virtual nodes, a custom job scheduler, unified logging and monitoring, and sandbox reuse restored stability, improved performance, and reduced resource costs by about 70%.

Zuoyebang Tech Team

May 27, 2022

How We Scaled 10,000+ K8s CronJobs with Serverless and Solved Node Instability

Background Introduction

During the cloud‑native container migration at Zuoyebang, scheduled tasks originally running on virtual machines were moved to Kubernetes CronJobs. The system performed well with fewer than 1,000 CronJobs, but problems emerged as the scale grew to tens of thousands.

Problem Discovery

Two main issues were identified: (1) node stability within the cluster, and (2) low resource utilization.

Issue 1: Node Stability

Frequent minute‑level tasks caused rapid pod creation and destruction, resulting in hundreds of containers being created and removed per minute on a single node. This led to excessive cgroup entries, especially memory cgroups that were not reclaimed promptly. The kubelet’s periodic reads of /sys/fs/cgroup/memory/memory.stat slowed down, increasing CPU time in kernel mode and causing noticeable network latency.

Performance profiling ( perf record cat /sys/fs/cgroup/memory/memory.stat and perf report) showed most CPU consumption in memcg_stat_show. The memcg_stat_show function in cgroup‑v1 traverses the memory cgroup tree many times per CPU core; with millions of memory cgroup entries, this became disastrous.

Memory cgroups are not released immediately after container termination because the kernel must walk all cached pages, which can be slow. The delayed reclamation strategy works for typical workloads but fails when a single machine creates and destroys hundreds of containers per minute, leading to tens of thousands of memory cgroup entries and seconds‑long reads of memory.stat. This also caused high dockerd load, slow kubelet PLEG, and nodes becoming UnReady.

Issue 2: Resource Utilization

The CNI network mode reserves a large portion of pod slots for cronjob pods, many of which run for only a few seconds and consume minimal resources, resulting in substantial idle capacity.

Other Issues: Scheduling Speed and Service Isolation

At peak times (e.g., midnight), thousands of jobs need to start simultaneously. The default Kubernetes scheduler processes pod placement serially, taking minutes to schedule all jobs, which is unacceptable for workloads requiring sub‑second precision. Additionally, CPU‑ or I/O‑intensive pods can interfere with normal services due to incomplete cgroup isolation.

Using Serverless in the K8s Cluster

To achieve stronger isolation, finer‑grained node control, and faster scheduling for cronjob workloads, a serverless solution was adopted. Virtual nodes were introduced, allowing pods to run on serverless nodes with the same security isolation and network connectivity as regular nodes, but without reserved resources and with pay‑per‑use billing.

Job Scheduler

All cronjob workloads now use a custom job scheduler that dispatches pods to serverless nodes in parallel, achieving millisecond‑level scheduling and falling back to regular nodes if serverless resources are insufficient.

Bridging Differences Between Serverless and Regular Pods

Key integration points include:

Unified Log Collection: Since virtual nodes cannot run DaemonSets, a custom log consumer aggregates logs from various cloud‑provider log services, normalizes them, and forwards them to a shared Kafka cluster.

Unified Monitoring and Alerting: Serverless pods expose the same Prometheus metrics (CPU, memory, disk, network) as regular pods, ensuring consistent observability.

Improving Startup Performance

Serverless jobs require second‑level startup to meet strict timing constraints. The main latency sources are sandbox creation/initialization and pulling the business image. By reusing sandboxes for identical workloads, the first launch may be slower, but subsequent launches achieve near‑instant startup.

Conclusion

Through a custom job scheduler, isolation of serverless pods from regular pods, and performance optimizations for serverless pod startup, the migration to serverless was transparent to developers. The approach eliminated the need to reserve resources for cronjobs, freeing roughly 10% of cluster capacity (tens of thousands of pods) and cutting cronjob resource costs by about 70%, while also resolving node instability caused by excessive pod churn.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Serverless Kubernetes Resource Optimization CronJob

Written by

Zuoyebang Tech Team

Sharing technical practices from Zuoyebang

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.