Cloud Native 11 min read

From Double Eleven to Cloud‑Native Capacity: Zheng Yangfei’s Journey and Ant Group’s Autoscaling Innovation

The article chronicles Zheng Yangfei’s rise from a double‑eleven intern to leader of Ant Group’s cloud‑native capacity team, detailing the evolution of large‑scale load‑testing, the challenges of autoscaling in financial‑grade systems, and the team’s shift toward platform‑driven, risk‑aware engineering.

AntTech
AntTech
AntTech
From Double Eleven to Cloud‑Native Capacity: Zheng Yangfei’s Journey and Ant Group’s Autoscaling Innovation

Born in 1992, Zheng Yangfei quickly became a key figure at Ant Group, moving from a server‑scaling intern in 2013 to the leader of the Cloud‑Native Capacity team, and serving as the overall stability head for the company’s flagship Double Eleven shopping festivals.

His early exposure to the massive traffic spikes of Double Eleven forced him to design and execute end‑to‑end load‑testing platforms, evolving from manual “human‑scale” scaling to a fully automated, platform‑based approach that reduced manual effort and improved reliability.

Recognizing the limits of traditional scaling methods, Zheng’s team embarked on building a cloud‑native capacity solution that combines historical trend analysis, real‑time prediction, and multi‑layer autoscaling (HPA/VPA) to allocate resources efficiently while meeting the stringent stability requirements of financial‑grade services.

The architecture consists of a profiling system that leverages big‑data pipelines and machine‑learning models to generate application “portraits,” and an AutoScaler that applies multi‑stage horizontal and vertical scaling, using Service Mesh to accelerate start‑up times and minimize scaling risk.

Beyond the technology, the team redefined the role of the technical risk department, promoting a capability‑based, platform‑centric model where SRE (re‑branded as Site Risk Engineer) is viewed as a set of skills rather than a static position, enabling continuous risk mitigation and higher automation.

To further strengthen the initiative, Ant Group is recruiting engineers with strong foundations in algorithms, distributed systems, and cloud‑native development, offering opportunities to work on large‑scale monitoring, capacity, and risk platforms that support critical events like Double Eleven.

cloud nativeBig Datamachine learningautoscalingSREcapacity-management
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.