Enterprise Kubernetes Migration Practice: Baidu Aifanfan's Journey to Cloud-Native Architecture
Baidu’s Aifanfan product migrated its entire suite to Kubernetes through a two‑phase, 11‑step process that standardized CI/CD, containerization, and traffic routing, enabling deployment of 200 + modules in under an hour, 99.99 % stability, cost‑effective operations, and laying groundwork for multi‑cluster, service‑mesh expansion.
This article presents a comprehensive case study of Baidu's Aifanfan product migrating to Kubernetes and cloud-native architecture. The migration addressed three key business challenges: rapid delivery (deploying 200+ modules within 1 hour, achieving two-week iteration cycles with twice-weekly releases), stability and cost reduction (maintaining 99.99% stability, reducing pvlost during releases, achieving observability for 20,000+ service instances), and ready-to-use capabilities (supporting diverse product deployment scenarios with scalable, reproducible operations).
The collaboration between Aifanfan and Baidu's Infrastructure Department followed a two-phase approach: first, adapting Kubernetes for internal network environments and cluster deployment; second, deploying 100+ modules to Kubernetes and gradually migrating 100% of production traffic by customer dimension.
The migration solution addressed key challenges including efficient deployment of multiple service types through standardized, automated CICD processes. The team implemented unified packaging using archer3 deployment protocol, unified containerization with decoupled Dockerfile templates, and unified application packages using Helm charts. This achieved zero-business-intrusion containerization with average 10.6-minute build and deployment times.
For smooth traffic migration, the team used a unified traffic entry (Access Gateway), gray-scale引流 (gradual traffic routing), progressive full migration, and monitoring with automatic rollback. The 11-step migration process included deploying to staging, verifying with shadow traffic, configuring BFE gateway, and gradually expanding traffic percentage.
Key benefits achieved: iteration speed improvement, rapid monitoring system construction using Prometheus + Grafana, and low-cost infrastructure upgrades including ConfigMap-based configuration management, EFK/Skywalking logging solutions, and Kubernetes-based unified traffic entry. The team completed 150+ application migrations with 2K+ service interface modifications within four months while maintaining 99.99% weekly stability.
Future plans include expanding service types to include Golang-based online communication and script-based scheduled tasks, leveraging Kubernetes native capabilities for service registration/discovery and auto-scaling, multi-region multi-cluster deployment for better customer experience, implementing Service Mesh for comprehensive observability and advanced traffic control, and building Kubernetes-based CICD toolchain platforms.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.