Cloud Native 17 min read

How Baidu Feed Achieved Serverless Scaling with Multi‑Dimensional Service Profiles

This article explains how Baidu's Feed recommendation backend adopted a serverless approach, building elastic, traffic, and capacity portraits for each micro‑service to enable predictive, load‑feedback, and timed scaling, thereby reducing resource waste and operational costs in a cloud‑native environment.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
How Baidu Feed Achieved Serverless Scaling with Multi‑Dimensional Service Profiles

Background

In Baidu's cloud‑native environment, the Feed recommendation service consists of many micro‑services that are compute‑heavy, run 24/7 and have fixed capacity, leading to resource waste during traffic fluctuations.

Goal

Build multi‑dimensional, personalized service portraits (elastic, traffic, capacity) and use them for automatic elastic scaling to reduce cost.

Elastic Portrait Construction

Services are classified into high, medium and low elasticity based on instance deployment time, resource quota, statefulness and external dependencies.

High elasticity : stateless, fast scaling.

Medium elasticity : some state, moderate scaling.

Low elasticity : stateful, costly scaling.

Improvements include standard container migration and separating storage from compute.

Traffic Portrait

Traffic is modeled using CPU usage as a proxy for QPS, divided into configurable time‑slices (e.g., hourly). Historical CPU data are smoothed and the maximum K windows per slice are used for prediction.

Capacity Portrait

Peak CPU utilization defines the required CPU buffer; machine‑learning models map QPS and resource usage to latency to determine safe capacity limits.

Elastic Strategies

Three strategies are applied:

Predictive elasticity : forecast traffic for the next slice and pre‑scale.

Load‑feedback elasticity : adjust instances in near‑real‑time based on current load.

Timed elasticity : expand before known peak periods and shrink afterwards.

Priorities: timed > predictive > load‑feedback. Load‑feedback only expands; shrinking is handled by the other two.

Stability Assurance

Periodic inspections (elastic, capacity, status) and one‑click interventions ensure service reliability during rapid scaling.

Serverless has been deployed to Baidu Feed with over 100 k service instances, significantly lowering operating costs and will continue to focus on hotspot capacity guarantees and ML‑enhanced traffic prediction.

Cloud NativeServerlesselastic scalingBackend ServicesService Profiling
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.