Cloud Computing 16 min read

Serverless Transformation of Baidu Search Middle Platform: Architecture, Challenges, and Benefits

This article details how Baidu's search middle platform migrated from script‑based processing to a serverless business‑framework architecture, outlining the technical challenges, design of data ingestion, processing, scheduling, and control layers, and summarizing the efficiency, cost, and performance gains achieved.

DataFunSummit
DataFunSummit
DataFunSummit
Serverless Transformation of Baidu Search Middle Platform: Architecture, Challenges, and Benefits

Baidu's search middle platform handles billions of daily queries, providing personalized content cards such as weather forecasts. Historically, content processing relied on a large number of ad‑hoc scripts that were difficult to maintain and scale.

In 2020, the team adopted a serverless philosophy, creating the vs‑lambda framework to shift focus to function development, reducing development cycles from weeks to hours and cutting maintenance costs. This shift also enabled over 90% cost savings in typical scenarios.

The platform's evolution can be divided into three stages: the "script Warring States" era, the business‑framework era, and the serverless era. The script era suffered from limited customization and low throughput. The business‑framework era introduced a unified framework where business code is submitted to a task platform and processed via a data gateway, improving isolation, reusability, and reducing the monolithic processing module.

Serverless introduced new technical challenges, including low‑cost user onboarding, stability under normal and abnormal conditions, dynamic resource scheduling, and efficient troubleshooting of transient logs. To address these, the system was refactored into four layers—data ingestion, data audit, data processing, and data indexing—connected by Kafka, supporting petabyte‑scale data and tens of thousands of QPS.

The final architecture consists of three layers: an application layer (self‑service data ingestion, online debugging, cloud monitoring, and rollback capabilities), a service layer (FaaS for custom functions, BaaS for internal services, and complex function orchestration), and a control layer (intelligent scheduling for auto‑scaling, cold‑start handling, hotspot migration, and resource adaptation).

Additional auxiliary systems—trace, reporting, and monitoring—provide observability and simplify troubleshooting, while automated diff testing ensures safe deployments.

Benefits of the serverless transformation include a 90% reduction in learning, usage, and maintenance costs, an 80% decrease in resource expenses through on‑demand scaling, and a 3‑5× boost in operator performance. The methodology emphasizes data‑plane reliability, compute‑plane throughput optimizations, and control‑plane maintainability, always driven by business requirements.

serverlessArchitectureCloud ComputingData Processingscalabilitysearch platform
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.