Cloud Computing 16 min read

Serverless Transformation of Baidu Search Middle Platform: Architecture, Challenges, and Benefits

This article details how Baidu's search middle platform migrated from script‑based processing to a serverless business‑framework architecture, outlining the technical challenges, design of data ingestion, processing, scheduling, and control layers, and summarizing the efficiency, cost, and performance gains achieved.

DataFunSummit

Jan 17, 2022

Serverless Transformation of Baidu Search Middle Platform: Architecture, Challenges, and Benefits

Baidu's search middle platform handles billions of daily queries, providing personalized content cards such as weather forecasts. Historically, content processing relied on a large number of ad‑hoc scripts that were difficult to maintain and scale.

In 2020, the team adopted a serverless philosophy, creating the vs‑lambda framework to shift focus to function development, reducing development cycles from weeks to hours and cutting maintenance costs. This shift also enabled over 90% cost savings in typical scenarios.

The platform's evolution can be divided into three stages: the "script Warring States" era, the business‑framework era, and the serverless era. The script era suffered from limited customization and low throughput. The business‑framework era introduced a unified framework where business code is submitted to a task platform and processed via a data gateway, improving isolation, reusability, and reducing the monolithic processing module.

Serverless introduced new technical challenges, including low‑cost user onboarding, stability under normal and abnormal conditions, dynamic resource scheduling, and efficient troubleshooting of transient logs. To address these, the system was refactored into four layers—data ingestion, data audit, data processing, and data indexing—connected by Kafka, supporting petabyte‑scale data and tens of thousands of QPS.

The final architecture consists of three layers: an application layer (self‑service data ingestion, online debugging, cloud monitoring, and rollback capabilities), a service layer (FaaS for custom functions, BaaS for internal services, and complex function orchestration), and a control layer (intelligent scheduling for auto‑scaling, cold‑start handling, hotspot migration, and resource adaptation).

Additional auxiliary systems—trace, reporting, and monitoring—provide observability and simplify troubleshooting, while automated diff testing ensures safe deployments.

Benefits of the serverless transformation include a 90% reduction in learning, usage, and maintenance costs, an 80% decrease in resource expenses through on‑demand scaling, and a 3‑5× boost in operator performance. The methodology emphasizes data‑plane reliability, compute‑plane throughput optimizations, and control‑plane maintainability, always driven by business requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Serverless architecture data-processing Scalability Search Platform

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.