Baidu Search Vertical Offline Computing System Architecture Evolution
Baidu's search vertical offline computing system evolved through four stages—from a fragmented pre‑2018 processing setup to a unified business framework, then serverless functions, and finally a data‑intelligent architecture with multi‑layer abstraction, graph and multi‑language engines, achieving 5‑10× efficiency gains and dramatically reducing failures.
This article traces the evolution of Baidu's search vertical offline computing system, detailing the challenges encountered and corresponding solutions throughout its development. The architecture evolution follows the principle of "there is no best architecture, only the most suitable architecture," providing appropriate solutions for different stages of development.
Background: In the past, search results from "Baidu yixia" were fetched from the internet as "natural results." As network information became richer, natural results could not effectively meet user needs. To solve this problem, vertical search solutions were developed to provide higher quality content and a better user experience.
Four Evolution Stages:
a. Original Offline Processing System (Pre-2018): The focus was on building the business processing entry from scratch. There was no complete framework system, and development costs were high. All business logic was mixed in a common service, with different data calling different strategy logic through configurations.
b. Business Processing Architecture: A complete business service processing framework was formed, with unified service framework and development stages, implementing cloud-native and service isolation modes. This solved efficiency issues (business conflicts during development and deployment) and stability problems (poor isolation causing cascading failures).
c. Serverless Architecture: Business access efficiency was further improved. Businesses shifted from managing services to managing processing functions. Businesses could quickly test and launch by simply registering functions, with automatic scaling of container instances, greatly improving efficiency while reducing resource costs.
d. Data Intelligent Architecture: Based on the existing service deployment, data management was implemented. Function management was upgraded to requirement management. On top of multi-language service framework support, costs were further reduced and efficiency improved. The four-layer architecture (application, logic, service, control) was established, achieving a complete transformation from imperative to declarative computing.
Core Design Highlights:
The Data Intelligent Architecture implements multi-layer abstraction and layered reuse. The computation engine includes a graph computation engine (DAG-based) and multi-language operator execution engine (supporting Python, GoLang, C++). The intelligent control system uses a two-stage rule engine that can automatically analyze and recover from 95% of anomalies, reducing core failures by 60%.
The new computing engine achieves 5-10x efficiency improvement for legacy business migrations while further improving business development efficiency through specialized business framework construction.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.