Deep Learning Model Architecture Evolution in Baidu Search
The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.
This article provides an in-depth look at the deep learning model architecture evolution in Baidu Search, focusing on the work of the Model Architecture Group within Baidu's Search Architecture Department. The group is dedicated to bringing the latest artificial intelligence technologies to hundreds of millions of Baidu users at lower costs.
The article begins by explaining how deep learning plays a crucial role in search functionality, enabling precise answers to queries like river lengths rather than just returning webpage links. It also covers image-based queries where users can ask about image content.
The evolution of search architecture is discussed, highlighting the transition from traditional inverted index retrieval to semantic indexing. Semantic indexing uses query embedding representations mapped to vector spaces (128/256 dimensions) where semantically similar content is closer together, providing more relevant results to users.
The article contrasts search models with recommendation models. Search models typically use transformer structures with text features and word tables under 200,000, requiring deep models and significant computation. Recommendation models involve large user and material features with word tables in the terabyte range, characterized by wide but shallow structures.
Key characteristics of search models are outlined: text/image to embedding conversion, query-url/title/content processing, deep models, offline pre-training with online multi-stage estimation, and computation-intensive operations requiring heterogeneous hardware.
The article details the application of deep learning models in semantic retrieval paths, including both offline and online components. Offline processing handles large-scale text embedding storage, while online processing uses models like ERNIE to convert user queries into vectors for comparison with stored vectors.
The search online inference system is described, covering three main model categories: demand analysis/query rewriting, relevance/ranking, and classification. The system handles real-time query processing and returns results through a multi-stage pipeline including caching, dynamic batching, user-defined preprocessing, and estimation queues.
Model optimization practices are discussed, addressing bottlenecks in GPU models including I/O, CPU, and GPU limitations. Optimization strategies cover training optimization (data reading, framework scheduling, kernel fusion/development, model implementation equivalence replacement), inference optimization (GPU/CPU load balancing, model structure pruning), and model miniaturization (distillation, quantization, pruning).
The article concludes by emphasizing the challenging and meaningful work of the Model Architecture Group in bringing AI technologies to users efficiently, and invites interested candidates to join the team.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.