Artificial Intelligence 14 min read

Deep Learning Model Architecture Evolution in Baidu Search

The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.

Baidu Geek Talk

Nov 9, 2023

This article provides an in-depth look at the deep learning model architecture evolution in Baidu Search, focusing on the work of the Model Architecture Group within Baidu's Search Architecture Department. The group is dedicated to bringing the latest artificial intelligence technologies to hundreds of millions of Baidu users at lower costs.

The article begins by explaining how deep learning plays a crucial role in search functionality, enabling precise answers to queries like river lengths rather than just returning webpage links. It also covers image-based queries where users can ask about image content.

The evolution of search architecture is discussed, highlighting the transition from traditional inverted index retrieval to semantic indexing. Semantic indexing uses query embedding representations mapped to vector spaces (128/256 dimensions) where semantically similar content is closer together, providing more relevant results to users.

The article contrasts search models with recommendation models. Search models typically use transformer structures with text features and word tables under 200,000, requiring deep models and significant computation. Recommendation models involve large user and material features with word tables in the terabyte range, characterized by wide but shallow structures.

Key characteristics of search models are outlined: text/image to embedding conversion, query-url/title/content processing, deep models, offline pre-training with online multi-stage estimation, and computation-intensive operations requiring heterogeneous hardware.

The article details the application of deep learning models in semantic retrieval paths, including both offline and online components. Offline processing handles large-scale text embedding storage, while online processing uses models like ERNIE to convert user queries into vectors for comparison with stored vectors.

The search online inference system is described, covering three main model categories: demand analysis/query rewriting, relevance/ranking, and classification. The system handles real-time query processing and returns results through a multi-stage pipeline including caching, dynamic batching, user-defined preprocessing, and estimation queues.

Model optimization practices are discussed, addressing bottlenecks in GPU models including I/O, CPU, and GPU limitations. Optimization strategies cover training optimization (data reading, framework scheduling, kernel fusion/development, model implementation equivalence replacement), inference optimization (GPU/CPU load balancing, model structure pruning), and model miniaturization (distillation, quantization, pruning).

The article concludes by emphasizing the challenging and meaningful work of the Model Architecture Group in bringing AI technologies to users efficiently, and invites interested candidates to join the team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization quantization GPU inference pruning Search Architecture model distillation semantic retrieval Ernie transformer models

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.