Artificial Intelligence 12 min read

AIAPI: Baidu's AI-Native Retrieval System for Large Language Model Applications

AIAPI, Baidu’s AI‑native retrieval platform for large language models, tackles hallucination, slow domain updates, and output opacity by delivering authoritative, timely, full‑content data through a dual‑channel architecture that combines traditional search and RAG, employs reusable ranking, graph‑enhanced data layers, dynamic caching that cuts storage by 70 %, and QueryPlan‑based QoS, achieving markedly higher retrieval quality and a 34 % speed gain with Wenxin 4.0.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
AIAPI: Baidu's AI-Native Retrieval System for Large Language Model Applications

This paper introduces AIAPI, a retrieval system designed to provide AI-native capabilities for large language models (LLMs). The system addresses challenges in retrieval-augmented generation (RAG) scenarios, such as hallucination, slow knowledge updates in specialized domains, and lack of transparency in model outputs.

The authors identify key requirements for AIAPI: high-quality, authoritative, and timely data; complete content retrieval rather than fragmented snippets; structured and interpretable interfaces for better model understanding; low latency to minimize first-token delays; and cost control for multi-round generation processes.

The system architecture employs a dual-channel approach that simultaneously supports traditional search (P requests) and RAG-enhanced retrieval (R requests). This design allows for query plan optimization, resource sharing, and cost reduction while maintaining high retrieval quality. The architecture includes three main layers: a reusable recall and ranking layer, an expanded data layer with graph engine capabilities for traffic isolation and customized processing, and a split presentation layer for different traffic types.

Key innovations include complete content acquisition strategies that decouple from presentation logic, dynamic concurrent processing for content customization, and optimized caching schemes that reduce storage costs by approximately 70%. The system provides multi-level QoS control through QueryPlan-based API capabilities, allowing users to select appropriate service levels based on their needs.

Evaluation using Baidu's Wenxin 4.0 model demonstrates significant improvements in retrieval quality and a 34% speed improvement compared to standard product search interfaces. The paper concludes by discussing ongoing challenges in RAG systems and future optimization directions as LLMs continue to evolve.

Large Language ModelsRAGcost optimizationRetrieval-Augmented GenerationQuery PlanningSearch ArchitectureAI-Native SystemsAIAPI
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.