Artificial Intelligence 12 min read

AIAPI: Baidu's AI-Native Retrieval System for Large Language Model Applications

AIAPI, Baidu’s AI‑native retrieval platform for large language models, tackles hallucination, slow domain updates, and output opacity by delivering authoritative, timely, full‑content data through a dual‑channel architecture that combines traditional search and RAG, employs reusable ranking, graph‑enhanced data layers, dynamic caching that cuts storage by 70 %, and QueryPlan‑based QoS, achieving markedly higher retrieval quality and a 34 % speed gain with Wenxin 4.0.

Baidu Geek Talk

Dec 16, 2024

AIAPI: Baidu's AI-Native Retrieval System for Large Language Model Applications

This paper introduces AIAPI, a retrieval system designed to provide AI-native capabilities for large language models (LLMs). The system addresses challenges in retrieval-augmented generation (RAG) scenarios, such as hallucination, slow knowledge updates in specialized domains, and lack of transparency in model outputs.

The authors identify key requirements for AIAPI: high-quality, authoritative, and timely data; complete content retrieval rather than fragmented snippets; structured and interpretable interfaces for better model understanding; low latency to minimize first-token delays; and cost control for multi-round generation processes.

The system architecture employs a dual-channel approach that simultaneously supports traditional search (P requests) and RAG-enhanced retrieval (R requests). This design allows for query plan optimization, resource sharing, and cost reduction while maintaining high retrieval quality. The architecture includes three main layers: a reusable recall and ranking layer, an expanded data layer with graph engine capabilities for traffic isolation and customized processing, and a split presentation layer for different traffic types.

Key innovations include complete content acquisition strategies that decouple from presentation logic, dynamic concurrent processing for content customization, and optimized caching schemes that reduce storage costs by approximately 70%. The system provides multi-level QoS control through QueryPlan-based API capabilities, allowing users to select appropriate service levels based on their needs.

Evaluation using Baidu's Wenxin 4.0 model demonstrates significant improvements in retrieval quality and a 34% speed improvement compared to standard product search interfaces. The paper concludes by discussing ongoing challenges in RAG systems and future optimization directions as LLMs continue to evolve.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models RAG Retrieval-Augmented Generation Query Planning Search Architecture AI-Native Systems AIAPI

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.