Artificial Intelligence 14 min read

Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

This article details how PingCAP's three‑person AI Lab leveraged Retrieval‑Augmented Generation (RAG) techniques—including basic RAG, fine‑tuned embeddings, re‑ranking, graph RAG, and agent‑based RAG—to create scalable, multilingual document‑question answering services while addressing large‑scale documentation challenges, model limitations, and user feedback loops.

DataFunSummit

Oct 18, 2024

Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

PingCAP’s AI Lab, staffed by fewer than three engineers, shares its experience building Retrieval‑Augmented Generation (RAG) applications for the extensive TiDB documentation corpus, which exceeds 15 K documents across multiple languages.

Business challenges : Users cannot manually read all documentation, leading to incomplete knowledge and long support response times, especially for the growing overseas community.

Basic RAG : Utilizes large language model (LLM) multi‑turn dialogue to answer queries, but initial OpenAI embeddings lacked multilingual support and produced off‑topic results, prompting the need for model adjustments.

Only‑Answer‑TiDB : Implements toxicity detection to filter non‑TiDB content, ensuring the LLM responds solely to relevant database questions.

Embedding Model Fine‑tuning : Because OpenAI embeddings are English‑only, the team fine‑tuned a multilingual embedding model using the GenQ method, generating chunk‑question pairs for training with MultipleNegativesRankingLoss and augmenting negative samples automatically.

ReRank : Addresses low similarity ranking by storing frequently used QA pairs in a vector database and performing dual‑space retrieval, selecting the top‑10 results for further LLM processing.

Graph RAG : Constructs a knowledge graph from both entities and chunk summaries, enabling richer context retrieval and visualisation of document relationships.

Agent RAG : Introduces a multi‑agent pipeline (Planner, Engineer & Executor, Critic) built on Microsoft AutoGen with FSM support, allowing complex diagnostic workflows and API‑driven reasoning.

Application diagram : Shows the evolution from basic RAG to optimized pipelines, incorporating TiDB relevance checks, fine‑tuned embeddings, re‑ranking, knowledge graphs, and agents to improve answer accuracy and reduce user dislike rates from 34 % to under 3 %.

Q&A : Discusses handling of image data, title‑based chunking, and the importance of using open‑source tools like LlamaIndex for robust document processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM RAG fine-tuning Agent Embedding

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.