Artificial Intelligence 14 min read

Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

This article details how PingCAP's three‑person AI Lab leveraged Retrieval‑Augmented Generation (RAG) techniques—including basic RAG, fine‑tuned embeddings, re‑ranking, graph RAG, and agent‑based RAG—to create scalable, multilingual document‑question answering services while addressing large‑scale documentation challenges, model limitations, and user feedback loops.

DataFunSummit
DataFunSummit
DataFunSummit
Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

PingCAP’s AI Lab, staffed by fewer than three engineers, shares its experience building Retrieval‑Augmented Generation (RAG) applications for the extensive TiDB documentation corpus, which exceeds 15 K documents across multiple languages.

Business challenges : Users cannot manually read all documentation, leading to incomplete knowledge and long support response times, especially for the growing overseas community.

Basic RAG : Utilizes large language model (LLM) multi‑turn dialogue to answer queries, but initial OpenAI embeddings lacked multilingual support and produced off‑topic results, prompting the need for model adjustments.

Only‑Answer‑TiDB : Implements toxicity detection to filter non‑TiDB content, ensuring the LLM responds solely to relevant database questions.

Embedding Model Fine‑tuning : Because OpenAI embeddings are English‑only, the team fine‑tuned a multilingual embedding model using the GenQ method, generating chunk‑question pairs for training with MultipleNegativesRankingLoss and augmenting negative samples automatically.

ReRank : Addresses low similarity ranking by storing frequently used QA pairs in a vector database and performing dual‑space retrieval, selecting the top‑10 results for further LLM processing.

Graph RAG : Constructs a knowledge graph from both entities and chunk summaries, enabling richer context retrieval and visualisation of document relationships.

Agent RAG : Introduces a multi‑agent pipeline (Planner, Engineer & Executor, Critic) built on Microsoft AutoGen with FSM support, allowing complex diagnostic workflows and API‑driven reasoning.

Application diagram : Shows the evolution from basic RAG to optimized pipelines, incorporating TiDB relevance checks, fine‑tuned embeddings, re‑ranking, knowledge graphs, and agents to improve answer accuracy and reduce user dislike rates from 34 % to under 3 %.

Q&A : Discusses handling of image data, title‑based chunking, and the importance of using open‑source tools like LlamaIndex for robust document processing.

LLMRAGFine-tuningagentembeddingknowledge graph
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.