Artificial Intelligence 36 min read

Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies

This comprehensive guide walks you through the fundamentals of Retrieval‑Augmented Generation (RAG) and AI agents, explains their inner workings, shares optimization tricks, provides ready‑to‑run code snippets, and demonstrates how to evaluate performance with metrics such as recall, faithfulness, and answer relevance.

Tencent Technical Engineering

Jun 16, 2025

Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies

1 AI Memory Keeper and Knowledge Guide

1.1 Hi, I’m RAG! I decide not to let AI "hallucinate"

When AI needs reliable knowledge instead of fabricating answers, that’s when I shine. I am Retrieval‑Augmented Generation (RAG), a framework that first queries a knowledge base for relevant documents and then feeds those pieces to a large language model (LLM) so the final answer is grounded in facts.

1.2 My Core Skill: Powerful "Dictionary" Ability

I combine a retrieval system with a generative AI model . When a user asks a question, I turn it into a query vector, search the knowledge base, retrieve the most relevant snippets, and give them to the LLM as context – essentially providing the model with a private library and an efficient librarian.

1.3 My Growth Path: From Rookie to Expert

Initially my recall was only 25.4% and faithfulness 27.2%. I improved by:

Knowledge structuring : converting unstructured text into clean, structured data.

Knowledge update mechanisms : periodic refreshes and automated quality checks.

Redundancy elimination : removing duplicate or outdated entries.

1.4 Document Parsing (PDF)

Good parsing is crucial; otherwise retrieved content will be noisy. Open‑source tools such as MinerU , Marker , MarkItDown , and Docling are compared, with MinerU highlighted for its Chinese support and complete documentation.

2 Definition and History of Agents

2.1 OpenAI’s AGI Five‑Level Classification of Agents

Level 1: Conversational AI – can chat but not act. Level 2: Reasoners – can solve academic problems. Level 3: Agents – can plan and execute tasks autonomously. Level 4: Innovators – can generate novel ideas. Level 5: Organizers – can manage entire organizations.

2.2 Core Definition of an Agent

An agent perceives (senses), decides (reasoning), acts (execution), and learns (memory). It can be thought of as a digital all‑rounder that fetches information, makes decisions, and performs actions.

3 Principles of Large‑Model‑Based Agents

3.1 Planning

Agents break complex goals into sub‑goals. Common techniques include prompting the LLM with "Steps for X" or using explicit planning languages such as PDDL (Planning Domain Definition Language). The workflow typically involves translating the user query to PDDL, calling a classical planner, and translating the plan back to natural language.

3.2 Memory

Short‑term memory is the context window of the LLM; long‑term memory is realized via external vector stores (e.g., Chroma, FAISS, HNSW) that can be queried later. Human‑like memory types (sensory, short‑term, long‑term, explicit vs. implicit) are discussed.

3.3 Tool Use

Agents can call external tools (APIs, code interpreters, search engines). Frameworks such as ReAct combine reasoning and action, while Reflexion adds self‑reflection and dynamic memory. Tool‑use agents include MRKL systems, CRITIC, code‑generation agents, and observation‑based agents.

3.4 Autonomous vs. Workflow Systems

Workflow systems follow a predefined sequence of LLM‑tool calls, suitable for well‑defined tasks. Autonomous agents let the LLM decide dynamically which tool to use and when, enabling flexible, long‑horizon planning.

4 Agent Classifications

4.1 By Quantity

Single‑Agent: one agent handles the whole task. Multi‑Agent: multiple specialized agents collaborate, exchange information, and make collective decisions.

4.2 By Behavior Pattern

Tool‑Use Agents (e.g., MRKL, CRITIC), Code‑Generation Agents, Observation‑Based Agents, RAG Agents, etc.

5 Agent Evaluation Frameworks

AgentBench evaluates agents across eight real‑world scenarios grouped into three categories: Coding (code generation, database queries), Gaming (card games, puzzles), and Web (shopping, browsing). Metrics such as success rate are reported.

6 Practical Implementations

(1) Simple OpenAI‑based Agent

from openai import OpenAI
import json
client = OpenAI(api_key="sk‑api_key")
# Define a tool to get stock price
tools = [{"type": "function", "function": {"name": "get_stock_price", "description": "Get the current price of a given stock.", "parameters": {"type": "object", "properties": {"symbol": {"type": "string", "description": "Stock symbol e.g. AAPL, GOOGL"}}, "required": ["symbol"], "additionalProperties": false}, "strict": true}}]
messages = [{"role": "user", "content": "What's the current price of Apple stock?"}]
completion = client.chat.completions.create(model="gpt-4", messages=messages, tools=tools)
print(completion.choices[0].message.content)

(2) LangChain ReAct Agent

from langchain.chains import LLMMathChain, LLMChain
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.prompts import PromptTemplate
from langchain import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "your_api_key"
llm = OpenAI(temperature=0)
# Translation tool
def translate_text(text, target_language):
    return f"Translated text: {text} (to {target_language})"
translate_tool = Tool(name='Translator', func=lambda x: translate_text(x['text'], x['target_language']), description='Useful for translating text.')
# Weather tool
def get_weather(location):
    return f"Weather in {location}: Sunny, high 75°F, low 60°F."
weather_tool = Tool(name='WeatherChecker', func=lambda x: get_weather(x['location']), description='Useful for getting current weather.')
tools = [translate_tool, weather_tool]
react_agent = initialize_agent(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, tools=tools, llm=llm, verbose=True)
print(react_agent.run("What's the weather like in New York?"))
print(react_agent.run("Translate 'Hello' to Spanish."))

(3) Manus & OpenManus

Manus is a multi‑agent system that first uses a PlanningTool to split a complex task into linear sub‑tasks, then assigns each sub‑task to a specialized agent that follows a ReAct loop (Reason → Act). Progress is recorded in a TODO.md file. OpenManus builds on Manus with a minimal plug‑in architecture, emphasizing prompt‑driven reasoning and tool‑driven actions.

7 References

1. 33号实验室安全左移优化 2. 大模型知识学习笔记 3. RAGAS official documentation 4. RAGAS paper 5. Reference blog

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents LLM prompt engineering RAG evaluation retrieval tool use

Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.