AI Agent Deep Dive: Understanding Planning, Memory, Tools, and Action

This article revisits the AI Agent architecture and provides a detailed analysis of its four core components—Planning, Memory, Tools, and Action—covering mainstream planning strategies, memory types, tool specifications, and execution loops, accompanied by concrete LangChain code examples that demonstrate building a fully integrated multi‑component agent.

Coder Trainee
Coder Trainee
Coder Trainee
AI Agent Deep Dive: Understanding Planning, Memory, Tools, and Action

Agent Core Architecture Review

The agent consists of four essential components: Planning, Memory, Tools, and Action. Planning decides what to do, Memory stores context, Tools provide external capabilities, and Action executes the plan.

┌─────────────────────────────────────────────────────────────────┐
│               AI Agent Core Architecture                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────────────────────────────────────────────────┐   │
│   │                     Agent                           │   │
│   │   ┌──────────┐   ┌──────────┐   ┌──────────┐         │   │
│   │   │ Planning │   │ Memory  │   │  Tools   │         │   │
│   │   └────┬─────┘   └────┬─────┘   └────┬─────┘         │   │
│   │        │               │               │               │   │
│   │        └───────────────┼───────────────┘               │   │
│   │                     ▼                                 │   │
│   │               ┌─────────────┐                         │   │
│   │               │   Action   │                         │   │
│   │               └─────────────┘                         │   │
│   └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Planning (Agent's Brain)

Definition

Planning is the decision‑making core responsible for understanding user intent, decomposing complex tasks, deciding execution steps, and selecting appropriate tools.

Common Planning Modes

ReAct (Reason + Act): Thought → Action → Observation loop; suitable for general tasks.

CoT (Chain of Thought): Step‑by‑step reasoning; suited for complex reasoning.

ToT (Tree of Thoughts): Multi‑path exploration with optimal path selection; used for creative tasks.

Plan‑and‑Solve : Plan first, then execute; appropriate for multi‑step tasks.

ReAct Mode Detailed Flow

┌─────────────────────────────────────────────────────────────────┐
│               ReAct Execution Flow                              │
├─────────────────────────────────────────────────────────────────┤
│ User: "Help me check Beijing weather and tell me if I need an umbrella."
│
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Thought: User wants Beijing weather and umbrella advice      │ │
│ │ Action: get_weather                                         │ │
│ │ Action Input: {"city": "北京"}                             │ │
│ │ Observation: 北京今天晴,25°C,湿度45%                     │ │
│ │ Thought: It's sunny, no umbrella needed, but UV is strong   │ │
│ │ Final Answer: 北京今天天气晴朗,25°C,不需要带伞,但建议防晒 │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

ReAct Agent Implementation (LangChain)

# react_agent.py
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import tool
from langchain_openai import ChatOpenAI

@tool
def get_weather(city: str) -> str:
    """获取天气"""
    return f"{city}:晴天,25°C"

@tool
def get_uv_index(city: str) -> str:
    """获取紫外线指数"""
    return f"{city}:紫外线指数 7(强)"

llm = ChatOpenAI(model="gpt-4", temperature=0)

REACT_PROMPT = """
你是一个智能助手,使用 ReAct 模式工作。

可用工具:{tools}
工具名称:{tool_names}

格式:
Thought: 思考
Action: 工具名
Action Input: 参数
Observation: 结果
(重复)
Final Answer: 最终答案

Question: {input}
{agent_scratchpad}
"""

agent = create_react_agent(llm, [get_weather, get_uv_index], REACT_PROMPT)
executor = AgentExecutor(agent=agent, tools=[get_weather, get_uv_index])
result = executor.invoke({"input": "北京天气怎么样?需要防晒吗?"})
print(result["output"])

Memory (Agent's Long‑Term Knowledge)

Memory Types

Short‑Term Memory : Current conversation context stored in a buffer.

Long‑Term Memory : Cross‑session knowledge via a vector database and Retrieval‑Augmented Generation (RAG).

Working Memory : Variables or state machines representing the current task state.

Semantic Memory : Common‑sense facts encoded in the large model parameters.

Short‑Term Memory Example

# memory_agent.py
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory(return_messages=True)
conversation = ConversationChain(llm=llm, memory=memory, verbose=True)

print(conversation.predict(input="我叫张三"))
print(conversation.predict(input="我叫什么名字?"))  # 能记住

Long‑Term Memory with RAG

┌─────────────────────────────────────────────────────────────────┐
│                     RAG Workflow                               │
├─────────────────────────────────────────────────────────────────┤
│ User query → Vectorize → Retrieve similar docs → Enhance Prompt → Generate answer │
│               │               │               │
│               ▼               ▼               ▼
│        ┌──────────┐   ┌──────────┐
│        │ Vector DB│   │ Knowledge │
│        │ (Chroma) │   │ (PDF/MD) │
│        └──────────┘   └──────────┘
└─────────────────────────────────────────────────────────────────┘
# rag_agent.py
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

# 1. Load documents
with open("knowledge.txt") as f:
    documents = f.read()

# 2. Split documents
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_text(documents)

# 3. Vectorize and store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(texts, embeddings)

# 4. Retrieve + generate
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
result = qa_chain.invoke("公司年假怎么计算?")

Tools (Agent's Hands)

Definition

Tools are interfaces that let an agent interact with the external world, such as calling APIs, executing code, operating files, or invoking other services.

Tool Definition Specification (LangChain)

from langchain.tools import tool

@tool
def send_email(recipient: str, subject: str, content: str) -> str:
    """发送邮件到指定邮箱"""
    # 实际调用邮件 API
    return f"邮件已发送至 {recipient}"

Common Tool Types

Search : Google Search, Bing – real‑time information retrieval.

Compute : Calculator, Code Interpreter – numeric calculation or script execution.

Network : HTTP Request, API Call – external service invocation.

File : Read/Write/Delete – local file operations.

Database : SQL Query – data query operations.

Action (Agent's Execution Power)

Action Types

Tool Call : Execute a specific operation (e.g., send email, query database).

Information Output : Return the final answer to the user.

Self‑Reflection : Evaluate the result and decide whether to continue.

Task Decomposition : Break a task into sub‑tasks.

Action Loop Implementation

# action_loop.py
class SimpleAgent:
    def __init__(self, llm, tools, max_iterations=5):
        self.llm = llm
        self.tools = {tool.name: tool for tool in tools}
        self.max_iterations = max_iterations

    def run(self, task: str):
        thought = task
        for i in range(self.max_iterations):
            # 1. Think next step
            action, action_input = self.think(thought)
            # 2. Execute action
            if action == "Final Answer":
                return action_input
            if action in self.tools:
                observation = self.tools[action].run(action_input)
            else:
                observation = f"未知工具: {action}"
            # 3. Update thought
            thought = f"{thought}
Action: {action}
Observation: {observation}"
        return "达到最大迭代次数,未完成任务"

# 使用示例
agent = SimpleAgent(llm, [weather_tool, calculator_tool])
result = agent.run("查北京天气并计算25+36")

Full Practical Multi‑Component Agent

# full_agent.py
import os
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import tool
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# ============ 1. Define tools ============
@tool
def get_weather(city: str) -> str:
    """获取天气"""
    weathers = {"北京": "晴25°C", "上海": "多云22°C"}
    return weathers.get(city, "未知城市")

@tool
def calculate(expr: str) -> str:
    """数学计算"""
    return str(eval(expr))

@tool
def search_knowledge(query: str) -> str:
    """搜索知识库(RAG)"""
    # 模拟向量检索
    return f"关于'{query}'的知识:这是一个重要知识点..."

# ============ 2. Configure memory ============
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# ============ 3. Configure LLM ============
llm = ChatOpenAI(model="gpt-4", temperature=0)

# ============ 4. ReAct Prompt ============
PROMPT = """
你是一个智能助手,有记忆能力,可以使用工具。

对话历史:{chat_history}

可用工具:{tools}
工具名称:{tool_names}

格式:
Thought: 思考
Action: 工具名
Action Input: 参数
Observation: 结果
Final Answer: 最终答案

Question: {input}
{agent_scratchpad}
"""

# ============ 5. Create Agent ============
agent = create_react_agent(llm=llm, tools=[get_weather, calculate, search_knowledge], prompt=PROMPT)
executor = AgentExecutor(agent=agent, tools=[get_weather, calculate, search_knowledge], memory=memory, verbose=True, max_iterations=5)

# ============ 6. Execute ============
if __name__ == "__main__":
    # First round – introduce name
    print(executor.invoke({"input": "我叫张三"}))
    # Second round – recall name
    print(executor.invoke({"input": "我叫什么名字?"}))
    # Third round – use tools
    print(executor.invoke({"input": "北京天气如何?再帮我算 100+200"}))

Core Component Comparison Summary

Planning : decides what to do; input = user intent + state; output = action plan; key techniques = ReAct, CoT, ToT.

Memory : stores information; input = history + current context; output = enriched context; key techniques = buffer memory, vector store (RAG).

Tools : execute operations; input = parameters; output = result; key techniques = API calls, function calling.

Action : carries out execution; input = plan; output = effect; key techniques = code execution, API invocation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ReActLangChainRAGAI agentMemoryToolsPlanning
Coder Trainee
Written by

Coder Trainee

Experienced in Java and Python, we share and learn together. For submissions or collaborations, DM us.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.