Survey of Popular AI Agent Frameworks and Their Architectures
The article surveys modern open‑source AI agent frameworks, defining agents as autonomous perception‑planning‑action systems, outlining core modules (inference, memory, tools, action), comparing single‑agent designs like BabyAGI and AutoGPT with multi‑agent platforms such as MetaGPT and AutoGen, and discussing their benefits, trade‑offs, and future research directions.
This article provides a comprehensive overview of modern AI agent frameworks, summarizing their core concepts, architectural components, and representative implementations.
Agent Definition : An agent is an autonomous entity that perceives its environment, makes decisions, and takes actions to achieve a goal, thereby reducing human workload and communication costs.
Background : The authors surveyed mainstream open‑source agent frameworks, selecting 19 representative agents that cover a wide range of designs.
Agent Basics : The decision loop follows a perception‑planning‑action cycle (P→P→A). Perception gathers information, planning decides the next step, and action executes it. The policy module determines actions, while observations feed back into perception, forming a closed learning loop.
Core Modules : In practice, agents can be decomposed into four modules: inference, memory, tools, and action.
Decision Models :
Traditional ReAct (Reason and Act): combines few‑shot prompting, thought, action, and observation.
Plan‑and‑Execute ReAct: splits complex tasks into subtasks, reducing the number of LLM calls.
Example code snippets from the ReAct prompt:
task_creation_agent
You are a task‑creation AI, using the results of execution agents to create new tasks. ...Single‑Agent Frameworks :
BabyAGI: task decomposition, prioritization, and execution.
AutoGPT: personal assistant that integrates external tools (search, browsing, file operations).
HuggingGPT: selects and orchestrates multiple HuggingFace models for multi‑modal tasks.
GPT‑Engineer: code generation and project scaffolding.
Samantha, AppAgent, OS‑Copilot, etc., each focusing on multimodal perception or OS‑level automation.
Multi‑Agent Frameworks :
MetaGPT: simulates a software company with roles such as product manager, architect, and engineer.
AutoGen: flexible orchestration of multiple agents (LLM, human, tool) for complex workflows.
CrewAI, AgentScope, TaskWeaver, etc., each offering different collaboration patterns, SOP definitions, and monitoring capabilities.
Advantages of Multi‑Agent Systems :
Multiple perspectives for problem analysis.
Decomposition of complex tasks into specialized sub‑agents.
Higher controllability and extensibility (open‑closed principle).
Potentially faster problem solving through parallelism.
Challenges :
Increased cost and latency.
Complex communication and higher development effort.
For simple problems, a single agent may suffice.
Future Directions include improving single‑agent architectures (CoT → XoT, long‑term memory), adding multimodal perception, self‑reflection, and scaling to distributed deployments with monitoring, RAG integration, and benchmark suites.
References list key papers and repositories such as ReAct, Plan‑and‑Execute, LLMCompiler, and various framework GitHub links.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.