Artificial Intelligence 23 min read

How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants

This article explains the evolution of multi‑agent systems for e‑commerce assistants, detailing the ReAct‑based planning framework, hierarchical master‑sub agent collaboration, evaluation methods, and sample‑generation techniques that together improve accuracy, efficiency, and scalability of AI‑driven merchant services.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants

Introduction

The multi‑agent architecture for merchant assistants has evolved through three stages: (1) B‑mall automatic ticket reply using LLM + RAG without tool invocation; (2) JD招商站 with a single agent handling knowledge‑base QA and tool calls, suffering low accuracy and hallucinations; (3) JD‑Mai intelligent assistant introducing a master‑sub‑agent collaborative model that significantly improves accuracy.

The assistant’s algorithmic foundation is a Large Language Model‑based multi‑agent system that mimics real‑world merchant team collaboration, allowing merchants to interact via natural language to obtain 24/7 operational support.

1. Mapping Real‑World Merchant Operations to Multi‑Agent Algorithm Space

The design motivation is to simulate human problem‑solving processes with agents. First, the real‑world merchant and team operations are described, then a role‑mapping to the AI world is performed.

2. Key Technologies of Multi‑Agent Planning

2.1 Agent Construction: ReAct Paradigm Multi‑Model Integration

LLM: interprets the problem, extracts the ultimate goal, guides reverse planning, and validates the tool‑call chain.

Embedding: quickly matches the goal node to tools, avoiding lengthy prompts and hallucinated tool selection.

Tools DAG: performs multi‑path reverse reasoning, extracting parameters for precise scheduling.

Operations Optimization: theoretically accelerates solving and improves reverse‑planning efficiency (pending empirical validation).

ReAct enables dynamic updates: each step of forward execution triggers a planning update based on the observed result.

2.2 Multi‑Agent Online Inference

2.2.1 Technical Features

Hierarchical dynamic planning and distributed collaboration based on the ReAct paradigm, with a Master Agent coordinating sub‑agents.

Master Agent decomposes complex scenarios into independent sub‑tasks and dispatches them to Sub Agents.

Sub Agents execute assigned tasks, supporting distributed scheduling and cooperation.

Standard communication protocol ensures efficient multi‑agent coordination, multi‑step linking, and global chain‑of‑thought planning.

2.2.2 Demonstration

A video demonstrates the online collaborative inference process, showing how the front‑end assistant UI maps to the back‑end multi‑agent inference service.

2.2.3 Architectural Summary

Low inference difficulty: transforms large‑model multi‑step planning into next‑task prediction.

Low cost: multiple small models cooperate, reducing training and deployment expenses.

Fast iteration: rapid problem localization enables quick model updates.

Open challenges include long response time for complex queries, error accumulation in chained reasoning, and the need for multi‑agent joint learning to mitigate risks. Compared with single‑agent or LLM‑MoE architectures, the multi‑agent design offers higher stability for complex business scenarios at the cost of increased engineering effort.

2.3 Agent Full‑Link ReAct Evaluation

Global evaluation: decomposes tasks and schedules them, assigning weighted scores to each agent to compute overall system efficiency.

Local evaluation: uses a Reward Model to assess thought/action/observation cycles, identifying bottlenecks and suggesting optimizations.

Diverse Reward Models: business‑customizable rules, existing high‑level LLMs for general evaluation, and trained reward models for task‑specific assessment.

2.4 LLM Offline/Online Sample Enhancement

Automated offline sample generation by standardizing business data, enabling rapid creation of high‑quality training data for various scenarios.

Automated online inference labeling and sample accumulation using multiple Reward Model strategies, continuously expanding and refining the sample library to improve online inference capability.

References

Step 1: Caller initiates request. Only the user can call the Master Agent; domain agents can be called by the Master or other domain agents. Step 2: Agent performs Planning/Reasoning, retrieving conversation history from Memory. Step 3: Reasoning generates

thought

(natural‑language description of the goal) and

action_code

(structured list of tasks). Step 4: Agent executes tool calls defined in

action_code

. Step 5: Called tool returns results. Step 6: Agent writes ReAct information to Memory and logs. Step 7: Agent responds based on the trust mode (direct response or further ReAct cycles).

e-commerceLLMReactMulti-AgentAgent ArchitectureAI planning
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.