Artificial Intelligence 25 min read

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant employs a large‑language‑model‑driven multi‑agent architecture with dynamic ReAct planning, enabling merchants to query and execute store operations in under a second with over 90 % decision accuracy, while reducing inference cost, hallucinations, and engineering effort across diverse e‑commerce tasks.

JD Retail Technology

Feb 10, 2025

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant is designed to address a wide range of challenges faced by e‑commerce merchants, offering 24/7 operational support that can respond within one second. Merchants interact with the assistant using natural language to obtain information about store performance, query business rules, or execute quick functions.

The assistant’s algorithmic foundation is built on a large language model (LLM) based Multi‑Agent system , which simulates the collaborative workflow of a real‑world merchant team. By employing dynamic planning and coordination among multiple agents, the system handles the entire merchant workflow—from product publishing and order management to customer service and data analysis—providing capabilities such as sales forecasting, marketing placement, pricing, and keyword recommendation.

Current multi‑agent collaboration technology achieves a decision‑making accuracy of over 90 % and response times at the second level, enabling a faster, better‑serviced, and cost‑effective merchant experience.

Architecture Evolution

Stage 1: B‑mall ticket auto‑reply – LLM combined with RAG for knowledge‑base answering, without tool invocation.

Stage 2: JD招商站 – a single agent handles knowledge‑base Q&A and tool calls, but suffers from low accuracy, hallucinations, and poor scenario differentiation.

Stage 3: JD Mai Intelligent Assistant – introduces a master‑plus‑sub‑agents multi‑agent architecture, partitioning problems to significantly improve accuracy.

Multi‑Agent System Design

The system serves as a generic, open host for various merchant services (e.g., sales prediction, marketing, pricing). It can integrate external tools, agents, or APIs at different development stages.

2.1 Agent Construction – ReAct Paradigm Integration

Four model types are integrated to empower the agent’s reverse‑planning capability:

LLM: interprets the problem and defines the ultimate goal.

Embedding: quickly matches the goal to the appropriate tool, reducing LLM prompt length and hallucinations.

Tools DAG: performs multi‑path reverse reasoning and extracts parameters for precise scheduling.

Operations Optimization: theoretically accelerates solving, pending empirical validation.

2.1.2 ReAct Dynamic Planning Updates

During forward execution, the ReAct paradigm updates the plan dynamically at each step based on execution results.

Technical Benefits

Improved planning efficiency and reduced inference cost by orchestrating multiple smaller models instead of a single massive model.

Enhanced architectural stability and controllable risk through task decomposition.

Mitigated LLM hallucinations via embeddings and Tools DAG, boosting planning quality (≈10 % higher tool‑call accuracy).

Reduced sample engineering effort for LLMs, increasing system extensibility and maintenance efficiency by over 60 %.

Real‑time adjustment and optimization of execution chains.

2.2 Multi‑Agent Online Inference

2.2.1 Technical Features

Task‑layered dynamic planning and distributed collaboration: Master Agent performs high‑level task decomposition, delegating sub‑tasks to Sub‑Agents.

Standard communication protocol ensures efficient coordination, multi‑step linking, and global chain‑of‑thought planning.

2.2.2 Demonstration

A video showcases the online collaborative inference process, linking the front‑end assistant UI with the back‑end multi‑agent inference service.

2.2.3 Architecture Summary

Key points: lower inference difficulty by converting full‑chain multi‑step planning into next‑task prediction; reduced cost via cooperative small models; faster iteration and issue localization.

Remaining challenges include longer response times for complex queries and error accumulation across chained agents, prompting research into multi‑agent joint learning.

2.3 Agent Full‑Chain ReAct Evaluation

Two evaluation dimensions are defined:

Global evaluation: weighted scoring of each agent after task decomposition and scheduling to compute overall system performance.

Local evaluation: Reward Model assesses thought/action/observation cycles of each agent to identify bottlenecks and suggest optimizations.

2.3.2 Diverse Reward Models

Business‑customizable rule functions/reward models for flexible evaluation.

Utilization of state‑of‑the‑art large models for generic assessment.

Training dedicated reward models to improve task‑specific evaluation.

The following code snippets illustrate automated evaluation prompts and data formats used in the assessment pipeline:

输入总结模型的目标是针对用户历史的会话记录与本轮的提问分析其具体意图，作为Master Agent的思考的核心环节，需要对其意图总结效果进行评价。

1、自动化评价方案：

1.评价方法：以高阶模型（例如：GPT-4o）作为裁判模型，结合用户当前轮次提问与历史会话，对线上推理的准确性进行评价。

2.自动化评分指令（简化）：

你是一个擅长问题意图理解的专家。现在需要你评估一个电商平台AI助手对于商家用户提问的意图理解质量，并要求你从以下维度对回答进行评估，评分为0-10分，分数必须是整数:1.正确性:意图是否正确表达出用户当前的问题；2.关联性:当前问题的意图可能和历史对话强关联，也可能无关，判断助手理解的意图是否正确关联历史对话。

...（省略其余代码块）

2.4 LLM Offline‑Online Sample Augmentation

2.4.1 Automated Offline Sample Generation and Expansion – Standardized business data is ingested to automatically generate and scale high‑quality training samples for LLMs.

2.4.2 Automated Online Inference Annotation and Sample Accumulation – Reward Model strategies continuously label and accumulate samples generated during online inference, enhancing the model’s capabilities.

Outlook

The multi‑agent collaborative approach demonstrated by the JD Merchant Intelligent Assistant provides a reproducible pattern for tackling complex distributed interaction systems in industry, offering:

Generality and replicability across sectors.

Industry‑level benchmark status, recognized by InfoQ’s 2024 AI Agent Application Research Report, Top 100 Global Software Cases, and China Tech Industry Best Practice awards.

The JD Retail Data Algorithm team, responsible for building large‑model agents (SFT, RLHF, RAG, KAG, Multi‑Agent, Self‑Reflection, Distillation), invites passionate innovators to join. Resume submission email: [email protected].

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM ReAct evaluation multi-agent

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.