Scaling to Ten‑Thousand QPS: Lessons from Building a Real‑Time Product‑Domain Agent
The article details how the product team tackled AI‑driven challenges by designing a two‑layer, event‑driven Function‑Centric Agent architecture that unifies workflow orchestration and capability supply, enabling real‑time inference for billions of items, cutting development cycles to one person‑week, and boosting search conversion rates.
With the rise of large language models, the product domain needed a sustainable AI foundation beyond offline batch pipelines. The team posed four core questions: the long‑term technical infrastructure needed for AI, how to decompose AI into an evolvable roadmap, whether real‑time inference at SKU granularity is feasible, and how agents can address the long‑standing SKU‑ization trend.
Why an Agent?
Existing paradigms—Prompt Engineering, RAG, fine‑tuning, MaaS, and raw Function Calling—each solve specific problems but suffer from capability fragmentation and poor engineering sustainability in a high‑complexity, standardized domain. An Agent, built on top of Function Calling, naturally integrates these capabilities while adding state management, goal decomposition, and long‑term memory.
Embedded Function Calling enables safe integration with existing systems (e.g., product center IC, category data packs, offline ODPS assets).
RAG is fused via vector memory or knowledge‑base retrieval for context‑aware facts.
Fine‑tuned models can be used as sub‑capabilities without tight coupling.
The Agent surpasses static Prompt Engineering by supporting goal‑driven dynamic reasoning and backtracking.
Technical Landscape (2025‑early)
Major inference frameworks such as ReAct, Plan‑and‑Execute, and Self‑Reflection have become natively supported by mainstream LLMs. Open‑source ecosystems (LangChain, LlamaIndex, AutoGen, CrewAI, OpenDevin) provide mature orchestration, memory, tool registration, and multi‑agent collaboration. Observability tools now trace Agent behavior, making debugging and governance feasible.
Framework Selection
After extensive evaluation, the team chose a lightly coupled spring‑ai‑alibaba stack to avoid the complexity and maintenance burden of heavier Agent frameworks. Future plans include integrating community components like deepeval for evaluation and deepresearch for factual verification.
Architecture Design
The system adopts a two‑layer structure:
Workflow orchestration layer – business‑scenario specific pipelines.
Unified capability layer – standardized AIFunction interfaces for tools and domain knowledge.
All functions are annotated with @AiFunction, @AiParameter, @AiResult, etc., generating SDK‑level wrappers that can be invoked via a chainable syntax such as registry.item().query().invoke(params), ensuring compile‑time safety and a unified view of function calls.
AIFunction Specification
name(optional): unique identifier. description (required): one‑sentence capability description. parameters: auto‑derived from @AiParameter. returns: optional return type description. expose: flag for external exposure.
Extended fields: tags, author, sideEffect, timeoutMs, deprecated.
Knowledge Bases
Three knowledge layers support the Agent:
Explicit factual knowledge – objective statements (e.g., "GPU brand of a graphics card"). Used for operational decisions, prompt augmentation, and data cleaning.
Contextual scenario knowledge – relationships between items and contexts (e.g., item‑item, item‑scenario). Supports main‑accessory inference.
Implicit experiential knowledge – user experience, expert reviews, brand culture. Powers selling points and parameter explanations.
Explicit knowledge is generated by aggregating SKU master data, images, external notes, raw attributes, and manual configurations, then processed by statistical analysis and LLM understanding modules. Results are stored in a vector database for fast semantic retrieval.
Old Architecture Limitations
Complex data pipelines with high maintenance cost (SQL, UDF, scattered offline nodes).
Poor extensibility for multi‑round or multi‑model reasoning.
Unstable inference scheduling due to shared GPU resources.
Separate online and offline stacks causing duplicated code and inconsistent logic.
New Unified Agent Architecture
The 2025 redesign merges online and offline inference into a single workflow powered by Spring AI Agent. Core components: item‑agent‑client: lightweight JDK8 SDK exposing standardized capabilities. agent‑server: domain logic, repository/DAO separation, unified service APIs. item‑agent‑instances: per‑scenario Agent modules (data enrichment, Q&A, SKU engine). item‑agent‑evaluation‑client: A/B testing and metric collection. item‑agent‑functions: registered atomic functions (vector write, text parsing, attribute extraction). item‑agent‑sdk: unified call contract for AI and non‑AI Agents.
Shared item‑agent‑event‑engine for asynchronous messaging and eventual consistency.
Admin layer for ops, scheduling, and manual interventions.
Both offline batch and real‑time event‑driven triggers share the same workflow logic; the only difference is the entry point (scheduled job vs. transaction event). Distributed processing, dynamic concurrency control, and dual‑write to MySQL (low‑latency queries) and TisPlus (vector storage) ensure high throughput and consistency.
Evaluation & Results
An automated evaluation pipeline measures new vs. old attribute recall, ranking, and conversion impact. Metrics such as V‑value accuracy, completeness, readability, and consistency are scored using LLM‑based semantic assessment. The system achieved:
Coverage of billions of items with significantly higher information completeness.
Positive lift in search and detail page conversion rates.
Development cycle reduced to ~1 person‑week per new feature.
Future Outlook
Emerging Agent frameworks (Harness, Skill, etc.) promise richer planning, memory, and tool orchestration. The team plans to integrate these to deepen semantic understanding, modularize domain expertise, and build self‑adapting decision mechanisms, moving the product domain from "single‑point efficiency" to "systemic autonomy".
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
