Artificial Intelligence 14 min read

Applying Large Models to Xiao AI Assistant: Intent Routing, Understanding, and Response Generation

This article presents a comprehensive technical overview of how large language models are integrated into Xiaomi's Xiao AI assistant, detailing the architecture for intent routing, domain‑specific intent understanding, function‑calling mechanisms, fine‑tuning strategies, performance gains, and future research directions.

DataFunSummit
DataFunSummit
DataFunSummit
Applying Large Models to Xiao AI Assistant: Intent Routing, Understanding, and Response Generation

The presentation introduces Xiao AI, an omnipresent AI assistant covering voice, visual, translation, and call functions across devices such as phones, speakers, TVs, and Xiaomi cars.

It explains the motivation behind adopting large models after the ChatGPT wave, highlighting a 10% increase in next‑day user retention and an 8% improvement in query satisfaction.

Large‑Model Intent Routing : A dedicated model classifies incoming queries and routes them to vertical agents. Two main challenges are knowledge requirements (e.g., distinguishing system settings from device commands) and latency constraints (<200 ms). Prompt engineering with few‑shot examples was initially used, but token limits led to a shift toward model fine‑tuning.

Fine‑Tuning Process : The workflow consists of two steps – continued pre‑training on Xiao AI dialogue data mixed with generic NLP data (ratio 10:1 to 15:1) and instruction fine‑tuning. Experiments show a 2% accuracy gain on the evaluation set and further improvements when adding few‑shot examples.

Domain‑Specific Intent Understanding : Traditional Intent+Slot pipelines are replaced by a function‑calling approach. All APIs are abstracted as functions with defined parameters; the model decides whether to invoke a function and supplies the required arguments, enabling multi‑turn interactions and reducing training data by up to 95%.

Function‑Calling Challenges : Ensuring 100% instruction compliance, handling function dependency chains, and reducing inference latency. Solutions include a lightweight LLMPlanner‑TaskFetching‑Executor framework (similar to LLMCompiler) and token‑reduction techniques such as high‑compression base models, vocabulary expansion, and token replacement.

Response Generation : Issues of timeliness, long‑context handling, and strict instruction adherence are addressed using Retrieval‑Augmented Generation (RAG) and fine‑tuned models. Desired capabilities include knowledge summarization, information extraction, complex reasoning, multi‑turn consistency, and safe fallback responses.

Fine‑Tuning Details : Steps involve optimizing single‑skill training data (evaluated automatically by GPT‑4‑like models), mixing data ratios for instruction fine‑tuning, and constructing preference data for DPO training using a four‑tuple format <Query, Knowledge, Win_response, Lose_response> . Instruction fine‑tuning plus DPO yields a 2% increase in response satisfaction and a 10% boost over pure prompt engineering.

Future Outlook : Exploration of end‑to‑end multimodal models (e.g., GPT‑4o, Gemini) and on‑device large models for privacy, while acknowledging that the current divide‑and‑conquer architecture remains the most effective solution.

The article concludes with acknowledgments of the speakers and references to related technical talks.

Large Language ModelsFine-tuningfunction callingNLPAI Assistantintent routingXiao AI
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.