Artificial Intelligence 20 min read

Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

This article presents a comprehensive overview of OPPO's XiaoBu Assistant, detailing its research background, chat skill architecture, evolution from retrieval and rule‑based methods to generative models, industry model comparisons, decoding and ranking strategies, safety mechanisms, performance optimizations, and evaluation results.

DataFunTalk
DataFunTalk
DataFunTalk
Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

The talk introduces the research background of XiaoBu Assistant, an AI assistant embedded in OPPO smartphones and IoT devices, highlighting its five core functions and large user base.

It describes XiaoBu's chat capabilities, including witty single‑turn responses, skill‑guided interactions, multi‑turn topic handling, and emotion perception.

The development timeline moves from retrieval‑based chat, to rule‑based chat for non‑exhaustive queries, and finally to generative chat to address long‑tail queries, noting challenges of controllability and safety.

Industry generative solutions are surveyed, covering RNN‑based seq2seq, tensor2tensor (self‑attention), GPT series, and UniLM models, with a comparative diagram of their characteristics.

Decoding strategies such as greedy search, beam search, top‑k sampling, and top‑p sampling are explained, followed by answer ranking methods like RCE rank and MMI rank to filter low‑quality responses.

Evaluation approaches include manual human assessment (safety, relevance, richness, fluency) and automated metrics, acknowledging gaps between automatic scores and real user experience.

The business practice section outlines the overall response flow: initial retrieval, fallback to rule‑based, then generative; context‑aware handling for multi‑turn queries; safety checks for queries and answers; and a two‑stage model training pipeline inspired by Baidu Plato, with 1‑V‑1 and 1‑V‑N phases.

Model design details cover architecture selection (UniLM with BERT‑base initialization), input embeddings (token, context mask, role embedding), training configurations, and decoding using sampling‑rank with top‑p.

Safety mechanisms include multi‑stage query filtering, QA pair safety modeling, and data sanitization to remove or replace unsafe training examples.

Performance optimizations involve dynamic batching, ONNX runtime acceleration, and caching of intermediate states, achieving latency reductions on T4 GPUs.

Effect analysis shows strong automated diversity scores and high manual satisfaction (≈85%), with examples of multi‑turn conversational quality.

The conclusion reflects on the feasibility of end‑to‑end generative chat, remaining challenges in latency, persona alignment, safety, consistency, and plans to explore multimodal generative QA.

model optimizationNLPChatbotgenerative AIdialogue systemsOPPO
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.