Artificial Intelligence 11 min read

How Large Models Are Revolutionizing Douyin’s User Experience – Expert Insights

In a detailed interview, ByteDance AI specialist Cai Conghuai explains how large‑model techniques such as SFT, DPO and RAG address Douyin’s multimodal user‑experience challenges, improve signal detection, root‑cause analysis, and outline future AI‑agent breakthroughs for content platforms.

DataFunSummit

Jun 19, 2025

How Large Models Are Revolutionizing Douyin’s User Experience – Expert Insights

Douyin, with hundreds of millions of daily active users, faces massive user‑experience challenges. At the DA Digital Intelligence Conference in Shenzhen (July 25‑26), ByteDance algorithm expert Cai Conghuai shared how large models empower intelligent user‑experience solutions, covering experience‑signal recognition, content understanding, and root‑cause diagnosis using SFT, DPO, and RAG technologies.

DataFun: What limits do traditional algorithms have for video recommendation and comment interaction, and how do large models offer new solutions?

Cai: Traditional methods require massive data to learn deep semantic features across multimodal signals (titles, video frames, user profiles, comments), resulting in low ROI. Large models provide strong semantic and multimodal understanding, achieving good performance with zero‑ or few‑shot samples, thus expanding signal channels cost‑effectively.

DataFun: Which core methodologies from your AI work at Tencent transferred to “large‑model‑driven experience intelligence,” and what mindsets or technical habits needed breaking?

Cai: Problem definition, data analysis, model selection, training, evaluation, and iteration are universal. The key shift is moving from purely discriminative formulations to generative tasks that large models can handle.

DataFun: How does the large model enable earlier detection of experience problems in the “experience‑signal recognition” stage, especially for multimodal video and comment data?

Cai: Offline feedback, online客服, and reports are lagging signals. By analyzing user‑generated video comments semantically, we can spot unreasonable pinned words before users file complaints, using a hierarchical pipeline of traditional and large models enhanced with RAG to boost accuracy at billion‑scale volumes.

DataFun: How are “quality scores” and “semantic viewpoints” quantified, and do large models replace or complement traditional scoring models?

Cai: We define core business metrics (semantic‑viewpoint accuracy, duplication, missing rates) and conduct manual sampling. Large models can replace traditional models in many scenarios, but for massive scale we adopt a hybrid layered approach, employing full‑parameter fine‑tuning and reinforcement fine‑tuning (DPO) with category and preference annotations.

DataFun: In root‑cause analysis, how do you balance diagnostic accuracy with interpretability? Any use of knowledge graphs or causal reasoning?

Cai: We combine user profiles, behavior logs, A/B experiment data, and release information. Most anomalies stem from experiments or releases, so we model the problem as a matching task. For example, early 2025 we identified that a “Douyin spark tag” experiment caused a surge in user complaints, pinpointed quickly via the large model.

DataFun: How do SFT, DPO, and RAG adapt to Douyin’s constraints such as model size, real‑time latency, and data security?

Cai: We fine‑tune a 7B base model, then apply distillation and quantization to reduce resource consumption. DPO aligns the model with business preferences for summarization and semantic‑viewpoint tasks. RAG retrieves relevant knowledge from internal knowledge bases, limiting reliance on massive parametric knowledge and enhancing security.

DataFun: How do you design a scientific evaluation system for the subjective outputs of large models?

Cai: Currently we rely on expert judgment, breaking subjective issues into multiple dimensions. Future work will incorporate user surveys, A/B testing, and annotated knowledge bases for a more rigorous evaluation framework.

DataFun: How do you address hallucinations and long‑tail coverage challenges?

Cai: We filter feedback with a quality‑score model and enforce human checks. During inference, RAG pulls high‑quality knowledge from a curated repository to constrain generation. For long‑tail issues, we build dedicated retrieval and recognition pipelines for each high‑risk category.

DataFun: What measurable impact has the solution had on Douyin’s metrics, and is it applicable to other content platforms?

Cai: Core metrics improved: reduced negative feedback, increased user satisfaction, higher problem‑resolution rates, and better service efficiency. These challenges are common across content platforms, so the technical approach offers broad reference value.

DataFun: What are the core breakthroughs you foresee for large models in user experience over the next three years?

Cai: The evolution of AI agents—from simple chat to intelligent diagnosis, root‑cause analysis, and multi‑dimensional collaboration—will drive smoother interactions. Technically, we need advances in agent architecture, data collection, fine‑tuning, and model compression. Business‑wise, robust privacy management, standardized knowledge bases, and cross‑team coordination are essential.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

user experience RAG model fine-tuning evaluation AI Algorithms Multimodal Learning

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.