Artificial Intelligence 17 min read

Applying Large Language Models to NPC Role‑Playing and Game Localization at Tencent

This article details Tencent's practical exploration of large language model deployment in overseas game scenarios, covering the design of customized NPC role‑playing models, multilingual localization pipelines, data construction, training, evaluation frameworks, multi‑agent improvement loops, and insights from a comprehensive Q&A session.

DataFunSummit
DataFunSummit
DataFunSummit
Applying Large Language Models to NPC Role‑Playing and Game Localization at Tencent

Tencent shares its experience of deploying large language models (LLMs) in overseas game contexts, focusing on two main scenarios: NPC role‑playing and game localization translation.

For NPC role‑playing, generic LLMs often sound overly formal and lack personality, so Tencent builds a specialized model with a million‑scale dataset sourced from novels, scripts, and games, using a "5+3" schema (name, gender, age, personality, background, plus actions, dialogue style, knowledge). Training involves targeted fine‑tuning, safety‑question datasets, and DPO optimization to enforce knowledge boundaries.

The evaluation framework consists of three tiers: basic language ability, identity‑consistent style and skills, and advanced subjective traits. Benchmark results show that even strong models like GPT‑4o can appear stiff, illustrated by a Socrates dialogue case.

In the localization translation track, three translation categories are identified (in‑game UI, storyline, user‑generated content and operational events). Challenges include missing game‑specific terminology, evolving slang, and contextual nuances. Tencent enhances LLMs with retrieval‑augmented generation (RAG), term‑embedding, and negative‑sample training to filter noisy retrieval results.

A continuous improvement loop—translation → evaluation → correction—is implemented using a multi‑agent chain and MQM scoring, complemented by offline model iteration and online A/B testing to monitor user feedback, usage frequency, and quality metrics, especially for low‑resource languages.

The Q&A section addresses online vs. offline correction, expert evaluation dimensions, benefits of multi‑agent designs, NPC memory mechanisms, multilingual curse mitigation strategies, and practical data collection methods for low‑resource languages.

Large Language ModelsAI evaluationTencentgame localizationmultilingual translationNPC AI
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.