Artificial Intelligence 17 min read

Applying Large Language Models to NPC Role‑Playing and Game Localization at Tencent

This article details Tencent's practical exploration of large language model deployment in overseas game scenarios, covering the design of customized NPC role‑playing models, multilingual localization pipelines, data construction, training, evaluation frameworks, multi‑agent improvement loops, and insights from a comprehensive Q&A session.

DataFunSummit

Dec 3, 2024

Applying Large Language Models to NPC Role‑Playing and Game Localization at Tencent

Tencent shares its experience of deploying large language models (LLMs) in overseas game contexts, focusing on two main scenarios: NPC role‑playing and game localization translation.

For NPC role‑playing, generic LLMs often sound overly formal and lack personality, so Tencent builds a specialized model with a million‑scale dataset sourced from novels, scripts, and games, using a "5+3" schema (name, gender, age, personality, background, plus actions, dialogue style, knowledge). Training involves targeted fine‑tuning, safety‑question datasets, and DPO optimization to enforce knowledge boundaries.

The evaluation framework consists of three tiers: basic language ability, identity‑consistent style and skills, and advanced subjective traits. Benchmark results show that even strong models like GPT‑4o can appear stiff, illustrated by a Socrates dialogue case.

In the localization translation track, three translation categories are identified (in‑game UI, storyline, user‑generated content and operational events). Challenges include missing game‑specific terminology, evolving slang, and contextual nuances. Tencent enhances LLMs with retrieval‑augmented generation (RAG), term‑embedding, and negative‑sample training to filter noisy retrieval results.

A continuous improvement loop—translation → evaluation → correction—is implemented using a multi‑agent chain and MQM scoring, complemented by offline model iteration and online A/B testing to monitor user feedback, usage frequency, and quality metrics, especially for low‑resource languages.

The Q&A section addresses online vs. offline correction, expert evaluation dimensions, benefits of multi‑agent designs, NPC memory mechanisms, multilingual curse mitigation strategies, and practical data collection methods for low‑resource languages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI evaluation Tencent game localization multilingual translation NPC AI

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.