Artificial Intelligence 10 min read

Practical AI‑Powered Voice Recognition for Game Dialogue Testing: A Step‑by‑Step Case Study

This article presents a detailed case study of using AI speech‑recognition techniques—including acoustic modeling with VGG, pypinyin conversion, feature extraction, and CTC decoding—to automatically verify game dialogue audio against script text, outlining the workflow, challenges, implementation details, and experimental results.

NetEase LeiHuo Testing Center

Apr 15, 2022

Practical AI‑Powered Voice Recognition for Game Dialogue Testing: A Step‑by‑Step Case Study

The article introduces a real‑world AI project aimed at improving the efficiency of testing voice‑over dialogue in the mobile game "Qian Nu" by automatically matching audio files with their corresponding script subtitles.

Product background: Testers previously had to listen to each voice clip in full and manually compare it with the planned subtitles, a time‑consuming process.

Requirement: Use AI to recognize speech and determine whether the spoken content aligns with the textual description.

Solution overview: An acoustic model converts Chinese characters to tonal pinyin, assigns numeric IDs to each pinyin token, and uses a CNN‑based VGG model with CTC decoding to output the most likely pinyin sequence. The language model step can be omitted for this verification task, directly comparing acoustic‑model output with expected pinyin.

Implementation steps:

Extract the list of voice files and their expected text from the design documents.

Convert the expected Chinese text to tonal pinyin (e.g., "ni3hao3") using the pypinyin library.

Map each unique pinyin token to a fixed numeric ID.

Transform each audio file into a spectrogram and extract features (mp3 → wav conversion, framing, windowing, spectrogram generation).

Train an AI model (VGG‑based CNN) to map spectrogram features to the numeric pinyin IDs.

Run the trained model on the original voice files to obtain predicted pinyin sequences.

Compare the predicted sequences with the expected ones, allowing for a predefined error set to handle minor oral‑written variations.

Technical challenges addressed:

Linking Unity/Fmod event identifiers with raw audio files required custom scripting.

Inconsistent references to audio events across multiple design sheets demanded a flexible extraction pipeline.

Handling oral‑written discrepancies by defining an error‑tolerance dictionary.

Key technologies used:

Acoustic modeling with VGG (a classic CNN) and CTC decoding to collapse repeated symbols and remove blank tokens.

Feature extraction via spectrograms; attempted SpecAugment but ultimately adopted the ASRT feature‑extraction method.

Training data from public Chinese speech corpora (THCHS30 and ST‑CMDS).

Training parameters: batch size 8 (later 16 on external GPU resources), over 250,000 batches, loss convergence observed but accuracy still below target.

Results and future work: The prototype achieved modest recognition accuracy; further improvements are needed before production deployment. The team plans to refine the model, experiment with larger batch sizes, and explore additional AI use cases in product testing.

Overall, the case study demonstrates how AI techniques can be integrated into game development pipelines to automate quality‑assurance tasks, reducing manual effort and paving the way for broader AI adoption in testing workflows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python AI Speech Recognition game testing CTC decoding pypinyin VGG

Written by

NetEase LeiHuo Testing Center

LeiHuo Testing Center provides high-quality, efficient QA services, striving to become a leading testing team in China.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.