Artificial Intelligence 16 min read

How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles

Tencent’s home‑grown Zixiao AI inference chip, combined with the LightRuntime engine, dramatically reduces latency and cost for real‑time subtitles in Tencent Meeting, handling tens of thousands of concurrent audio streams while meeting sub‑second delay requirements through hardware‑software co‑optimizations and mixed‑precision model tuning.

Tencent Tech
Tencent Tech
Tencent Tech
How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles

Accelerating Tencent’s Self‑Developed Chip Portfolio

Tencent is rapidly advancing its custom silicon, including the video codec chip “Canghai” (mass‑produced for cloud gaming), the high‑performance network chip “Xuanling” (delivering zero CPU usage and 4× performance), and the AI inference chip “Zixiao”.

Zixiao in Real‑Time Subtitles for Tencent Meeting

Zixiao has been mass‑produced and deployed across Tencent’s flagship services. In Tencent Meeting, it powers real‑time personalized subtitles, achieving a single‑card performance equivalent to four NVIDIA T4 GPUs and reducing timeout rates from 0.005% to zero.

Technical Challenges of Real‑Time Subtitles

During peak periods, subtitle services must handle over 100,000 concurrent streams with end‑to‑end latency under 1 second. The system must keep per‑utterance processing under 2 seconds, otherwise the segment is dropped. High concurrency stresses CPU, GPU, and network resources.

Optimization Strategies on Zixiao

Instant (Transient) Module Acceleration : Previously run on CPU, the instant models were fine‑tuned to remove dynamic components, reducing memory usage and moving inference to Zixiao without accuracy loss.

Steady (Stable) Module Acceleration : Acoustic and rescoring models were ported to Zixiao and scheduled via the custom LightRuntime runtime, maximizing chip utilization.

Thread Framework Optimization : Batch processing threads were replaced with LightRuntime’s group‑batch and scheduling capabilities, eliminating redundant batch threads.

Model Micro‑Tuning to Eliminate Padding Effects

Dynamic input shapes caused padding‑induced errors in position embeddings and attention. The solution introduced a

real_length

input and a binary

Mask

to isolate valid frames, allowing static‑shape inference and preserving accuracy.

Key steps:

Query padded subsample cache rows separately.

Query acoustic feature rows using

real_length

as the start index.

Concatenate the two results as conformer block input.

Apply a mask after softmax to zero out padded positions.

Performance Gains

Moving the instant middle‑frame model to Zixiao reduced 128‑stream latency from ~1200 ms to 10 ms and cut CPU usage dramatically. The first‑frame module’s session pool lowered 400‑stream latency from 249 ms to 30 ms.

Mixed‑precision inference identified overflow‑prone layers (e.g., MatMul, Mul) and kept them in FP32, achieving a good trade‑off between speed and accuracy.

LightRuntime Engine Features

LightRuntime provides:

AutoBatch : Dynamically aggregates single‑request batches to improve throughput (≈20% gain for acoustic models).

AutoPadding : Automatic bucketing and padding to optimal tensor shapes, reducing DTU cost.

Multi‑Model Scheduling : Multiple sessions handle different ONNX models, with priority scheduling for high‑importance models.

These capabilities enable Zixiao to handle the full subtitle traffic, covering over 95% of meeting subtitle volume with zero timeout and up to 75% cost savings in extreme scenarios.

Overall Solution

Zixiao accelerates both transient and steady decoding pipelines. LightRuntime’s ease of integration required minimal code changes, and the combined system delivers sub‑second latency, high concurrency, and significant cost reductions.

performance optimizationAI inferencehardware accelerationTencent Meetingreal-time speech recognition
Tencent Tech
Written by

Tencent Tech

Tencent's official tech account. Delivering quality technical content to serve developers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.