Architectural Evolution of Bilibili Live Interaction Center
To solve duplicated functionality, legacy code, and scalability limits in Bilibili’s live‑streaming interaction services, the team created a unified Interaction Center that abstracts RTC, consolidates session, link, UI, scoring and role management, introduces a shared state machine and tracing, and evolves through phased, extensible architecture for higher performance and maintainability.
In Bilibili's live streaming business, real‑time audio‑video interaction has become the main communication mode between anchors and between anchors and viewers. To meet the growing demand for interaction, Bilibili offers various real‑time interaction products such as voice chat rooms, link‑mic, and video PK.
Because of rapid business and technical growth, frequent team reorganizations, and accumulating technical debt, the interaction services suffered from duplicated functionality, data silos, and an outdated architecture that no longer met performance and scalability requirements.
The team therefore decided to platform‑ify the interaction center, consolidating all interaction services into a unified platform that reduces system complexity while improving performance and delivery efficiency.
Typical Interaction Business
The main interaction scenarios include voice chat rooms, voice link‑mic, multi‑person link‑mic, and one‑to‑one link‑mic. The most representative example is the video link‑mic feature, which enables anchors to invite viewers or guests for real‑time video calls, displaying the guest video in picture‑in‑picture mode.
Typical Architecture
Two anchors use RTC for video link‑mic, then each anchor’s streaming software merges the streams and pushes them to a video‑cloud CDN for distribution to their respective live rooms. The diagram shows two parallel planes: the live plane and the interaction plane, with RTC and push‑pull streaming as the core communication technologies.
Challenges and Problems
Repeated construction: similar capabilities are built separately for each interaction product, leading to data islands and duplicated development.
Outdated technology: legacy components (PHP, early Go services) and multiple RTC SDK migrations cause complexity.
Lack of documentation and knowledge transfer, making the system hard to understand.
Inefficient development processes, lacking automation, high QPS, and scalability issues.
Technical debt and low code quality affecting maintainability.
To address these, the team abstracted common capabilities into a “Interaction Center” that sits between the live plane and the interaction plane, providing unified session management, link management, and UI rendering.
Platform Evolution
The evolution is divided into two phases:
Phase 1: Provide generic session and link capabilities, abstracting RTC away from business logic.
Phase 2: Consolidate UI rendering and stream management, decoupling client implementations from specific interaction services.
Key Common Features (1.0)
Mic Management : Assign fixed positions for users in voice chat rooms and multi‑person video link‑mic, ensuring consistent layout and supporting scoring rules based on mic position.
Scoring System : Real‑time scoring based on gifts, follows, and stage‑specific rules; redesigned to use both synchronous and asynchronous pipelines with caching for low latency.
Role Management : RBAC‑based permission model for anchors, participants, administrators, and viewers, with role data stored both in real‑time session data and persistent storage.
Limitations of 1.0
The existing implementation cannot automatically switch a two‑person link‑mic to a multi‑person link‑mic, and each interaction mode requires separate development, limiting extensibility.
Phase 2 Enhancements
Unified connection management with a universal multi‑person link capability.
Componentized UI layout system driven by JSON configuration, allowing dynamic adaptation to different interaction modes across iOS, Android, Web, and PC.
Dynamic stream control to support mixed push‑pull scenarios (e.g., some users only pull, others only push audio).
Unified Data and State Machine
Data from multiple services (interact, biz, av) are consolidated to ensure consistency of session and link records. A shared state machine abstracts common lifecycle states (e.g., Initiated, Connecting, Connected, Cancelled, Timeout, Ended) to avoid duplicated state logic across products.
Performance, Monitoring, and Capacity
Key quality‑of‑service metrics are collected, including RTC join success rate, latency, push‑stream success, invitation/application reach rates, QPS of interaction operations, and real‑time user counts. Capacity planning targets a three‑fold increase in concurrent interaction sessions, participants, and peak QPS.
Stress‑test interfaces (FastCreate, Invitation, Apply, Handle, RtcEvent, UniversalInfo) and jobs (CheckChannelStatusTask, CheckLinkTimeoutTask, CheckUserRecordTimeoutTask) are defined to validate scalability.
Tracing and Diagnosis System
To simplify root‑cause analysis across multiple business modules (PK scoring, gifting, etc.), a tracing framework links key events across services. Data points are injected throughout the system, forming a hierarchy of scene → module → key node, each with success/failure status.
A “diagnosis console” aggregates server, client, and RTC events, providing full‑stack tracing, monitoring, and offline analysis for rapid issue resolution.
Conclusion
The platform‑centric approach balances the need for extensibility with operational stability. While upfront design can limit future adaptability, an evolutionary architecture with well‑defined extension points and low coupling helps accommodate growing user volume and feature demands.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.