Real-Time Audio/Video System Architecture and Key Technologies Based on WebRTC
The article surveys the evolution of live streaming toward low‑latency, interactive scenarios and details WebRTC‑based real‑time audio/video system design, covering RTP/UDP transport, FEC and ARQ loss recovery, congestion control, jitter buffering, echo cancellation, edge‑node path optimization, and a multi‑layer architecture with signaling, routing, mixing services for scalable, high‑availability PK deployments.
The article introduces the background of live streaming, describing how the industry has evolved from simple entertainment streams to education, e‑commerce, and interactive multi‑host (PK) scenarios. It explains the need for low‑latency, real‑time audio/video (AV) communication, where hosts see each other's streams with sub‑second delay while viewers receive near‑real‑time streams.
2. Real‑time Audio/Video Key Technologies
2.1 Transport Protocol – RTP (Real‑time Transport Protocol) is the standard for AV transport. UDP is preferred for low latency because it tolerates loss and reordering, while TCP incurs retransmission delays. A fallback to TCP is used when UDP is blocked.
2.2 Packet‑Loss Compensation – Forward Error Correction (FEC) adds redundant packets so that lost data can be reconstructed; WebRTC implements UlpFEC and FlexFEC. Backward error correction includes ARQ (NACK‑based retransmission) and PLC (Packet Loss Concealment) for audio.
2.3 Congestion Control – Bandwidth estimation adjusts video bitrate based on network conditions. WebRTC uses Google Congestion Control, combining delay‑based (Transport‑CC, REMB) and loss‑based algorithms to compute a target bandwidth.
2.4 Buffering – Jitter buffers (video) and NetEQ (audio) smooth out network jitter, reordering, and packet loss while maintaining synchronization.
2.5 Echo Cancellation – WebRTC’s AEC module removes echo caused by speaker‑mic feedback using delay estimation, linear adaptive filtering, and nonlinear processing.
2.6 Optimal Path Selection – For geographically distant hosts, routing through intermediate acceleration nodes reduces latency; global deployment of edge nodes enables path optimization.
3. WebRTC Analysis – WebRTC’s architecture consists of an interface layer (Web API, C++ API), session layer (signalling, media negotiation), engine layer (audio/video codecs, SRTP, congestion control, echo cancellation, etc.), and device I/O layer (capture/rendering, network I/O). Most key technologies are already provided by WebRTC, allowing teams to focus on integration and customization.
4. System Architecture for B‑Station PK – The design combines a signaling service (rtc‑service), health‑check job (rtc‑job), routing service (rtc‑router), and mixing service (rtc‑mixer). Hosts connect via a 4‑layer acceleration network to the nearest edge node; media streams are either P2P (with STUN/TURN) or server‑relayed. Mixing can be performed on the client or server depending on device load and bandwidth. The architecture ensures high availability through multi‑region deployment, health checks, and failover mechanisms.
5. Conclusion – The article provides a comprehensive guide to building a low‑latency real‑time AV system, covering protocols, error correction, congestion control, buffering, echo cancellation, path optimization, and a practical deployment architecture based on WebRTC.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.