Design and Optimization of a High‑Performance Live‑Streaming Danmaku System
This article details the design and optimization of a high‑throughput live‑streaming danmaku system, covering background requirements, bandwidth challenges, short‑polling versus WebSocket delivery, compression and frequency controls, service splitting with caching and lock‑free ring buffers, and reports successful handling of 700 k concurrent users during a major event.
To better support Southeast Asian live‑streaming services, a danmaku feature was added. The first phase used Tencent Cloud but suffered from stutter and insufficient bullet comments, prompting the development of a custom danmaku system capable of supporting up to one million concurrent users per room.
Problem Analysis
Bandwidth pressure: delivering at least 15 danmaku every 3 seconds results in over 3 KB per packet, leading to an estimated 8 Gbps data rate, while the available bandwidth is only 10 Gbps.
Weak network causing danmaku stutter and loss.
Performance and reliability: projected QPS exceeds 300 k, requiring robust handling during peak events such as Double‑Eleven.
Bandwidth Optimization
Enable HTTP compression; gzip can achieve over 40 % reduction.
Simplify response structure.
Content ordering optimization: place strings and numbers together to increase compression ratio.
Frequency control.
Bandwidth control: add a request‑interval parameter so the server can throttle client request frequency during traffic spikes.
Sparse control: during periods with few or no danmaku, adjust the next request time to avoid unnecessary client requests.
Danmaku Stutter and Loss Analysis
The most common dilemma when building a danmaku system is choosing between push and pull delivery.
Long Polling via AJAX
The client opens an AJAX request that remains pending until the server has an event to return. Enabling HTTP Keep‑Alive can further reduce handshake latency.
Advantages: reduces polling frequency, low latency, good browser compatibility. Disadvantages: the server must maintain many concurrent connections.
WebSockets
WebSocket provides true bidirectional communication with minimal header overhead (2‑10 bytes for server‑to‑client frames, plus 4 bytes mask for client‑to‑server). It offers stronger real‑time capabilities and keeps the connection alive.
Advantages: low control overhead, full‑duplex communication, better real‑time performance. Disadvantages: still requires a persistent connection and may not survive weak networks.
Long Polling vs WebSockets
Both rely on TCP long connections. TCP keep‑alive probes detect broken connections based on keepalive_probes , keepalive_time , and keepalive_intvl . In weak Southeast Asian networks, connections often drop, making both approaches unsuitable.
Consequently, the team adopted a short‑polling strategy to deliver danmaku.
Reliability and Performance
To ensure stability, the service was split: a complex sending side and a high‑frequency pulling side. This prevents the pull service from overwhelming the send service and vice versa, facilitating independent scaling and clearer service boundaries.
On the pull side, a local cache is introduced. The service periodically RPCs the danmaku store, caches results in memory, and serves subsequent requests directly from the cache, dramatically reducing latency and isolating failures.
Data is sharded by time using a ring buffer that retains only the latest 60 seconds. The tail pointer advances each second, storing timestamps and associated danmaku lists. Reads traverse the buffer backward from the tail, ensuring ordered data with high read efficiency.
Write operations are single‑threaded, eliminating concurrency concerns. Reads are lock‑free because they only access data up to 30 seconds old, avoiding overlap with the write pointer.
On the sending side, rate limiting discards excess danmaku, and graceful degradation (e.g., fallback avatar fetching or profanity filtering) ensures core functionality remains unaffected.
Conclusion
During the Double‑12 promotion, even with a brief Redis outage, the system efficiently supported 700 k concurrent users and met its performance targets.
Readers are invited to discuss the design, ask questions, and join the author’s ChatGPT community for further AI‑related resources and side‑project opportunities.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.