Design and Optimization of a High‑Performance Bullet Chat System for Southeast Asian Live Streaming
This article details the design, bandwidth optimization, and reliability strategies of a custom bullet‑chat system for Southeast Asian live streaming, covering background challenges, problem analysis, compression, request throttling, long‑polling versus WebSocket trade‑offs, and a short‑polling solution that successfully supported 700 k concurrent users.
Background
To better support Southeast Asian live streaming, a bullet‑chat feature was added. The first version, powered by Tencent Cloud, suffered from frequent stutters and insufficient messages, prompting the development of an in‑house system capable of handling up to one million concurrent users per room.
Problem Analysis
The system faces three main issues:
Bandwidth pressure – delivering 15 messages every 3 seconds with HTTP headers exceeds 3 KB per packet, resulting in an estimated 8 Gbps data rate, while only 10 Gbps is available.
Weak networks causing stutter and loss – already observed in production.
Performance and reliability – projected QPS can surpass 300 k, demanding robust handling during peak events.
Bandwidth Optimization
Four measures were taken:
Enable HTTP compression (gzip can reduce size by over 40%).
Simplify response structures.
Reorder content to increase redundancy, improving compression ratios.
Frequency control: Bandwidth control: add a request‑interval parameter so the server can throttle client requests. Sparse control: during low‑traffic periods, delay next requests to avoid unnecessary calls.
Bullet‑Chat Stutter and Loss Analysis
Choosing a delivery mechanism (push vs pull) is critical.
Long Polling via AJAX
The client opens an AJAX request that the server holds until an event occurs. Enabling HTTP Keep‑Alive reduces handshake overhead. Advantages: lower latency, good browser compatibility. Disadvantages: the server must maintain many connections.
WebSockets
WebSocket provides true bidirectional communication with minimal header overhead (2‑10 bytes for server‑to‑client frames, plus 4 bytes mask for client‑to‑server). It reduces per‑request overhead compared to HTTP and offers stronger real‑time capabilities.
Long Polling vs WebSockets
Both rely on TCP long connections. TCP keep‑alive probes detect disconnections based on three parameters:
keepalive_probes : number of probes (default 7)
keepalive_time : timeout (default 2 hours)
keepalive_intvl : interval between probes (default 75 s)
In weak Southeast Asian networks, TCP connections often drop, making detection intervals critical. For Long Polling the shortest detection interval is min(keepalive_intvl, polling_interval) , while for WebSockets it is min(keepalive_intvl, client_sending_interval) . Because connections may already be broken when the next packet is sent, TCP keep‑alive offers limited benefit, and WebSockets also struggle under poor network conditions.
Given these constraints, the team adopted a “short‑polling” approach for bullet‑chat delivery.
Reliability and Performance
The service was split into two parts: a sending side handling complex logic and a pulling side serving high‑frequency read requests. This separation prevents one side from overwhelming the other and simplifies scaling.
On the pulling side, a local cache stores recent bullet messages fetched via RPC. Data is sharded by time into a ring buffer that retains only the latest 60 seconds, enabling lock‑free reads and writes and high throughput.
On the sending side, rate limiting discards excess messages, and auxiliary features (avatar fetching, profanity filtering) are designed to fail gracefully, ensuring core message delivery continues.
Summary
During the Double‑12 event, despite a brief Redis outage, the system reliably supported 700 k concurrent users, meeting the performance goals.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.