Backend Development 16 min read

Design and Optimization of a High‑Throughput Long‑Connection Service for Live Streaming

The article details a Golang‑based high‑throughput long‑connection service for live‑streaming, describing its five‑layer architecture, multi‑protocol support, load‑balancing, message‑queue decoupling, aggregation with brotli compression, multi‑region deployment, priority channels, and future enhancements for observability and intelligent endpoint selection.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Optimization of a High‑Throughput Long‑Connection Service for Live Streaming

In the digital entertainment era, bullet‑screen (danmu) has become an essential interactive element on live‑streaming platforms. Real‑time interaction such as sending danmu or gifts requires a persistent network channel, i.e., a long‑connection, to push information instantly to the client.

A long‑connection is a network data channel that stays alive for the whole application lifecycle, supporting full‑duplex data transfer. Unlike short‑connection request/response models, it enables the server to push data to users proactively.

This article introduces a Golang‑based long‑connection service, covering its framework design and the optimizations made for stability and high throughput.

Framework Design

The service is shared by multiple business lines, so the design must accommodate diverse requirements while keeping the service boundaries clear to avoid coupling with business logic.

The long‑connection service consists of three main aspects:

Connection establishment, maintenance, and management.

Downstream data push.

Upstream data forwarding (currently only heartbeat).

Overall Architecture

The architecture is divided into five layers:

Control Layer : Pre‑connection checks, authentication, token generation, and routing control.

Access Layer : Core long‑connection handling – certificate unloading, protocol adaptation, connection‑ID/room‑ID mapping, and upstream/downstream message processing.

Logic Layer : Business‑level functions such as online user reporting and connection‑attribute recording.

Message Distribution Layer : Message packaging, compression, aggregation, and dispatch to edge nodes.

Service Layer : Business service entry point for downstream push, permission control, message validation, and rate‑limiting.

Core Processes

The long‑connection follows three core processes:

Establishing the connection: The client obtains a valid token and access point configuration via the control layer.

Maintaining the connection: The client sends periodic heartbeats to keep the connection alive.

Downstream push: Business servers trigger a push, the service layer determines the target connection, the distribution layer forwards the message to the appropriate access node, which finally delivers it to the client.

Feature List

Based on Bilibili’s live‑streaming scenarios, the service provides the following generic push capabilities:

User‑level messages (e.g., invitation for PK).

Device‑level messages (e.g., log‑upload commands for unauthenticated devices).

Room‑level messages (e.g., danmu broadcast to all users in a room).

Region‑level messages (e.g., promotional activity to all rooms in a specific region).

Global messages (e.g., platform‑wide notifications).

High‑Throughput Optimizations

With millions of concurrent users during peak events (e.g., the S‑Series finals), the system faces message rates of over 100 million per second. The following measures were taken to sustain performance.

1. Network Protocol

Three protocols are supported:

TCP – reliable, suitable for high‑reliability scenarios.

UDP – unreliable but low‑latency, used where occasional loss is acceptable.

WebSocket – bidirectional communication for web clients with moderate overhead.

The access layer separates protocol handling from connection management, allowing new protocols to be added without affecting core business logic.

2. Load Balancing

A control layer provides HTTP short‑connection endpoints that, based on client location and edge‑node health, dynamically select the optimal access node. Horizontal scaling of the access layer and dynamic node addition/removal ensure stable CPU and memory usage even when online users approach ten million.

3. Message Queue

Introducing a message queue and a dedicated distribution layer decouples business push from edge‑node delivery, improving concurrency and preventing bottlenecks in the service layer.

4. Message Aggregation

During hot events, a single room may generate millions of identical messages. By aggregating messages per room and sending them in batches, the QPS of the distribution layer to the access layer drops by about 60 %.

5. Compression Algorithms

After aggregation, message payloads become larger, so compression is applied. Two widely used algorithms were evaluated: zlib and brotli.

Test results (average compressed size) are shown below:

Scenario

Original Size

zlib Size

zlib Ratio

brotli Size

brotli Ratio

brotli vs zlib Savings

2 messages

1126

468

42%

390

35%

17%

10 messages

4706

1728

37%

1423

30%

18%

20 messages

9505

2674

28%

2172

23%

19%

40 messages

19387

3161

16%

2488

13%

20%

Brotli consistently outperformed zlib, so it was adopted. Compression is performed at the distribution layer to avoid repeated work on edge nodes, improving throughput and reducing bandwidth costs.

Service Guarantees

Because many business flows depend on reliable push, the following safeguards were implemented.

1. Multi‑Active Deployment

Identical service instances are deployed across East, South, and North China, as well as Singapore for overseas users. Automatic failover ensures continuity when a region experiences a fault.

2. High/Low Priority Channels

Messages are classified by importance. Critical messages (e.g., PK invitations) use a high‑priority channel, while less critical ones (e.g., regular danmu) use a low‑priority channel, providing physical isolation and preferential delivery.

3. "High‑Reach" (高达) Function

To guarantee end‑to‑end delivery, each message carries a unique msgID . The client performs idempotent deduplication and ACKs receipt. The server retries undelivered messages within a configurable window. The final delivery rate is calculated as 1‑(1‑r)^(n+1). For example, with r = 97 % and n = 2, the delivery rate reaches 99.9973 %.

Other Optimizations

Enter/Exit Room Messages : To avoid loss of room‑join/leave notifications, a state‑machine driven by heartbeats is used, providing idempotent handling and compensation mechanisms.

Future Plans

The service is stable after several iterations, and future work will focus on:

Data‑driven observability: full‑link network quality metrics and high‑value message tracing.

Intelligence: automatic endpoint selection and connection establishment based on environment.

Performance: sharing goroutines in the access layer’s connection module to reduce goroutine count and increase per‑node capacity.

Feature expansion: adding offline message support and other capabilities.

backend architecturegolangLoad BalancingStreamingLong Connectionhigh throughputmessage compression
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.