How We Scaled a Live‑Stream Danmu System from PHP to Go for 50K+ Concurrent Users
Facing massive memory usage and latency in a PHP‑based live‑stream bullet chat, we iteratively re‑engineered the system—splitting Redis, limiting broadcasts, sharding rooms, then rebuilding it in Go with distributed room management, concurrent broadcasting, and extensive testing, achieving stable operation for tens of thousands of concurrent connections.
Early Danmu System
In 2016, when live streaming surged, our company began optimizing the bullet‑chat (danmu) system. The initial version was built with PHP and a Gateway framework, stored all client IDs in Redis, and ran on three machines behind an LVS load balancer using multiple worker processes.
Basic Situation
Implemented in PHP + Gateway.
Client IDs stored in Redis.
Three machines behind LVS provided the service.
Multi‑process workers handled message delivery.
Problems
Huge memory consumption; a 4‑core 8 GB machine hit the limit with ~500 clients.
Each message required fetching all client IDs for the room from Redis, making Redis and internal bandwidth bottlenecks under high concurrency.
Worker‑process count limited per‑machine concurrency, and excess workers wasted resources.
When a room exceeded 2 000 users, latency could reach about one minute.
Temporary Fixes
Split Redis into a dual‑node, four‑instance setup to disperse load.
Limited the number of broadcast messages per time unit; excess messages were dropped.
For rooms that switched from live to on‑demand, a separate danmu system was used for load shedding.
Sharded a single room into multiple sub‑rooms for message processing.
Effect After Temporary Fixes
Redis pressure reduced dramatically.
Single‑machine I/O pressure lowered.
Same hardware could support more live rooms.
However, the fundamental issues remained, prompting a full redesign.
New Danmu System
Challenges
Single rooms may host 50 k–100 k concurrent users.
Sudden traffic spikes during popular streams.
Strict real‑time delivery requirements; high latency degrades interaction.
Each message must be delivered over many long‑lived connections.
Efficient management of massive long‑connections.
Support for user/IP blacklists and sensitive‑word filtering.
Requirements
Choose a language with better memory handling for long‑running high‑concurrency services (Go).
Distributed architecture to horizontally scale to tens of thousands of users per room.
Easy integration of third‑party messages (gifts, system notices).
Prefer in‑memory management for client connections, minimizing database interactions.
Concurrent broadcast support to improve efficiency.
Refactor Approach
Adopt Go as the development language for its strong concurrency support.
Each server manages only the connections it receives.
Implement concurrent room‑wide broadcasting.
Room Management (Code)
type RoomInfo struct {
RoomID string
Lock *sync.Mutex // room operation lock
Rows []*RowList // slice of rows in the room
Length uint64 // total node count in the room
LastChangeTime time.Time // last update time
}
type RowList struct {
Nodes []*Node // list of nodes
}Each client connection is wrapped into a
Nodeand placed into a
RowListbelonging to its room.
type Node struct {
RoomID string
ClientID int64
Conn *websocket.Conn
UpdateTime time.Time
LastSendTime time.Time // last message send time
IsAlive bool // connection health flag
DisabledRead bool // whether speaking permission is disabled
}Nodes are grouped into slices; each slice is processed by a goroutine for sequential sending, while a lock protects the room during concurrent operations.
Message Management (Code)
var messageChannel map[string]chan nodeMessage
func init() {
messageChannel = make(map[string]chan nodeMessage)
}
func sendMessageToChannel(roomId string, nm nodeMessage) error {
if c, ok := messageChannel[roomId]; ok {
c <- nm
} else {
messageChannel[roomId] = make(chan nodeMessage, 1024)
messageChannel[roomId] <- nm
roomObj := &RoomInfo{}
roomObj.RoomID = roomId
roomObj.Rows = make([]*RowList, 0, 4)
roomObj.Lock = &sync.Mutex{}
go daemonReciver(messageChannel[roomId], roomObj)
go timerForClean(messageChannel[roomId])
if roomId == "" {
go CleanHall(roomObj)
}
}
return nil
}Each room has its own message channel stored in
messageChannel. Incoming messages are pushed to the channel, and dedicated goroutines handle broadcasting and cleanup.
Server Management
A top‑level chatroom is created; all servers connect to it, and messages received by any server are broadcast to the others through this shared room.
Daemon Goroutine Management
Message‑sending goroutine: pulls messages from the channel and concurrently sends them to all
RowListinstances.
Room‑cleanup goroutine: periodically removes dead nodes and reorganizes room structures to improve efficiency.
Testing
Environment: Cloud VM, 8 CPU / 16 GB.
OS: CentOS 7 (no special tuning).
Test: 15 000 WebSocket connections in a single room, each sending a message that passes blacklist and sensitive‑word filters, then broadcast.
CPU usage: < 5%.
Memory usage: 2 GB (including OS).
Network: peak ~10 Mb/s.
Broadcast latency for 15 000 nodes: 100‑110 ms.
Result: an 8‑core / 16 GB machine can comfortably handle 50 k concurrent connections, with peak capacity near 60‑70 k.
More Sharing
The core implementation has been open‑sourced at https://github.com/logan-go/roomManager for interested readers.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.