Evolution and Design of Bilibili Customer Service Seat Scheduling System
The article traces Bilibili’s customer‑service seat scheduling system from its initial balanced‑distribution algorithm and Redis‑based priority queues for live chat and ticket handling, through fairness‑focused saturation limits and virtual‑queue mechanisms, to planned dynamic tuning and expertise‑aware routing for future scalability.
This article provides a comprehensive technical analysis of the evolution of Bilibili's customer service seat scheduling system, covering both online chat and ticket‑based support. It explains why sophisticated scheduling is required, especially during large‑scale events that generate sudden spikes in traffic.
Background and Challenges
The core goal is to achieve better user experience with fewer service agents while avoiding waste of human resources. Challenges include heterogeneous agent skills, varying service durations, unpredictable traffic bursts, and the need to respect agents' work/rest cycles.
Online Seat Scheduling – Phase 1
Four primary allocation strategies were evaluated:
Balanced distribution – evenly spreads incoming requests among agents.
Familiar‑customer priority – routes a user to an agent who previously handled them.
Last‑service priority – assigns the request to the agent who served the user most recently.
Designated allocation – assigns based on custom business rules.
The balanced distribution was chosen as the default, implemented with a Strategy Pattern to allow easy switching to other policies.
Balanced Distribution Logic
The system compares the current load of two agents (A and B) within the same skill group. Allocation rules are:
If A’s active sessions are fewer than B’s and both are below their saturation limits, new requests go to A.
If the loads are equal and both are under the limits, the request is assigned randomly.
If an agent has reached its saturation, the request is given to any agent still below the limit.
If all agents are saturated, the request enters a queue.
Queue Management with Redis Zset
The queue is implemented using Redis sorted sets, enabling ordering by enqueue timestamp and fast rank queries. Typical commands are:
ZADD tobeallocated_ticket_list_{group_id} {weight} {ticket_id}
ZRANGE tobeallocated_ticket_list_{group_id} 0 N
ZREM tobeallocated_ticket_list_{group_id} {ticket_id}
ZSCORE tobeallocated_ticket_list_{group_id} {ticket_id}
Weight calculation combines a high‑order type component (release = 1, transfer = 2, explicit = 3, auto = 4) multiplied by 10¹⁰ and a low‑order component based on seconds elapsed since a fixed start time, ensuring both type priority and FIFO ordering.
Virtual Queue (Virtual Waiting Area)
To avoid wasting agent capacity during massive spikes, users beyond a configurable rank are moved to a virtual waiting area. The system probes these users; if they respond within a timeout, they are promoted back to the normal queue, otherwise they are dropped.
Ticket Seat Scheduling – Phase 1
Ticket handling differs from live chat: a ticket is created first, then assigned to an agent. Allocation must satisfy real‑time, accuracy, fairness, and priority requirements. The same Redis Zset structure is used, with weight‑based ordering to prioritize releases over automatic assignments.
Phase 2 Improvements
Fairness issues were observed when agents with earlier login times received a disproportionate share of tickets. The solution introduced “instant saturation” limits (maximum concurrent tickets) alongside daily caps. Allocation now checks both limits before assigning a ticket, smoothing load across the day and preventing early‑login agents from monopolizing work.
Phase 3 Outlook
Future enhancements include:
Adjusting virtual‑queue entry criteria based on actual waiting time rather than static rank.
Dynamic tuning of virtual‑queue ratios per skill group using recent session metrics.
Incorporating soft factors such as agent expertise, historical performance, and user feedback to further optimize ticket routing.
Conclusion
The described scheduling system demonstrates how a combination of balanced algorithms, Redis‑based priority queues, and adaptive virtual waiting mechanisms can handle high‑volume, bursty traffic while maintaining service quality and operational efficiency.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.