Databases 14 min read

Redis Cluster Slot Migration Optimization Based on Master-Slave Replication Mechanism

Vivo’s storage team redesigned Redis cluster slot migration by disguising the target as a slave and using master‑slave replication with progressive loading, which eliminates ask‑move round trips, reduces latency spikes, consolidates topology updates, and ensures stable access and high availability during horizontal scaling.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Redis Cluster Slot Migration Optimization Based on Master-Slave Replication Mechanism

This article discusses the optimization of Redis cluster horizontal scaling operations by vivo's storage team. The native Redis cluster slot migration causes significant latency spikes during data migration, affecting business operations and even leading to cluster node failures.

Problem Analysis:

The native Redis migration operates at key granularity, involving operations such as serializing key-value pairs, sending data over network, waiting for acknowledgment, and cleaning up local copies. These synchronous blocking operations occupy the worker thread for extended periods, causing service latency spikes. Additionally, the ask-move mechanism doubles network round trips, and frequent topology changes during migration trigger concentrated topology refresh requests from clients, causing resource contention.

Solution:

Vivo developed a custom slot migration mechanism based on master-slave replication. The target node is伪装成 the source node's slave, using the master-slave protocol for data transfer. Key modifications include: adding slot information exchange between migrating nodes, restructuring RDB file structure by slot order, and recording slot data offset metadata at the RDB file end. The target node implements progressive loading - loading data incrementally as each network packet arrives rather than blocking until all data is received.

Optimization Results:

1) Minimal latency impact - progressive loading reduces large key migration impact; 2) Stable data access - no ask-move mechanism needed during migration, ensuring read-write consistency; 3) Fewer topology changes - supports migrating multiple slots simultaneously with single topology update; 4) High availability - migration state syncs to slave nodes, enabling automatic failover continuation.

Test results show that the custom migration maintains stable latency on both source and target nodes during data transfer, significantly outperforming native migration.

distributed systemsRedisPerformance TuningDatabase OptimizationclusterSlot MigrationMaster-Slave Replication
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.