Scaling PB‑Level Data: Mastering Redis, Codis, and MySQL Sharding
In this technical share, the operations director explains how his team tackled PB‑scale data challenges by scaling Redis with Codis, implementing multi‑dimensional MySQL sharding, using vertical and horizontal partitioning, and optimizing storage with TokuDB, offering practical insights for high‑throughput system design.
1. Redis Scaling
We faced a Redis scaling bottleneck: a single Redis process could not utilize multiple CPU cores, reaching 95‑100% CPU at 15,000 concurrent connections, causing task‑scheduling delays.
Option 1: Change program logic to use consistent‑hash multiple Redis instances – abandoned due to development workload.
Option 2: Redis Cluster – rejected because our heavy write workload and limited production adoption made it unsuitable.
Option 3: Codis or Twemproxy – chosen solution.
We finally adopted Codis. Thanks to help from Liu Qi, Codis has run stably for over a year with no incidents and achieved 150,000 QPS.
Architecture Overview
The architecture diagram (below) shows a multi‑layer design with no single points of failure.
Key advantages of Codis:
1. Smooth migration – Codis provides migration tools; moving from Redis to Codis caused zero business impact, though some Redis commands are unsupported.
2. Easy scaling – Data is split into 1024 slots, allowing expansion to up to 1024 Redis instances by adding new instances and migrating slots.
3. Friendly management UI – Operations such as migration can be performed directly through the web console.
2. Using MySQL for PB‑Level Data Storage
Our second challenge was storing petabyte‑scale data. The monitoring service collects data from over 200 global points, with daily growth exceeding 1 TB.
Real scenario: 200+ monitoring points generate logs that are stored centrally; the service serves over 300,000 users.
Storing such log‑like data directly in MySQL is not ideal; we also rely on Elasticsearch, Hadoop, Spark, Suro, Kafka, Druid, and other big‑data frameworks, or simple file storage.
Database Sharding Challenges
We explored both vertical and horizontal sharding.
Vertical Sharding
Multiple databases reside in a single MySQL instance; high‑IO database A was moved to a separate server without code changes, using Mycat.
Horizontal Sharding
Tables are split across multiple shards based on a field using various rules:
Enumeration – e.g., split by province.
Modulo – numeric fields modulo 10, 100, etc.; also string hash modulo.
Range – e.g., ID 100001‑200000.
Date – daily, monthly, hourly partitions.
Consistent hash – solves expansion in distributed environments.
Consistent hash effectively handles capacity growth.
Example: API Monitoring Data
We use Cobar for two‑level sharding: first by monitoring‑point ID (enumeration), then by day (table partition).
Each monitoring point’s data is stored together, making it easy to understand business‑wise.
First sharding by monitoring‑point ID using enumeration.
Second sharding by day; daily scripts create tables for the upcoming month.
Derived Scaling Issues
Initially 10 machines stored data for 200+ points. As data grew, we added machines, eventually needing over 200 machines. Some points generated so much data that a single disk could not hold it.
We compressed historical data with TokuDB, reducing 1 TB to ~100 GB, allowing up to ~4 PB on a 2 TB cloud disk.
Solution: Four‑Dimensional Sharding
We shard data on four dimensions:
Date – first level.
Monitoring‑point ID – second level.
Dynamic modulo of task ID based on current shard count – third level.
Task ID % 100 – fourth level.
Benefits:
Hot and cold data naturally separated.
Unlimited date‑based sharding.
Shard count per time slice is configurable.
No data migration required.
Note: In theory sharding can be unlimited; server count per shard is controllable, making cost savings predictable.
Some tables may be empty due to double modulo, but this does not affect the application.
Interactive Q&A
Q1: How many records after all these sharding steps?
We do not have a total count; current write rate is 20,000 QPS.
Q2: Impact of rehash in consistent hash?
We do not use consistent hash in production; migration volume was deemed too large for our scenario.
Q3: Disk configuration for each DB to improve I/O?
We use UCLOUD cloud disks backed by RAID‑10, supporting up to 1,600 random write IOPS.
Q4: Role of master‑slave in Codis group?
Provides high availability; manual failover is triggered via script.
Q5: Are all databases in a single data center?
We leverage multi‑region cloud deployments; cross‑region latency is under 10 ms, and we have a multi‑cloud strategy for disaster recovery.
Q6: Will Codis automatically remove a failed instance?
Current version requires manual intervention; future versions may add auto‑removal.
Q7: Is master‑slave in Codis just a state, not real Redis replication?
Codis relies on Redis’s native master‑slave replication.
Q8: Details of master‑slave switch and high availability?
Manual switch via UI or script; see diagram below.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.