Baidu's Optimization of MooseFS and Redis: Architecture Improvements and Performance Enhancement
At Baidu’s 49th Technical Salon, Cheng Yishi explained how the company revamped its MooseFS and Redis systems by adding a Shadow Master to split reads from writes, introducing Slave nodes for failover, and deploying a Redis proxy middleware, thereby dramatically improving performance, scalability, and high‑availability for critical services.
At the 49th Baidu Technical Salon, Cheng Yishi, technical leader of Baidu's MFS team responsible for MooseFS (distributed file system) and BDRP (distributed KV system) development, shared how Baidu improved MooseFS and Redis architecture to enhance performance and scalability.
Architecture Problems Identified: The main issues with MooseFS and Redis are performance bottlenecks due to strong centralized design, and error handling/extension problems. Both systems share similar architecture with Google File System (GDFS).
First Improvement - Shadow Master: Baidu developed a Shadow Master architecture where the master handles all write requests while the shadow master processes most read requests. Client-side routing directs all traffic to master initially. In theory, the master can handle 100,000 requests per second while shadow master handles 20,000. In practice, when master processes 100,000 requests, shadow master can handle 70,000-80,000 (40% of total requests).
Second Improvement - Slave for Failover: To address error handling issues, Baidu added Slave nodes that take over the entire cluster during failures, providing strong consistency replacement.
Redis High Availability: Baidu uses Redis proxy middleware to build highly available distributed Redis clusters, meeting low-latency and large-data business requirements. These systems are widely used in Baidu's commercial products, LBS products, database file hot backup, and support many critical services.
Key Insights: Cheng emphasized that distributed storage architecture has no major changes—the most important aspect is the details, which are complex and numerous. Poor handling can cause entire clusters to fail. He encouraged developers not to fear "pitfalls" in open-source software but to understand and fix them by studying the code.
Baidu Tech Salon
Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.