High‑Throughput Cloud File Storage (CFS) Design for Kuaishou's Spring Festival Red‑Envelope Campaign
During Kuaishou's 2020 Spring Festival red‑envelope campaign, Tencent Cloud's Cloud File Storage (CFS) provided a high‑throughput, read‑heavy NFS solution that handled up to 5 GB/s across 1 700 pods, delivering 100% availability, low latency, and massive interaction volumes for billions of users.
During the 2020 Spring Festival, Tencent Cloud's Cloud File Storage (CFS) was selected after rigorous stress testing to serve as the storage backbone for Kuaishou's advertising recommendation system, ensuring 100% availability for the massive red‑envelope activity.
Background: The Spring Festival gala attracts over 1 billion viewers, making it a prime traffic source for internet platforms. Kuaishou won the exclusive partnership for the gala and launched a "6 billion cash" red‑envelope campaign, resulting in unprecedented interaction volumes (639 billion clicks, 5.9 billion shares, 7.8 billion viewers).
Business Scenario & Requirements:
Throughput demand: 5 GB/s (≈40 Gbps)
Single‑file size: from KB to tens of GB
Concurrent clients: up to 8 K Pods (≈4 000 nodes, >8 000 Pods)
Metadata operations: occasional ls
Compute node spec: 84 CPU × 320 GB RAM per TKE node
The workload is read‑heavy, with large model files (10‑plus GB) copied from IDC to CFS and then accessed by thousands of Pods during the campaign.
Challenges: Achieving single‑file high‑throughput under thousands of concurrent connections, and handling massive read traffic without degrading latency.
NFS Overview: NFS provides a network‑transparent file system interface, supporting POSIX semantics, and has evolved through versions v2, v3, v4, and the parallel extension pNFS (v4.1). It operates via RPC/XDR layers and integrates with the Linux VFS.
CFS Features: CFS offers scalable shared file storage with standard NFS/CIFS protocols, automatic capacity expansion, high reliability (three‑copy replication, 99.9999999% durability), and fine‑grained access control via VPC and security groups.
Solution Architecture: By analyzing the read‑dominant, large‑file characteristics, the team optimized CFS pre‑fetch strategies and introduced a multi‑level cache:
OS page cache on storage servers
Parallel data pull to access servers, which cache files in memory
Pods are scheduled to different access servers, distributing read load
This design broke the single‑file throughput bottleneck, achieving a peak aggregate throughput of 6.61 GB/s across 1 700 concurrent clients while maintaining 100% availability.
Results: The campaign recorded 639 billion total interactions, 7.8 billion live viewers, and a 40% increase in daily active users. CFS successfully delivered the advertising models to all Pods with low latency and high reliability.
Additional Insights: The presentation also covered the modular decomposition of distributed file systems (metadata, data, interface, and control modules) and answered audience questions about mounting, open‑source plans, access control, caching mechanisms, and comparisons with HDFS.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.