Big Data 15 min read

Kuaishou's Practices for Large‑Scale Model Data Processing, Real‑Time Feature Handling, and Storage

This article presents Kuaishou's end‑to‑end engineering solutions for handling massive, real‑time recommendation model data, covering scenario description, complex business pipelines, trillion‑parameter model storage, high‑throughput processing with Flink and NVM, and future directions for cloud‑native scalability.

DataFunTalk
DataFunTalk
DataFunTalk
Kuaishou's Practices for Large‑Scale Model Data Processing, Real‑Time Feature Handling, and Storage

Kuaishou operates a real‑time recommendation system that must process billions of daily video uploads and live‑stream interactions, requiring both massive data volume and sub‑second latency for feature collection, model training, and online inference.

The recommendation pipeline consists of recall, coarse ranking, fine ranking, re‑ranking, and final result selection, with large‑scale business scenarios split into massive (trillions of daily samples) and medium‑scale (hundreds of billions) workloads, each using either streaming or batch iteration strategies.

Kuaishou's recommendation models have grown to the trillion‑parameter scale (≈1.9 × 10³ B parameters) due to the use of SIM long‑sequence models, demanding petabyte‑level storage and high‑throughput access.

The evolution of language models—from RNNs to Transformers and encoder‑decoder architectures—directly influences recommendation model design, with newer models offering parallelism and higher computational efficiency.

For real‑time data processing, Kuaishou combines Flink streaming with a stateless hash‑join approach, offloading state to high‑performance storage to achieve sub‑second end‑to‑end latency despite handling up to 30 TB of state per second during peak traffic.

Feature computation leverages both scalar CPU operations and vectorized GPU kernels, orchestrated through a Python‑based DSL that calls high‑performance C++ operators, enabling flexible and efficient processing of massive feature sets.

Storage requirements are met with a three‑layer NVM‑Table architecture (NVM, memory pool, and unified API), employing LRU/LFU eviction, feature‑score gating, and zero‑copy techniques to deliver ultra‑low latency, high‑throughput, and strong consistency across thousands of nodes.

Looking forward, Kuaishou plans to adopt cloud‑native designs, CXL/NVM hardware, and generative token‑based recommendation models, anticipating a tenfold increase in state storage demands and emphasizing hardware‑software co‑optimization to sustain future growth.

big dataRecommendation systemsLarge-Scale Modelsreal-time data processingKuaishouNVM storage
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.