Cloud Native 13 min read

Implementing Compute-Storage Separation for Large-Scale Retrieval Systems Using Fluid

This article describes the challenges of operating massive, TB‑scale retrieval clusters at Zuoyebang, and presents a Fluid‑based compute‑storage separation architecture that improves data distribution, update efficiency, scalability, and stability, enabling containerized search services to be managed like regular stateless workloads.

Architecture Digest
Architecture Digest
Architecture Digest
Implementing Compute-Storage Separation for Large-Scale Retrieval Systems Using Fluid

Large‑scale retrieval systems are the foundational layer for many platform services, often running on thousands of bare‑metal servers with petabyte‑level data, demanding extreme performance, throughput, stability, and low fault tolerance.

Beyond the operational layer, massive clusters and massive data bring huge challenges in data iteration and service governance, such as efficient incremental and full data distribution and hotspot tracking.

This article introduces Zuoyebang’s internal design of a Fluid‑based compute‑storage separation architecture that significantly reduces the complexity of large‑scale retrieval services, allowing them to be managed like ordinary online services.

1. Problems Faced by Large‑Scale Retrieval Systems

Zuoyebang’s intelligent analysis and search functions rely on a retrieval system with over a thousand nodes and hundreds of terabytes of data. Each shard replicates the full dataset across many servers, targeting P99 latency of 1.x ms, peak throughput of hundreds of GB/s, and 99.999% availability.

Traditional approaches focus on data locality, but daily TB‑scale index updates require offline indexing services to push updates to each shard, leading to difficulties in data synchronization, weak elasticity, and limited shard scalability.

Key issues include:

Data set dispersion: synchronizing full shard data to every node requires multi‑level distribution, causing long cycles and extensive validation.

Weak elastic scaling of business resources: tightly coupled compute‑storage architecture hampers rapid scaling for traffic spikes.

Insufficient per‑shard data scalability: storage limits force data splitting unrelated to business needs.

These challenges increase cost and weaken automation.

The root cause is the tight coupling of compute and storage; decoupling them via a compute‑storage separation architecture is essential.

2. How Compute‑Storage Separation Solves Complexity

The new architecture must ensure stable reads, high update speed for thousands of nodes, POSIX‑compatible access, controllable data iteration (treating data updates as CD pipelines), and scalable data sets.

Fluid, an open‑source Kubernetes‑native distributed data orchestration and acceleration engine, was chosen as the core component.

3. Component Introduction

Fluid provides a Kubernetes‑native abstraction that lets data flow like fluid between storage backends (HDFS, OSS, Ceph) and compute workloads, handling caching, replication, eviction, transformation, and management transparently.

It focuses on dataset orchestration (caching datasets on specific nodes) and application orchestration (scheduling apps onto nodes that already hold required data), enabling collaborative scheduling.

4. Reasons for Choosing Fluid

The search service is already containerized, making it a natural fit for Fluid.

Fluid’s data‑aware scheduling allows near‑data processing, improving access performance.

Fluid implements a PVC interface, letting pods mount data without awareness, providing metadata, distributed caching, and efficient file retrieval.

Fluid + jindoruntime offers multiple cache modes (origin‑pull, full cache) and storage options (disk, memory) adaptable to various scenarios.

5. Practical Implementation

Separate cache nodes from compute nodes to improve elasticity and isolate stability concerns.

Use Fluid’s dataset nodeAffinity to schedule cache nodes efficiently.

Adopt a full‑cache mode to avoid unexpected origin pulls, ensuring atomic data loads where new versions become visible only after complete loading.

Combined with automated indexing and version management, this approach greatly enhances system safety, stability, and automation.

6. Results

Minute‑level distribution of hundreds of terabytes of data.

Atomic data version management and update processes, turning data distribution into a controllable, intelligent CI/CD pipeline.

Search services behave like stateless services, easily scaling horizontally via Kubernetes HPA, improving stability and availability.

7. Outlook

Compute‑storage separation enables previously specialized services to become stateless and fit into DevOps workflows. Beyond retrieval, Fluid is being explored for OCR model training and distribution, with future work focusing on optimizing scheduling, expanding model training pipelines, and contributing to Fluid’s observability and high‑availability features.

Kuberneteslarge-scale retrievalCompute-Storage SeparationData OrchestrationFluid
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.