Databases 11 min read

StarRocks 3.0 Storage‑Compute Separation Architecture: Design, Implementation, and Evaluation

This article explains the storage‑compute separation architecture introduced in StarRocks 3.0, presents industry case studies, details the design of StarOS and compute nodes, discusses technical challenges and key techniques, and evaluates cost, reliability, elasticity, and performance through benchmarks and user feedback.

DataFunSummit

Feb 1, 2024

StarRocks 3.0 Storage‑Compute Separation Architecture: Design, Implementation, and Evaluation

The article introduces StarRocks 3.0’s storage‑compute separation architecture, beginning with three industry examples (AWS S3‑based shared storage, Snowflake SaaS architecture, and AWS Redshift) that illustrate different approaches, benefits, and limitations of separating storage and compute.

It then describes StarRocks’ own design, including the cloud‑native StarOS platform, the transformation of the traditional BE node into a stateless Compute Node (CN), and the role of StarManager in managing tablets, shards, services, files, and logs.

Key implementation details are covered: caching remote data locally to mitigate high I/O latency, enforcing immutable files for version control, ensuring write idempotency to avoid data corruption during node failures, and employing multi‑version management for consistency.

The article analyzes challenges such as heterogeneous storage support, low‑latency I/O over distributed storage, metadata consistency, cache and compute scheduling, and compaction/GC coordination in a stateless environment.

Benefits of the separation are quantified, highlighting cost reduction (fewer replicas and cheaper storage media), improved reliability and availability (leveraging cloud storage SLAs), and enhanced elasticity through resource isolation similar to Snowflake’s warehouse model.

Performance evaluations include StreamLoad ingestion throughput, TPC‑DS query benchmarks on a 1 TB dataset, and real‑world user case studies that demonstrate comparable latency to integrated architectures while achieving lower cost and better scalability.

The article concludes that StarRocks’ storage‑compute separation addresses many of the limitations observed in existing solutions, offering a flexible, cost‑effective, and high‑performance platform for modern analytical workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native StarRocks Performance Evaluation Storage Compute Separation distributed databases

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.