Big Data 25 min read

StarRocks Compute‑Storage Separation Cuts Costs 40% and Boosts Efficiency 20% at DMALL

DMALL upgraded its big‑data platform by adopting StarRocks 3.x with compute‑storage separation, lakehouse external tables, and Kubernetes deployment, achieving 20% higher compute utilization, 40% lower storage cost, faster cluster provisioning, and notable improvements in development and operations efficiency.

StarRocks

Jan 2, 2025

Background

DMALL, founded in 2015, provides an all‑channel digital retail solution DMALL OS. Its big‑data platform, the technical foundation of DMALL OS, has supported rapid B2B growth. With industry upgrades and the maturity of cloud‑native technologies, the platform architecture transitioned from integrated compute‑storage to a separated model.

Architecture 1.0 and Pain Points

In 2021 DMALL adopted StarRocks as a unified OLAP engine (see the earlier article for details). The “data lake + data warehouse” layered architecture met business needs but suffered from:

Compute resource waste : peak resources were often idle due to tidal usage patterns.

Storage redundancy : data needed extra synchronization, leading to multiple copies.

Reliability challenges : complex data sync and scaling delays affected user experience.

Architecture 2.0 Upgrade

In 2023, DMALL upgraded the platform with a lakehouse core, introducing cloud‑native, compute‑storage separation, and StarRocks on Kubernetes. The new architecture includes:

After the separation, the deployment and data flow were updated as shown:

StarRocks 3.x added core features such as compute‑storage separation, External Catalog, and Kubernetes deployment. After half a year of testing, the final architecture delivered:

Reduced compute cost : Peak compute workloads (StarRocks daytime, Spark nighttime) share the same resources, raising overall utilization and lowering cost.

Reduced storage cost : StarRocks can store internal tables on S3‑compatible object storage and query external tables directly from Hive or Iceberg, eliminating redundant storage.

Maintained query performance : Local SSD cache and Kubernetes HPA keep query latency comparable to the integrated model.

Application Scenarios

T+1 Update Analysis

For offline reports such as “high‑dimensional business reports”, the team used StarRocks External Catalog to query Iceberg tables without building internal tables, achieving fast query rates comparable to the integrated mode.

Real‑Time Update Analysis

For real‑time monitoring, the team leveraged StarRocks Default Catalog with CDC to write data into internal tables, achieving an average latency of about 10 seconds.

Practical Experience

Lakehouse (External Table) Integration

StarRocks 2.3+ supports Catalog, allowing simultaneous management of internal and external data. DMALL customized this by:

Adding a custom login interface to integrate with the internal UniDATA authentication system.

Consolidating management permissions (catalog creation, BE node deletion, FE config changes) into a unified role.

Linking table permissions to Apache Ranger for both internal and external tables, decoupling data permissions from the storage engine.

Unified Query Entry

Users can switch catalogs with SET CATALOG <catalog_name> or query directly using

SELECT * FROM <catalog_name>.<db_name>.<table_name>

. To simplify client usage, DMALL built a MixDB Proxy that maps logical databases to physical storage, handling authentication, rate‑limiting, fallback, resource statistics, and query auditing.

Local Cache Optimization

StarRocks 3.x introduced local SSD cache for external Iceberg tables, narrowing the performance gap with internal tables.

Properties
# Enable cache
 datacache_enable = true
# Iceberg metadata refresh interval (default 10min)
 background_refresh_metadata_interval_millis = 600000

During testing, the team observed slower first‑run queries because the cache was not pre‑warmed. They built a cron‑based cache‑refresh tool that extracts recent SQL from the AuditLoader table, analyzes accessed tables, predicts next‑day hot tables, and issues refreshed queries.

Index and Compaction Tuning

StarRocks 3.3.2 allows primary‑key indexes to be persisted to object storage, reducing disk pressure. After upgrading, each compute node’s disk was reduced from 60 GB to 20 GB, saving ~80% storage.

Properties
# Enable persistent index on object storage
 "enable_persistent_index" = "true",
 "persistent_index_type" = "CLOUD_NATIVE"

Compaction temporary files also caused disk pressure. Setting enable_light_pk_compaction_publish = false prevented large temporary files from being written to /cn/storage/tmp, crucial for small‑disk environments.

# Disable lightweight PK compaction temporary files
 enable_light_pk_compaction_publish = false
# Allow vertical compaction to write temporary data to disk
 lake_enable_vertical_compaction_fill_data_cache = true
# Increase max data ratio per compaction
 update_compaction_ratio_threshold = 0.8
# Increase buffer size for remote reads during compaction
 lake_compaction_stream_buffer_size_bytes = 5242880

StarRocks on Kubernetes Enhancements

Scenario‑based small clusters : Deploy dedicated StarRocks clusters per department or use‑case via separate Kubernetes namespaces, isolating resources and reducing interference.

Shared lake tables : All clusters share the same Iceberg lake tables, ensuring consistent query results.

Self‑service ops with centralized control : Each cluster has a “cluster admin” role for day‑to‑day operations, while a central team holds super‑admin privileges for resource and permission management.

Metadata Backup & Recovery

To protect FE metadata from loss, DMALL built a sidecar container that periodically syncs FE metadata to an S3 bucket using aws s3 sync. In case of node failure, an init container restores the metadata before FE starts.

Custom Elastic Scaling

CN pod startup latency (~10 s) hindered high‑concurrency queries. DMALL introduced a CronJob that pre‑scales the minimum CN replica count before peak hours and scales down at night, ensuring resources are ready for BI queries while freeing capacity for offline jobs.

Handling HPA‑Induced Write Failures

When HPA shrank a CN pod during an active write, the task failed because the FE was notified after the CN entered Terminating. DMALL added a preStop hook to inform FE before termination.

YAML
lifecycle:
  preStop:
    exec:
      command:
        - sh
        - '-c'
        - >-
          mysql -h $FE_SERVICE_NAME.$POD_NAMESPACE.svc.cluster.local -P9030 -uroot -ppwd -e "ALTER SYSTEM DROP COMPUTE NODE '$HOSTNAME.starrocks-cn-search.$POD_NAMESPACE.svc.cluster.local:9050'";
          sh /opt/starrocks/cn_prestop.sh

Summary

The migration to a compute‑storage separated, lakehouse‑centric architecture with StarRocks on Kubernetes delivered a 20% increase in compute utilization, over 40% storage cost reduction, and significant gains in development and operations efficiency. The experience highlighted the importance of proper cache warming, index persistence, compaction tuning, and Kubernetes‑native automation.

Future Plans

Adopt serverless containers (e.g., Volcano VCI, Huawei CCI, Alibaba ECI) for elastic CN scaling.

Enhance StarRocks operational features such as health diagnostics, SQL tuning, and cost reporting.

Improve automatic materialized view (AutoMV) optimization for data‑warehouse workloads.

Explore new lakehouse formats like Paimon for real‑time and cold‑storage scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Kubernetes StarRocks Lakehouse Compute-Storage Separation

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.