Cloud Native 14 min read

Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

This step‑by‑step tutorial shows how to deploy StarRocks and MinIO in a decoupled storage‑compute architecture using Docker Compose and Kubernetes, configure local caching, create storage volumes, load public datasets, and run SQL queries to explore the combined data.

StarRocks
StarRocks
StarRocks
Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

Overview

The decoupled storage‑compute ("存算分离") architecture separates compute and storage, allowing independent scaling, lower costs, and better resource utilization. StarRocks 3.0+ supports this model, and MinIO provides an open‑source, S3‑compatible object store that can be used for both local testing and private deployments.

Advantages of Decoupled Architecture

Cost control : Scale compute and storage independently.

Flexible deployment : Mix and match compute and storage systems.

Elastic scaling : Add or remove nodes as workload changes.

Query performance : Local cache reduces remote storage latency.

Maintainability : Workloads can be moved between resource pools.

Clear resource isolation : Supports multi‑tenant scenarios.

Prerequisites

curl : Download YAML and data files.

Docker Compose : Install via Docker Desktop (includes Docker Engine and Compose).

Verify installation: docker compose version SQL client : DBeaver or MySQL CLI.

The MySQL instance in this tutorial can be accessed via the MySQL CLI; a GUI client is optional.

Quick Start Steps

Create work directory and download Docker Compose file

mkdir sr-quickstart
cd sr-quickstart
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/docker-compose.yml

Start containers in background docker compose up -d Configure MinIO Access the MinIO console at http://localhost:9001/access-keys (default credentials minioadmin:minioadmin ) and create an access key.

Connect SQL client (DBeaver example) Host: localhost , Port: 9030 , User: root .

Or use MySQL CLI inside the starrocks-fe container

docker compose exec starrocks-fe mysql -P9030 -h127.0.0.1 -uroot --prompt="StarRocks > "

Create storage volume in StarRocks

CREATE STORAGE VOLUME shared
TYPE = S3
LOCATIONS = ("s3://starrocks/shared/")
PROPERTIES (
  "enabled" = "true",
  "aws.s3.endpoint" = "http://minio:9000",
  "aws.s3.use_aws_sdk_default_behavior" = "false",
  "aws.s3.enable_ssl" = "false",
  "aws.s3.use_instance_profile" = "false",
  "aws.s3.access_key" = "{your Access Key}",
  "aws.s3.secret_key" = "{your Secret Key}"
);
SET shared AS DEFAULT STORAGE VOLUME;

Download sample datasets

curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv

Create database and tables

CREATE DATABASE IF NOT EXISTS quickstart;
USE quickstart;
CREATE TABLE IF NOT EXISTS crashdata (
  CRASH_DATE DATETIME,
  BOROUGH STRING,
  ZIP_CODE STRING,
  LATITUDE INT,
  LONGITUDE INT,
  LOCATION STRING,
  ON_STREET_NAME STRING,
  CROSS_STREET_NAME STRING,
  OFF_STREET_NAME STRING,
  CONTRIBUTING_FACTOR_VEHICLE_1 STRING,
  CONTRIBUTING_FACTOR_VEHICLE_2 STRING,
  COLLISION_ID INT,
  VEHICLE_TYPE_CODE_1 STRING,
  VEHICLE_TYPE_CODE_2 STRING
);
CREATE TABLE IF NOT EXISTS weatherdata (
  DATE DATETIME,
  NAME STRING,
  HourlyDewPointTemperature STRING,
  HourlyDryBulbTemperature STRING,
  HourlyPrecipitation STRING,
  HourlyPresentWeatherType STRING,
  HourlyPressureChange STRING,
  HourlyPressureTendency STRING,
  HourlyRelativeHumidity STRING,
  HourlySkyConditions STRING,
  HourlyVisibility STRING,
  HourlyWetBulbTemperature STRING,
  HourlyWindDirection STRING,
  HourlyWindGustSpeed STRING,
  HourlyWindSpeed STRING
);

Load data into tables via StarRocks stream load

curl --location-trusted -u root \
    -T ./NYPD_Crash_Data.csv \
    -H "label:crashdata-0" \
    -H "column_separator:," \
    -H "skip_header:1" \
    -H "enclose:\"" \
    -H "max_filter_ratio:1" \
    -H "columns:tmp_CRASH_DATE, tmp_CRASH_TIME, CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5" \
    -XPUT http://localhost:8030/api/quickstart/crashdata/_stream_load
curl --location-trusted -u root \
    -T ./72505394728.csv \
    -H "label:weather-0" \
    -H "column_separator:," \
    -H "skip_header:1" \
    -H "enclose:\"" \
    -H "max_filter_ratio:1" \
    -H "columns: STATION, DATE, LATITUDE, LONGITUDE, ELEVATION, NAME, REPORT_TYPE, SOURCE, HourlyAltimeterSetting, HourlyDewPointTemperature, HourlyDryBulbTemperature, HourlyPrecipitation, HourlyPresentWeatherType, HourlyPressureChange, HourlyPressureTendency, HourlyRelativeHumidity, HourlySkyConditions, HourlySeaLevelPressure, HourlyStationPressure, HourlyVisibility, HourlyWetBulbTemperature, HourlyWindDirection, HourlyWindGustSpeed, HourlyWindSpeed, Sunrise, Sunset, DailyAverageDewPointTemperature, DailyAverageDryBulbTemperature, DailyAverageRelativeHumidity, DailyAverageSeaLevelPressure, DailyAverageStationPressure, DailyAverageWetBulbTemperature, DailyAverageWindSpeed, DailyCoolingDegreeDays, DailyDepartureFromNormalAverageTemperature, DailyHeatingDegreeDays, DailyMaximumDryBulbTemperature, DailyMinimumDryBulbTemperature, DailyPeakWindDirection, DailyPeakWindSpeed, DailyPrecipitation, DailySnowDepth, DailySnowfall, DailySustainedWindDirection, DailySustainedWindSpeed, DailyWeather, MonthlyAverageRH, MonthlyDaysWithGT001Precip, MonthlyDaysWithGT010Precip, MonthlyDaysWithGT32Temp, MonthlyDaysWithGT90Temp, MonthlyDaysWithLT0Temp, MonthlyDaysWithLT32Temp, MonthlyDepartureFromNormalAverageTemperature, MonthlyDepartureFromNormalCoolingDegreeDays, MonthlyDepartureFromNormalHeatingDegreeDays, MonthlyDepartureFromNormalMaximumTemperature, MonthlyDepartureFromNormalMinimumTemperature, MonthlyDepartureFromNormalPrecipitation, MonthlyDewpointTemperature, MonthlyGreatestPrecip, MonthlyGreatestPrecipDate, MonthlyGreatestSnowDepth, MonthlyGreatestSnowDepthDate, MonthlyGreatestSnowfall, MonthlyGreatestSnowfallDate, MonthlyMaxSeaLevelPressureValue, MonthlyMaxSeaLevelPressureValueDate, MonthlyMaxSeaLevelPressureValueTime, MonthlyMaximumTemperature, MonthlyMeanTemperature, MonthlyMinSeaLevelPressureValue, MonthlyMinSeaLevelPressureValueDate, MonthlyMinSeaLevelPressureValueTime, MonthlyMinimumTemperature, MonthlySeaLevelPressure, MonthlyStationPressure, MonthlyTotalLiquidPrecipitation, MonthlyTotalSnowfall, MonthlyWetBulb, AWND, CDSD, CLDD, DSNW, HDSD, HTDD, NormalsCoolingDegreeDay, NormalsHeatingDegreeDay, ShortDurationEndDate005, ShortDurationEndDate010, ShortDurationEndDate015, ShortDurationEndDate020, ShortDurationEndDate030, ShortDurationEndDate045, ShortDurationEndDate060, ShortDurationEndDate080, ShortDurationEndDate100, ShortDurationEndDate120, ShortDurationEndDate150, ShortDurationEndDate180, ShortDurationPrecipitationValue005, ShortDurationPrecipitationValue010, ShortDurationPrecipitationValue015, ShortDurationPrecipitationValue020, ShortDurationPrecipitationValue030, ShortDurationPrecipitationValue045, ShortDurationPrecipitationValue060, ShortDurationPrecipitationValue080, ShortDurationPrecipitationValue100, ShortDurationPrecipitationValue120, ShortDurationPrecipitationValue150, ShortDurationPrecipitationValue180, REM, BackupDirection, BackupDistance, BackupDistanceUnit, BackupElements, BackupElevation, BackupEquipment, BackupLatitude, BackupLongitude, BackupName, WindEquipmentChangeDate" \
    -XPUT http://localhost:8030/api/quickstart/weatherdata/_stream_load

Query the loaded data

SELECT COUNT(DISTINCT c.COLLISION_ID) AS Crashes,
       TRUNCATE(AVG(w.HourlyDryBulbTemperature), 1) AS Temp_F,
       MAX(w.HourlyPrecipitation) AS Precipitation,
       DATE_FORMAT(c.CRASH_DATE, '%d %b %Y %H:00') AS Hour
FROM crashdata c
LEFT JOIN weatherdata w
  ON DATE_FORMAT(c.CRASH_DATE, '%Y-%m-%d %H:00:00') = DATE_FORMAT(w.DATE, '%Y-%m-%d %H:00:00')
WHERE DAYOFWEEK(c.CRASH_DATE) BETWEEN 2 AND 6
GROUP BY Hour
ORDER BY Crashes DESC
LIMIT 200;

Conclusion

Integrating StarRocks with MinIO provides a flexible, scalable, and cost‑effective data platform. Decoupling compute from storage enables independent scaling, improves performance, simplifies operations, and offers clear resource isolation for multi‑tenant workloads, making it a solid foundation for modern cloud‑native analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLKubernetesStarRocksMinioObject StorageData LakehouseDocker-ComposeDecoupled Storage
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.