Databases 12 min read

GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics

GreptimeDB solves front‑end observability challenges with a distributed architecture (frontend, datanode, flownode, metasrv), transparent two‑level caching, elastic scaling, and an SQL‑based flow engine for real‑time multi‑granularity aggregation and approximate counting, delivering millisecond query latency and cost‑effective storage.

DeWu Technology

Apr 28, 2025

GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics

This article addresses the pain points of front‑end observability—high‑frequency data ingestion, multi‑granularity aggregation, and resource inefficiency—and proposes GreptimeDB as an end‑to‑end solution.

Deployment architecture : GreptimeDB consists of four node types—Frontend (stateless request handling), Datanode (data storage and query), Flownode (stream‑processing), and Metasrv (metadata service). Regions (sharded tables) can be migrated between datanodes using the migrate_region function, enabling elastic scaling.

Transparent data caching : GreptimeDB abstracts the storage layer and provides a two‑level cache. Disk cache acts as a page‑cache for object‑storage files, while memory cache stores both raw file bytes and deserialized structures (e.g., min/max, bloom filters). This reduces latency for hot objects.

Fearless scaling : The system is cloud‑native; Metasrv collects node load and the scheduler (e.g., Kubernetes) can adjust replica counts for read‑write separation. Region migration and load‑aware traffic routing ensure high availability.

GreptimeDB Flow practice : Flow is a lightweight SQL‑based stream engine for time‑series. It supports continuous multi‑granularity aggregation:

10‑second hot table:

CREATE FLOW rpc_cost_10s
SINK TO rpc_cost_10s_agg
EXPIRE AFTER '12hours'::INTERVAL
AS SELECT app_name, url,
       date_bin('10s'::INTERVAL, timestamp) AS time_window,
       uddsketch(cost_time_ms, 0.01, 0.001) AS cost_sketch
FROM rpc_cost_time
GROUP BY app_name, url, date_bin('10s'::INTERVAL, timestamp);

1‑minute roll‑up:

CREATE FLOW rpc_cost_1m
SINK TO rpc_cost_1m_agg
EXPIRE AFTER '30days'::INTERVAL
AS SELECT app_name, url,
       date_bin('1m'::INTERVAL, time_window) AS time_window_1m,
       uddsketch_merge(cost_sketch) AS cost_sketch_1m
FROM rpc_cost_10s_agg
GROUP BY app_name, url, date_bin('1m'::INTERVAL, time_window);

10‑minute cold table:

CREATE FLOW rpc_cost_10m
SINK TO rpc_cost_10m_agg
EXPIRE AFTER '180days'::INTERVAL
AS SELECT app_name, url,
       date_bin('10m'::INTERVAL, time_window_1m) AS time_window_10m,
       uddsketch_merge(cost_sketch_1m) AS cost_sketch_10m
FROM rpc_cost_1m_agg
GROUP BY app_name, url, date_bin('10m'::INTERVAL, time_window_1m);

UV approximate counting with HyperLogLog : Flow can compute per‑window unique visitor estimates using hll and query them with hll_count:

CREATE FLOW uv_hll_10s
SINK TO uv_state_10s
EXPIRE AFTER '12hours'::INTERVAL
AS SELECT app_name, url,
       date_bin('10s'::INTERVAL, ts) AS time_window,
       hll(user_id) AS uv_state
FROM access_log
GROUP BY app_name, url, date_bin('10s'::INTERVAL, ts);

SELECT app_name, url, hll_count(uv_state) AS uv_count
FROM uv_state_10s
WHERE time_window = 1743479260;

Benefits : Pre‑aggregation and multi‑level roll‑up cut query latency from seconds to milliseconds; tiered TTL (10 s → 1 day, 1 m → 7 days, 10 m → 180 days) controls storage cost; independent scaling of Frontend, Flownode, and Datanode provides resource decoupling; using standard SQL lowers the learning curve.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL real-time analytics distributed architecture flow engine GreptimeDB HyperLogLog Transparent Caching

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.