Databases 20 min read

Using ClickHouse for Time‑Series Data Management and Analysis in JD.com JUST Platform

This article explains how JD.com’s JUST platform leverages the open‑source columnar database ClickHouse to store, query and analyze massive time‑series data, covering data modeling, lifecycle management, system goals, technology selection, cluster architecture, deployment, scaling and future enhancements.

JD Tech Talk

Oct 20, 2020

Using ClickHouse for Time‑Series Data Management and Analysis in JD.com JUST Platform

ClickHouse is an open‑source column‑oriented analytical database developed by Yandex, and JD.com’s JUST platform uses it to store and analyze massive time‑series data.

Time‑series data consists of metrics, timestamps, tags and values; typical use cases include monitoring device status, environmental measurements, financial tick data, and more.

The article outlines the full lifecycle of time‑series data—collection, storage, query, analysis and deletion—highlighting the need for high‑throughput writes, immutable records, petabyte‑scale storage, fast real‑time queries, high availability, scalability and ease of use.

After comparing several open‑source time‑series databases (OpenTSDB, InfluxDB, TDengine) the authors select ClickHouse for its superior compression, columnar storage and parallel query engine. Key performance figures show up to 200 MB/s write speed and millions of rows per second aggregation.

ClickHouse’s architecture is described: column files, ReplicatedMergeTree engine, distributed tables, sharding, replicas, multi‑master mode and the role of ZooKeeper for metadata synchronization.

Typical DDL for a time‑series table in JUST is shown:

create table my_ts_table as ts (
    tag1 string,
    tag2 string
) (
    value1 double,
    value2 double
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/...','{replica}')
PARTITION BY toYYYYMM(time)
ORDER BY (time);

Cluster deployment strategies (2‑node, cross‑replica, primary‑secondary) and scaling formulas are discussed, as well as dynamic expansion of shards and replicas without downtime.

The system currently provides basic time‑range queries, tag filtering, down‑sampling and simple analytics, and future work includes real‑time ingestion, advanced aggregation panels, richer analysis functions and full SQL compatibility.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ClickHouse DistributedSystems bigdata TimeSeries

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.