Big Data 21 min read

JD Retail Big Data OLAP Application and Practice

This talk presents JD Retail’s big‑data OLAP solution, covering the massive, variable and complex traffic data challenges, the custom data‑ingestion and versioned update tools, ClickHouse query‑architecture upgrades, optimization techniques, and future plans for multi‑cluster querying and pre‑computation.

DataFunSummit
DataFunSummit
DataFunSummit
JD Retail Big Data OLAP Application and Practice

The presentation introduces JD Retail's data background, focusing on the massive traffic data generated by the "Golden Eye" business scenario, which reaches trillions of records daily and requires real‑time, multi‑dimensional analysis.

Key challenges are identified as massive volume, high variability, complexity of hierarchical dimensions, and strict timeliness (SLA) demands, especially during peak events like Double 11.

To address these, three solution directions are proposed: improving timeliness through versioned data updates and pre‑calculated offline batches; ensuring precision via metadata‑driven schema evolution and materialized views; and supporting high concurrency by scaling ClickHouse clusters, employing multi‑active clusters, and monitoring resource usage.

The data‑ingestion toolchain, built on Spark, handles cluster health checks, metadata completion, load balancing, data partitioning, and anomaly handling. It supports both row‑wise and column‑wise push, leverages ReplicatedReplacingMergeTree for deduplication, and validates data integrity before committing.

Query architecture upgrades include a high‑availability layer that routes queries to appropriate clusters based on resource type (CPU, IO, memory), utilizes Redis and Elasticsearch for caching hot data, and performs pre‑computations during off‑peak hours to keep ClickHouse continuously ready for user queries.

Optimization techniques cover deduplication (using uniqCombine64, count‑group‑by), data pruning, join strategies (local in‑joins, global joins for small dimension tables), and careful alignment of GROUP BY and ORDER BY keys in materialized views.

Common operational issues such as node failures, Zookeeper inconsistencies, and versioned rollbacks are discussed, along with mitigation strategies like multi‑replica writes and partition‑level recovery.

Future plans involve unified multi‑cluster query federation, broader use of ClickHouse pre‑computation, synchronization of results to Elasticsearch and Redis, and containerizing query resources to dynamically allocate workloads based on query characteristics.

The session concludes with a Q&A where the speaker, Chen Hongjian, JD's Big Data Architect, shares practical limits of local tables and the impact of in‑memory caching on concurrency.

Data EngineeringBig DataQuery OptimizationClickHouseOLAPJD Retail
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.