Big Data 13 min read

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

DataFunSummit

Jul 1, 2024

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

JD Retail historically used a Lambda architecture that guaranteed data completeness but suffered from high system complexity, duplicated pipelines, and latency caused by merging batch and real‑time layers.

Background and Pain Points : The dual‑system design created a conflict between data completeness and real‑time requirements, required separate APIs for batch and streaming data, and incurred heavy resource consumption in both offline ETL (Plumber) and online Kafka+Flink processing.

Architecture Maintenance Cost : Offline processing involved multiple layers (BDM, FDM, GDM) and heavy ETL jobs, while real‑time processing relied on Kafka topics and Flink, leading to high CPU/memory usage, long batch windows (T+1), and frequent task failures during data spikes.

State Data Update and Storage Issues : Full‑partition rewrites for state updates wasted compute resources, and storing full snapshots for every hour/day consumed petabytes of storage.

Iteration and Optimization :

Replaced offline MapReduce jobs with Flink streaming jobs that write directly to Hudi tables, enabling incremental processing and reducing latency.

Implemented multi‑stream merging: real‑time binlog streams from various business domains are ingested by Flink, transformed into Hudi BDM tables, and further processed to generate GDM/RDDM models.

Adopted a new storage model (partitioned tables + MOR + bucket) to limit small files, cap version numbers, and enable asynchronous compaction.

Reduced costs by reusing tables across business lines, automating DMS table creation, and visualizing task status for faster issue resolution.

Ensured data consistency with hash‑based ordering, Hudi heartbeat mechanisms, and pre‑combine logic for incremental updates.

Improved sustainability through monitoring of data backlog, task anomalies, and checkpoint failures, as well as metadata updates for schema changes.

External Key Association : Integrated Flink and Spark to handle foreign‑key joins on large tables, processing incremental SKU data against full SPU tables and vice‑versa in 10‑minute batches.

Query Optimization :

Enabled Hudi metadata caching and block‑level file caching on StarRocks BE nodes to avoid repeated remote reads.

Implemented asynchronous materialized views in StarRocks to accelerate complex queries without manual refresh tasks.

Effects and Benefits :

Timeliness: Near‑real‑time processing reduced batch windows from 3–4 hours to under 20 minutes.

Job efficiency: Resource consumption for wide‑table construction dropped dramatically, with only changed data being rewritten.

Storage savings: Incremental storage cut annual storage costs by ~90% for billions of product records.

Unified API: A single query API eliminated data‑source inconsistencies and cut integration effort by 50%.

Query layering: Indexing and partitioning enabled seamless integration with Trino, ClickHouse, and StarRocks.

Future Outlook and Planning :

Implement disaster‑recovery measures for data center outages and task restarts.

Isolate batch resources to achieve elastic scaling.

Address Hudi small‑file issues with scheduled compaction, bucketing, and partitioning plugins.

Build a data‑immunity system and enhance Hudi self‑management to lower maintenance overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Processing Flink StarRocks Data Warehouse Lambda architecture Hudi JD Retail

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.