Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0
This article details the evolution of 58 Tongcheng Bao's real‑time data warehouse, describing the initial Spark‑Streaming architecture, its limitations, and the redesign using Flink with a layered ODS‑DWD‑DWS‑APP model, data‑quality monitoring, join techniques, and the resulting improvements in latency and accuracy.
58 Tongcheng Bao, a leading lifestyle service platform, processes massive advertising and user data and needed a real‑time data warehouse to support fast decision‑making.
Early offline warehouses performed batch ETL, but growing demand for instant insights led to the creation of a real‑time warehouse, first built on Spark Streaming (version 1.0) that consumed Kafka streams and wrote results to Druid.
Version 1.0 suffered from micro‑batch latency, process‑time joins causing data loss, high task maintenance cost, and an inflexible data‑layer structure.
To overcome these issues, version 2.0 replaced Spark Streaming with Flink and introduced a four‑layer architecture: ODS (raw), DWD (detail), DWS (summary), and APP (service), each organized by business domain.
Data‑quality management was added, covering completeness, consistency, timeliness, and accuracy, with full‑life‑cycle monitoring and alerting for each processing stage.
Flink implementations include double‑stream joins and interval joins, dramatically reducing the number of real‑time tasks, eliminating state‑order problems, and achieving >99% accuracy for key metrics such as click‑through and cash flow.
Benchmark results show the new warehouse delivers second‑level latency and high data‑accuracy, while the architecture supports scalable, maintainable development.
The team plans to continue expanding domain coverage, refining Flink usage, and advancing data‑intelligence capabilities.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.