Big Data 15 min read

Building a General Real-Time Data Warehouse: Methods and Practices at Meituan Waimai

This article introduces a universal method for building a real-time data warehouse at Meituan Waimai, covering streaming technologies, architecture choices such as Lambda and Kappa, component design, feature production, SLA management, and practical OLAP solutions using Flink, Storm, and Doris.

DataFunSummit
DataFunSummit
DataFunSummit
Building a General Real-Time Data Warehouse: Methods and Practices at Meituan Waimai

The article presents a comprehensive overview of constructing a general real-time data warehouse at Meituan Waimai, emphasizing low‑latency end‑to‑end processing, SQL standardization, and rapid response to business changes.

It first outlines typical real‑time scenarios in food delivery, including operational monitoring, production reliability, user‑facing recommendation, and risk control, highlighting the need for both low‑latency and near‑real‑time analytics.

For streaming technology selection, open‑source options such as Storm, Spark Streaming, and Flink are discussed; Meituan Waimai migrated from Storm to Flink as the latter matured, citing performance and architectural advantages.

The article compares Lambda and Kappa architectures, describing Lambda’s dual‑path (batch + stream) approach and its resource duplication issues, and Kappa’s unified stream processing with limited real‑world adoption.

It then details the challenges of real‑time business data, such as multi‑state order flows, complex joins across many tables, and the need for batch‑oriented analysis on streaming inputs.

To address these challenges, a layered real‑time data‑warehouse design is proposed: a unified data source layer (log and business streams), a real‑time detail layer built with standardized cleaning, filtering, and enrichment, and a summary layer that aggregates metrics via Flink/Storm or Doris.

The platform‑level construction abstracts functions into reusable components (cleaning, filtering, enrichment, custom Java/Python scripts) and supports dynamic feature generation through SQL‑based expressions, reducing resource waste by sharing streams across multiple metrics.

SLA mechanisms are introduced, covering end‑to‑end latency tracking and job‑level efficiency monitoring via lightweight instrumentation and centralized reporting.

For real‑time OLAP, the solution leverages Doris as a high‑performance engine that handles both unique and aggregate models, enabling efficient rollback calculations and seamless integration of historical and current partitions.

Finally, a practical use case demonstrates how a merchant can combine historical order counts (offline) with today’s real‑time orders (online) using a Lambda pattern in Doris, illustrating the overall architecture’s ability to support diverse, high‑throughput business requirements.

Flinkstream processingreal-time data warehouselambda architecturestormKappa architectureDoris
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.