Big Data 11 min read

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

This article presents Meituan's use of Flink to enable incremental data warehouse production, covering the warehouse architecture, streaming data integration evolution, real-time OLAP applications, platform design, and future directions for unified stream‑batch processing.

DataFunTalk
DataFunTalk
DataFunTalk
Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

The presentation, delivered by Meituan's real‑time computing lead, outlines how Flink supports incremental production in Meituan's data warehouse, including data warehouse incremental production, streaming data integration, streaming data processing, and streaming OLAP applications.

1. Data Warehouse Incremental Production – Describes Meituan's three‑horizontal, four‑vertical warehouse architecture, emphasizing metadata, lineage, and data security across integration, processing, consumption, and application stages.

2. Streaming Data Integration – Traces three generations of integration: V1.0 batch load, V2.0 real‑time binlog capture via Kafka, and V3.0 HIDI architecture that adds upsert/delete support, small‑file compaction, and schema management.

3. Streaming Data Processing – Explains ETL incremental production using Flink SQL, the need for Flink's SQL capabilities to match Spark, and the design of a table format that supports upsert/delete and both batch and incremental reads.

4. Real‑Time Data Warehouse Model and Platform – Details the layered platform (resource, storage, engine, SQL, platform, application), highlights UDF support and exclusive use of Flink streaming, and introduces a Web IDE for SQL modeling and ETL development.

5. Streaming OLAP Applications – Covers heterogeneous source synchronization (DataX and Flink‑based architectures), the advantages of Flink for scalability and unified source/sink handling, and the design of Flink‑based OLAP production platforms with resource, model, task, and permission management.

6. Future Planning – Aims for a unified stream‑batch processing paradigm where both real‑time and incremental pipelines are handled by Flink, achieving true stream‑batch integration.

big dataFlinkreal-time analyticsstreamingdata warehouseIncremental Processing
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.