Big Data 11 min read

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

This article presents Meituan's use of Flink to enable incremental data warehouse production, covering the warehouse architecture, streaming data integration evolution, real-time OLAP applications, platform design, and future directions for unified stream‑batch processing.

DataFunTalk

Feb 15, 2021

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

The presentation, delivered by Meituan's real‑time computing lead, outlines how Flink supports incremental production in Meituan's data warehouse, including data warehouse incremental production, streaming data integration, streaming data processing, and streaming OLAP applications.

1. Data Warehouse Incremental Production – Describes Meituan's three‑horizontal, four‑vertical warehouse architecture, emphasizing metadata, lineage, and data security across integration, processing, consumption, and application stages.

2. Streaming Data Integration – Traces three generations of integration: V1.0 batch load, V2.0 real‑time binlog capture via Kafka, and V3.0 HIDI architecture that adds upsert/delete support, small‑file compaction, and schema management.

3. Streaming Data Processing – Explains ETL incremental production using Flink SQL, the need for Flink's SQL capabilities to match Spark, and the design of a table format that supports upsert/delete and both batch and incremental reads.

4. Real‑Time Data Warehouse Model and Platform – Details the layered platform (resource, storage, engine, SQL, platform, application), highlights UDF support and exclusive use of Flink streaming, and introduces a Web IDE for SQL modeling and ETL development.

5. Streaming OLAP Applications – Covers heterogeneous source synchronization (DataX and Flink‑based architectures), the advantages of Flink for scalability and unified source/sink handling, and the design of Flink‑based OLAP production platforms with resource, model, task, and permission management.

6. Future Planning – Aims for a unified stream‑batch processing paradigm where both real‑time and incremental pipelines are handled by Flink, achieving true stream‑batch integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink real-time analytics Streaming Incremental Processing

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.