Big Data 13 min read

Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse

The presentation by TikTok e‑commerce data‑warehouse engineer Ma Wenyuan explains data‑lake characteristics, near‑real‑time architecture, and practical e‑commerce use cases, highlighting Apache Hudi features, hybrid batch‑stream processing, and future challenges for scaling and integration.

DataFunTalk
DataFunTalk
DataFunTalk
Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse

Speaker: Ma Wenyuan, TikTok E‑commerce Real‑Time Data Warehouse team.

Overview: The talk introduces data‑lake technology characteristics, near‑real‑time architecture, and practical applications in e‑commerce data warehousing.

Data Lake Features: Stores massive raw data with low cost, uses schema‑on‑read, supports Apache Hudi with streaming primitives, upserts, indexing, and both read‑optimized and real‑time query modes.

Near‑Real‑Time Architecture: Describes analysis‑oriented and operation‑oriented scenarios, the need for high efficiency and low storage cost, and how data lake enables reuse of batch results, unified storage, and simplified computation pipelines.

Implementation: Presents a hybrid architecture that combines batch and streaming, leveraging Flink, Spark, Presto, and Hudi to achieve hourly incremental updates while preserving batch richness.

E‑commerce Use Cases: Includes marketing promotion analytics, traffic diagnosis, logistics monitoring, risk governance, and operational monitoring such as data product anomaly detection and real‑time message landing.

Future Challenges: Calls for higher performance, deeper integration with Flink/Spark, and a shift from analytical to product‑level near‑real‑time applications.

Conclusion: Data lake technology proves feasible for near‑real‑time scenarios in TikTok e‑commerce, with ongoing plans to address scalability and reliability.

Big Datastreamingdata warehousedata lakeHudiTikToknear real-time
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.