Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD
The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.
On March 29, Kuaishou hosted the first Apache Hudi Asia technical salon in Beijing, attracting over 230 on‑site participants and more than 16,000 online viewers. Speakers from Kuaishou, Meituan, Douyin, Huawei, JD and others shared best practices and case studies of Hudi in data‑lake scenarios.
Vinoth Chandar, CEO of Onehouse and Hudi PMC Chair, praised Chinese developers for contributing over half of the code and core features, announced the 1.0 release with storage format optimizations, index redesign, enhanced concurrency, deep Flink integration and incremental processing, and outlined upcoming 1.1/1.2 features.
Wang Jing, head of Kuaishou’s data platform, described the company’s AI+Data strategy, a 10 EB storage foundation and the use of Spark, Flink, ClickHouse, Doris and Hudi, emphasizing three value pillars: cost‑effective stream‑batch storage, real‑time lake ingestion, and wide‑table model simplification.
The salon featured five technical talks:
01 – Apache Hudi in Kuaishou AI and BI: vectorized pipelines, real‑time subscription, logical wide‑table, EB‑scale storage, sub‑30 s latency, supporting recommendation, advertising and search.
02 – Meituan’s incremental lake Beluga: “one table three modes”, row‑based HFile writes, column‑based Parquet reads, meta‑server management, Flink/Spark/Presto integration, minute‑level freshness and cost reduction.
03 – Douyin’s Sample Center: solving Kafka‑based storage limits with Hudi, unified lake architecture, Flink‑driven real‑time sample‑label stitching, PB‑scale daily processing, vectorized engines and low‑cost data lake for recommendation.
04 – Huawei Cloud large‑scale Hudi practice: performance optimizations for ODS writes, delta‑log reading, CDC ingestion, column‑family sparse wide tables, and the LDMS management service that automates lifecycle and metadata handling.
05 – JD’s data‑and‑compute paradigm: multi‑model lake‑table architecture, buffered‑plus‑persistent storage, binary‑stream copy to cut serialization, DataBus/EasyStudio toolchain, achieving 2× write speed, 70% CP reduction and sub‑5‑minute data freshness.
The closing remarks highlighted the rapid growth of Hudi as a core technology for data‑driven business and the upcoming challenges in 2025.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.