Tag

Bucket Index

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
Nov 5, 2022 · Big Data

Evolution of ByteDance Data Lake Indexing: Hudi Index Enhancements and Future Directions

This article presents ByteDance's evolution of data lake indexing built on Apache Hudi, detailing traditional update challenges, Hudi's index mechanisms, the introduction of bucket and extensible hash indexes, query optimizations, and upcoming multi‑modal and range index innovations.

Bucket IndexExtensible HashHudi
0 likes · 12 min read
Evolution of ByteDance Data Lake Indexing: Hudi Index Enhancements and Future Directions
ByteDance Data Platform
ByteDance Data Platform
Feb 28, 2022 · Big Data

How Hudi’s New Bucket Index Boosts Upsert Performance in Massive Data Lakes

This article explains the background, design, and practical benefits of Hudi's Bucket Index—a hash‑based indexing mechanism that reduces unnecessary file reads and writes, improves upsert speed on terabyte‑scale datasets, and enables query optimizations such as bucket pruning and bucket join.

Bucket IndexHash IndexHudi
0 likes · 16 min read
How Hudi’s New Bucket Index Boosts Upsert Performance in Massive Data Lakes
Big Data Technology Architecture
Big Data Technology Architecture
Nov 2, 2021 · Big Data

ByteLake: ByteDance’s Real‑Time Data Lake Platform Built on Apache Hudi

This article presents ByteDance’s ByteLake, a real‑time data lake platform built on Apache Hudi, covering Hudi fundamentals, ByteLake’s use cases, the platform’s architectural optimizations, new features such as a commit‑based metastore and bucket indexing, and future roadmap plans.

Apache HudiBucket IndexByteLake
0 likes · 10 min read
ByteLake: ByteDance’s Real‑Time Data Lake Platform Built on Apache Hudi