Apache Hudi Concurrency Control: Overview, MVCC, and OCC
This article provides a comprehensive overview of concurrency control in Apache Hudi, explaining ACID properties, the role of MVCC and OCC, and how Hudi coordinates multiple writers and table services to achieve serializable scheduling while maintaining high performance.
Overview – Every commit to a Hudi table forms a transaction, whether it adds new data or runs a table service job. Concurrency control coordinates simultaneous transactions to ensure correctness, consistency, and high performance.
ACID recap – Atomicity, Consistency, Isolation, and Durability are briefly described to illustrate why isolation is essential for preventing dirty reads, lost updates, and other anomalies.
MVCC in Hudi – Hudi uses Multi‑Version Concurrency Control (MVCC) by maintaining a timeline of monotonically increasing commit timestamps and file slices that represent record versions. The TableFileSystemView API provides the latest view of the table, enabling writers and readers to operate without locking and ensuring read‑write isolation.
OCC in Hudi – Optimistic Concurrency Control (OCC) follows a three‑phase protocol: read, validate, and write. Hudi implements OCC at the file‑level to support multiple writers. Users enable it by setting hoodie.write.concurrency.mode=OPTIMISTIC_CONCURRENCY_CONTROL and configuring a lock provider (e.g., Zookeeper, Hive Metastore, DynamoDB).
The article illustrates OCC with a detailed example of two write clients acquiring locks, checking the timeline, and handling conflicts, showing how conflicting writes are aborted and rolled back to preserve atomicity.
Summary – The article recaps the importance of MVCC and OCC for handling single‑writer/table‑service and multi‑writer scenarios in Hudi, and invites readers to join the Apache Hudi community for further discussion.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.