Big Data 8 min read

Apache Hudi Concurrency Control: Overview, MVCC, and OCC

This article provides a comprehensive overview of concurrency control in Apache Hudi, explaining ACID properties, the role of MVCC and OCC, and how Hudi coordinates multiple writers and table services to achieve serializable scheduling while maintaining high performance.

DataFunSummit
DataFunSummit
DataFunSummit
Apache Hudi Concurrency Control: Overview, MVCC, and OCC

Overview – Every commit to a Hudi table forms a transaction, whether it adds new data or runs a table service job. Concurrency control coordinates simultaneous transactions to ensure correctness, consistency, and high performance.

ACID recap – Atomicity, Consistency, Isolation, and Durability are briefly described to illustrate why isolation is essential for preventing dirty reads, lost updates, and other anomalies.

MVCC in Hudi – Hudi uses Multi‑Version Concurrency Control (MVCC) by maintaining a timeline of monotonically increasing commit timestamps and file slices that represent record versions. The TableFileSystemView API provides the latest view of the table, enabling writers and readers to operate without locking and ensuring read‑write isolation.

OCC in Hudi – Optimistic Concurrency Control (OCC) follows a three‑phase protocol: read, validate, and write. Hudi implements OCC at the file‑level to support multiple writers. Users enable it by setting hoodie.write.concurrency.mode=OPTIMISTIC_CONCURRENCY_CONTROL and configuring a lock provider (e.g., Zookeeper, Hive Metastore, DynamoDB).

The article illustrates OCC with a detailed example of two write clients acquiring locks, checking the timeline, and handling conflicts, showing how conflicting writes are aborted and rolled back to preserve atomicity.

Summary – The article recaps the importance of MVCC and OCC for handling single‑writer/table‑service and multi‑writer scenarios in Hudi, and invites readers to join the Apache Hudi community for further discussion.

Big Dataconcurrency controldata lakeApache HudiMVCCOCC
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.