Delta Lake 3.1: New Features, Metadata Optimization, and Universal Format Overview
This article introduces Delta Lake 3.1, detailing its release background, the addition of Deletion Vector to Update and Merge commands, metadata‑driven count/min/max optimizations, the Universal Format for cross‑engine compatibility, and a comparative evaluation with Iceberg and Hudi.
This article shares the latest features of Delta Lake version 3.1.
Delta Lake has evolved from the 2.x series to the 3.x series, with the major 3.0 release followed by the newest 3.1 version, which brings several performance and compatibility enhancements.
The content is organized into three main parts: (1) an introduction to Delta Lake release versions, (2) the new features of Delta Lake 3.1, and (3) an evaluation and comparison with other lake formats.
Delta Lake 3.1 New Features
Deletion Vector is added to the DELETE , UPDATE and MERGE commands, accelerating these operations.
Count/Min/Max queries are optimized through metadata look‑ups, avoiding full data scans.
Universal Format (Iceberg) support allows Delta writes to simultaneously generate metadata for Hudi and Iceberg, enabling cross‑engine data access.
1. Deletion Vector
Before Deletion Vector, updating a row required rewriting the entire file. With Deletion Vector, only the new data file is written and the metadata records the deletion of the old file, dramatically speeding up writes while keeping read performance stable.
The Deletion Vector implementation writes the vector directly into metadata, avoiding extra data reads—a key reason why Delta Lake’s read performance is not significantly impacted.
2. Metadata Query Optimization
Traditional queries first list all data files from table metadata and then read each file to compute aggregates, which is slow for large tables. Delta Lake 3.1 stores statistics (e.g., row counts, min/max values) in the metadata, allowing count/min/max operations to be answered instantly from metadata alone.
3. Universal Format
The Universal Format concept unifies data access across different engines by generating Iceberg and Hudi compatible metadata when writing to Delta Lake, enabling the same table to be queried by engines that support any of these formats.
The process creates Delta’s own transaction log and then converts its metadata to Iceberg or Hudi metadata, submitting the latter to the appropriate catalog (e.g., Hive Metastore) so that different engines can read the same data without additional transformations.
Delta Lake Evaluation and Comparison
In North America, the three dominant lake formats are Delta, Hudi, and Iceberg. All support merge‑on‑read, Z‑ordering, and ACID guarantees, but differ in transaction handling and optimization strategies. Benchmark results show Delta Lake often achieves superior read/write performance, though actual outcomes may vary in production environments.
Given the differences, users should select the format that best matches their specific workload and ecosystem requirements.
Conclusion
The presentation covered the Deletion Vector acceleration, metadata‑driven query optimizations, and the Universal Format that bridges Delta Lake with Iceberg and Hudi, followed by a comparative analysis of the three major lake formats.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.