Big Data 8 min read

Delta Lake 3.1: New Features, Metadata Optimization, and Universal Format Overview

This article introduces Delta Lake 3.1, detailing its release background, the addition of Deletion Vector to Update and Merge commands, metadata‑driven count/min/max optimizations, the Universal Format for cross‑engine compatibility, and a comparative evaluation with Iceberg and Hudi.

DataFunSummit

Apr 27, 2024

Delta Lake 3.1: New Features, Metadata Optimization, and Universal Format Overview

This article shares the latest features of Delta Lake version 3.1.

Delta Lake has evolved from the 2.x series to the 3.x series, with the major 3.0 release followed by the newest 3.1 version, which brings several performance and compatibility enhancements.

The content is organized into three main parts: (1) an introduction to Delta Lake release versions, (2) the new features of Delta Lake 3.1, and (3) an evaluation and comparison with other lake formats.

Delta Lake 3.1 New Features

Deletion Vector is added to the DELETE, UPDATE and MERGE commands, accelerating these operations.

Count/Min/Max queries are optimized through metadata look‑ups, avoiding full data scans.

Universal Format (Iceberg) support allows Delta writes to simultaneously generate metadata for Hudi and Iceberg, enabling cross‑engine data access.

1. Deletion Vector

Before Deletion Vector, updating a row required rewriting the entire file. With Deletion Vector, only the new data file is written and the metadata records the deletion of the old file, dramatically speeding up writes while keeping read performance stable.

The Deletion Vector implementation writes the vector directly into metadata, avoiding extra data reads—a key reason why Delta Lake’s read performance is not significantly impacted.

2. Metadata Query Optimization

Traditional queries first list all data files from table metadata and then read each file to compute aggregates, which is slow for large tables. Delta Lake 3.1 stores statistics (e.g., row counts, min/max values) in the metadata, allowing count/min/max operations to be answered instantly from metadata alone.

3. Universal Format

The Universal Format concept unifies data access across different engines by generating Iceberg and Hudi compatible metadata when writing to Delta Lake, enabling the same table to be queried by engines that support any of these formats.

The process creates Delta’s own transaction log and then converts its metadata to Iceberg or Hudi metadata, submitting the latter to the appropriate catalog (e.g., Hive Metastore) so that different engines can read the same data without additional transformations.

Delta Lake Evaluation and Comparison

In North America, the three dominant lake formats are Delta, Hudi, and Iceberg. All support merge‑on‑read, Z‑ordering, and ACID guarantees, but differ in transaction handling and optimization strategies. Benchmark results show Delta Lake often achieves superior read/write performance, though actual outcomes may vary in production environments.

Given the differences, users should select the format that best matches their specific workload and ecosystem requirements.

Conclusion

The presentation covered the Deletion Vector acceleration, metadata‑driven query optimizations, and the Universal Format that bridges Delta Lake with Iceberg and Hudi, followed by a comparative analysis of the three major lake formats.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data data lake metadata optimization Delta Lake Deletion Vector Universal Format

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.