Big Data 20 min read

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

The article explains how Apache Iceberg v3 replaces the scalable‑limited positional‑delete mechanism in Merge‑on‑Read tables with compact Deletion Vectors, detailing the performance, I/O and metadata drawbacks of positional deletes and showing how the new bitmap‑based approach resolves them.

Past Memory Big Data
Past Memory Big Data
Past Memory Big Data
Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

Copy‑on‑Write vs. Merge‑on‑Read

In Copy‑on‑Write (CoW) mode Iceberg rewrites entire data files on update or delete, slowing writes but keeping reads fast. Merge‑on‑Read (MoR) writes delete files instead of rewriting data, reducing write amplification but requiring the query engine to merge delete files at read time.

Delete Types in MoR

MoR supports two delete file types:

Positional Delete Files – record the exact row positions (file path + row ordinal) to be omitted.

Equality Delete Files – store the values of one or more columns that identify rows to delete.

While both work for small workloads, positional deletes expose serious scalability problems.

How Positional Deletes Work

A delete file lists rows by their ordinal within a data file. During a read, the engine loads every matching delete file, builds an in‑memory bitmap (often a Roaring Bitmap), and filters out the marked rows.

Example: a data file data-file-1.parquet with 100 rows has two rows (positions 0 and 102) deleted. Iceberg creates a small delete file containing entries like:

data-file-1.parquet, 0
data-file-1.parquet, 102

When the file is scanned, rows at positions 0 and 102 are skipped.

Scalability Limitations

1. Read I/O grows linearly with the number of delete files because each data split must open every associated delete file, read its contents, and merge the bitmaps.

2. Partition‑scoped deletes combine deletions for many data files into one file, reducing file count but forcing the reader to load unrelated delete entries, wasting I/O.

3. File‑scoped deletes avoid unrelated reads but generate a large number of tiny delete files; each file open incurs latency, and the sheer count inflates metadata.

4. Runtime bitmap construction adds CPU and memory overhead, especially when many rows are deleted; in extreme cases the scan cost can double.

5. Metadata bloat occurs because every delete file is listed in Iceberg manifest files. Hundreds of thousands of delete files add thousands of manifest entries, increasing snapshot size and planner memory usage.

6. Dangling deletes appear when a data file is rewritten (e.g., compacted) but its old delete entries remain in manifests, requiring explicit rewrite‑position‑delete‑files operations.

Operational Burden

Because Iceberg does not automatically compact delete files, users must run periodic maintenance (minor compaction to merge small delete files, major compaction to rewrite data files). Without this, tables accumulate massive numbers of delete files, leading to degraded query performance and increased storage overhead, as illustrated by a production PB‑scale table that generated tens of millions of delete files after weeks of daily deletions.

Deletion Vectors – The v3 Solution

Iceberg format spec v3 introduces Deletion Vectors (DVs) to replace positional delete files. A DV is a bitmap stored as a binary blob (Roaring Bitmap compression) that marks deleted row positions for a single data file.

Key characteristics:

Each data file can have at most one DV; new deletions are merged into the existing bitmap (UNION) and written as a new DV, discarding the old one.

DVs are stored inside Puffin files – Iceberg’s auxiliary binary container – rather than as separate Parquet/Avro files.

Manifests reference a Puffin file together with offset and length for each DV, eliminating the need to list many tiny delete files.

During a read, the engine retrieves the DV binary block from the Puffin file, decodes the bitmap, and applies it directly, avoiding the cost of opening multiple delete files and building large in‑memory structures.

The community designed this mechanism in collaboration with Delta Lake, as explained by Iceberg co‑founders Ryan Blue and Anton Okolnychyi at the Apache Iceberg Summit.

Benefits of Deletion Vectors

Significant reduction in read‑time I/O and metadata size.

Elimination of file‑scoped delete explosion and partition‑scoped unrelated reads.

Simplified maintenance – no external compaction of delete files is required.

Consistent row‑level delete semantics with lower runtime overhead.

In summary, positional deletes offered a way to avoid full file rewrites but introduced prohibitive I/O, metadata, and operational costs at scale. Deletion Vectors provide a compact, bitmap‑based alternative that preserves the benefits of Merge‑on‑Read while addressing its core performance and management drawbacks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceMetadataData LakeApache IcebergMerge-on-ReadDeletion VectorPositional Delete
Past Memory Big Data
Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.