Big Data 12 min read

Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization

This talk explains how Apache Iceberg’s DataSkipping can lose efficiency with many filter columns, and presents a data‑organization redesign using space‑filling curves and Z‑Order to improve query I/O, detailing the OPTIMIZE syntax, implementation steps, performance benchmarks, and future roadmap.

DataFunTalk
DataFunTalk
DataFunTalk
Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization

As enterprise data volumes and formats increase, analysts demand faster query speeds while maintaining accuracy. Apache Iceberg’s DataSkipping, which uses file‑level metrics to prune data, can become ineffective when many filter columns are involved, leading to near‑full‑table scans.

An example query SELECT count(*) FROM employee WHERE first_name like 'Tho%' AND last_name like 'Frank%' AND birthplace='newyork'; illustrates Iceberg’s three‑layer I/O filtering: partition pruning on the partition column birthplace , file‑level min‑max filtering on first_name and last_name , and Row‑Group filtering inside Parquet files. This reduces the amount of data scanned dramatically.

The problem arises when the columns used for filtering are not sorted or when many columns are sorted: min‑max statistics lose discriminative power, and additional filter fields cause the query to scan all files (fault data). The presentation shows a diagram where the third filter column forces a full scan.

To address these issues, the authors propose a data‑organization redesign based on space‑filling curves. By mapping multi‑dimensional column values to a one‑dimensional Z‑order address (similar to Geohash), related rows become clustered, restoring the usefulness of min‑max statistics. A concrete Geohash example demonstrates how a 2‑D range [x:2‑3, y:4‑5] is encoded to a 1‑D interval [100100, 100111] , preserving the original search space.

Building on this concept, Tencent’s Iceberg implementation introduces a Z‑Order algorithm and a native OPTIMIZE command. The workflow consists of four steps: (1) selecting candidate files (full table or specific partitions), (2) generating a Z‑order address for each row by interleaving the binary representations of the chosen columns, (3) repartitioning the dataset by the Z‑order address (equivalent to Dataset.repartitionByRange(ZOrderAddress) ), and (4) writing the new layout back using Copy‑On‑Write, creating a new snapshot.

Performance tests evaluate the impact of data organization. Key‑parameter tests vary aggregation columns, output file size, and CUBE size, showing that more aggregation columns degrade filtering efficiency, while a 1 GB output file and a 150 GB CUBE provide a good balance. An SSB benchmark (queries Q3.1‑Q3.4) demonstrates that Z‑Order reduces query latency to under 1 s, compared with >12 s for unoptimized data, and enables substantial file‑level filtering.

The roadmap for Tencent Iceberg includes enhancing continuous data ingestion, improving query performance (including compute‑storage separation and indexing), boosting operability, and extending the system to new compute engines and cloud catalogs.

big dataQuery OptimizationApache IcebergData SkippingZ-Order
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.