Tagged articles
1 articles
Page 1 of 1
Past Memory Big Data
Past Memory Big Data
Dec 26, 2024 · Big Data

Eliminate Shuffle: Deep Dive into Spark’s Storage Partition Join (SPJ)

This article explains how Spark ≥ 3.3’s Storage Partition Join (SPJ) can avoid costly shuffle operations by using Iceberg tables, outlines the required table properties and Spark configurations, demonstrates the effect with code examples and execution plans, and explores several realistic join scenarios.

Apache IcebergBig DataSPJ
0 likes · 16 min read
Eliminate Shuffle: Deep Dive into Spark’s Storage Partition Join (SPJ)