Tag

Hash Shuffle

0 views collected around this technical thread.

IT Services Circle
IT Services Circle
Mar 21, 2022 · Big Data

Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms

This article explains the evolution and inner workings of Spark's shuffle phase, comparing the original Hash‑based shuffle, the default Sort‑based shuffle, the optimized Tungsten‑Sort shuffle, and related configuration options that affect performance and file handling in large‑scale data processing.

Hash ShuffleShuffleSort Shuffle
0 likes · 17 min read
Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2020 · Big Data

Understanding Shuffle in Hadoop MapReduce and Spark

This article explains the concept and workflow of shuffle in Hadoop MapReduce and Spark, covering map‑side buffering, spill and merge, reduce‑side copy‑merge‑reduce, the reasons for sorting and file merging, and compares Hash‑Shuffle and Sort‑Shuffle implementations with performance considerations.

Hash ShuffleMapReducePerformance
0 likes · 16 min read
Understanding Shuffle in Hadoop MapReduce and Spark