Tag

Tungsten

0 views collected around this technical thread.

IT Services Circle
IT Services Circle
Mar 21, 2022 · Big Data

Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms

This article explains the evolution and inner workings of Spark's shuffle phase, comparing the original Hash‑based shuffle, the default Sort‑based shuffle, the optimized Tungsten‑Sort shuffle, and related configuration options that affect performance and file handling in large‑scale data processing.

Hash ShuffleShuffleSort Shuffle
0 likes · 17 min read
Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2019 · Big Data

Apache Spark Memory Management: Storage and Execution Memory (Part 2)

This article continues the deep dive into Apache Spark memory management, explaining storage memory handling—including RDD persistence, caching, eviction, and disk spilling—as well as execution memory allocation for multi-tasking and shuffle operations, and detailing Spark’s internal structures such as BlockManager, StorageLevel, and Tungsten page management.

Apache SparkMemory ManagementRDD Persistence
0 likes · 13 min read
Apache Spark Memory Management: Storage and Execution Memory (Part 2)
Qunar Tech Salon
Qunar Tech Salon
Aug 29, 2016 · Big Data

Whole‑Stage Code Generation and Vectorization in Apache Spark’s Tungsten Engine

The article explains how Spark 2.0’s second‑generation Tungsten engine replaces the traditional Volcano iterator model with whole‑stage code generation and vectorization, eliminating virtual calls, keeping temporary data in CPU registers, and using loop unrolling and SIMD to achieve order‑of‑magnitude performance gains on large‑scale data workloads.

Apache SparkTungstenVectorization
0 likes · 12 min read
Whole‑Stage Code Generation and Vectorization in Apache Spark’s Tungsten Engine
High Availability Architecture
High Availability Architecture
Jan 6, 2016 · Big Data

Spark Latest Features, Tungsten Project, and Hulu’s Production Practices

This article reviews Spark's evolution from version 1.2 to 1.6, explains the DataFrame and Tungsten projects, shares Hulu’s real‑world Spark deployments, and discusses performance‑related challenges such as stack overflow, streaming receiver latency, and class‑loader deadlocks.

DataFramesDataset APIHulu
0 likes · 17 min read
Spark Latest Features, Tungsten Project, and Hulu’s Production Practices