Tag

RDD Persistence

0 views collected around this technical thread.

Data Thinking Notes
Data Thinking Notes
Oct 27, 2022 · Big Data

Boost Spark Performance: Proven Code Optimizations & Tuning Tips

This article outlines practical Spark job optimization techniques—from code-level improvements and resource tuning to data skew handling, persistence strategies, shuffle reduction, broadcast variables, Kryo serialization, and efficient data structures—demonstrating how each can dramatically cut execution time.

Kryo SerializationPerformance TuningRDD Persistence
0 likes · 19 min read
Boost Spark Performance: Proven Code Optimizations & Tuning Tips
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2019 · Big Data

Apache Spark Memory Management: Storage and Execution Memory (Part 2)

This article continues the deep dive into Apache Spark memory management, explaining storage memory handling—including RDD persistence, caching, eviction, and disk spilling—as well as execution memory allocation for multi-tasking and shuffle operations, and detailing Spark’s internal structures such as BlockManager, StorageLevel, and Tungsten page management.

Apache SparkRDD PersistenceShuffle
0 likes · 13 min read
Apache Spark Memory Management: Storage and Execution Memory (Part 2)