Big Data 13 min read

Analysis of Apache Spark 2.2.1 Memory Management Model

This article examines Spark's unified memory manager in version 2.2.1, detailing on‑heap and off‑heap memory regions, the four on‑heap memory pools, dynamic execution‑storage memory sharing, task memory accounting, and provides concrete calculation examples to explain UI discrepancies and runtime memory limits.

Qunar Tech Salon

Apr 9, 2018

Analysis of Apache Spark 2.2.1 Memory Management Model

This article analyzes the memory management model of Apache Spark, focusing on the UnifiedMemoryManager as implemented in Spark 2.2.1. The analysis is limited to the executor side; the driver side is not covered.

On‑Heap Memory

By default Spark uses only on‑heap memory, which is divided into four regions:

Execution Memory : stores temporary data for shuffle, join, sort, aggregation, etc.

Storage Memory : holds cached RDD data and unrolled data.

User Memory : keeps metadata required for RDD transformations, such as dependencies.

Reserved Memory : reserved for Spark internal objects (fixed at 300 MB in Spark 2.2.1).

The total on‑heap memory can be visualized as:

Key calculations:

systemMemory = Runtime.getRuntime.maxMemory (configured via spark.executor.memory or --executor-memory)

reservedMemory = 300 MB (hard‑coded, modifiable only in test mode via spark.testing.reservedMemory)

usableMemory = systemMemory - reservedMemory

Off‑Heap Memory

Since Spark 1.6, off‑heap memory can be enabled (via spark.memory.offHeap.enabled=true) and sized with spark.memory.offHeap.size. Off‑heap memory is allocated using unsafe Java APIs, bypassing the JVM GC, but requires manual allocation and release logic.

Dynamic Adjustment of Execution and Storage Memory

In Spark 1.5 and earlier, execution and storage memory were statically partitioned. Starting with Spark 2.x, they can borrow space from each other: if execution memory is insufficient but storage has free space, execution can use it, and vice‑versa. The shared region is indicated by a dashed line in the diagrams.

The implementation logic is:

Initial execution and storage fractions are set via spark.memory.storageFraction.

If both sides run out of space, data is spilled to disk using an LRU policy.

When one side borrows memory, the other may evict its blocks to disk to return the space.

Borrowing only works when both sides are of the same memory type (both on‑heap or both off‑heap).

Task‑Level Memory Distribution

Each task shares the execution memory pool. Spark maintains a HashMap mapping task IDs to their current memory usage. When a task requests numBytes:

If enough free execution memory exists, the request is granted and the map is updated.

Otherwise the task blocks until enough memory is released.

Each task can use between 1/(2N) and 1/N of the total execution memory, where N is the number of concurrent tasks.

Example: with 10 GB execution memory and 5 concurrent tasks, each task may use 1 GB – 2 GB.

Example – On‑Heap Only

Configuration: --executor-memory 18g Assuming default fractions ( spark.memory.fraction=0.6, spark.memory.storageFraction=0.5), Spark UI shows about 10.1 GB of storage memory. The calculation steps are:

systemMemory = 18 GB = 19327352832 bytes
reservedMemory = 300 MB = 314572800 bytes
usableMemory = systemMemory - reservedMemory = 19012780032 bytes
StorageMemory = usableMemory * spark.memory.fraction * spark.memory.storageFraction ≈ 10.1 GB

The UI converts bytes to GB using a divisor of 1000³, which explains the slight difference from the 1024³ conversion.

Runtime.maxMemory vs. Configured Executor Memory

Even if --executor-memory 18g is set, Runtime.getRuntime.maxMemory() may return a smaller value (e.g., 17179869184 bytes) because the JVM heap is split into Eden, Survivor, and Tenured spaces. The formula is:

ExecutorMemory = Eden + 2 * Survivor + Tenured
Runtime.maxMemory = Eden + Survivor + Tenured

Thus the reported max memory can be less than the configured value.

Example – On‑Heap + Off‑Heap

Configuration:

spark.executor.memory          18g
spark.memory.offHeap.enabled  true
spark.memory.offHeap.size     10737418240   // 10 GB

Calculations:

systemMemory = 17179869184 bytes
reservedMemory = 300 MB = 314572800 bytes
usableMemory = 16865296384 bytes
onHeapStorage = usableMemory * 0.6 ≈ 10.12 GB
offHeapStorage = 10 GB
Total Storage = onHeapStorage + offHeapStorage ≈ 20.9 GB (as shown in UI)

The UI again uses the 1000³ divisor for GB conversion.

Overall, understanding these memory calculations helps in fine‑tuning Spark applications and interpreting the values displayed in the Spark UI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data memory management Performance tuning Executor Spark Off-Heap

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.