Fundamentals 19 min read

Mastering JVM Garbage Collection: Algorithms, Collectors, and Tuning

This article explains the theory behind JVM garbage collection algorithms, details various collectors such as Serial, Parallel, CMS, and G1, compares their strengths and weaknesses, and explores advanced concepts like three‑color marking, write barriers, SATB, and memory management parameters.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Mastering JVM Garbage Collection: Algorithms, Collectors, and Tuning

Garbage Collection Algorithms

The JVM uses several garbage‑collection (GC) algorithms based on generational collection theory. Memory is divided into young and old generations, allowing each generation to use the most suitable algorithm.

Generational Collection Theory

Objects are grouped by lifespan into young and old generations.

Young generation objects are typically short‑lived, so a copying algorithm is used to minimize copy cost.

Old generation objects benefit from mark‑sweep or mark‑compact algorithms.

Copying (Mark‑Copy) Algorithm

The heap is split into two equal regions. Live objects are copied from the active region to the other region, after which the entire active region is reclaimed in one step.

Mark‑Sweep Algorithm

Two phases: mark reachable objects, then sweep away the rest.

Drawbacks: lower efficiency and memory fragmentation.

Mark‑Compact Algorithm

After marking, live objects are moved to one side of the heap, eliminating fragmentation.

More time‑consuming than pure mark‑sweep.

Garbage Collectors

Collectors are concrete implementations of the above algorithms.

Serial Collector

Single‑threaded, stops all application threads (STW) during GC.

Uses copying for young generation and mark‑compact for old generation.

JVM flags:

-XX:+UseSerialGC -XX:+UseSerialOldGC

ParNew Collector

Multithreaded version of Serial, default for the server VM.

Also uses copying for the young generation.

JVM flag:

-XX:UseParNewGC

Parallel Scavenge Collector (JDK 1.8 default)

Multithreaded copying collector focused on maximizing throughput.

Thread count defaults to the number of CPU cores.

JVM flags:

-XX:UseParallelGC

(young) and

-XX:UseParallelOldGC

(old).

Serial Old Collector

Single‑threaded mark‑compact collector for the old generation.

Parallel Old Collector (JDK 1.8 default)

Multithreaded mark‑compact collector for the old generation.

Prioritizes throughput; may cause longer STW pauses.

CMS (Concurrent Mark‑Sweep) Collector

Designed for low pause times, CMS uses a mark‑sweep algorithm and performs most work concurrently.

CMS Collection Steps

Initial Mark : STW pause to mark GC roots.

Concurrent Mark : Traverses the object graph without stopping application threads.

Remark : A short STW pause to catch changes that occurred during concurrent marking.

Concurrent Sweep : Reclaims unmarked regions while the application runs.

CMS Advantages and Drawbacks

Low pause times, suitable for latency‑sensitive services.

High CPU usage and inability to collect floating garbage.

Mark‑sweep can cause memory fragmentation;

-XX:+UseCMSCompactAtFullCollection

can mitigate this.

Possible "concurrent mode failure" leading to a full STW GC.

CMS Parameters

-XX:+UseConcMarkSweepGC

– enable CMS.

-XX:ConcGCThreads

– number of concurrent GC threads.

-XX:+UseCMSCompactAtFullCollection

– compact after a full GC.

-XX:CMSFullGCsBeforeCompaction

– how many full GCs before compaction.

-XX:CMSInitiatingOccupancyFraction

– trigger CMS when old generation reaches this percentage (default 92%).

-XX:+UseCMSInitiatingOccupancyOnly

– use only the specified occupancy threshold.

-XX:+CMSScavengeBeforeRemark

– perform a minor GC before the remark phase.

-XX:+CMSParallelInitialMarkEnabled

– multithreaded initial mark.

-XX:+CMSParallelRemarkEnabled

– multithreaded remark.

Three‑Color Marking and Write Barriers

Three‑Color Marking Algorithm

Black : Objects fully scanned (root and all reachable fields).

Gray : Object scanned but its fields not yet processed.

White : Unscanned objects; after the algorithm they are considered garbage.

Snapshot‑At‑The‑Beginning (SATB)

SATB records a snapshot of reachable objects at the start of a concurrent GC. New allocations after the start are treated as live, and write barriers record references that change during the marking phase.

Write Barrier

<code>void oop_field_store(oop* field, oop new_value) {
    // pre‑write barrier (record old value if needed)
    *field = new_value; // actual store
    // post‑write barrier (record new value for incremental update)
}</code>

The barrier ensures that reference changes are logged so the collector can maintain correctness during concurrent marking.

Read Barrier

<code>oop oop_field_load(oop* field) {
    pre_load_barrier(field); // record the read if in concurrent phase
    return *field;
}</code>

Read barriers are used by collectors such as ZGC to keep track of objects accessed during marking.

Remember Set and Card Table

To avoid scanning the entire old generation for cross‑generation references, the JVM maintains a Remember Set, implemented via a Card Table. Each card (typically 512 bytes) is marked dirty when a reference from an old object to a young object is created, allowing the GC to focus only on dirty cards.

References

"深入理解 Java 虚拟机第三版" – 周志明

Three‑color marking and write barriers – 路过的猪

JavaJVMMemory ManagementGarbage CollectionGC AlgorithmsCMSThree-Color Marking
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.