Shenandoah GC’s Concurrent Thread Stack Processing Slashes Pause Times to Sub‑millisecond in JDK 17
This article explains how Shenandoah OpenJDK's new concurrent thread‑stack processing, introduced in JDK 17, dramatically reduces garbage‑collection pause times to sub‑millisecond levels by using stack watermarks, and provides practical configuration guidance and benchmark results.
1. Introduction
Shenandoah OpenJDK garbage‑collection (GC) was created to minimize GC pause times. The original Shenandoah collector appeared in JDK 12, providing concurrent heap evacuation, and was back‑ported to JDK 11. Subsequent releases added concurrent class unloading (JDK 14) and concurrent reference processing (JDK 16), further cutting pauses. JDK 17 finally introduced concurrent thread‑stack processing, delivering reliable sub‑millisecond pauses.
2. Thread handling in Java
Each Java thread has its own stack containing frames with local variables, monitors, and references to heap objects. During a GC cycle the VM scans all thread stacks at a safe point to seed the marking queue, then updates those references after object evacuation. Scanning and updating stacks can take milliseconds for small workloads and tens of milliseconds for large server applications, directly affecting overall latency.
3. Concurrent thread processing in OpenJDK 17
Shenandoah solves this by using a "stack watermark" mechanism (originally from ZGC). All stack frames are static while a method is executing, allowing GC threads to scan them safely. When a frame is destroyed (return or exception), the GC coordinates with the executing thread via the watermark, indicating which part of the stack can be scanned concurrently.
4. Using stack watermarks during GC
During the initial pause of a marking phase, Shenandoah sets the watermark to the top frame of each thread, marking the stack as safe for concurrent scanning. After the safe point, the application resumes while the GC processes frames above the watermark. The process works as follows:
Lower the watermark by one frame.
Prevent GC threads from scanning beyond the new watermark.
Process references in frames that are now above the watermark.
The result is that the same frames and references are scanned as in a stop‑the‑world pause, but the work is overlapped with application execution.
5. Shenandoah GC benchmarks
JDK version | Initial Mark (µs) | Final (µs)
-----------|-------------------|----------
JDK 11 | 421 | 1294
JDK 16 | 321 | 704
JDK 17 | 63 | 3286. Availability
Shenandoah’s availability varies by vendor and JDK version. By default OpenJDK 12+ builds include Shenandoah; OpenJDK 11 requires explicit inclusion.
Red Hat: Fedora 24+ OpenJDK 8+, RHEL 7.4+ (technical preview), Windows OpenJDK 8u.
Amazon: Shenandoah available in Amazon Corretto starting with 11.0.9.
Oracle: Not shipped in any version.
Azul: Shenandoah shipped in Azul Zulu starting with 11.0.9.
OpenJDK: Default binary includes Shenandoah from 11.0.9.
Linux distributions: Debian, Gentoo (USE flag), CentOS, Oracle Linux, Amazon Linux, etc., provide Shenandoah.
7. Enabling Shenandoah
Run your Java application with the JVM option -XX:+UseShenandoahGC :
<code>java <PATHTOYOURAPPLICATION> -XX:+UseShenandoahGC</code>7.1 Modes
Shenandoah supports three main modes, selectable with -XX:ShenandoahGCMode= :
normal/satb (default): Uses Snapshot‑At‑The‑Beginning marking, similar to G1.
iu (experimental): Uses Incremental Update marking, mirroring SATB but intercepting writes to newly created objects.
passive (diagnostic): Runs stop‑the‑world GC for testing or precise measurement.
7.2 Basic configuration
Common logging and performance options:
-Xlog:gc (JDK 9+) or -verbose:gc (JDK 8) prints GC timing.
-Xlog:gc+ergo (JDK 9+) or -XX:+PrintGCDetails (JDK 8) logs heuristic decisions.
-Xlog:gc+stats (JDK 9+) or -verbose:gc (JDK 8) prints a summary table at the end.
-XX:+AlwaysPreTouch : Pre‑touches heap pages to reduce latency spikes.
-Xms and -Xmx set a fixed heap size; matching them eliminates heap resizing overhead.
-XX:+UseLargePages or -XX:+UseTransparentHugePages enable huge pages for large heaps.
-XX:+UseNUMA : Enables NUMA interleaving on multi‑socket hosts.
-XX:-UseBiasedLocking : Disables biased locking for latency‑sensitive workloads.
-XX:+DisableExplicitGC : Prevents user‑initiated System.gc() from forcing extra GC cycles.
8. Conclusion
The article demonstrates how Shenandoah GC’s concurrent thread‑stack processing eliminates the remaining GC pause overhead, providing reliable sub‑millisecond pauses in JDK 17. For more details, see the Shenandoah GC project’s GitHub repository and the OpenJDK Wiki.
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.