Understanding ZGC: Architecture, Key Features, and Execution Cycle
ZGC, introduced in JDK 11, is a non‑generational, highly concurrent Java garbage collector that uses colored pointers and load barriers to achieve sub‑10 ms pause times—even on terabyte‑scale heaps—while limiting throughput loss to under 15 % through three brief stop‑the‑world phases and four concurrent phases.
Garbage collection is an unavoidable topic for Java developers. Since JDK 11, Oracle has introduced ZGC, a next‑generation low‑latency collector that aims to support terabyte‑scale heaps, keep pause times under 10 ms (often in the micro‑second range), and limit throughput impact to less than 15 %.
ZGC is non‑generational; it divides the heap into pages rather than young and old generations. It is highly concurrent, allowing most GC work to run in parallel with application threads, and it uses a mark‑copy algorithm consisting of a marking phase, an object relocation phase, and a relocation (pointer‑fixup) phase.
The mark‑copy algorithm works as follows: during the marking phase, live objects reachable from GC roots are identified; in the relocation phase, those live objects are copied to new addresses; finally, the relocation phase updates all pointers to the new addresses. This approach prevents heap fragmentation and improves performance compared to traditional collectors such as CMS or G1.
Performance measurements (SPECjbb2015) show that ZGC achieves lower pause times—up to ten times less than G1—while maintaining comparable throughput, especially on a 128 GB heap.
Two core innovations enable ZGC’s low latency: Colored Pointers and Load Barriers. Colored Pointers embed metadata in the high bits of a 64‑bit pointer to indicate its state (good or bad) without extra memory accesses. Load Barriers replace write barriers; they are triggered when a thread reads a reference and can self‑heal stale pointers by consulting forwarding tables.
Example of a load‑barrier invocation:
var x = obj.fieldThe underlying implementation distinguishes a fast path (when the pointer is already good) from a slow path that computes the correct address and repairs the pointer.
Concurrent marking is performed by GC threads while application threads continue running. A simplified version of the concurrent marking loop is shown below:
while (obj in mark_stack) {
// Mark live object; succeed only if not already marked
success = mark_obj(obj);
if (success) {
for (e in obj->ref_fields()) {
MarkBarrier(slot_of_e, e);
}
}
}
void MarkBarrier(uintptr_t *slot, unintptr_t addr) {
if (is_null(addr)) return;
if (is_pointing_into(addr, EC)) {
good_addr = remap(addr);
} else {
good_addr = good_color(addr);
}
mark_stack->add(good_addr);
self_heal(slot, addr, good_addr);
}
void LoadBarrier(uintptr_t *slot, unintptr_t addr) {
if (is_null(addr)) return;
if (is_pointing_into(addr, EC)) {
good_addr = remap(addr);
} else {
good_addr = good_color(addr);
}
mark_stack->add(good_addr);
self_heal(slot, addr, good_addr);
return good_addr;
}ZGC’s execution cycle consists of three stop‑the‑world (STW) pauses and four concurrent phases:
Initial Mark (STW1): set up address views, allocate new pages, and mark root objects.
Concurrent Mark (M/R): traverse the object graph in parallel, updating page activity metrics.
Re‑Mark (STW2): finalize marking and handle any remaining objects.
Concurrent Evacuation Preparation (EC): select pages with the most garbage for evacuation.
Initial Evacuation (STW3): adjust address views to the Remapped space and relocate TLAB objects.
Concurrent Evacuation (RE): copy objects from selected pages, update forwarding tables, and free old pages.
A simplified relocation routine used during the concurrent evacuation phase is:
for (page in EC) {
for (obj in page) {
relocate(obj);
}
}
unintptr_t relocate(unintptr_t obj) {
ft = forwarding_tables_get(obj);
if (ft->exist(obj)) {
return ft->get(obj);
}
new_obj = copy(obj);
if (ft->insert(obj, new_obj)) {
return new_obj;
}
dealloc(new_obj);
return ft->get(obj);
}In summary, ZGC’s highly concurrent design—driven by colored pointers and load barriers—delivers millisecond‑level pause times even for multi‑terabyte heaps, making it a strong candidate for latency‑sensitive Java workloads.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.