How Does the JVM’s Three‑Color Marking Algorithm Optimize Garbage Collection?
The article explains JVM garbage‑collection techniques, comparing simple reference‑counting and reachability analysis, then details the three‑color marking algorithm—including its phases, color semantics, step‑by‑step process, and common issues like over‑marking and under‑marking—followed by solutions used in CMS and G1 collectors.
When the JVM performs garbage collection, it must first determine which objects can be reclaimed and which cannot. The following methods are used to decide object survivability:
(1) Reference Counting Each object has a counter that increments when referenced and decrements when a reference is removed. This method is simple and efficient for detection, but cannot handle cyclic references, so it is not widely used.
(2) Reachability Analysis (Reference Chain Method) Starting from GCRoots, the entire reference chain is scanned to find all reachable objects; the remaining objects are considered unreachable garbage.
Reachability analysis is the algorithm currently used to determine object liveness.
1. Why Use the Three‑Color Marking Algorithm
One implementation of reachability analysis starts from GCRoots and uses a mark‑sweep algorithm, which consists of a marking phase and a sweeping phase.
During the marking phase, the entire reference chain is scanned from GCRoots to mark all reachable objects.
In the sweeping phase, the unreachable objects are identified and reclaimed. This approach requires a stop‑the‑world (STW) pause, leading to long GC pauses in earlier collectors.
To address the drawbacks of the mark‑sweep algorithm, the three‑color marking algorithm was introduced and is used by JVM’s CMS and G1 garbage collectors.
2. How the Three‑Color Marking Algorithm Works
The algorithm classifies objects into three colors:
White : Unmarked objects, the default state; these are candidates for reclamation.
Gray : Marked objects whose references have not yet been examined; they are alive, but their referenced objects may still be unmarked.
Black : Objects that have been fully processed; both the object and all objects it references are considered alive.
The execution steps are:
(1) Create three sets (white, gray, black) and initially place all objects in the white set.
(2) Starting from the root nodes, move reachable objects from the white set to the gray set.
(3) Process the gray set: for each gray object, move any reachable white objects it references to the gray set, then move the processed gray object to the black set.
Repeat step (3) until the gray set is empty; the remaining white objects are the garbage to be reclaimed.
The three‑color marking algorithm thus identifies garbage by distinguishing white, gray, and black objects.
3. Problems of the Three‑Color Marking Algorithm
In CMS, concurrent marking can cause over‑marking and under‑marking when user threads modify object references during the marking phase.
(1) Under‑marking occurs when a white object becomes unreachable but is not marked because a black object still references it after being marked.
(2) Over‑marking happens when an object that has become unreachable is still marked gray, leading to it being treated as alive.
4. Solutions to the Three‑Color Marking Issues
Under‑marking requires two conditions: (1) at least one black object points to a white object after being marked, and (2) all gray objects delete references to white objects before their own scanning completes. Breaking either condition prevents under‑marking.
CMS resolves under‑marking with an incremental update scheme using a write‑behind barrier: newly added references are recorded, and after the concurrent phase, those objects are rescanned as gray, requiring a brief STW pause.
G1 addresses under‑marking with an original snapshot approach using a write‑ahead barrier: before a reference is removed, it is saved; after scanning, the saved references are turned gray and rescanned, and new references are tracked with a TAMS pointer.
While the original snapshot can create floating garbage—temporarily reviving objects that should be reclaimed—it is acceptable because the next GC cycle will eventually collect them.
4.3 Over‑marking Issue
During concurrent marking, objects previously marked as alive may lose references and become unreachable, leading to over‑marking. This creates floating garbage but is typically reclaimed in subsequent GC cycles, so the impact is limited.
Summary
(1) The three‑color marking algorithm is an implementation of reachability analysis that identifies all reachable objects.
(2) It improves upon the inefficient mark‑sweep method by using white, gray, and black colors to track object states.
(3) The algorithm can suffer from over‑marking and under‑marking, with under‑marking being the more severe issue because it may reclaim live objects. CMS uses incremental updates, while G1 employs an original snapshot technique to mitigate under‑marking.
Lobster Programming
Sharing insights on technical analysis and exchange, making life better through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.