Golang Garbage Collection: Algorithms, Memory Management, and Evolution
The article details Go’s runtime memory architecture—including pages, spans, mcache, and size classes—and explains its non‑generational concurrent tri‑color mark‑and‑sweep garbage collector, the required write‑barrier techniques, and the collector’s evolution, phases, trigger mechanisms, and practical code examples across Go 1.0‑1.16.
Modern high‑level languages manage memory automatically. Go (since v1.12) uses a non‑generational, concurrent, tri‑color mark‑and‑sweep garbage collector (GC). The article explains Go's runtime memory layout, allocation structures, GC algorithms, write‑barrier techniques, and the evolution of the collector across Go versions.
1. Go runtime basics
Go schedules work using three entities: G (goroutine), M (OS thread), and P (processor). A goroutine runs only when a G, an M, and a P are combined.
2. Memory management structures (TCMalloc‑inspired)
Page – 8 KB on x64, the basic unit of memory.
Span – a contiguous set of pages; represented by mspan in the runtime.
mcache – per‑P cache for small objects (≤ 32 KB), providing lock‑free allocation.
mcentral – shared cache (similar to TCMalloc’s CentralCache) that supplies spans to mcache.
mheap – heap manager (analogous to PageHeap) that obtains pages from the OS and organizes them into spans.
Stack – each goroutine has its own stack storing locals, arguments, and pointers.
3. Object size classes
Tiny objects: size < 16 B, no pointers, allocated from mcache.
Small objects: 16 B – 32 KB, allocated from the appropriate mspan class.
Large objects: > 32 KB, allocated directly from mheap.
The core idea is multi‑level management to reduce lock contention and fragmentation.
4. Allocation flow
If mcache lacks a free span, it requests one from mcentral; if mcentral is empty, it asks mheap; if mheap lacks pages, it requests new pages (≥ 1 MB) from the OS.
5. Garbage‑collection algorithm
Go uses a tri‑color mark‑and‑sweep algorithm. Objects are colored white (unvisited), gray (reachable but not fully scanned), or black (fully scanned). The collector performs a stop‑the‑world (STW) pause to mark root objects, then proceeds concurrently:
Mark phase: roots are scanned, gray objects are processed, turning them black and turning reachable objects gray.
Mark termination: when the gray set is empty, the collector stops marking.
Sweep phase: unmarked (white) objects are reclaimed.
Because the mutator runs concurrently, write barriers are required to maintain the tri‑color invariant.
6. Write‑barrier techniques
Insert write barrier (Dijkstra, 1978) – shades the newly stored pointer gray before the store:
func DijkstraWritePointer(slot *unsafe.Pointer, ptr unsafe.Pointer) {
shade(ptr) // mark new downstream object gray
*slot = ptr
}
// usage examples
A.AddDownstream(nil, B) // B becomes gray
A.AddDownstream(C, B) // B becomes gray againDelete write barrier (Yuasa, 1990) – shades the old pointer gray when a reference is removed:
func YuasaWritePointer(slot *unsafe.Pointer, ptr unsafe.Pointer) {
shade(*slot) // old object becomes gray
*slot = ptr
}
// usage examples
A.AddDownstream(B, nil) // B becomes gray if it was white
A.AddDownstream(B, C) // B becomes gray againHybrid write barrier (Go 1.8) – combines insert and delete barriers to avoid full stack rescans while keeping the tri‑color invariant:
writePointer(slot, ptr) {
shade(*slot)
if currentStackIsGrey {
shade(ptr)
}
*slot = ptr
}These barriers ensure that newly reachable objects are marked gray and that objects whose last reference is removed are also marked gray, preventing both "floating" garbage and premature reclamation (dangling pointers).
7. Evolution of Go GC (v1.0 → v1.16)
v1.0 – fully stop‑the‑world, serial mark‑and‑sweep.
v1.1 – parallel marking/cleaning on multi‑core.
v1.3 – precise stack scanning.
v1.5 – concurrent tri‑color mark‑sweep, latency < 10 ms.
v1.6 – decentralized GC coordinator, bitmap‑based heap.
v1.7 – parallel stack shrinking, GC pause < 2 ms.
v1.8 – hybrid write barrier, pause < 0.5 ms.
v1.9 – removed full stack rescans.
v1.10 – improved pacing controller.
v1.12 – new mark‑termination algorithm.
v1.13 – Scavenger returns memory to OS.
v1.14 – new page allocator for faster allocation.
v1.15 – compiler changes to reduce write‑barrier overhead.
v1.16 – aggressive MADV_DONTNEED to release unused memory.
8. GC phases in the source (runtime/mgc.go)
Sweep termination – STW, safe‑point entry, clean remaining spans.
Mark phase – switch gcphase to _GCmark, enable write barriers, start concurrent workers, scan roots, process gray queue, use distributed termination detection.
Mark termination – STW, switch to _GCmarktermination, stop workers, perform housekeeping.
Sweep phase – switch to _GCoff, resume program, new allocations start as white, background sweeps free spans.
9. GC trigger conditions
The runtime decides to start a GC via runtime.gcTrigger.test based on three kinds of triggers:
//gcTrigger.test implementation (simplified)
func (t gcTrigger) test() bool {
if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
return false
}
switch t.kind {
case gcTriggerHeap:
return gcController.heapLive >= gcController.trigger
case gcTriggerTime:
lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
return lastgc != 0 && t.now-lastgc > forcegcperiod
case gcTriggerCycle:
return int32(t.n-work.cycles) > 0
}
return true
}Triggers are invoked from:
runtime.mallocgc – allocation‑driven heap trigger.
runtime.GC – explicit user request.
runtime.forcegchelper – periodic background trigger.
10. Example code snippets
GC‑finished callback:
func gcfinished() *int {
p := 1
runtime.SetFinalizer(&p, func(_ *int) { println("gc finished") })
return &p
}Allocation loop used for tracing:
func allocate() {
_ = make([]byte, int((1<<20)*0.25))
}
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
gcfinished()
for n := 1; n < 50; n++ {
println("#allocate: ", n)
allocate()
}
println("terminate")
}GC trigger test in mallocgc :
if shouldhelpgc {
if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
gcStart(t)
}
}Manual GC invocation:
func GC() {
n := atomic.Load(&work.cycles)
gcWaitOnMark(n)
gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
gcWaitOnMark(n + 1)
for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) {
sweep.nbgsweep++
Gosched()
}
for atomic.Load(&work.cycles) == n+1 && !isSweepDone() {
Gosched()
}
// post‑sweep bookkeeping omitted for brevity
}These snippets illustrate how the collector is wired into allocation, how write barriers are implemented, and how GC cycles are started and completed.
11. References
Go Language Design and Implementation (Draveness)
Expert comparison of Go and Java GC algorithms
Go runtime source (runtime/mgc.go)
CMS garbage collector article
Go 1.16 source code repository
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.