Why Does go build -race Crash with Auto‑Instrumentation? Inside Go’s Runtime

The article analyzes why using the auto‑instrumentation command `otel go build -race` causes a crash, tracing the failure to the injected runtime code, the Go‑C calling conventions, and a zeroed race context, and then presents practical fixes to prevent the crash.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
Why Does go build -race Crash with Auto‑Instrumentation? Inside Go’s Runtime

Recently, Alibaba Cloud ARMS, the compiler team, and the MSE team jointly released an open‑source Go compile‑time auto‑instrumentation that provides Java‑level monitoring with zero intrusion. Developers replace go build with otel go build to enable full monitoring and governance.

When users replace the normal go build -race with otel go build -race, the generated binary crashes. The -race flag enables the Go race detector, which adds extra checks to detect data races.

The crash stack trace shows the failure originates from __tsan_func_enter and the key point is runtime.contextPropagate. The tool inserts the following code at the beginning of runtime.newproc1:

func newproc1(fn *funcval, callergp *g, callerpc uintptr) (retVal0 *g) {
    // injected code
    retVal0.otel_trace_context = contextPropagate(callergp.otel_trace_context)
    ...
}

func contextPropagate(tls interface{}) interface{} {
    if tls == nil {
        return nil
    }
    if taker, ok := tls.(ContextSnapshoter); ok {
        return taker.TakeSnapShot()
    }
    return tls
}

func (tc *traceContext) TakeSnapShot() interface{} {
    ...
}
TakeSnapShot

is instrumented by the race detector, which inserts calls to racefuncenter() and racefuncexit(). This leads to a call chain:

racefuncenter (Go) → racecall (Go) → __tsan_func_enter (C)

Understanding the Go and C calling conventions on amd64, the first nine function arguments are passed in registers. The relevant registers are shown below:

For the System V AMD64 convention (used when Go calls C), the first six arguments are passed in RDI, RSI, RDX, RCX, R8, R9:

The analysis reveals that g_racectx(R14) is zero. In Go’s GMP model, R14 holds the current goroutine, which cannot be zero; the zero value comes from g0.racectx, which the runtime sets to zero at program start in main:

// src/runtime/proc.go#main
func main() {
    mp := getg().m
    // g0's racectx is only used as the parent of the main goroutine.
    mp.g0.racectx = 0
    ...
}

Because newproc1 runs on the g0 goroutine, the injected contextPropagate receives a zero racectx, causing __tsan_func_enter to dereference a null pointer and crash.

One fix is to mark TakeSnapShot with the compiler directive //go:norace, which tells the race detector to ignore memory accesses in that function, preventing the automatic insertion of racefuncenter(). However, the function also performs map initialization and iteration, which the compiler expands into calls like mapiterinit() that are hard‑coded to enable race checks and cannot be suppressed with //go:norace. The practical solution is to avoid using map data structures in the code injected into newproc1.

The runtime package itself is marked with NoInstrument via the pkgSpecials table, so the compiler skips race instrumentation for its code:

var pkgSpecialsOnce = sync.OnceValue(func() map[string]PkgSpecial {
    for _, pkg := range runtimePkgs {
        set(pkg, func(ps *PkgSpecial) {
            ps.Runtime = true
            ps.NoInstrument = true
        })
    }
    ...
})

In summary, the crash is caused by the injected contextPropagate calling TakeSnapShot under the race detector, which receives a zero racectx from the g0 goroutine. Adding //go:norace to TakeSnapShot and avoiding map usage in the injected code resolves the issue.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DebuggingconcurrencyRuntimeauto instrumentationRace Detector
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.