Go Black Magic: Accessing Runtime Functions and Integrating C/Assembly for Performance
This article demonstrates three Go black‑magic techniques—using go:linkname to call private runtime functions like memmove and growslice, employing cgo to invoke C code, and embedding Plan 9 assembly—to bypass safety checks and achieve significant performance improvements, though they rely on unsafe practices.
This article explores several advanced techniques in Go that allow developers to bypass the language's safety restrictions and achieve higher performance. The focus is on three "black‑magic" tricks: calling private runtime functions via go:linkname , invoking C code with cgo , and embedding Plan 9 assembly directly in Go.
1. Calling private runtime functions (memmove)
Go packages hide functions that start with a lowercase letter. By using the go:linkname directive, a developer can link a local stub to a private runtime function such as runtime.memmove . The article shows the stub declaration and a test that copies a byte slice using the linked memmove function.
package test
// private
func abs() {}
// public
func Abs() {}
//go:noescape
//go:linkname memmove runtime.memmove
func memmove(to unsafe.Pointer, from unsafe.Pointer, n uintptr)
type GoSlice struct { Ptr unsafe.Pointer; Len int; Cap int }
func Test_memmove(t *testing.T) {
src := []byte{1,2,3,4,5,6}
dest := make([]byte, 10)
memmove((*GoSlice)(unsafe.Pointer(&dest)).Ptr, (*GoSlice)(unsafe.Pointer(&src)).Ptr, unsafe.Sizeof(byte(0))*6)
// ... dump results ...
}The same approach is applied to runtime.growslice , which is the internal function used by the built‑in append to grow slices. The article defines the necessary Go type representations ( GoType , GoEface ) and shows how to call growslice via go:linkname to manually expand a slice.
type GoType struct {
Size uintptr; PtrData uintptr; Hash uint32; Flags uint8; Align uint8; FieldAlign uint8; KindFlags uint8; Traits unsafe.Pointer; GCData *byte; Str int32; PtrToSelf int32
}
type GoEface struct { Type *GoType; Value unsafe.Pointer }
//go:linkname growslice runtime.growslice
func growslice(et *GoType, old GoSlice, cap int) GoSlice2. Calling C code with cgo
Using cgo , Go can call arbitrary C functions. The article provides a simple example that calls a C wrapper around sbrk to allocate raw memory, and another example that calls an inline assembly function Add defined in C.
/*
#include
#include
static void* Sbrk(int size) { void *r = sbrk(size); if(r == (void *)-1){ return NULL; } return r; }
*/
import "C"
import "fmt"
func main() {
mem := C.Sbrk(C.int(100))
defer C.free(mem)
fmt.Println(mem)
}cgo introduces overhead and disables some Go optimisations; the article links to a discussion of its drawbacks.
3. Embedding Plan 9 assembly (isspace, u32toa_small)
To avoid cgo’s performance penalty, the article shows how to compile C code to AT&T‑style assembly with clang , convert it to Plan 9 assembly using the asm2asm tool, and then link it with Go. A simple isspace function and a more complex integer‑to‑string converter u32toa_small are used as examples.
// ./inner/op.h
#ifndef OP_H
#define OP_H
char isspace(char ch);
#endif
// ./inner/op.c
#include "op.h"
char isspace(char ch) { return ch == ' ' || ch == '\r' || ch == '\n' || ch == '\t'; } //go:nosplit
//go:noescape
func __isspace(ch byte) (ret byte)After conversion, the generated Plan 9 assembly is linked and tested with Go unit tests, confirming correct behaviour.
//go:nosplit
//go:noescape
func __u32toa_small(out *byte, val uint32) (ret int)Benchmarks compare the standard library strconv.Itoa with the custom __u32toa_small . The custom implementation roughly halves the execution time (≈9 ns/op vs ≈19 ns/op).
BenchmarkGoConv-12 60740782 19.52 ns/op
BenchmarkFastConv-12 122945924 9.455 ns/opConclusion
The two techniques—linking private runtime functions and integrating C/assembly—provide substantial performance gains and are already employed in production (e.g., ByteDance’s sonic library). While they rely on unsafe and are not encouraged for everyday code, they are valuable tools when the standard library cannot meet strict performance requirements.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.