Why Strict Aliasing Matters: Deep Dive into LLVM TBAA and TypeSanitizer
This article explains the Strict Aliasing Rule, how compilers use Type‑Based Alias Analysis (TBAA) for optimization, demonstrates LLVM’s metadata‑based implementation, introduces the TypeSanitizer tool for detecting aliasing violations, and offers practical guidance to avoid common pitfalls in C/C++ code.
Strict Aliasing Rule Overview
The Strict Aliasing Rule is a compiler optimization rule that assumes pointers of different types refer to independent memory regions, allowing the compiler to omit redundant loads. Violating this rule can lead to undefined behavior.
Example Demonstrating the Rule
<code>#include <stdio.h>
#include <math.h>
float i_am_clever(unsigned int *i, float *f) {
if (!isnan(*f))
*i ^= 1 << 31;
return *f; // Do we need to load *f again here?
}
int main() {
float f = 5;
f = i_am_clever((unsigned int *)&f, &f);
printf("%f\n", f);
}
</code>Compiled with -O3 , both GCC and Clang output 5.000000 . With -O0 or -fno-strict-aliasing , the result becomes -5.000000 because the code violates the Strict Aliasing Rule, introducing undefined behavior.
C++ Standard Specification
In C++17 and C++20 the standard lists the types through which an object may be accessed without causing undefined behavior. The allowed types include the dynamic type, cv‑qualified versions, similar types, signed/unsigned equivalents, aggregates containing such types, base class types, and char / unsigned char / std::byte .
Type‑Based Alias Analysis (TBAA)
TBAA is a classic alias‑analysis algorithm that determines whether two pointers may alias based on their static types. Compilers like Clang enable TBAA at optimization levels above -O0 . Adding -fno-strict-aliasing disables TBAA.
LLVM Implementation of TBAA
LLVM uses metadata‑based TBAA. The front‑end attaches metadata to load / store instructions, describing the high‑level type information. Example IR:
<code>%struct.S1 = type { i32, i64 }
%struct.S2 = type { float, double, %struct.S1 }
define i32 @_Z3fooP2S1P2S2(ptr %p1, ptr %p2) {
entry:
store i32 1, ptr %p1, align 8, !tbaa !5
%s = getelementptr inbounds %struct.S2, ptr %p2, i64 0, i32 2
store i32 2, ptr %s, align 8, !tbaa !11
%0 = load i32, ptr %p1, align 8, !tbaa !5
ret i32 %0
}
!5 = !{!6, !6, i64 0}
!6 = !{!"int", !7, i64 0}
!7 = !{!"omnipotent char", !8, i64 0}
!8 = !{!"Simple C++ TBAA"}
!11 = !{!12, !6, i64 16}
!12 = !{!"_ZTS2S2", !13, i64 0, !14, i64 8, !6, i64 16}
</code>TypeSanitizer
TypeSanitizer (TySan) is a dynamic analysis tool that detects type‑based aliasing violations at runtime. It consists of three parts: shadow mapping, compile‑time instrumentation, and a runtime library.
Shadow Mapping
<code>MemToShadow(addr) = (addr & SHADOW_MASK) * sizeof(void*) + SHADOW_OFFSET</code>Each byte of application memory has a shadow entry of pointer size that records its type descriptor.
Instrumentation
<code>void instrumentMemoryAccess(Instruction *I) {
MemoryLocation ML = MemoryLocation::get(I);
void *Ptr = ML.Ptr;
uint64_t AccessSize = ML.Size.getValue();
void *TD = TypeDescriptor(ML.AATags.TBAA);
__tysan_access_callback(Ptr, AccessSize, TD);
}
</code>Runtime Library
<code>void __tysan_check(void *Ptr, uint64_t AccessSize, void *TD) {
void **Shadow = MemToShadow(Ptr);
void *ShadowTD = Shadow[0];
if (!isAliasingLegal(TD, ShadowTD))
reportError();
for (uint64_t i = 1; i < AccessSize; ++i) {
ShadowTD = Shadow[i];
if (ShadowTD && !isAliasingLegal(TD, ShadowTD))
reportError();
}
}
</code>Case Study
Using the earlier i_am_clever example, TypeSanitizer detects a violation when the compiler incorrectly assumes aliasing between unsigned int * and float * . The runtime reports an error with details about the offending memory address and types.
Avoiding Strict Aliasing Violations
The most common violation is type punning via casts or unions. The safe C++ way is to use std::memcpy to reinterpret bits without breaking the aliasing rule.
<code>void func(double d) {
std::int64_t n;
std::memcpy(&n, &d, sizeof d); // OK
printf("%" PRId64 "\n", n);
}
</code>References
What is the Strict Aliasing Rule and Why do we care? – Shafik Yaghmour
[RFC] Design of a TBAA sanitizer – llvm-dev mailing list
The Type Sanitizer: Free Yourself from -fno-strict-aliasing – LLVM Developers' Meeting
Reviving TypeSanitizer – LLVM Discussion Forums
ByteDance SYS Tech
Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.