Fundamentals 32 min read

Crash Analysis of C++ LooperObserverMan: Memory Reordering and Segmentation Fault

The crash in asl::LooperObserverMan::notifyIdle was traced to a compiler‑level store‑store reordering that exposed an uninitialized observer pointer, causing an illegal memory access, and was fixed by inserting a memory barrier or using proper atomic ordering to enforce correct initialization order.

Amap Tech
Amap Tech
Amap Tech
Crash Analysis of C++ LooperObserverMan: Memory Reordering and Segmentation Fault

This article provides a detailed replay of a crash case analysis. It reviews C++ polymorphism, class memory layout, PC pointer and chip exception handling, and memory‑barrier concepts.

1.1 View Crash Call Stack

(gdb) bt
#0  0x0000000078432d68 in asl::LooperObserverMan::notifyIdle(this=<optimized out>, looper=0x160eebd40, delay_queue_size=0) at ../../../../src/asl_message_framework/src/BaseMessageLooper.cpp:371
#1  0x00000000784928e4 in asl::MessageQueue::fetchNext(this=<this@entry>=0x160eedfc0, timing=@0xf4e9f60: 0) at ../../../../src/asl_message_framework/src/MessageQueue.cpp:83
#2  0x0000000078492b24 in asl::MessageQueue::next(this=0x160eedfc0, timing=@0xf4e9f60: 0) at ../../../../src/asl_message_framework/src/MessageQueue.cpp:60
#3  0x000000007832036c in asl::Looper::loop(this=0x160eebd40) at ../../../../src/asl_message_framework/src/Looper.cpp:107
#4  0x0000000078495ee0 in asl::MessageThread::run(this=0x7998e678) at ../../../../src/asl_message_framework/src/MessageThread.cpp:56
#5  0x000000007851cc70 in asl::Thread::runCallback(param=0x7998e678) at ../../../../src/asl_message_framework/src/Thread.cpp:183
#6  0x0000000010314e0 in ?? ()

The crash occurs in asl::LooperObserverMan::notifyIdle() at line 371 of BaseMessageLooper.cpp . The source code (simplified) is shown below:

// ... source code snippet ...

1.2 Unexpected Segmentation Fault Location

The crash reports a segmentation fault, which usually means an illegal address access. The analysis suspects that the node->observer pointer is either null or a dangling pointer, causing the fault.

2.1 Amplify Source with Assembly

One line of C++ can expand to many assembly instructions. The disassembly of notifyIdle is:

(gdb) disas
Dump of assembler code for function asl::LooperObserverMan::notifyIdle(asl::IMessageLooper*, int):
0x0000000078432d30: stp x19, x20, [sp,#-48]!
0x0000000078432d34: stp x21, x22, [sp,#16]
0x0000000078432d38: str x30, [sp,#32]
0x0000000078432d3c: ldr x19, [x0]
0x0000000078432d40: cbz x19, 0x78432d8c
0x0000000078432d44: mov x22, x1
0x0000000078432d48: mov w21, w2
0x0000000078432d4c: adrp x20, 0x786b0000
0x0000000078432d50: b 0x78432d60
0x0000000078432d54: nop
0x0000000078432d58: ldr x19, [x19,#8]
0x0000000078432d5c: cbz x19, 0x78432d8c
=> 0x0000000078432d68: ldr x2, [x0]
0x0000000078432d6c: ldr x3, [x2,#56]
0x0000000078432d70: cmp x3, x1
0x0000000078432d74: b.eq 0x78432d58
0x0000000078432d78: mov w2, w21
0x0000000078432d7c: mov x1, x22
0x0000000078432d80: blr x3
0x0000000078432d84: ldr x19, [x19,#8]
0x0000000078432d88: cbnz x19, 0x78432d60
0x0000000078432d8c: ldp x21, x22, [sp,#16]
0x0000000078432d90: ldr x30, [sp,#32]
0x0000000078432d94: ldp x19, x20, [sp],#48
0x0000000078432d98: ret
End of assembler dump.

The crash happens at the highlighted instruction ldr x2, [x0] . Register x0 holds the value 0x2e002e , which cannot be accessed:

(gdb) i register x0
x0 0x2e002e 3014702
(gdb) x 0x2e002e
0x2e002e: Cannot access memory at address 0x2e002e

Thus the direct cause of the crash is an illegal memory access.

3.1 Analyze Assembly for Clues

Examining the three instructions before the fault:

0x0000000078432d60 <+48>: ldr x0, [x19]
0x0000000078432d64 <+52>: ldr x1, [x20,#1160]
=> 0x0000000078432d68 <+56>: ldr x2, [x0]

Register x19 points to a valid ObserverNode (value 0x17bb988e0 ), whose observer member is 0x7998e758 . The observer’s v‑table is also valid, so the pointer itself is not corrupted.

3.2 Suspected Causes

Two hypotheses were considered:

Memory corruption (dangling pointer).

Use of an uninitialized variable.

Further source inspection shows that new_node->observer = observer is performed before _observers = new_node , suggesting that the observer should be initialized before the list head becomes visible to other threads.

3.3 Reorder Possibility

Because the two assignments are independent, the compiler may reorder them. The generated assembly for the QNX platform confirms this reorder:

pOVar1 = (ObserverNode *)operator.new(0x10); // allocate new_node
this->_observers = pOVar1;                     // _observers = new_node
pOVar1->observer = observer;                 // new_node->observer = observer
pOVar1->next = (ObserverNode *)0x0;           // new_node->next = NULL

Thus a reading thread can see a non‑null _observers while observer is still uninitialized, leading to the observed crash.

3.4 Demo Verification

A minimal reproducer was built with a writer thread calling addObserver and a reader thread calling notifyIdle . Stress testing on the client side produced 10 errors out of 217 258 runs, confirming the race.

Adding an explicit memory barrier prevented the reorder:

bool LooperObserverMan::addObserver(Observer * observer) {
    // ...
    ObserverNode * new_node = new ObserverNode();
    new_node->next = NULL;
    new_node->observer = observer;
    __asm__ __volatile__("":::"memory"); // memory barrier
    if (node == NULL)
        _observers = new_node;
    else
        node->next = new_node;
    return true;
}

After inserting the barrier, the assembly shows the correct order (observer set before the list head is published), and the crash disappears.

4. Final Conclusion

The immediate cause is an illegal memory access caused by reading node->observer before it is initialized. The root cause is compiler‑level store‑store reordering in the addObserver implementation. Adding a memory barrier or redesigning the code to avoid such lock‑free patterns resolves the issue.

Knowledge Recap

C++ polymorphism and class memory layout.

ARM PC/ELR handling for synchronous data‑abort exceptions.

Memory‑ordering guarantees and the need for explicit barriers in lock‑free code.

Takeaways

When high‑level code does not reveal a bug, inspecting the generated assembly can provide the needed resolution.

Lock‑free designs must consider compiler and CPU reordering; use std::atomic or memory barriers to enforce ordering.

Understanding the platform’s exception model (e.g., ARM ELR) helps pinpoint the exact faulting instruction.

For further reading, see the recommended articles at the end of the original document.

Debuggingconcurrencyccrash analysisMemory Reordering
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.