Understanding and Solving Android UI Jank and ANR with MDAP LooperMonitor
The article explains how Android UI jank and ANR stem from the Looper‑MessageQueue, and introduces MDAP’s LooperMonitor—a low‑overhead tool that records each message’s target, timing and stack, aggregates data across multiple loopers, visualizes it, and enables developers to quickly pinpoint and resolve the exact cause of stalls.
Jank and ANR (Application Not Responding) are critical performance problems for Android client developers. Jank manifests as dropped frames, stuttered scrolling, or slow touch response, while ANR appears when the main thread is blocked for several seconds, causing the system to show an "Application not responding" dialog.
Both issues stem from the Android Looper‑MessageQueue mechanism. The Looper continuously pulls Message objects from the MessageQueue and dispatches them to their targets. UI rendering, system component scheduling, and developer‑defined Handler messages are all processed through this loop.
The UI rendering path is driven by Choreographer , which requests a VSync signal from the hardware, creates a doFrame message, and posts it to the main thread queue. A doFrame message must finish within one VSync interval (≈16 ms) to keep the device at full frame rate. If the message exceeds the interval, the next VSync cannot be requested in time, a frame is dropped, and the app feels sluggish.
ANR occurs when a critical system message (e.g., service creation, input dispatch) cannot be processed within a predefined timeout (5 s for input, up to 20 s for foreground services). The timeout is set by the system when it posts a "bomb" message such as MessageQueue.nativePollOnce . If the bomb fires, the system reports an ANR.
Most APM SDKs capture ANR by listening for the ANR signal and dumping the current thread stack. This post‑mortem approach often yields stacks that point to generic locations like MessageQueue.nativePollOnce , which are not helpful for root‑cause analysis.
MDAP (Multi‑Dimension‑Analysis‑Platform) introduces LooperMonitor , a monitoring component that records the execution history of every message in the Looper for a configurable time window (e.g., 500 ms for jank, 10 s for ANR). The monitor captures:
Message target (the Handler instance)
Message callback (the Runnable if posted directly)
Message what (system message type)
Start and end timestamps, wall‑time and CPU‑time
Pending messages that have not yet been dispatched
Key design points:
1. Monitoring entry
The monitor hooks the Looper dispatch loop. The relevant source snippet (simplified) is:
for (;;) {
Message msg = queue.next(); // might block
final Printer logging = me.mLogging;
if (logging != null) {
logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what);
}
try {
msg.target.dispatchMessage(msg);
// record end time, etc.
} catch (Exception e) {
throw e;
} finally {
if (logging != null) {
logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
}
}
}Instead of relying on string parsing, LooperMonitor replaces the Printer with a lightweight observer that receives the Message object directly, eliminating temporary string allocation.
2. Multi‑Looper support
Looper is also used in React‑Native threads ( NativeModulesMessageQueueThread , JSMessageQueueThread ). LooperMonitor creates a LooperRecorder per Looper instance, maps it by thread ID, and uses a lock‑free design (object pools, single‑threaded recorders) to keep overhead minimal.
3. Data collection
Messages are classified by execution time:
Fast messages (<30 ms) are aggregated; only the last message’s target , callback , total duration, and occurrence time are stored.
Medium messages (30 ms – 200 ms) keep full metadata for tracing.
Slow messages (>200 ms) also record a stack trace. Two stack‑capture strategies are supported:
Strategy 1 – Compile‑time instrumentation : Inserts tracing code at method entry/exit, generates compact IDs, and maps them to method names.
Strategy 2 – Runtime timeout detection : Uses coroutines to asynchronously capture the stack when a message exceeds its threshold, filtering out idle periods to avoid noisy nativePollOnce stacks.
When both strategies are available, the system prefers the lightweight instrumented trace.
4. Memory optimisation
LooperMonitor employs object pooling for MessageInfo records (pool size ≈ 50) and aggregates small messages to keep the record queue bounded (max ≈ 500 entries). Rolling eviction removes the oldest records or those older than the configured window (e.g., 10 s). This limits memory usage to a few hundred kilobytes even under heavy load.
5. Visualisation
Collected data is sent to MDAP where it is displayed in several dashboards:
MDAP Looper board : A timeline of message execution colored by type; the rightmost bar represents the moment of jank/ANR.
Realtime floating window : Live bar chart of recent message durations.
Debug detail view : Separate panels for pending messages (showing timeout offsets) and historical dispatch data (wall‑time vs. CPU‑time).
These visual tools allow engineers to pinpoint the exact message that caused a stall and to see whether the main thread was blocked, starved of CPU, or waiting on I/O.
6. Case studies
The article presents three real‑world ANR cases:
Input‑dispatch timeout where the main thread showed a long wall‑duration but near‑zero CPU time, indicating a blocked thread.
Service‑creation timeout where a CREATE_SERVICE message stayed pending for >10 s; the actual long‑running messages were identified by scrolling back 10 s in the history panel.
High kernel‑time usage caused by a misbehaving Realm I/O thread that consumed >40 % CPU, leading to severe jank despite normal system load.
In each case, LooperMonitor’s detailed timeline and stack information enabled developers to locate the root cause far more quickly than traditional ANR dumps.
Conclusion
LooperMonitor provides a low‑overhead, high‑precision view of the Android Looper, turning opaque ANR and jank symptoms into actionable data. Combined with MDAP’s aggregation and visualisation, it helps mobile teams reduce latency, improve user experience, and accelerate debugging of complex performance issues.
Shopee Tech Team
How to innovate and solve technical challenges in diverse, complex overseas scenarios? The Shopee Tech Team will explore cutting‑edge technology concepts and applications with you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.