How We Detect and Diagnose Main‑Thread Lag in WeChat iOS
This article explains the causes of UI stutters on iOS, outlines a thread‑stack‑dump monitoring approach, describes detection thresholds, classification methods, sampling strategies, and shares the practical results of deploying the solution in WeChat.
Introduction
WeChat iOS engineers frequently receive reports of occasional UI stutters when switching from background to foreground, when dialogs freeze for several seconds, or when tab switches feel sluggish. These reports are hard to reproduce because they may occur only on specific devices or at particular moments, and logging is often insufficient.
Root Causes
Deadlock: the main thread holds lock A while waiting for lock B, and a background thread holds lock B while waiting for lock A.
Lock contention: the main thread accesses the database while a background thread inserts a large amount of data, causing brief pauses.
Heavy I/O on the main thread: writing large amounts of data directly from the UI thread.
Heavy computation on the main thread: inefficient algorithms that consume excessive CPU.
Intensive UI rendering: complex layouts or rich text that require a lot of drawing work.
Diagnosis Strategy
For each cause, different diagnostics are suggested:
Deadlocks usually accompany crashes and can be analyzed via crash reports.
For lock contention, it is essential to identify which thread holds the lock.
Heavy I/O and computation can be tracked by adding timing logs at function entry and exit.
UI rendering stalls are often inside system functions, making it difficult to add log points.
The key idea is to spawn a monitoring thread that watches the main thread; when a stall is detected, it dumps the stack of all threads.
Implementation Details
How to know the main thread is lagging?
What monitoring frequency and strategy should the helper thread use without causing noticeable performance or battery impact?
How to classify the captured stack traces?
How often will dump files be generated and how large will they be?
Should reports be sent in full or sampled to balance data usefulness and traffic cost?
Lag Detection Criteria
Two practical thresholds are used:
CPU usage exceeding 100%.
Main‑thread RunLoop execution longer than 2 seconds.
Detection Strategy
Memory dump: every 1 second the monitor checks the main thread; if a lag is found, it dumps all thread stacks to memory.
File dump: if the new stack differs from the previous one, it is written to a file; otherwise the check interval follows the Fibonacci sequence (1, 1, 2, 3, 5, 8…) to avoid redundant dumps.
Classification Method
Instead of using crash‑report categories, a two‑level classification is applied:
First level: group by the innermost two stack frames.
Second level: within each first‑level group, further group by the innermost four frames, allowing separation of different business scenarios that share the same root cause.
Operational Considerations
In a gray‑release test, each user generated about 30 dump files per day, consuming roughly 300 KB of upload traffic. To limit impact, a 5% sampling rate is used, and sampled users upload only the first 20 dumps per day with compression. A whitelist allows forced reporting for critical cases, and dumps are retained for seven days before automatic deletion.
Results
Since the gray rollout in WeChat 5.3.1, the main‑thread lag monitor has helped resolve issues that were previously hard to locate, such as severe stutter when switching from a subscription‑heavy account (500+ subscriptions) and occasional delays when loading contacts (over 1 k friends).
Future Work
Mobile client performance optimization remains a broad and fast‑evolving field. Possible next steps include exploring system‑level hooks (e.g., intercepting msgSend) for fine‑grained timing, handling cases where the main thread consumes 100% CPU so the monitoring thread still gets CPU time, and extending the approach to other platforms like Android’s ANR detection.
WeChat Client Technology Team
Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.