Mobile Development 10 min read

How We Detect and Diagnose Main‑Thread Lag in WeChat iOS

This article explains the causes of UI stutters on iOS, outlines a thread‑stack‑dump monitoring approach, describes detection thresholds, classification methods, sampling strategies, and shares the practical results of deploying the solution in WeChat.

WeChat Client Technology Team
WeChat Client Technology Team
WeChat Client Technology Team
How We Detect and Diagnose Main‑Thread Lag in WeChat iOS

Introduction

WeChat iOS engineers frequently receive reports of occasional UI stutters when switching from background to foreground, when dialogs freeze for several seconds, or when tab switches feel sluggish. These reports are hard to reproduce because they may occur only on specific devices or at particular moments, and logging is often insufficient.

Root Causes

Deadlock: the main thread holds lock A while waiting for lock B, and a background thread holds lock B while waiting for lock A.

Lock contention: the main thread accesses the database while a background thread inserts a large amount of data, causing brief pauses.

Heavy I/O on the main thread: writing large amounts of data directly from the UI thread.

Heavy computation on the main thread: inefficient algorithms that consume excessive CPU.

Intensive UI rendering: complex layouts or rich text that require a lot of drawing work.

Diagnosis Strategy

For each cause, different diagnostics are suggested:

Deadlocks usually accompany crashes and can be analyzed via crash reports.

For lock contention, it is essential to identify which thread holds the lock.

Heavy I/O and computation can be tracked by adding timing logs at function entry and exit.

UI rendering stalls are often inside system functions, making it difficult to add log points.

The key idea is to spawn a monitoring thread that watches the main thread; when a stall is detected, it dumps the stack of all threads.

Detection flowchart
Detection flowchart

Implementation Details

How to know the main thread is lagging?

What monitoring frequency and strategy should the helper thread use without causing noticeable performance or battery impact?

How to classify the captured stack traces?

How often will dump files be generated and how large will they be?

Should reports be sent in full or sampled to balance data usefulness and traffic cost?

Lag Detection Criteria

Two practical thresholds are used:

CPU usage exceeding 100%.

Main‑thread RunLoop execution longer than 2 seconds.

Detection Strategy

Memory dump: every 1 second the monitor checks the main thread; if a lag is found, it dumps all thread stacks to memory.

File dump: if the new stack differs from the previous one, it is written to a file; otherwise the check interval follows the Fibonacci sequence (1, 1, 2, 3, 5, 8…) to avoid redundant dumps.

Classification Method

Instead of using crash‑report categories, a two‑level classification is applied:

First level: group by the innermost two stack frames.

Second level: within each first‑level group, further group by the innermost four frames, allowing separation of different business scenarios that share the same root cause.

First‑level classification
First‑level classification
Second‑level classification
Second‑level classification

Operational Considerations

In a gray‑release test, each user generated about 30 dump files per day, consuming roughly 300 KB of upload traffic. To limit impact, a 5% sampling rate is used, and sampled users upload only the first 20 dumps per day with compression. A whitelist allows forced reporting for critical cases, and dumps are retained for seven days before automatic deletion.

Results

Since the gray rollout in WeChat 5.3.1, the main‑thread lag monitor has helped resolve issues that were previously hard to locate, such as severe stutter when switching from a subscription‑heavy account (500+ subscriptions) and occasional delays when loading contacts (over 1 k friends).

Future Work

Mobile client performance optimization remains a broad and fast‑evolving field. Possible next steps include exploring system‑level hooks (e.g., intercepting msgSend) for fine‑grained timing, handling cases where the main thread consumes 100% CPU so the monitoring thread still gets CPU time, and extending the approach to other platforms like Android’s ANR detection.

Mobile DevelopmentiOSPerformance MonitoringWeChatmain threadlag detection
WeChat Client Technology Team
Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.