Binary Reordering and Clang Instrumentation for iOS App Startup Optimization
This article explains the principles of virtual memory and paging, demonstrates how page faults affect iOS app launch time, and provides a step‑by‑step guide to using clang static instrumentation and Xcode order files to reorder binary symbols, reduce page faults, and achieve measurable startup speed improvements.
Preface
Since the Douyin team shared their binary‑reordering solution that improved app launch speed by over 15%, this article revisits the underlying concepts and demonstrates a practical clang‑instrumentation workflow to achieve the same optimization.
Virtual Memory and Physical Memory
Early computers loaded applications contiguously into physical memory, leading to security and efficiency problems. Virtual memory introduces a mapping table that translates virtual addresses to physical pages, solving inter‑process access issues and enabling lazy loading.
Virtual Memory Working Principle
The system treats a large continuous virtual address space (e.g., 0x000000‑0xffffff) as a set of pages; each virtual address is mapped to a physical page via a page table.
CPU Address Translation
The CPU uses the MMU and OS page tables to translate virtual addresses to physical ones during memory accesses.
How Virtual Memory Solves Efficiency
Physical memory is divided into pages (4 KB on macOS, 16 KB on iOS). Only the pages actually accessed are loaded, reducing waste.
Binary Reordering
Overview
During launch, functions are placed in Mach‑O sections according to link order, often scattering related functions across multiple pages and causing multiple page faults. By reordering symbols so that frequently called startup functions share the same page, the number of page faults—and thus launch latency—can be reduced.
Binary Reordering Optimization Principle
Place startup functions (e.g., method1 and method4 ) into a single memory page, reducing page faults from two to one.
How to Detect Page Faults
Open Instruments → System Trace on a real device.
Select the target app, start profiling, and stop after the first screen appears.
Inspect the “Page Faults” metric.
Alternatively, add DYLD_PRINT_STATISTICS to view pre‑main timing.
Actual Binary Reordering Steps
Create an order file (e.g., lb.order ) listing the symbols you want to prioritize.
Configure Xcode’s linker flag -order_file to point to this file.
Rebuild; the linker arranges symbols in the specified order.
Verifying Symbol Order
Enable “Write Link Map File” in Build Settings, locate the generated .txt file, and inspect the # Symbols: section to confirm the new order.
Collecting All Startup Symbols via Clang Instrumentation
Enable clang coverage with -fsanitize-coverage=trace-pc-guard (or -fsanitize-coverage=func,trace-pc-guard to avoid loops). Implement the required callbacks:
void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
static uint64_t N;
if (start == stop || *start) return;
for (uint32_t *x = start; x < stop; x++) *x = ++N;
}
void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
if (!*guard) return;
void *PC = __builtin_return_address(0);
// enqueue PC for later processing
}Use an atomic queue ( OSAtomicEnqueue/OSAtomicDequeue ) to store PCs safely across threads, then resolve each PC to a symbol with dladdr :
Dl_info info; dladdr(PC, &info);
printf("fname=%s sname=%s\n", info.dli_fname, info.dli_sname);Filter out duplicate symbols, prepend an underscore for non‑Objective‑C symbols, and write the ordered list to a temporary lb.order file (e.g., in /tmp ).
Handling Pitfalls
Multithreading: Use lock‑free atomic queues to avoid contention.
Loops: Instrument only function entries ( -fsanitize-coverage=func ) to prevent recursive guard calls.
Load‑time symbols: Guard values are zero for +load methods; either ignore them or adjust the guard check.
Swift and Mixed Projects
For Swift, add the flags -sanitize-coverage=func and -sanitize=undefined under “Other Swift Flags”. The same instrumentation works for Swift functions.
Post‑Optimization Results
Measured on a fresh install, page‑fault counts dropped from multiple faults to a single fault, yielding a noticeable reduction in launch time (often >15% on iOS devices).
Conclusion
By combining virtual‑memory knowledge, clang static instrumentation, and Xcode order‑file linking, developers can automatically discover startup‑critical symbols, generate an order file, and reorder the Mach‑O binary to minimize page faults, thereby accelerating iOS app launch performance.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.