Mobile Development 26 min read

Binary Reordering and Clang Instrumentation for iOS App Startup Optimization

This article explains the principles of virtual memory and paging, demonstrates how page faults affect iOS app launch time, and provides a step‑by‑step guide to using clang static instrumentation and Xcode order files to reorder binary symbols, reduce page faults, and achieve measurable startup speed improvements.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Binary Reordering and Clang Instrumentation for iOS App Startup Optimization

Preface

Since the Douyin team shared their binary‑reordering solution that improved app launch speed by over 15%, this article revisits the underlying concepts and demonstrates a practical clang‑instrumentation workflow to achieve the same optimization.

Virtual Memory and Physical Memory

Early computers loaded applications contiguously into physical memory, leading to security and efficiency problems. Virtual memory introduces a mapping table that translates virtual addresses to physical pages, solving inter‑process access issues and enabling lazy loading.

Virtual Memory Working Principle

The system treats a large continuous virtual address space (e.g., 0x000000‑0xffffff) as a set of pages; each virtual address is mapped to a physical page via a page table.

CPU Address Translation

The CPU uses the MMU and OS page tables to translate virtual addresses to physical ones during memory accesses.

How Virtual Memory Solves Efficiency

Physical memory is divided into pages (4 KB on macOS, 16 KB on iOS). Only the pages actually accessed are loaded, reducing waste.

Binary Reordering

Overview

During launch, functions are placed in Mach‑O sections according to link order, often scattering related functions across multiple pages and causing multiple page faults. By reordering symbols so that frequently called startup functions share the same page, the number of page faults—and thus launch latency—can be reduced.

Binary Reordering Optimization Principle

Place startup functions (e.g., method1 and method4 ) into a single memory page, reducing page faults from two to one.

How to Detect Page Faults

Open Instruments → System Trace on a real device.

Select the target app, start profiling, and stop after the first screen appears.

Inspect the “Page Faults” metric.

Alternatively, add DYLD_PRINT_STATISTICS to view pre‑main timing.

Actual Binary Reordering Steps

Create an order file (e.g., lb.order ) listing the symbols you want to prioritize.

Configure Xcode’s linker flag -order_file to point to this file.

Rebuild; the linker arranges symbols in the specified order.

Verifying Symbol Order

Enable “Write Link Map File” in Build Settings, locate the generated .txt file, and inspect the # Symbols: section to confirm the new order.

Collecting All Startup Symbols via Clang Instrumentation

Enable clang coverage with -fsanitize-coverage=trace-pc-guard (or -fsanitize-coverage=func,trace-pc-guard to avoid loops). Implement the required callbacks:

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
    static uint64_t N;
    if (start == stop || *start) return;
    for (uint32_t *x = start; x < stop; x++) *x = ++N;
}

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    if (!*guard) return;
    void *PC = __builtin_return_address(0);
    // enqueue PC for later processing
}

Use an atomic queue ( OSAtomicEnqueue/OSAtomicDequeue ) to store PCs safely across threads, then resolve each PC to a symbol with dladdr :

Dl_info info; dladdr(PC, &info);
printf("fname=%s sname=%s\n", info.dli_fname, info.dli_sname);

Filter out duplicate symbols, prepend an underscore for non‑Objective‑C symbols, and write the ordered list to a temporary lb.order file (e.g., in /tmp ).

Handling Pitfalls

Multithreading: Use lock‑free atomic queues to avoid contention.

Loops: Instrument only function entries ( -fsanitize-coverage=func ) to prevent recursive guard calls.

Load‑time symbols: Guard values are zero for +load methods; either ignore them or adjust the guard check.

Swift and Mixed Projects

For Swift, add the flags -sanitize-coverage=func and -sanitize=undefined under “Other Swift Flags”. The same instrumentation works for Swift functions.

Post‑Optimization Results

Measured on a fresh install, page‑fault counts dropped from multiple faults to a single fault, yielding a noticeable reduction in launch time (often >15% on iOS devices).

Conclusion

By combining virtual‑memory knowledge, clang static instrumentation, and Xcode order‑file linking, developers can automatically discover startup‑critical symbols, generate an order file, and reorder the Mach‑O binary to minimize page faults, thereby accelerating iOS app launch performance.

iOSmemory managementstartup optimizationBinary Reorderingclang instrumentation
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.