Mobile Development 26 min read

How Kuaishou Optimized iOS App Startup and Prevented Performance Degradation

This article details Kuaishou's systematic approach to iOS app startup optimization, covering premain and postmain phases, dynamic library lazy loading, +load and static initializer monitoring, binary reordering, task scheduling, background fetch, prewarm mechanisms, and a comprehensive anti‑degradation framework to sustain launch performance.

Kuaishou Frontend Engineering
Kuaishou Frontend Engineering
Kuaishou Frontend Engineering
How Kuaishou Optimized iOS App Startup and Prevented Performance Degradation

Background

App launch speed directly impacts the user's first impression; faster launches improve retention and drive core business growth.

Kuaishou App Launch Definition

Start point : Obtained via sysctl to get process creation time.

End point : For scrollable pages and live streams, the first video frame; for discovery, follow, city pages, the first displayed image; for other scenarios, the completion of viewDidAppear .

Problem Status

Experience Issues

Users report slow launch in feedback and App Store reviews.

iOS development groups report slow launch affecting debugging efficiency.

Online launch alerts cannot pinpoint the problem.

Technical Issues

Monitoring metrics differ from user perception; lack of premain data.

No launch framework to control tasks; AppDelegate exceeds 20k lines.

No degradation‑prevention mechanism; large teams cause gradual launch slowdown.

Incomplete online monitoring points prevent accurate root‑cause analysis.

Launch Optimization Results

Implemented launch‑optimization governance achieved a 50% reduction in P50 and 60% reduction in P90 launch times, and after building a degradation‑prevention system, no online launch slowdown exceeded 50 ms.

Launch Process Overview

The launch consists of two major stages: premain (code parsing, initialization, architecture) and postmain (SDK initialization and home‑page logic). Premain is harder to monitor and control, especially for large apps.

Premain Stage Optimization

1. Premain Execution Flow

Reduce system parsing and business‑logic cost by:

Removing unused code and third‑party libraries, cleaning unused dynamic‑library dependencies, and applying lazy loading for dynamic libraries.

Using +load , static initializers, and binary reordering to cut business‑logic cost in premain.

2. Dynamic Library Lazy Loading

Package rarely used business modules and SDKs into separate dynamic libraries and load them after launch based on runtime needs.

Key points:

Lazy‑loaded methods must be invoked through a router that calls dlopen on first use.

Prevent degradation by ensuring lazy libraries are not loaded before launch; dlopen is slower than static linking.

3. +load Monitoring and Governance

Adding code to +load is discouraged because it runs early, cannot be captured by stability SDKs, and blocks the main thread.

How to monitor +load

Identify the first executed +load by analyzing dylib dependency order and attaching a monitor library.

Traverse __objc_nlclslist and __objc_nlcatlist in the main binary and dynamic libraries to list all +load classes and categories.

Swap the first +load with a wrapper that records execution time for each discovered +load .

How to govern +load

For existing +load , work with owners to delete, delay, lazy‑load, or move to initialize (taking care of category method overrides).

For new +load , add MR‑pipeline checks that block merges containing the +load keyword and provide warnings.

4. static initializer Monitoring and Governance

Static initializers run after +load and include patterns such as:

__attribute__((constructor)) functions.

Global variables initialized by function calls.

C++ global object constructors.

Global std::string literals.

Objective‑C objects created in global scope.

Example:

<code>__attribute__((constructor)) static void test1() { usleep(2000); NSLog(@"test1"); }</code>
<code>bool test2() { usleep(3000); NSLog(@"test2"); return true; } static bool global_c_1 = test2(); bool global_c_2 = test2();</code>
<code>class Test3 { public: Test3() { usleep(4000); NSLog(@"test3"); } }; static Test3 test3_1 = Test3(); Test3 test3_2; Test3 test3_3;</code>
<code>const std::string test4 = "1234";</code>
<code>static NSDictionary *dictObject5_1 = @{ @"one": @"1" }; NSDictionary *dictObject5_2 = @{ @"one": @"1", @"two": @"2" };</code>

How to monitor static initializer

Use Instruments' “Static Initializer Calls” feature.

Hook the __mod_init_func section in each Mach‑O to record execution time, fixing permissions with vm_protect when needed.

How to govern static initializer

Apply the same governance as +load for __attribute__((constructor)) methods.

Replace global variables with locals where possible.

Use similar control techniques as for +load .

5. Binary Reordering Technique

Basic Principle

Virtual‑to‑physical memory mapping is lazy; missing pages cause page‑in interrupts, which are costly. Random code placement (from +load and static initializers) leads to many page‑ins.

By reordering code to be contiguous, multiple page‑ins collapse into one, reducing launch latency.

Generate Orderfile

Use Xcode’s Clang instrumentation with -fsanitize-coverage=func,trace-pc-guard for C/C++/ObjC and -sanitize-coverage=func for Swift. Collect symbols via __sanitizer_cov_trace_pc_guard and deduplicate before dladdr lookup.

Configure Orderfile

Add the generated orderfile to Xcode’s “Order File” build setting and enable link‑map to verify ordering.

Full‑process Automation

Automate orderfile generation for each build, including all dynamic libraries, to keep reordering up‑to‑date.

6. Other Premain Optimizations

Renaming text segments and removing unused classes have limited ROI for launch speed but help with binary size.

Postmain Stage Optimization

Focuses on SDK initialization and home‑page rendering; offers higher ROI for short‑term launch speed gains.

Three‑step Practice

Use a launch framework to list and keep only necessary tasks.

Balance main‑thread and background‑thread work; pre‑fetch network requests early.

Analyze bottlenecks with flame graphs (hooking objc_msgSend ) and Instruments’ App Launch tool.

Typical cases:

Singleton initialization moved to background.

Global config serialized more efficiently.

Fishhook call deferred to background thread.

Load only the needed language resources.

Prevent premature lazy‑library loading.

Pre‑load heavy resources (images, player, language files) on background threads.

Avoid heavy system‑info calls on the main thread.

Replace simple animations with Bézier drawing.

Lazy‑load non‑primary tabs.

Progressively render non‑critical components.

Task Scheduling Framework

Initially used a mesh‑dependency scheduler across multiple cores, but caused main‑thread waits on low‑end devices. Revised to sequential main‑thread then background‑thread execution.

TTI (Time‑to‑Interactive) Task Scheduling

TTI is defined as the interval from launch completion to 300 consecutive frames each under 84 ms.

TTI Optimization Ideas

Identify root causes via flame graphs and TimeProfiler.

Distribute tasks across frames using CADisplayLink and run‑loop scheduling.

Record each task’s execution time and schedule only if remaining frame budget exceeds the previous task’s cost.

Background Launch

iOS Background Fetch allows the app to update content in the background, turning a cold launch into a hot launch when the user opens the app.

How to Enable Background Fetch

Enable “Background Fetch” in Xcode > Capabilities > Background Modes.

Call setMinimumBackgroundFetchInterval: in application:didFinishLaunchingWithOptions: .

Implement application:performFetchWithCompletionHandler: to request data.

Debugging Background Fetch

In Xcode scheme, check “Launch due to a background fetch event”.

While the app runs, select “Simulate Background Fetch” from the Debug menu.

Prewarm Mechanism (iOS 15)

System may pre‑warm the app process before the user opens it, reducing perceived launch time. Prewarm stops after UIApplicationMain is called.

Degradation‑Prevention Technical Construction

Combines offline scanning, automated launch‑lab jobs, weekly version comparison, and gray‑scale data to detect and block performance regressions.

1. Pipeline Static Scan

During MR review, diff is scanned for new +load , static initializer, or launch‑framework code; merges are blocked until approved.

2. Launch Lab Scheduled Jobs

Every 6 hours, builds current and 6‑hour‑old dev branches, runs 30 automated launch tests, and compares metrics to detect degradation.

3. Weekly Version Comparison

When a dev branch is released, an automated job compares its launch metrics with the previous week’s build and reports any regression.

4. Gray‑scale Data

Limited gray‑scale testing provides pre‑release insight, though Apple’s restrictions keep sample size small.

Online Issue Localization

Uses various telemetry points across premain, willLaunch/didLaunch, coverShow, and afterLaunch stages to monitor task durations and detect anomalies.

Conclusion

The article presented a comprehensive view of iOS app launch phases, optimization techniques, background launch strategies, and a robust anti‑degradation framework that together safeguard launch performance gains.

References

AppOrderFiles: https://github.com/yulingtianxia/AppOrderFiles

Prepare Your App for Prewarming: https://developer.apple.com/documentation/uikit/app_and_environment/responding_to_the_launch_of_your_app/about_the_app_launch_sequence#3894431

mobile developmentmonitoringperformance optimizationiOSapp startupdegradation prevention
Kuaishou Frontend Engineering
Written by

Kuaishou Frontend Engineering

Explore the cutting‑edge tech behind Kuaishou's front‑end ecosystem

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.