How Kuaishou Optimized iOS App Startup and Prevented Performance Degradation
This article details Kuaishou's systematic approach to iOS app startup optimization, covering premain and postmain phases, dynamic library lazy loading, +load and static initializer monitoring, binary reordering, task scheduling, background fetch, prewarm mechanisms, and a comprehensive anti‑degradation framework to sustain launch performance.
Background
App launch speed directly impacts the user's first impression; faster launches improve retention and drive core business growth.
Kuaishou App Launch Definition
Start point : Obtained via sysctl to get process creation time.
End point : For scrollable pages and live streams, the first video frame; for discovery, follow, city pages, the first displayed image; for other scenarios, the completion of viewDidAppear .
Problem Status
Experience Issues
Users report slow launch in feedback and App Store reviews.
iOS development groups report slow launch affecting debugging efficiency.
Online launch alerts cannot pinpoint the problem.
Technical Issues
Monitoring metrics differ from user perception; lack of premain data.
No launch framework to control tasks; AppDelegate exceeds 20k lines.
No degradation‑prevention mechanism; large teams cause gradual launch slowdown.
Incomplete online monitoring points prevent accurate root‑cause analysis.
Launch Optimization Results
Implemented launch‑optimization governance achieved a 50% reduction in P50 and 60% reduction in P90 launch times, and after building a degradation‑prevention system, no online launch slowdown exceeded 50 ms.
Launch Process Overview
The launch consists of two major stages: premain (code parsing, initialization, architecture) and postmain (SDK initialization and home‑page logic). Premain is harder to monitor and control, especially for large apps.
Premain Stage Optimization
1. Premain Execution Flow
Reduce system parsing and business‑logic cost by:
Removing unused code and third‑party libraries, cleaning unused dynamic‑library dependencies, and applying lazy loading for dynamic libraries.
Using +load , static initializers, and binary reordering to cut business‑logic cost in premain.
2. Dynamic Library Lazy Loading
Package rarely used business modules and SDKs into separate dynamic libraries and load them after launch based on runtime needs.
Key points:
Lazy‑loaded methods must be invoked through a router that calls dlopen on first use.
Prevent degradation by ensuring lazy libraries are not loaded before launch; dlopen is slower than static linking.
3. +load Monitoring and Governance
Adding code to +load is discouraged because it runs early, cannot be captured by stability SDKs, and blocks the main thread.
How to monitor +load
Identify the first executed +load by analyzing dylib dependency order and attaching a monitor library.
Traverse __objc_nlclslist and __objc_nlcatlist in the main binary and dynamic libraries to list all +load classes and categories.
Swap the first +load with a wrapper that records execution time for each discovered +load .
How to govern +load
For existing +load , work with owners to delete, delay, lazy‑load, or move to initialize (taking care of category method overrides).
For new +load , add MR‑pipeline checks that block merges containing the +load keyword and provide warnings.
4. static initializer Monitoring and Governance
Static initializers run after +load and include patterns such as:
__attribute__((constructor)) functions.
Global variables initialized by function calls.
C++ global object constructors.
Global std::string literals.
Objective‑C objects created in global scope.
Example:
<code>__attribute__((constructor)) static void test1() { usleep(2000); NSLog(@"test1"); }</code> <code>bool test2() { usleep(3000); NSLog(@"test2"); return true; } static bool global_c_1 = test2(); bool global_c_2 = test2();</code> <code>class Test3 { public: Test3() { usleep(4000); NSLog(@"test3"); } }; static Test3 test3_1 = Test3(); Test3 test3_2; Test3 test3_3;</code> <code>const std::string test4 = "1234";</code> <code>static NSDictionary *dictObject5_1 = @{ @"one": @"1" }; NSDictionary *dictObject5_2 = @{ @"one": @"1", @"two": @"2" };</code>How to monitor static initializer
Use Instruments' “Static Initializer Calls” feature.
Hook the __mod_init_func section in each Mach‑O to record execution time, fixing permissions with vm_protect when needed.
How to govern static initializer
Apply the same governance as +load for __attribute__((constructor)) methods.
Replace global variables with locals where possible.
Use similar control techniques as for +load .
5. Binary Reordering Technique
Basic Principle
Virtual‑to‑physical memory mapping is lazy; missing pages cause page‑in interrupts, which are costly. Random code placement (from +load and static initializers) leads to many page‑ins.
By reordering code to be contiguous, multiple page‑ins collapse into one, reducing launch latency.
Generate Orderfile
Use Xcode’s Clang instrumentation with -fsanitize-coverage=func,trace-pc-guard for C/C++/ObjC and -sanitize-coverage=func for Swift. Collect symbols via __sanitizer_cov_trace_pc_guard and deduplicate before dladdr lookup.
Configure Orderfile
Add the generated orderfile to Xcode’s “Order File” build setting and enable link‑map to verify ordering.
Full‑process Automation
Automate orderfile generation for each build, including all dynamic libraries, to keep reordering up‑to‑date.
6. Other Premain Optimizations
Renaming text segments and removing unused classes have limited ROI for launch speed but help with binary size.
Postmain Stage Optimization
Focuses on SDK initialization and home‑page rendering; offers higher ROI for short‑term launch speed gains.
Three‑step Practice
Use a launch framework to list and keep only necessary tasks.
Balance main‑thread and background‑thread work; pre‑fetch network requests early.
Analyze bottlenecks with flame graphs (hooking objc_msgSend ) and Instruments’ App Launch tool.
Typical cases:
Singleton initialization moved to background.
Global config serialized more efficiently.
Fishhook call deferred to background thread.
Load only the needed language resources.
Prevent premature lazy‑library loading.
Pre‑load heavy resources (images, player, language files) on background threads.
Avoid heavy system‑info calls on the main thread.
Replace simple animations with Bézier drawing.
Lazy‑load non‑primary tabs.
Progressively render non‑critical components.
Task Scheduling Framework
Initially used a mesh‑dependency scheduler across multiple cores, but caused main‑thread waits on low‑end devices. Revised to sequential main‑thread then background‑thread execution.
TTI (Time‑to‑Interactive) Task Scheduling
TTI is defined as the interval from launch completion to 300 consecutive frames each under 84 ms.
TTI Optimization Ideas
Identify root causes via flame graphs and TimeProfiler.
Distribute tasks across frames using CADisplayLink and run‑loop scheduling.
Record each task’s execution time and schedule only if remaining frame budget exceeds the previous task’s cost.
Background Launch
iOS Background Fetch allows the app to update content in the background, turning a cold launch into a hot launch when the user opens the app.
How to Enable Background Fetch
Enable “Background Fetch” in Xcode > Capabilities > Background Modes.
Call setMinimumBackgroundFetchInterval: in application:didFinishLaunchingWithOptions: .
Implement application:performFetchWithCompletionHandler: to request data.
Debugging Background Fetch
In Xcode scheme, check “Launch due to a background fetch event”.
While the app runs, select “Simulate Background Fetch” from the Debug menu.
Prewarm Mechanism (iOS 15)
System may pre‑warm the app process before the user opens it, reducing perceived launch time. Prewarm stops after UIApplicationMain is called.
Degradation‑Prevention Technical Construction
Combines offline scanning, automated launch‑lab jobs, weekly version comparison, and gray‑scale data to detect and block performance regressions.
1. Pipeline Static Scan
During MR review, diff is scanned for new +load , static initializer, or launch‑framework code; merges are blocked until approved.
2. Launch Lab Scheduled Jobs
Every 6 hours, builds current and 6‑hour‑old dev branches, runs 30 automated launch tests, and compares metrics to detect degradation.
3. Weekly Version Comparison
When a dev branch is released, an automated job compares its launch metrics with the previous week’s build and reports any regression.
4. Gray‑scale Data
Limited gray‑scale testing provides pre‑release insight, though Apple’s restrictions keep sample size small.
Online Issue Localization
Uses various telemetry points across premain, willLaunch/didLaunch, coverShow, and afterLaunch stages to monitor task durations and detect anomalies.
Conclusion
The article presented a comprehensive view of iOS app launch phases, optimization techniques, background launch strategies, and a robust anti‑degradation framework that together safeguard launch performance gains.
References
AppOrderFiles: https://github.com/yulingtianxia/AppOrderFiles
Prepare Your App for Prewarming: https://developer.apple.com/documentation/uikit/app_and_environment/responding_to_the_launch_of_your_app/about_the_app_launch_sequence#3894431
Kuaishou Frontend Engineering
Explore the cutting‑edge tech behind Kuaishou's front‑end ecosystem
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.