Mobile Development 24 min read

Baidu App Android Startup Performance Optimization: Theory, Tools, and Practical Implementations

Baidu dramatically accelerated its Android app’s launch by dissecting the cold‑start sequence, applying full‑path analysis, leveraging tracing tools, introducing a priority‑aware task‑scheduling framework, replacing SharedPreferences with the binary UniKV store, eliminating lock contention, and tightening thread, I/O, and library loading, which together cut ANR rates and boosted user retention.

Baidu App Technology
Baidu App Technology
Baidu App Technology
Baidu App Android Startup Performance Optimization: Theory, Tools, and Practical Implementations

Startup performance is one of the most critical metrics for the Baidu App. Users expect the app to respond quickly; long launch times lead to poor user experience, low store ratings, and even abandonment. Consequently, Baidu has invested heavily in continuous startup performance optimization.

The optimization work is divided into four parts: an overview, tooling, concrete optimization techniques, and anti‑degradation measures. Earlier articles in the series (links provided in the original text) cover the overview and tooling in detail.

2. Optimization Theory

The understanding of the Android launch process determines the direction and effectiveness of optimization. Google’s documentation describes the basic steps of a cold start:

Create the Application object;

Start the main thread;

Create the main Activity;

Inflate the view hierarchy;

Layout the screen;

Perform the initial draw.

After the first draw, the system swaps the background window with the main Activity, and the user can interact with the app.

In practice, many launch paths exist (icon click, push notification, browser deep link, etc.). Each path can be broken down into four stages: process creation, framework loading, home‑page rendering, and pre‑loading. Optimizing only the icon‑click path is insufficient; a full‑path optimization is required for a truly smooth experience.

3. Detailed Process and Component Analysis

The launch involves several system processes:

Launcher : receives user clicks and notifies AMS.

SystemServer : schedules app launch, creates processes, and manages windows (AMS, WMS).

Zygote : forks the app process and pre‑loads the VM and core libraries, reducing startup latency.

SurfaceFlinger : handles VSync, window composition, and framebuffer management.

Understanding these components enables deeper analysis of bottlenecks such as CPU load, memory pressure, and I/O contention.

4. Practical Optimization Measures

Optimization is grouped into three categories: conventional optimization, basic mechanism optimization, and low‑level technical optimization.

4.1 Conventional Optimization

For early‑stage products, quick wins can be achieved by using performance tools (Trace, Thor Hook) to identify hot spots and applying lazy loading, asynchronous execution, or removal of unnecessary work.

4.2 Basic Mechanism Optimization

As the app grows, pre‑loading cannot be eliminated entirely. Baidu introduced a task‑scheduling framework that supports:

Personalized scheduling based on user behavior and task IDs.

Tiered experience strategies that adapt to device capabilities.

Fine‑grained scheduling for splash screens, deep‑link landing pages, etc.

Priority‑aware delayed scheduling.

Parallel rendering of the home page while the splash screen loads.

The framework consists of three modules: device scoring, tiered configuration, and tiered dispatch. Device scoring combines static hardware information and dynamic performance metrics to produce a final score that drives configuration selection.

4.2.1 KV Storage Optimization

SharedPreferences (SP) suffers from slow XML‑based read/write, poor multi‑process support, and thread‑creation overhead. The following code illustrates the original SP loading mechanism:

private final Object mLock = new Object();
private boolean mLoaded = false;
private void startLoadFromDisk() {
    synchronized (mLock) {
        mLoaded = false;
    }
    new Thread("SharedPreferencesImpl-load") {
        public void run() {
            loadFromDisk();
        }
    }.start();
}

When a key is accessed before loading finishes, the thread blocks:

public String getString(String key, @Nullable String defValue) {
    synchronized (mLock) {
        awaitLoadedLocked();
        String v = (String) mMap.get(key);
        return v != null ? v : defValue;
    }
}

Write operations use commit (blocking) or apply (asynchronous). apply can still cause ANR because the write task is queued in QueuedWork and the main thread may wait for it during activity lifecycle events:

QueuedWork.waitToFinish();

To address these issues, Baidu adopted two complementary solutions:

UniKV : a drop‑in replacement for SP that inherits the SP API but stores data in a binary format with a 4 KB block layout, supports additional data types, and provides multi‑process safety via mmap and a custom recursive lock.

System SP Optimizations : for third‑party SDKs that still use native SP, Baidu applied dynamic proxy tricks to bypass the blocking QueuedWork and replace it with a non‑blocking handler.

UniKV’s file layout includes a 40‑byte header (version, write count, CRCs, etc.) followed by data blocks allocated in 4 KB units via mmap. Migration from SP to UniKV is performed lazily: if migration is incomplete, the app continues using SP while a background task copies data to the KV file.

4.2.2 Lock Optimization

Excessive synchronization (e.g., synchronized in ABTest components) caused monitor contention and ANR. Baidu refactored these components to use lock‑free data structures, reducing first‑read latency from 118 ms to 6 ms on low‑end devices.

4.2.3 Other Basic Mechanism Optimizations

Additional improvements include:

Thread Optimization : enforce a unified thread‑pool, prohibit custom thread priority changes, and standardize pool parameters.

I/O Optimization : avoid main‑thread I/O longer than 100 ms and increase buffer sizes to reduce system calls.

SO Optimization : defer unnecessary native library loading and load essential libraries asynchronously.

Binder Optimization : reduce unnecessary inter‑process calls.

Main‑Thread Priority : ensure the UI thread retains high priority; incorrect priority settings (e.g., t.setPriority(3) ) can cause severe stalls.

ContentProvider/FileProvider Optimization : lazy‑load providers or move them to separate processes to avoid heavy I/O during Application.attachBaseContext .

Image prepareToDraw : trigger bitmap upload to GPU on a background thread to avoid blocking the render thread.

4.3 Low‑Level Mechanism Optimization

Explorations include VerifyClass, CPU Booster, and GC tuning. These are high‑risk, high‑reward optimizations that affect the entire app and will be detailed in future performance deep‑dives.

Results

After applying the above measures, Baidu observed a significant drop in ANR rates, improved DAU and retention, and faster write success rates on low‑end devices. The UniKV solution eliminated SP‑related ANRs, and the lock‑free redesign cut read latency by 95 % on a Xiaomi 5 device.

Conclusion

Startup performance optimization is a complex, ongoing effort that requires a holistic view of both business logic and system behavior. Full‑path awareness, task scheduling, KV storage redesign, and low‑level system tweaks together form a sustainable strategy for continuous improvement.

References

Douyin startup optimization – https://heapdump.cn/article/3624814

Kuaishou TTI governance – https://zhuanlan.zhihu.com/p/422859543

Android startup analysis – https://juejin.cn/post/7183144743411384375

MMKV documentation – https://github.com/Tencent/MMKV/wiki/android_ipc

OptimizationAndroidANRKV storagethreadingperformance toolsStartup Performance
Baidu App Technology
Written by

Baidu App Technology

Official Baidu App Tech Account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.