xCrash: An Open-Source Android Crash Capture SDK Overview
xCrash is an open‑source Android SDK that reliably captures both Java and native crashes across Android 4.0‑9.0 and multiple CPU architectures by using async‑signal‑safe handlers, a dedicated dumper process, and pre‑allocated resources to generate tombstone‑style dump files without root, offering richer diagnostics than BreakPad and extensible future features.
Introduction
In 2019 iQIYI open‑sourced xCrash on GitHub. It is a comprehensive Android app crash‑capture SDK that generates tombstone‑style dump files in a user‑specified directory when the app process crashes. It supports both native and Java crashes, works on Android 4.0‑9.0, and runs on armeabi, armeabi‑v7a, arm64‑v8a, x86 and x86_64 architectures.
The SDK has been deployed in more than 20 iQIYI Android apps, including the main iQIYI app, iQIYI Speed, iQIYI Animation, Qixiu, iQIYI VR, and others.
Problem Overview
App crashes are the most severe quality issue on mobile. Java crashes are relatively easy to capture because the JVM provides a controlled environment and built‑in crash handling. Native crashes are harder: the system daemon debuggerd automatically creates a detailed tombstone file, but this file is not exposed to released apps. Developers can obtain tombstones via bugreport or root access, but a production‑grade solution is missing.
Native Crash Basics
Native crashes are triggered by signals sent by the Linux kernel when a thread performs an illegal operation (e.g., division by zero, illegal instruction, invalid system call, bad memory access). The most common signals are:
SIGFPE : divide‑by‑zero
SIGILL : illegal CPU instruction
SIGSYS : illegal system call
SIGSEGV : invalid virtual memory access
SIGBUS : invalid physical address access
SIGABRT : abort/kill invoked by the process itself
Signal handlers must be async‑signal‑safe; only a limited set of functions (e.g., snprintf , gettimeofday ) may be called. Heap allocation is prohibited, so any required buffers must be pre‑allocated during initialization.
Extreme Conditions Before a Crash
During crash handling the process may face stack overflow, out‑of‑memory, file‑descriptor exhaustion, or flash‑space shortage. Strategies used by xCrash include:
Using sigaltstack() to provide a separate stack for the signal handler (avoids stack overflow).
Avoiding mmap() when virtual address space is exhausted.
Reserving a single file descriptor for writing the crash dump.
Pre‑creating “placeholder” files on low‑flash devices and falling back to in‑memory storage for critical data (e.g., backtrace).
xCrash Architecture
The system consists of two parts: the in‑process component (running inside the crashing app) and an independent dumper process.
In‑process component :
Java side: Java crash capture via JVM mechanisms, JNI bridge registration, tombstone parser, and tombstone manager.
Native side: JNI bridge, signal handlers that launch the dumper, and a fallback mode that attempts to collect data directly if the dumper fails.
Dumper process (pure native) :
Process : attaches/detaches to the crashed process, collects FD list, logcat, etc.
Threads : gathers registers, backtrace, stack for each thread.
Memory Layout : parses /proc/self/maps and /proc/self/smaps .
Memory : reads memory via local buffers, mmap() ed ELF files, or ptrace() remote reads.
Registers : handles architecture‑specific register data.
ELF : parses unwind tables from .ARM.exidx , .eh_frame , .debug_frame , and compressed .gnu_debugdata .
When a signal is caught, the handler quickly forks a new process using clone() + execl() to “escape” the async‑signal‑safe restrictions. The child process can then use ptrace() to suspend all threads, collect registers, backtraces, and other diagnostics without being limited by FD or memory constraints.
Backtrace Implementation
Standard libc backtrace() is unavailable on Android’s Bionic runtime. The NDK does not expose reliable unwind APIs, and system libraries ( libcorkscrew , libunwind , libunwindstack ) are either version‑locked or require root. xCrash implements its own unwind logic that works on Android 4.0‑9.0, handling all relevant unwind tables and supporting LZMA‑compressed debug data.
Additional Features
Full FD list with usage details.
Comprehensive memory statistics (global, per‑process, per‑region).
Regex‑based thread whitelist to limit the number of threads whose registers/backtraces are collected.
Zero‑permission operation (no root or special Android permissions required).
Device‑root detection.
High crash‑capture success rate via multiple safeguards (FD reservation, placeholder files, fallback unwind, etc.).
Extensibility: custom user data (play logs, bullet‑screen logs, NLE edit logs, lifecycle trace) can be attached to the dump.
Comparison with BreakPad
BreakPad generates binary minidumps that require the original ELF files for post‑mortem debugging, making automated analysis complex and costly. xCrash produces standard tombstone text files that can be parsed directly on the server, simplifying crash aggregation and triage. It also avoids the large codebase and maintenance overhead of BreakPad.
Future Plans
ANR (Application Not Responding) monitoring.
Strengthening the fallback mode.
Reducing UI jank during dump generation.
Local crash count and timing statistics.
Exploring complementary usage with BreakPad.
The project is hosted on GitHub: https://github.com/iqiyi/xCrash
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.