Fundamentals 17 min read

Instrumenting iOS SDKs with LLVM and SanitizerCoverage for BasicBlock‑Level Code Coverage

This article explains how to use LLVM and SanitizerCoverage to insert BasicBlock‑level instrumentation into iOS binaries such as the WeChat SDK, covering the theory of code‑coverage metrics, compilation steps, bitcode extraction, and a practical demo that visualizes runtime execution paths.

Sohu Tech Products

Dec 9, 2020

Instrumenting iOS SDKs with LLVM and SanitizerCoverage for BasicBlock‑Level Code Coverage

Background

Inspired by a TikTok R&D article on binary reordering that achieved a 15% app launch speedup, the author notes several scenarios that static scanning cannot cover, such as complex control‑flow constructs, Objective‑C/C function calls, third‑party libraries, and functions marked with __attribute__((constructor)). The goal is to use llvm and its intermediate representation ( IR) to address these gaps.

Effect Demonstration

The solution relies on inserting instrumentation at the BasicBlock level. An example with the WeChat SDK shows how a callback function __sanitizer_cov_trace_pc_guard can be hooked to trace execution.

BasicBlock concept will be explained in the next section.

1. WeChat SDK

The SDK provides three public headers; WXApi.h exposes the class method [WXApi registerApp: universalLink:].

/*! @brief 微信Api接口函数类
 *
 * 该类封装了微信终端SDK的所有接口
 */
@interface WXApi : NSObject

/*! @brief WXApi的成员函数，向微信终端程序注册第三方应用。
 *
 * 需要在每次启动第三方应用程序时调用。
 * @attention 请保证在主线程中调用此函数
 * @param appid 微信开发者ID
 * @param universalLink 微信开发者Universal Link
 * @return 成功返回YES，失败返回NO。
 */
+ (BOOL)registerApp:(NSString *)appid universalLink:(NSString *)universalLink;
@end

2. main.m

A minimal project adds a callback and invokes the WeChat SDK:

@import Darwin;
int main(int argc, char * argv[]) {
  // 调用微信SDK
  [WXApi registerApp:@"App" universalLink:@"link"];
  return 0;
}

// 提供回调函数
void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  Dl_info info;
  void *PC = __builtin_return_address(0);
  dladdr(PC, &info);
  printf("guard:%p 开始执行:%s 
", PC, info.dli_sname);
}

Running the program with a breakpoint on __sanitizer_cov_trace_pc_guard reveals the call stack.

Instrumentation and Code Coverage

The article distinguishes three coverage granularity levels: Function‑Level: records which functions were executed. BasicBlock‑Level: records execution of basic blocks (straight‑line code sequences). Edge‑Level: records transitions between basic blocks.

1. Function‑Level

Simple but coarse; records function entry.

According to the TikTok article, they barely achieve this level.

2. BasicBlock‑Level

Typical basic block contains only sequential instructions. Example:

void foo(int *a) {
  if (a)
    *a = 0;
}

When compiled to assembly, the function is split into three basic blocks, each of which can be instrumented.

BasicBlock‑Level instrumentation enables line‑coverage measurement.

3. Edge‑Level

Tracks paths such as A→C. By adding a virtual path D, one can infer whether a particular edge was taken.

Edge‑Level instrumentation enables path‑coverage measurement.

SanitizerCoverage

LLVM’s SanitizerCoverage provides compile‑time hooks like -fsanitize-coverage=trace-pc-guard to insert instrumentation at the desired granularity.

1. Configure Compilation Flags

(Image omitted)

2. Prepare Source Files

// 文件 A
int f(void) __attribute__((constructor));

int f(void) {
  NSLog(@" int f() __attribute__((constructor)) 被调用");
  return 0;
}

// 文件 ViewController.mm
#import <string>
static std::string cxx_static_str("cxx_static_str");
+ (void)load {
  NSLog(@"load 被执行");
}

// 文件 main.m
@import Darwin;

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
  static uint32_t N;
  if (start == stop || *start) return;
  printf("INIT: %p %p
", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;
}

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  Dl_info info;
  void *PC = __builtin_return_address(0);
  dladdr(PC, &info);
  printf("guard:%p 开始执行:%s 
", PC, info.dli_sname);
}

void foo(int *a) {
  if (a)
    *a = 0;
}

int main(int argc, char * argv[]) {
  dispatch_async(dispatch_get_main_queue(), ^{ NSLog(@"main block"); });
  int i = 0;
  foo(&i);
  NSString *appDelegateClassName;
  @autoreleasepool {
    // Setup code that might create autoreleased objects goes here.
    appDelegateClassName = NSStringFromClass([AppDelegate class]);
  }
  return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}

3. Run

Running the binary shows coverage of load methods, C++ variables, __attribute__((constructor)) functions, and BasicBlock‑level paths.

Compilation Process Overview

Using a simple main.m example, the article shows the clang command chain (preprocess → LLVM bitcode → IR → assembly → object → linker) and how to inspect the generated .bc and .s files.

cat <<EOF > main.m
int main() {
  return 0;
}
EOF

xcrun clang main.m -save-temps -v -mllvm -debug-pass=Structure -fsanitize-coverage=trace-pc-guard

The resulting graph illustrates the flow from source to final executable.

Practical Demo with WeChat SDK

1. Process the SDK

Identify file type with file (universal binary with multiple architectures).

Extract a single‑arch archive using lipo -thin armv7.

Unpack the .a archive with tar to obtain object files.

For each .o, extract embedded bitcode via segedit.

Convert bitcode to assembly with

clang -O1 -target armv7-apple-ios7 -S ... -fsanitize-coverage=trace-pc-guard

Sample snippet from the generated AppCommunicate.s shows the inserted guard call:

Ltmp0:
	.loc	9 16 0 prologue_end     ; AppCommunicate/AppCommunicate.m:16:0
Lloh0:
	adrp	x0, l___sancov_gen_@PAGE
Ltmp1:
	;DEBUG_VALUE: +[AppCommunicate getDataPasteboardName]:self <- [DW_OP_LLVM_entry_value 1] $x0
Lloh1:
	add	x0, x0, l___sancov_gen_@PAGEOFF
	bl	___sanitizer_cov_trace_pc_guard
Ltmp2:
	;DEBUG_VALUE: +[AppCommunicate getDataPasteboardName]:_cmd <- [DW_OP_LLVM_entry_value 1] $x1

2. Demo

The instrumented object files are placed back into the Xcode project; running the app displays the SDK’s internal execution flow in the console.

3. Run

The console output confirms that the WeChat SDK’s methods are traced via the inserted guard.

Summary

Code coverage can be measured at Function, BasicBlock, and Edge levels.

LLVM’s SanitizerCoverage supports all three levels through compile‑time instrumentation.

Exporting third‑party libraries’ bitcode enables architecture‑agnostic instrumentation.

By combining coverage concepts, SanitizerCoverage, and a detailed compilation pipeline, the article demonstrates how to instrument a real‑world iOS SDK (WeChat) for fine‑grained execution tracing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

code coverage iOS LLVM binary analysis BasicBlock SanitizerCoverage SDK instrumentation

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.