Mobile Development 18 min read

Mach-O File Analysis and Resource Optimization for Baidu iOS App

The article explains how Baidu’s iOS app reduces package size by examining Mach‑O binaries with tools like otool and MachOView, then applying Python scripts to locate oversized assets, eliminate unused configuration files, and deduplicate resources, achieving a 12 MB reduction and a repeatable optimization pipeline.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Mach-O File Analysis and Resource Optimization for Baidu iOS App

This article is part of the Baidu APP iOS package size optimization series. It focuses on the analysis of Mach-O files and the practical resource‑optimization techniques applied in Baidu's iOS application.

1. Introduction The first two articles of the series covered overall size‑reduction strategies, image‑optimization methods, and the concept of large‑resource, unused‑configuration‑file, and duplicate‑resource optimization. This third article dives into the Mach‑O file format, which is essential for both resource and code analysis.

2. Mach‑O Overview Mach‑O (Mach Object) is the executable, object, dynamic‑library, and memory‑dump format used on macOS and iOS. Understanding its structure is a prerequisite for any binary‑size analysis.

2.1 Tools for Mach‑O Inspection

• MachOView – a GUI tool that can open a Mach‑O file and display its sections. Download: http://sourceforge.net/projects/machoview/ Source: https://github.com/gdbinit/MachOView

• otool – a command‑line utility bundled with macOS. Example usage to list class structures: otool -arch arm64 -ov xxx.app/xxx

Sample output (truncated): Contents of (__DATA,__objc_classlist) section 0000000100008238 0x100009980 isa 0x1000099a8 superclass 0x0 _OBJC_CLASS_$_UIViewController ...

Common otool commands are illustrated with screenshots in the original article.

2.2 File Format Inspection The file command reveals the file type, and lipo -info shows supported CPU architectures: ~ % file /Users/ycx/Desktop/demo.app/demo /Users/ycx/Desktop/demo.app/demo: Mach-O 64‑bit executable arm64 ~ % lipo -info /Users/ycx/Desktop/demo.app/demo Non‑fat file: /Users/ycx/Desktop/demo.app/demo is architecture: arm64

2.3 Mach‑O Structure

Mach‑O consists of three main parts: Header , LoadCommands , and Data . At the end of the file there is also a Loader‑Info section containing string tables and symbol tables.

2.3.1 Header The header describes basic file information (CPU type, file type, etc.). The relevant C structure (from XNU source EXTERNAL_HEADERS/mach‑o/loader.h ) is:

struct mach_header_64 {
uint32_t magic;    /* mach magic number identifier */
cpu_type_t cputype;  /* cpu specifier */
cpu_subtype_t cpusubtype;  /* machine specifier */
uint32_t filetype;  /* type of file */
uint32_t ncmds;    /* number of load commands */
uint32_t sizeofcmds;  /* the size of all the load commands */
uint32_t flags;    /* flags */
uint32_t reserved;  /* reserved */
};

Field values can be displayed with otool -hv demo :

% otool -hv demo
demo:
Mach header
magic cputype cpusubtype caps    filetype ncmds sizeofcmds flags
MH_MAGIC_64    ARM64        ALL 0x00 EXECUTE 22 3040 NOUNDEFS DYLDLINK TWOLEVEL PIE

MachOView also provides a visual view of these values (see screenshot in the article).

2.3.2 LoadCommands LoadCommands describe how the file should be mapped into virtual memory. The generic structure is:

struct load_command {
uint32_t cmd;    /* type of load command */
uint32_t cmdsize;  /* total size of command in bytes */
};

Common command constants (e.g., LC_SEGMENT_64 , LC_SYMTAB , LC_DYLIB ) are defined in the same header.

2.3.3 Segment Example – LC_SEGMENT_64 The 64‑bit segment command structure:

struct segment_command_64 { /* for 64‑bit architectures */
uint32_t  cmd;    /* LC_SEGMENT_64 */
uint32_t  cmdsize;  /* includes sizeof section_64 structs */
char    segname[16];  /* segment name */
uint64_t vmaddr;    /* memory address of this segment */
uint64_t vmsize;    /* memory size of this segment */
uint64_t fileoff;  /* file offset of this segment */
uint64_t filesize;  /* amount to map from the file */
vm_prot_t maxprot;  /* maximum VM protection */
vm_prot_t initprot;  /* initial VM protection */
uint32_t nsects;    /* number of sections in segment */
uint32_t flags;    /* flags */
};

The most relevant segments are __PAGEZERO (null‑pointer trap), __TEXT (code), __DATA (read‑write data), and __LINKEDIT (link‑info).

3. Resource Optimization

Because Baidu APP is a “carrier‑class” app with many frameworks (Hybrid, React‑Native, KMM, etc.), its binary contains many large resources (>40 KB). The optimization is divided into three parts:

Large‑resource optimization

Unused‑configuration‑file removal

Duplicate‑resource elimination

3.1 Large‑Resource Optimization A Python script recursively scans the IPA package and prints files larger than a threshold (40 KB in Baidu’s practice):

def findBigResources(path, threshold):
pathDir = os.listdir(path)
for allDir in pathDir:
child = os.path.join('%s%s' % (path, allDir))
if os.path.isfile(child):
end = os.path.splitext(child)[-1]
if end != ".dylib" and end != ".car":
temp = os.path.getsize(child)
fileLen = temp / 1024
if fileLen > threshold:
print(child + " length is " + str(fileLen))
else:
child = child + "/"
findBigResources(child, threshold)

Two mitigation strategies are suggested:

Asynchronous download for resources not needed at first launch.

Compression of frequently used large resources with runtime decompression.

3.2 Unused Configuration Files Another script lists configuration files (plist, json, txt, xib) while excluding known asset types (png, webp, gif, jpg, js, css):

def findProfileResources(path):
pathDir = os.listdir(path)
for allDir in pathDir:
child = os.path.join('%s%s' % (path, allDir))
if os.path.isfile(child):
end = os.path.splitext(child)[-1]
if end not in [".dylib", ".car", ".png", ".webp", ".gif", ".js", ".css"]:
print(child + " 后缀 " + end)
else:
child = child + "/"
findProfileResources(child)

Static strings referenced by code are stored in the __TEXT/__cstring section; they can be extracted with:

lines = os.popen('/usr/bin/otool -v -s __TEXT __cstring %s' % path).readlines()

After obtaining the set of used configuration files, a diff operation identifies unused ones, which can then be safely removed.

3.3 Duplicate‑Resource Optimization Files are hashed with MD5 to detect duplicates:

def get_file_library(path, file_dict):
pathDir = os.listdir(path)
for allDir in pathDir:
child = os.path.join('%s/%s' % (path, allDir))
if os.path.isfile(child):
md5 = img_to_md5(child)
file_dict.setdefault(md5, []).append(allDir)
continue
get_file_library(child, file_dict)
def img_to_md5(path):
fd = open(path, 'rb')
fmd5 = hashlib.md5(fd.read()).hexdigest()
fd.close()
return fmd5

After deduplication, Baidu APP achieved a 12 MB reduction over two quarters.

4. Conclusion Resource optimization is the most impactful part of package‑size reduction. By analyzing Mach‑O files, extracting large assets, removing unused configuration files, and deduplicating resources, Baidu APP reduced its binary size by 12 MB and established a pipeline to keep future growth in check.

Mobile DevelopmentiOSMach-OResource OptimizationBinary Analysisotool
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.