APK Resource Analysis and Optimization Using Python
This article explains how to use Python to analyze Android APK packages, extract basic statistics, identify optimizable resources such as oversized images, duplicate files, and unused assets, and provide data‑driven guidance for reducing APK size and improving distribution efficiency.
Background – Rapid feature growth in the JD.com main app has caused the APK size to increase dramatically, leading to higher promotion costs, lower user download willingness, and exceeding Google Play’s 100 MB limit. The article describes a Python‑based approach to analyze APKs, gather basic data, and pinpoint optimization opportunities.
APK File Structure – An APK is a zip archive; using aapt l file.apk lists its contents. The article shows typical directories (e.g., res/ , assets/ , lib/ , src/ ) and notes that Java resources are also packaged.
Primary Analysis Tasks
Download APK and mapping files.
Use AAPT to retrieve package information.
Obtain file system size ( apk_file_size ) and compressed size ( apk_download_size ).
Restore obfuscated resource IDs.
Detect duplicate resources via MD5.
Read DEX header to get class and method counts.
Identify non‑alpha PNG images larger than 10 KB.
Extract .so files with ZipFile and analyze them.
Detect unused resources under res/ .
All steps are implemented in Python; downloading and extracting the APK is performed with resumable download logic (omitted for brevity).
3.1 AAPT Retrieve APK Information
def get_apk_base_info(self):
# Get basic APK info
p = subprocess.Popen(self.aapt_path + " dump badging %s" % self.apkPath, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)
(output, err) = p.communicate()
package_match = re.compile("package: name='(\S+)' versionCode='(\d+)' versionName='(\S+)'" ).match(output.decode())
if not package_match:
raise Exception("can't get package,versioncode,version")
package_name = package_match.group(1)
version_code = package_match.group(2)
version_name = package_match.group(3)
launch_activity_match = re.compile("launchable-activity: name='(\S+)'" ).search(output.decode())
if not launch_activity_match:
raise Exception("can't get launch_activity")
launch_activity = launch_activity_match.group(1)
sdk_version_match = re.compile("sdkVersion:'(\S+)'" ).search(output.decode())
if not sdk_version_match:
raise Exception("can't get min_sdk_version")
min_sdk_version = sdk_version_match.group(1)
target_sdk_version_match = re.compile("targetSdkVersion:'(\S+)'" ).search(output.decode())
if not target_sdk_version_match:
raise Exception("can't get target_sdk_version")
target_sdk_version = target_sdk_version_match.group(1)
application_label_match = re.compile("application-label:'([\u4e00-\u9fa5_a-zA-Z0-9-\S]+)'" ).search(output.decode())
if not application_label_match:
raise Exception("can't get application_label")
application_label = application_label_match.group(1)
return package_name, version_name, version_code, launch_activity, min_sdk_version, target_sdk_version, application_label3.2 apk_file_size & apk_download_size
def get_apk_size(self):
# Get APK file size on disk
size = round(os.path.getsize(self.apkPath) / (1024 * 1000), 2)
# return str(size) + "M"
return os.path.getsize(self.apkPath)
def get_apk_download_size(apk_file_name):
# Get compressed size of the APK
zip_file = zipfile.ZipFile(apk_file_name, 'r')
zip_infos = zip_file.infolist()
download_size = 0
for index in range(len(zip_infos)):
zip_info = zip_infos[index]
download_size += zip_info.compress_size
return download_size3.3 ZipFile Read APK Files
def __get_files_from_apk(apk_file_name, apk_name_without_suffix, mapping_name_without_suffix):
# Read obfuscation mapping
proguard_map = reproguard.read_proguard_apk(mapping_name_without_suffix)
zip_file = zipfile.ZipFile(apk_file_name, 'r')
file_name_list = zip_file.namelist()
for index in range(len(file_name_list)):
file_name = str(file_name_list[index])
if proguard_map:
entry_name = str(reproguard.replace_path_id(file_name, proguard_map)) if ("/" in file_name) else file_name
else:
entry_name = file_name
md5_str = md5.get_md5_value(file_name)
zip_info = zip_file.getinfo(file_name)
file_info = FileInfo(path=file_name, entry_name=entry_name, md5_str=md5_str, compress_size=zip_info.compress_size, file_type=file_type, zip_file=zip_info)
# Further processing for .so, React Native, dex, images, etc.
zip_file.close()
return apk_file_list, aura_bundles, dex_files, react_modules3.4 Parse DEX Header
def ReadDexHeader_(self, file_dir):
# Read DEX file in binary mode
f = open(file_dir, 'rb')
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
self.mmap = m
# Extract header fields
string_ids_size = struct.unpack('
string_ids_off = struct.unpack('
type_ids_size = struct.unpack('
type_ids_off = struct.unpack('
proto_ids_size = struct.unpack('
proto_ids_off = struct.unpack('
field_ids_size = struct.unpack('
field_ids_off = struct.unpack('
method_ids_size = struct.unpack('
method_ids_off = struct.unpack('
class_defs_size = struct.unpack('
class_defs_off = struct.unpack('
data_size = struct.unpack('
data_off = struct.unpack('
header_data = {
'string_ids_size': string_ids_size,
'string_ids_off': string_ids_off,
'type_ids_size': type_ids_size,
'type_ids_off': type_ids_off,
'proto_ids_size': proto_ids_size,
'proto_ids_off': proto_ids_off,
'field_ids_size': field_ids_size,
'field_ids_off': field_ids_off,
'method_ids_size': method_ids_size,
'method_ids_off': method_ids_off,
'class_defs_size': class_defs_size,
'class_defs_off': class_defs_off,
'data_size': data_size,
'data_off': data_off
}
self.header = header_data3.5 Identify Non‑Alpha PNG Images
from PIL import Image
try:
image_bytes = io.BytesIO(zip_file.read(file_name))
img = Image.open(image_bytes)
image_size = img.size # (width, height)
if img.mode != "RGBA":
if image_type == ".png" and not filename_without_suffix.endswith(".9") and zip_info.compress_size >= 10*1024:
non_alpha = True
except OSError:
pass
finally:
file_info.image_size = image_size
file_info.non_alpha = non_alpha
apk_file_list.append(file_info)
continue3.6 Duplicate Resources – Duplicate files are detected by comparing MD5 hashes; identical hashes indicate redundant assets that can be deduplicated.
3.7 Unused Resources
Unused resources include files in res/ and assets/ that are not referenced by the compiled R.java , XML layouts, the AndroidManifest, or DEX code. The analysis proceeds in two parts:
3.7.1 Unused res/ Resources – Parse R.txt to obtain all resource IDs, analyze resources.arsc for actual references, scan XML files for value and non‑value references, and examine DEX/SMALI code for direct resource usage. The sets are merged, and any IDs not present in the merged reference set are considered unused.
3.7.2 Unused assets/ Resources – List all files under assets/ , then search SMALI code for string literals that reference those assets; files not referenced are marked as unused.
Key code snippets for these steps are provided in the original article (e.g., read_resource_txt_file , read_smali_files , decode_resources , find_asset_file ).
Conclusion – By leveraging Python to automate APK resource analysis, developers obtain precise metrics for image size, duplicate detection, DEX method counts, and unused assets, enabling effective APK slimming, reduced distribution costs, and improved user conversion rates.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.