Mobile Development 22 min read

APK Resource Analysis and Optimization Using Python

This article explains how to use Python to analyze Android APK packages, extract basic statistics, identify optimizable resources such as oversized images, duplicate files, and unused assets, and provide data‑driven guidance for reducing APK size and improving distribution efficiency.

JD Retail Technology
JD Retail Technology
JD Retail Technology
APK Resource Analysis and Optimization Using Python

Background – Rapid feature growth in the JD.com main app has caused the APK size to increase dramatically, leading to higher promotion costs, lower user download willingness, and exceeding Google Play’s 100 MB limit. The article describes a Python‑based approach to analyze APKs, gather basic data, and pinpoint optimization opportunities.

APK File Structure – An APK is a zip archive; using aapt l file.apk lists its contents. The article shows typical directories (e.g., res/ , assets/ , lib/ , src/ ) and notes that Java resources are also packaged.

Primary Analysis Tasks

Download APK and mapping files.

Use AAPT to retrieve package information.

Obtain file system size ( apk_file_size ) and compressed size ( apk_download_size ).

Restore obfuscated resource IDs.

Detect duplicate resources via MD5.

Read DEX header to get class and method counts.

Identify non‑alpha PNG images larger than 10 KB.

Extract .so files with ZipFile and analyze them.

Detect unused resources under res/ .

All steps are implemented in Python; downloading and extracting the APK is performed with resumable download logic (omitted for brevity).

3.1 AAPT Retrieve APK Information

def get_apk_base_info(self):
# Get basic APK info
p = subprocess.Popen(self.aapt_path + " dump badging %s" % self.apkPath, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)
(output, err) = p.communicate()
package_match = re.compile("package: name='(\S+)' versionCode='(\d+)' versionName='(\S+)'" ).match(output.decode())
if not package_match:
raise Exception("can't get package,versioncode,version")
package_name = package_match.group(1)
version_code = package_match.group(2)
version_name = package_match.group(3)
launch_activity_match = re.compile("launchable-activity: name='(\S+)'" ).search(output.decode())
if not launch_activity_match:
raise Exception("can't get launch_activity")
launch_activity = launch_activity_match.group(1)
sdk_version_match = re.compile("sdkVersion:'(\S+)'" ).search(output.decode())
if not sdk_version_match:
raise Exception("can't get min_sdk_version")
min_sdk_version = sdk_version_match.group(1)
target_sdk_version_match = re.compile("targetSdkVersion:'(\S+)'" ).search(output.decode())
if not target_sdk_version_match:
raise Exception("can't get target_sdk_version")
target_sdk_version = target_sdk_version_match.group(1)
application_label_match = re.compile("application-label:'([\u4e00-\u9fa5_a-zA-Z0-9-\S]+)'" ).search(output.decode())
if not application_label_match:
raise Exception("can't get application_label")
application_label = application_label_match.group(1)
return package_name, version_name, version_code, launch_activity, min_sdk_version, target_sdk_version, application_label

3.2 apk_file_size & apk_download_size

def get_apk_size(self):
# Get APK file size on disk
size = round(os.path.getsize(self.apkPath) / (1024 * 1000), 2)
# return str(size) + "M"
return os.path.getsize(self.apkPath)
def get_apk_download_size(apk_file_name):
# Get compressed size of the APK
zip_file = zipfile.ZipFile(apk_file_name, 'r')
zip_infos = zip_file.infolist()
download_size = 0
for index in range(len(zip_infos)):
zip_info = zip_infos[index]
download_size += zip_info.compress_size
return download_size

3.3 ZipFile Read APK Files

def __get_files_from_apk(apk_file_name, apk_name_without_suffix, mapping_name_without_suffix):
# Read obfuscation mapping
proguard_map = reproguard.read_proguard_apk(mapping_name_without_suffix)
zip_file = zipfile.ZipFile(apk_file_name, 'r')
file_name_list = zip_file.namelist()
for index in range(len(file_name_list)):
file_name = str(file_name_list[index])
if proguard_map:
entry_name = str(reproguard.replace_path_id(file_name, proguard_map)) if ("/" in file_name) else file_name
else:
entry_name = file_name
md5_str = md5.get_md5_value(file_name)
zip_info = zip_file.getinfo(file_name)
file_info = FileInfo(path=file_name, entry_name=entry_name, md5_str=md5_str, compress_size=zip_info.compress_size, file_type=file_type, zip_file=zip_info)
# Further processing for .so, React Native, dex, images, etc.
zip_file.close()
return apk_file_list, aura_bundles, dex_files, react_modules

3.4 Parse DEX Header

def ReadDexHeader_(self, file_dir):
# Read DEX file in binary mode
f = open(file_dir, 'rb')
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
self.mmap = m
# Extract header fields
string_ids_size = struct.unpack('
string_ids_off = struct.unpack('
type_ids_size = struct.unpack('
type_ids_off = struct.unpack('
proto_ids_size = struct.unpack('
proto_ids_off = struct.unpack('
field_ids_size = struct.unpack('
field_ids_off = struct.unpack('
method_ids_size = struct.unpack('
method_ids_off = struct.unpack('
class_defs_size = struct.unpack('
class_defs_off = struct.unpack('
data_size = struct.unpack('
data_off = struct.unpack('
header_data = {
'string_ids_size': string_ids_size,
'string_ids_off': string_ids_off,
'type_ids_size': type_ids_size,
'type_ids_off': type_ids_off,
'proto_ids_size': proto_ids_size,
'proto_ids_off': proto_ids_off,
'field_ids_size': field_ids_size,
'field_ids_off': field_ids_off,
'method_ids_size': method_ids_size,
'method_ids_off': method_ids_off,
'class_defs_size': class_defs_size,
'class_defs_off': class_defs_off,
'data_size': data_size,
'data_off': data_off
}
self.header = header_data

3.5 Identify Non‑Alpha PNG Images

from PIL import Image
try:
image_bytes = io.BytesIO(zip_file.read(file_name))
img = Image.open(image_bytes)
image_size = img.size  # (width, height)
if img.mode != "RGBA":
if image_type == ".png" and not filename_without_suffix.endswith(".9") and zip_info.compress_size >= 10*1024:
non_alpha = True
except OSError:
pass
finally:
file_info.image_size = image_size
file_info.non_alpha = non_alpha
apk_file_list.append(file_info)
continue

3.6 Duplicate Resources – Duplicate files are detected by comparing MD5 hashes; identical hashes indicate redundant assets that can be deduplicated.

3.7 Unused Resources

Unused resources include files in res/ and assets/ that are not referenced by the compiled R.java , XML layouts, the AndroidManifest, or DEX code. The analysis proceeds in two parts:

3.7.1 Unused res/ Resources – Parse R.txt to obtain all resource IDs, analyze resources.arsc for actual references, scan XML files for value and non‑value references, and examine DEX/SMALI code for direct resource usage. The sets are merged, and any IDs not present in the merged reference set are considered unused.

3.7.2 Unused assets/ Resources – List all files under assets/ , then search SMALI code for string literals that reference those assets; files not referenced are marked as unused.

Key code snippets for these steps are provided in the original article (e.g., read_resource_txt_file , read_smali_files , decode_resources , find_asset_file ).

Conclusion – By leveraging Python to automate APK resource analysis, developers obtain precise metrics for image size, duplicate detection, DEX method counts, and unused assets, enabling effective APK slimming, reduced distribution costs, and improved user conversion rates.

Mobile DevelopmentpythonResource OptimizationAPKimage compressionDex
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.