Operations 32 min read

Optimizing Python Dependency Management: Refactoring pipdeptree for TencentOS

This article details the background, research, demos, discovered issues, and the comprehensive refactor of the pipdeptree tool to replace deprecated pkg_resources APIs with importlib.metadata, improving Python package dependency analysis for AI workloads in TencentOS.

Tencent Architect
Tencent Architect
Tencent Architect
Optimizing Python Dependency Management: Refactoring pipdeptree for TencentOS

Background

As AI ecosystems grow, Python becomes the foundational language, leading to an explosion of packages and libraries. Solving the Python dependency‑hell problem can bring two main benefits: trimming unnecessary packages reduces AI/python image size, and cleaning up redundant dependencies makes development environments cleaner and dependency relationships clearer.

Research

The

pipdeptree

tool meets our requirements and is already integrated in TencentOS Server 4, installable via

dnf install python3-pipdeptree

. We also use it when building Python runtime and PyTorch images to produce smaller, lighter images.

pipdeptree

is simple to use; the two most common commands are:

<code>pipdeptree -p ABC</code>
<code>pipdeptree -j</code>

Internally,

pipdeptree

relies on the official

pkg_resources

library. The core parts are:

Environment.from_paths(None).iter_installed_distributions

– retrieves all installed distributions in the current environment.

DistInfoDistribution.requires()

– obtains the dependencies of a distribution.

Example of the generated dependency tree (truncated):

<code>{
    "package": {
        "key": "adal",
        "package_name": "adal",
        "installed_version": "1.2.7"
    },
    "dependencies": [
        {
            "key": "cryptography",
            "package_name": "cryptography",
            "installed_version": "41.0.4",
            "required_version": ">=1.1.0"
        },
        {
            "key": "pyjwt",
            "package_name": "PyJWT",
            "installed_version": "2.6.0",
            "required_version": ">=1.0.0,<3"
        }
    ]
}
</code>

Important notes:

The generated tree does not include

tox.ini

test dependencies, which are irrelevant for runtime RPM builds and can be ignored.

The

pkg_resources

API is deprecated; we should replace it with

importlib.resources

and related APIs.

Our own RPM packaging also uses

BuildRequires

and

Requires

, which can introduce redundant dependencies when the upstream package does not accurately list its needs.

Proposed Optimizations

Use

pipdeptree

to prune unnecessary Python packages from the environment.

Parse Python package source code to verify whether declared dependencies are actually used.

Demo – Step 1

<code>import json
import subprocess
import re

FILTERED_DEPENDENCIES = ['python3']

def extract_package_name(dep):
    match = re.match(r'.*python3(?:\.\d+)?dist\(([^)]+)\).*', dep)
    if match:
        return match.group(1)
    return dep.split(' ')[0]

def get_package_dependencies(package_name):
    package_name = get_rpmname(package_name)
    print(package_name)
    try:
        command = f'rpm-dep -i {package_name} -q'
        subprocess.run(command.split())
        parse_cmd = "jq -r '.next[] | .pkg_name' dep_tree__{package_name}__install.json | sort | uniq"
        output = subprocess.getoutput(parse_cmd)
        dependencies = output.strip().split('\n')
        return dependencies
    except subprocess.CalledProcessError:
        return []

def get_rpmname(py_name):
    if not py_name.startswith('python-'):
        command = f"dnf repoquery --whatprovides 'python3dist({py_name})' --latest-limit 1 --queryformat '%{{NAME}}' -q"
        output = subprocess.getoutput(command)
        if output == "":
            command = f"dnf repoquery --whatprovides 'python3dist({py_name.lower()})' --latest-limit 1 --queryformat '%{{NAME}}' -q"
            output = subprocess.getoutput(command)
            if output == "":
                py_name = f"python3-{py_name}"
                info_command = f'dnf info python3-{py_name}'
                info_result = subprocess.run(info_command.split(), stderr=subprocess.DEVNULL, stdout=subprocess.PIPE, text=True)
                if info_result.returncode != 0:
                    py_name = "ERROR"
        else:
            py_name = output
    else:
        py_name = py_name[7:]
        info_command = f'dnf info python3-{py_name}'
        info_result = subprocess.run(info_command.split(), stderr=subprocess.DEVNULL, stdout=subprocess.PIPE, text=True)
        if info_result.returncode != 0:
            py_name = "ERROR"
        else:
            py_name = f"python3-{py_name}"
    return py_name

def check_dependencies(package_data):
    package_name = package_data['package']['key']
    local_dependencies = [get_rpmname(dep['key']) for dep in package_data['dependencies']]
    repo_dependencies = get_package_dependencies(package_name)
    missing_dependencies = list(set(repo_dependencies) - set(local_dependencies))
    extra_dependencies = list(set(local_dependencies) - set(repo_dependencies))
    missing_dependencies = [dep for dep in missing_dependencies if dep not in FILTERED_DEPENDENCIES]
    extra_dependencies = [dep for dep in extra_dependencies if dep not in FILTERED_DEPENDENCIES]
    print(missing_dependencies)
    print(extra_dependencies)
    return {
        'package_name': get_rpmname(package_name),
        'missing_dependencies': missing_dependencies,
        'extra_dependencies': extra_dependencies
    }

def main():
    with open('packages.json', 'r') as file:
        packages_data = json.load(file)
    result = []
    for package_data in packages_data:
        package_result = check_dependencies(package_data)
        result.append(package_result)
    with open('result.json', 'w') as file:
        json.dump(result, file, indent=2)

if __name__ == '__main__':
    main()
</code>

Analysis of the results shows two main inaccuracies:

Test dependencies from

tox.ini

are missing, causing RPM dependencies to appear larger than those found by

pipdeptree

.

Some packages do not follow standard Python packaging conventions, leading to mismatched names (e.g.,

pycryptodome

vs.

Crypto

).

Demo – Step 2

<code>import ast
import importlib.metadata
import importlib.resources
import json
import os
import sys
import re

builtin_modules = set(sys.builtin_module_names)

def get_standard_library_modules():
    lib_path = os.path.dirname(os.__file__)
    modules = []
    def add_module(root, file):
        module_path = os.path.relpath(os.path.join(root, file), lib_path)
        module_name = os.path.splitext(module_path.replace(os.path.sep, '.'))[0]
        if module_name.endswith('.__init__'):
            module_name = module_name[:-9]
        modules.append(module_name)
    for root, dirs, files in os.walk(lib_path):
        if 'site-packages' in dirs:
            dirs.remove('site-packages')
        if root == lib_path:
            for file in files:
                if file.endswith('.py'):
                    add_module(root, file)
        if '__init__.py' in files:
            add_module(root, '__init__.py')
    return modules

builtin_modules.update(get_standard_library_modules())

def parse_imports(file_path):
    with open(file_path, 'r') as file:
        content = file.read()
    content = re.sub(r'#.*', '', content)
    content = re.sub(r'""".*?"""', '', content, flags=re.DOTALL)
    import_re = re.compile(r'(?:from\s+([.\w]+)(?:\s+import\s+[\w, ()]+)|import\s+([\w, ()]+))')
    matches = import_re.findall(content)
    imports = []
    for match in matches:
        module_names = match[0] if match[0] else match[1]
        if not module_names.startswith('.'):
            module_names = module_names.split(',')
            for module_name in module_names:
                module_name = module_name.strip().split(' as ')[0].split('.')[0]
                if module_name not in builtin_modules and not module_name.startswith('_'):
                    imports.append(module_name)
    return imports

def get_package_imports():
    package_imports = {}
    dists = importlib.metadata.distributions()
    for dist in dists:
        package_name = dist.metadata['Name']
        try:
            package_dir = importlib.resources.files(package_name)
            if package_dir is not None:
                package_imports[package_name] = {}
                for root, dirs, files in os.walk(str(package_dir)):
                    for file in files:
                        if file.endswith('.py'):
                            file_path = os.path.join(root, file)
                            imports = parse_imports(file_path)
                            imports = list(set(imports))
                            if package_name in imports:
                                imports.remove(package_name)
                            package_imports[package_name][file_path] = imports
        except Exception:
            pass
    return package_imports

package_imports = get_package_imports()
json_data = json.dumps(package_imports, indent=4)
print(json_data)
with open('packages.json', 'r') as file:
    package_data = json.load(file)
for package in package_data:
    package_name = package['package']['package_name']
    if package_name in package_imports:
        dependencies = {dep['package_name'] for dep in package['dependencies']}
        for file_path, imports in package_imports[package_name].items():
            for import_name in imports:
                if import_name not in dependencies:
                    print(f'In package {package_name}, file {file_path} imports {import_name} which is not in dependencies.')
                else:
                    print(f'In package {package_name}, file {file_path} imports {import_name} is found in pipdeptree.')
</code>

Further analysis reveals additional problems:

AST cannot distinguish relative imports (e.g.,

from .ABC import DEF

) from absolute imports, causing false positives.

Package names may not match module names (e.g.,

pycryptodome

provides

Crypto

).

Optional dependencies and test‑only dependencies appear as missing but are harmless.

Upstream packages sometimes omit required dependencies (e.g.,

urllib3

missing

brotli

,

google

, etc.).

Advanced Refactor – Replacing Deprecated APIs

The

pkg_resources

API is deprecated. We replace it with

importlib.metadata

and

packaging

while preserving functionality:

Replace

DistInfoDistribution

with

importlib.metadata.Distribution

.

Replace

Requirement

with

packaging.requirements.Requirement

.

Implement

local_only

and

user_only

logic using

sys.prefix

,

sys.base_prefix

,

site.getsitepackages()

, and

site.getusersitepackages()

.

Handle

direct_url.json

for editable installs and retrieve the source location.

Adapt version specifiers using

packaging.specifiers.SpecifierSet

.

Key code snippets for the new implementation:

<code>from importlib.metadata import Distribution

def iter_distributions(local_only=False, user_only=False):
    if local_only and sys.prefix != sys.base_prefix:
        paths = site.getsitepackages([sys.prefix])
        return list(distributions(path=paths))
    if user_only:
        return list(distributions(path=[site.getusersitepackages()]))
    return list(distributions())
</code>

We also provide compatibility shims for attributes like

key

,

project_name

, and

editable

that were present in the old API.

Testing Adjustments

Tests were updated to include the new

packaging

dependency, mock editable installs using

MagicMock

, and simulate virtual environments by monkey‑patching

sys.prefix

and command‑line arguments.

<code>def test_local_only(tmp_path, monkeypatch, capfd):
    prefix = str(tmp_path / 'venv')
    result = virtualenv.cli_run([prefix, '--activators', ''])
    pip_path = str(result.creator.exe.parent / 'pip')
    subprocess.run([pip_path, 'install', 'wrapt', '--prefix', prefix], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    cmd = [str(result.creator.exe.parent / 'python3'), '--local-only']
    monkeypatch.setattr(sys, 'prefix', [prefix])
    monkeypatch.setattr(sys, 'argv', cmd)
    main()
    out, _ = capfd.readouterr()
    found = {i.split('==')[0] for i in out.splitlines()}
    expected = {'wrapt', 'pip', 'setuptools', 'wheel'}
    assert found == expected
</code>

Conclusion

After completing the core refactor, the updated

pipdeptree

now uses modern, non‑deprecated APIs, correctly handles virtual environments, optional dependencies, and editable installs, and passes the extended test suite. The work was reviewed and approved by the upstream maintainers, leading to an invitation to become a maintainer of the project.

Pythonoperationsdependency managementpackagingrefactoringpipdeptreeimportlib
Tencent Architect
Written by

Tencent Architect

We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.