Fundamentals 12 min read

Introduction to PyMuPDF: Features, Installation, and Usage Guide

This article provides a comprehensive overview of PyMuPDF, the Python binding for the lightweight MuPDF library, detailing its core features, installation steps, and practical code examples for opening documents, extracting metadata, rendering pages, manipulating PDFs, and performing advanced operations such as merging, splitting, and saving files.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Introduction to PyMuPDF: Features, Installation, and Usage Guide

PyMuPDF Overview

PyMuPDF is the Python binding for the MuPDF library, a lightweight PDF, XPS, and ebook viewer that offers high‑quality anti‑aliased rendering, fast performance, and extensive document format support.

Key Features

Decrypt files and access metadata, links, and bookmarks.

Render pages as raster images (PNG) or vector graphics (SVG).

Search for text strings within pages.

Extract text and images in multiple formats (plain text, HTML, JSON, XML, etc.).

Full support for embedded files, password protection, and PDF‑specific operations such as creating, merging, splitting, and rearranging pages.

Command‑line utilities for encryption, optimization, sub‑document creation, and more.

Installation

Install PyMuPDF from PyPI using pip:

<code>pip install PyMuPDF</code>

The package provides wheels for Windows, Linux, and macOS, supporting Python 3.6‑3.9 (64‑bit) and, more recently, many‑linux2014_aarch64 builds for ARM.

Basic Usage

Import the library and view its version information:

<code>import fitz
print(fitz.__doc__)</code>

Open a document (PDF, XPS, EPUB, CBZ, etc.):

<code>doc = fitz.open("example.pdf")</code>

Access document properties such as page count and metadata:

<code>page_count = doc.page_count
metadata = doc.metadata</code>

Working with Pages

Load a page by number (0‑based) or use Python slicing for iteration:

<code>page = doc.load_page(0)  # first page
# or simply
page = doc[0]

for page in doc:
    # process each page
    pass
</code>

Render a page to a raster image:

<code>pix = page.get_pixmap()
pix.save("page-0.png")</code>

Or render to SVG:

<code>svg = page.get_svg_image()
</code>

Extract text in various formats (default plain text, HTML, JSON, XML, etc.):

<code>text = page.get_text("text")
html = page.get_text("html")
json_data = page.get_text("json")
</code>

Search for a specific string on the page and obtain its bounding rectangles:

<code>areas = page.search_for("mupdf")
</code>

PDF‑Specific Operations

Modify PDFs by inserting, deleting, moving, or copying pages:

<code># Delete the last page
doc.delete_page(-1)
# Insert a new blank page at the end
doc.insert_page(-1, width=595, height=842)
</code>

Merge two PDFs:

<code>doc1.insert_pdf(doc2)
</code>

Split a PDF by extracting selected pages into a new document:

<code>new_doc = fitz.open()
new_doc.insert_pdf(doc, to_page=9)          # first 10 pages
new_doc.insert_pdf(doc, from_page=len(doc)-10)  # last 10 pages
new_doc.save("first-and-last-10.pdf")
</code>

Save changes either as a new file or incrementally update the original PDF:

<code>doc.save("output.pdf", incremental=True)
</code>

Closing the Document

When processing is complete, close the document to release file handles:

<code>doc.close()
</code>

The guide also includes notes on optional dependencies (Pillow, fontTools, pymupdf‑fonts) for advanced image handling and font extraction.

PythonpdfTutorialPyMuPDFDocumentProcessingPDFManipulation
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.