Fundamentals 12 min read

Introduction to PyMuPDF: Features, Installation, and Usage Guide

This article provides a comprehensive overview of PyMuPDF, the Python binding for the lightweight MuPDF library, detailing its core features, installation steps, and practical code examples for opening documents, extracting metadata, rendering pages, manipulating PDFs, and performing advanced operations such as merging, splitting, and saving files.

Python Programming Learning Circle

Aug 29, 2022

Introduction to PyMuPDF: Features, Installation, and Usage Guide

PyMuPDF Overview

PyMuPDF is the Python binding for the MuPDF library, a lightweight PDF, XPS, and ebook viewer that offers high‑quality anti‑aliased rendering, fast performance, and extensive document format support.

Key Features

Decrypt files and access metadata, links, and bookmarks.

Render pages as raster images (PNG) or vector graphics (SVG).

Search for text strings within pages.

Extract text and images in multiple formats (plain text, HTML, JSON, XML, etc.).

Full support for embedded files, password protection, and PDF‑specific operations such as creating, merging, splitting, and rearranging pages.

Command‑line utilities for encryption, optimization, sub‑document creation, and more.

Installation

Install PyMuPDF from PyPI using pip: pip install PyMuPDF The package provides wheels for Windows, Linux, and macOS, supporting Python 3.6‑3.9 (64‑bit) and, more recently, many‑linux2014_aarch64 builds for ARM.

Basic Usage

Import the library and view its version information:

import fitz
print(fitz.__doc__)

Open a document (PDF, XPS, EPUB, CBZ, etc.): doc = fitz.open("example.pdf") Access document properties such as page count and metadata:

page_count = doc.page_count
metadata = doc.metadata

Working with Pages

Load a page by number (0‑based) or use Python slicing for iteration:

page = doc.load_page(0)  # first page
# or simply
page = doc[0]

for page in doc:
    # process each page
    pass

Render a page to a raster image:

pix = page.get_pixmap()
pix.save("page-0.png")

Or render to SVG:

svg = page.get_svg_image()

Extract text in various formats (default plain text, HTML, JSON, XML, etc.):

text = page.get_text("text")
html = page.get_text("html")
json_data = page.get_text("json")

Search for a specific string on the page and obtain its bounding rectangles:

areas = page.search_for("mupdf")

PDF‑Specific Operations

Modify PDFs by inserting, deleting, moving, or copying pages:

# Delete the last page
doc.delete_page(-1)
# Insert a new blank page at the end
doc.insert_page(-1, width=595, height=842)

Merge two PDFs:

doc1.insert_pdf(doc2)

Split a PDF by extracting selected pages into a new document:

new_doc = fitz.open()
new_doc.insert_pdf(doc, to_page=9)          # first 10 pages
new_doc.insert_pdf(doc, from_page=len(doc)-10)  # last 10 pages
new_doc.save("first-and-last-10.pdf")

Save changes either as a new file or incrementally update the original PDF:

doc.save("output.pdf", incremental=True)

Closing the Document

When processing is complete, close the document to release file handles:

doc.close()

The guide also includes notes on optional dependencies (Pillow, fontTools, pymupdf‑fonts) for advanced image handling and font extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Tutorial PyMuPDF DocumentProcessing PDFManipulation

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.