Fundamentals 13 min read

PyMuPDF (Python bindings for MuPDF) – Introduction, Features, Installation and Usage Guide

This article provides a comprehensive overview of PyMuPDF, the Python binding for the lightweight MuPDF library, covering its purpose, supported document formats, key features such as rendering, text extraction and PDF manipulation, installation methods, and detailed code examples for common operations.

Sohu Tech Products

Sep 28, 2022

PyMuPDF (Python bindings for MuPDF) – Introduction, Features, Installation and Usage Guide

1. Introduction to PyMuPDF

PyMuPDF is the Python interface to MuPDF, a lightweight PDF, XPS, and e‑book viewer library. MuPDF offers high‑quality anti‑aliased rendering, precise text layout, and supports formats like PDF, XPS, OpenXPS, CBZ, EPUB and FictionBook 2. The Python binding (current version 1.18.17) enables access to all MuPDF capabilities.

2. Core Features

Decrypt files

Access metadata, links and bookmarks

Render pages as raster images (PNG, etc.) or vector SVG

Search text

Extract text and images

Convert documents to PDF, (X)HTML, XML, JSON, plain text and more; for PDFs, create, merge or split pages, insert/delete/rearrange pages, and modify annotations and form fields

Extract or insert images and fonts

Full support for embedded files

Reformat PDFs for duplex printing, color separation, watermarks, etc.

Comprehensive password protection handling

Command‑line utility ( python -m fitz …) with encryption, decryption, optimization, sub‑document creation, document concatenation, and more

3. Installation

Install PyMuPDF via pip install PyMuPDF from PyPI wheels for Windows, Linux and macOS (Python 3.6‑3.9, 64‑bit; 32‑bit wheels are also available for Windows). Optional dependencies such as Pillow, fontTools and pymupdf‑fonts enhance functionality.

4. Basic Usage

Import the library: import fitz Check the version: print(fitz.__doc__) Open a document (from file or memory):

doc = fitz.open('example.pdf')  # or doc = fitz.open(stream=data, filetype='pdf')

5. Document Methods and Properties

Method/Property

Description Document.page_count Number of pages (int) Document.metadata Metadata dictionary Document.get_toc() Retrieve table of contents (list) Document.load_page() Load a specific page

6. Page Handling

Iterate over pages, load a page, and access links, annotations or widgets:

for page in doc:
    # process each page
    links = page.get_links()
    for link in links:
        # handle link
        pass

Render a page to a raster image:

pix = page.get_pixmap()
pix.save('page-%i.png' % page.number)

Render a page to SVG:

svg = page.get_svg_image()

Extract text in various formats ("text", "blocks", "words", "html", "dict", "json", "rawdict", "rawjson", "xhtml", "xml"):

text = page.get_text('text')

Search for a string on a page:

areas = page.search_for('mupdf')

7. PDF Operations

Modify PDFs (create, merge, split, reorder, delete pages) using methods such as Document.delete_page(), Document.copy_page(), Document.move_page(), Document.insert_page(), and Document.new_page(). Save changes with Document.save(), optionally using incremental=True for fast incremental updates.

Combine PDFs:

doc1.insert_pdf(doc2)  # append doc2 to doc1

Split a PDF (first 10 pages and last 10 pages example):

doc2 = fitz.open()
doc2.insert_pdf(doc1, to_page=9)          # first 10 pages
doc2.insert_pdf(doc1, from_page=len(doc1)-10)  # last 10 pages
doc2.save('first-and-last-10.pdf')

Close a document when finished:

doc.close()

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python PDF Document Processing MuPDF PyMuPDF text extraction

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.