Installation

PyMuPDF can be installed from Python wheels for Windows (32bit and 64bit), Linux (64bit, Intel and ARM) and Mac OSX (64bit, Intel), Python versions 3.6 and up:

python -m pip install --upgrade pip
python -m pip install --upgrade pymupdf

PyMuPDF does not support Python versions prior to 3.6. Some older wheels can be found here. Please note that we generally follow the official Python release schedules. For Python versions dropping out of official support this means that generation of wheels will eventually be ceased.

There are no mandatory external dependencies. However, some optional feature are available only if additional components are installed:

  • Pillow is required for Pixmap.pil_save() and Pixmap.pil_tobytes().
  • fontTools is required for Document.subset_fonts().
  • pymupdf-fonts is a collection of nice fonts to be used for text output methods.
  • Tesseract-OCR for optical character recognition in images and document pages. Tesseract is separate software, not a Python package. To enable OCR functions in PyMuPDF, the system environment variable "TESSDATA_PREFIX" must be defined and contain the tessdata folder name of the Tesseract installation location.

Note

You can install these additional components at any time – before or after installing PyMuPDF. PyMuPDF will detect their presence during import or when the respective functions are being used.

To install from sources, follow these steps:

Step 1: Install MuPDF

For open source GNU AGPL licenses download from here.

If you are a commercial customer, please contact Artifex.

Install following the instructions for your platform.

Step 2: Download and Generate PyMuPDF

Download the sources from https://pypi.org/project/PyMuPDF/#files and decompress them.

Adjust the setup.py script when necessary. Especially make sure that include_dirs and library_dirs point to the folders of your MuPDF installation. The easiest way to do this is setting the environment variable "PYMUPDF_DIRS" to the name of a JSON file, that contains a dictionary with these two keys having a list of folder names as values:

{
  "include_dirs": ["folder1", "folder2", "folder3", ...],
  "library_dirs": ["folder1", "folder2", "folder3", ...],
}

Now perform a python setup.py install.

Note

You can also install from sources of the Github repository. These do not contain the pre-generated files fitz.py or fitz_wrap.c, which instead are generated by the installation script setup.py. To use it, SWIG must be installed on your system.

Enabling Integrated OCR Support

If you do not intend to use this feature, this step can be skipped. Otherwise, it is required for both installation paths: from wheels and from sources.

PyMuPDF will contain all the logic to support OCR functions. Tesseract is however not a Python package, but separate software that must be installed on the system.

To use it, (Py-) MuPDF needs to be told the location of Tesseract’s language support folder. This currently happens via storing that folder name in the environment variable "TESSDATA_PREFIX".

In Windows, a typical way to define this name is:

set TESSDATA_PREFIX=C:\Program Files\Tesseract-OCR\tessdata

On Unix systems one might execute:

export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata

Caution

Setting this environment variable must happen outside Python – before starting your script. Manipulating os.environ will not work!