All the examples below assume that you are running inside a Python virtual
environment. See: https://docs.python.org/3/library/venv.html for details.
We also assume that
pip is up to date.
py -m venv pymupdf-venv .\pymupdf-venv\Scripts\activate python -m pip install --upgrade pip
python -m venv pymupdf-venv . pymupdf-venv/bin/activate python -m pip install --upgrade pip
PyMuPDF should be installed using pip with:
pip install --upgrade pymupdf
This will install from a Python wheel if one is available for your platform.
Installation when a suitable wheel is not available#
If a suitable Python wheel is not available, pip will automatically build from source using a Python sdist.
This requires C/C++ development tools to be installed:
Install Visual Studio 2019. If not installed in a standard location, set environmental variable
PYMUPDF_SETUP_DEVENVto the location of the
Having other installed versions of Visual Studio, for example Visual Studio 2022, can cause problems because one can end up with MuPDF and PyMuPDF code being compiled with different compiler versions.
PyMuPDF-1.20.0, the required MuPDF source code is already in the
sdist and is automatically built into PyMuPDF.
Wheels are available for Windows (32-bit Intel, 64-bit Intel), Linux (64-bit Intel, 64-bit ARM) and Mac OSX (64-bit Intel, 64-bit ARM), Python versions 3.7 and up.
PyMuPDF does not support Python versions prior to 3.8. Older wheels can be found in this repository and on PyPI. Please note that we generally follow the official Python release schedules. For Python versions dropping out of official support this means, that generation of wheels will also be ceased for them.
There are no mandatory external dependencies. However, some optional feature are available only if additional components are installed:
pymupdf-fonts is a collection of nice fonts to be used for text output methods.
Tesseract-OCR for optical character recognition in images and document pages. Tesseract is separate software, not a Python package. To enable OCR functions in PyMuPDF, the software must be installed and the system environment variable
"TESSDATA_PREFIX"must be defined and contain the
tessdatafolder name of the Tesseract installation location. See below.
You can install these additional components at any time – before or after installing PyMuPDF. PyMuPDF will detect their presence during import or when the respective functions are being used.
Build and install from local PyMuPDF checkout and optional local MuPDF checkout#
Install C/C++ development tools as described above.
Enter a Python venv and update pip, as described above.
Get a PyMuPDF source tree:
Clone the PyMuPDF git repository:
git clone https://github.com/pymupdf/PyMuPDF.git
Or download and extract a
.tar.gzsource release from https://github.com/pymupdf/PyMuPDF/releases.
Build and install PyMuPDF:
cd PyMuPDF && pip install .
This will automatically download a specific hard-coded MuPDF source release, and build it into PyMuPDF.
Build and install PyMuPDF using a local MuPDF source tree:
Clone the MuPDF git repository:
git clone --recursive https://ghostscript.com:/home/git/mupdf.git
Build PyMuPDF, specifying the location of the local MuPDF tree with the environmental variables
cd PyMuPDF && PYMUPDF_SETUP_MUPDF_BUILD=../mupdf pip install .
Building for different Python versions in same PyMuPDF tree:
PyMuPDF will build for the version of Python that is being used to run
pip. To run
pipwith a specific Python version, use
python -m pipinstead of
So for example on Windows one can build different versions with:
cd PyMuPDF && py -3.9 -m pip install .
cd PyMuPDF && py -3.10-32 -m pip install .
When running Python scripts that use PyMuPDF, make sure that the
current directory is not the
Otherwise, confusingly, Python will attempt to import
fitz from the local
fitz/ directory, which will fail because it only contains source files.
Having a PyMuPDF tree available allows one to run PyMuPDF’s
pip install pytest fontTools pytest PyMuPDF/tests
Notes about using a non-default MuPDF#
Using a non-default build of MuPDF by setting environmental variable
PYMUPDF_SETUP_MUPDF_BUILD can cause various things to go wrong and so is
not generally supported:
If MuPDF’s major version number differs from what PyMuPDF uses by default, PyMuPDF can fail to build, because MuPDF’s API can change between major versions.
Runtime behaviour of PyMuPDF can change because MuPDF’s runtime behaviour changes between different minor releases. This can also break some PyMuPDF tests.
If MuPDF was built with its default config instead of PyMuPDF’s customised config (for example if MuPDF is a system install), it is possible that
tests/test_textbox.py:test_textbox3()will fail. One can skip this particular test by adding
-k 'not test_textbox3'to the
Enabling Integrated OCR Support#
If you do not intend to use this feature, skip this step. Otherwise, it is required for both installation paths: from wheels and from sources.
PyMuPDF will already contain all the logic to support OCR functions. But it additionally does need Tesseract’s language support data, so installation of Tesseract-OCR is still required.
The language support folder location must be communicated either via storing it in the environment variable
"TESSDATA_PREFIX", or as a parameter in the applicable functions.
So for a working OCR functionality, make sure to complete this checklist:
- Locate Tesseract’s language support folder. Typically you will find it here:
- Set the environment variable
setx TESSDATA_PREFIX "C:/Program Files/Tesseract-OCR/tessdata"
declare -x TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata
- Set the environment variable
On Windows systems, this must happen outside Python – before starting your script. Just manipulating
os.environ will not work!