Opening Files#

Supported File Types#

PyMuPDF can open files other than just PDF.

The following file types are supported:

PDF XPS EPUB MOBI FB2 CBZ SVG TXT
JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD
JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS

How to Open a File#

To open a file, do the following:

doc = fitz.open("a.pdf")

Opening with a Wrong File Extension#

If you have a document with a wrong file extension for its type, you can still correctly open it.

Assume that “some.file” is actually an XPS. Open it like so:

doc = fitz.open("some.file", filetype="xps")

Note

PyMuPDF itself does not try to determine the file type from the file contents. You are responsible for supplying the file type information in some way – either implicitly, via the file extension, or explicitly as shown with the filetype parameter. There are pure Python packages like filetype that help you doing this. Also consult the Document chapter for a full description.

If PyMuPDF encounters a file with an unknown / missing extension, it will try to open it as a PDF. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify doc=fitz.open(stream=mem_area) to open it as a PDF document.

If you attempt to open an unsupported file then PyMuPDF will throw a file data error.


Opening Files as Text#

PyMuPDF has the capability to open any plain text file as a document. In order to do this you should provide the filetype parameter for the fitz.open function as "txt".

doc = fitz.open("my_program.py", filetype="txt")

In this way you are able to open a variety of file types and perform the typical non-PDF specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your txt content, then saving as PDF or merging with other PDF files is no problem.

Examples#

Opening a C# file#

doc = fitz.open("MyClass.cs", filetype="txt")

Opening an XML file#

doc = fitz.open("my_data.xml", filetype="txt")

Opening a JSON file#

doc = fitz.open("more_of_my_data.json", filetype="txt")

And so on!

As you can imagine many text based file formats can be very simply opened and interpreted by PyMuPDF. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.


This software is provided AS-IS with no warranty, either express or implied. This software is distributed under license and may not be copied, modified or distributed except as expressly authorized under the terms of that license. Refer to licensing information at artifex.com or contact Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States for further information.

Discord logo