Opening Files#
Supported File Types#
PyMuPDF can open files other than just PDF.
The following file types are supported:
PDF XPS EPUB MOBI FB2 CBZ SVG TXT | |
JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD
JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS
|
How to Open a File#
To open a file, do the following:
doc = fitz.open("a.pdf")
Opening with a Wrong File Extension#
If you have a document with a wrong file extension for its type, you can still correctly open it.
Assume that “some.file” is actually an XPS. Open it like so:
doc = fitz.open("some.file", filetype="xps")
Note
PyMuPDF itself does not try to determine the file type from the file contents. You are responsible for supplying the file type information in some way – either implicitly, via the file extension, or explicitly as shown with the filetype
parameter. There are pure Python packages like filetype that help you doing this. Also consult the Document chapter for a full description.
If PyMuPDF encounters a file with an unknown / missing extension, it will try to open it as a PDF. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify doc=fitz.open(stream=mem_area)
to open it as a PDF document.
If you attempt to open an unsupported file then PyMuPDF will throw a file data error.
Opening Files as Text#
PyMuPDF has the capability to open any plain text file as a document. In order to do this you should provide the filetype
parameter for the fitz.open
function as "txt"
.
doc = fitz.open("my_program.py", filetype="txt")
In this way you are able to open a variety of file types and perform the typical non-PDF specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your txt
content, then saving as PDF or merging with other PDF files is no problem.
Examples#
Opening a C#
file#
doc = fitz.open("MyClass.cs", filetype="txt")
Opening an XML
file#
doc = fitz.open("my_data.xml", filetype="txt")
Opening a JSON
file#
doc = fitz.open("more_of_my_data.json", filetype="txt")
And so on!
As you can imagine many text based file formats can be very simply opened and interpreted by PyMuPDF. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.