ファイルを開く¶

サポートされているファイルタイプ¶

PyMuPDF¶

PyMuPDF は PDF 以外のファイルも開くことができます。

PyMuPDF は以下のファイルタイプをサポートしています：


	PDF XPS EPUB MOBI FB2 CBZ SVG TXT MD
	JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS

PyMuPDF Pro¶

PyMuPDF Pro can open Office files.

PyMuPDF は以下のファイルタイプをサポートしています：

DOC/DOCX	XLS/XLSX	PPT/PPTX	HWP/HWPX

ファイルを開く方法¶

ファイルを開くには、次の手順を実行します。

doc = pymupdf.open("a.pdf")

注釈

The above creates a Document (ドキュメント). The instruction doc = pymupdf.Document("a.pdf") does exactly the same. So, open() is just a convenient alias.

To open an empty document, just do:

doc = pymupdf.open()

File Recognizer: Opening with a Wrong File Extension¶

If you have a document with a wrong file extension for its type, do not worry: it will still be opened correctly, thanks to the integrated file "content recognizer".

This component looks at the actual data in the file using a number of heuristics -- independent of the file extension. This of course is also true for file names without an extension.

Here is a list of details about how the file content recognizer works:

When opening from a file name, use the filetype parameter if your file format cannot be determined by content inspection. This is for instance the case for all text files: "txt", "html", "xml" or source files. If the file extension is missing or wrong or the file resides in memory, the filetype must be used. File formats that can successfully be recognized will be opened even without or wrong extensions, and the filetype paraneter will be ignored.
Files based on text content do not contain unambiguously recognizable internal structures. This is true for source files (Python, C, etc.) but also HTML, XML and so on. Here, the file extensions and the filetype parameter continue to play a role and are used to create a "Tex" / "HTML" / ... document. Correspondingly, text files with other / no extensions, can successfully be opened using filetype.

Opening Remote Files¶

For remote files on a server (i.e. non-local files), you will need to stream the file data to PyMuPDF.

For example use the requests library as follows:

import pymupdf
import requests

r = requests.get('https://mupdf.com/docs/mupdf_explored.pdf')
data = r.content
doc = pymupdf.Document(stream=data)

Opening Files from Cloud Services¶

For further examples which deal with files held on typical cloud services please see these Cloud Interactions code snippets.

Opening Django Files¶

Django implements a File Storage API to store files. The default is the FileSystemStorage, but the django-storages library provides a number of other storage backends.

You can open the file, move the contents into memory, then pass the contents to PyMuPDF as a stream.

import pymupdf
from django.core.files.storage import default_storage

from .models import MyModel

obj = MyModel.objects.get(id=1)
with default_storage.open(obj.file.name) as f:
    data = f.read()

doc = pymupdf.Document(stream=data)

Please note that if the file you open is large, you may run out of memory.

The File Storage API works well if you're using different storage backends in different environments. If you're only using the FileSystemStorage, you can simply use the obj.file.name to open the file directly with PyMuPDF as shown in an earlier example.

ファイルをテキストとして開く¶

PyMuPDF には、プレーンテキストファイルをドキュメントとして開く機能があります。これを行うには、pymupdf.open 関数の filetype パラメータを「txt」として指定する必要があります。

doc = pymupdf.open("my_program.py", filetype="txt")

このようにして、さまざまな種類のファイルを開いて、テキスト検索、テキスト抽出、ページレンダリングなどの PDF に固有ではない一般的な機能を実行できます。明らかに、txt コンテンツをレンダリングしたら、 PDF として保存したり、他の PDF ファイルと結合したりすることは問題ありません。

例¶

C# ファイルを開く¶

doc = pymupdf.open("MyClass.cs", filetype="txt")

XML ファイルを開く¶

doc = pymupdf.open("my_data.xml", filetype="txt")

JSON ファイルを開く¶

doc = pymupdf.open("more_of_my_data.json", filetype="txt")

等々！

ご想像のとおり、多くのテキストベースのファイル形式は、PyMuPDF によって非常に簡単に開いて解釈できます。これにより、これまで利用できなかった広範囲のファイルのデータ分析と抽出が突然可能になります。

Full Options for Opening a File¶

The pymupdf.open function has a number of parameters to give you full control over how files are opened. For the full API, please see the Document (ドキュメント) chapter, as open is just an alias for the Document() constructor.

open(filename=None, stream=None, filetype=None, archive=None, rect=None, width=0, height=0, fontsize=11)¶

See the Document() constructor for details.

戻り値:: A document object.

This software is provided AS-IS with no warranty, either express or implied. This software is distributed under license and may not be copied, modified or distributed except as expressly authorized under the terms of that license. Refer to licensing information at artifex.com or contact Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States for further information.