機能の比較#
機能比較表#
以下の表は、PyMuPDF が他の典型的な解決策と比較した場合の違いを示しています。
Feature | PyMuPDF | pikepdf | PyPDF2 | pdfrw | pdfplumber / pdfminer |
---|---|---|---|---|---|
Supports Multiple Document Formats |
PDF
XPS
EPUB
MOBI
FB2
CBZ
SVG
TXT
Image
DOCX XLSX PPTX HWPX See note |
||||
Implementation | Python and C | Python and C++ | Python | Python | Python |
Render Document Pages | All document types | No rendering | No rendering | No rendering | No rendering |
Write Text to PDF Page |
See: Page.insert_htmlbox or: Page.insert_textbox or: TextWriter |
||||
Supports CJK characters | |||||
Extract Text | All document types | PDF only | PDF only | ||
Extract Text as Markdown (.md) | All document types | ||||
Extract Tables | All document types | PDF only | |||
Extract Vector Graphics | All document types | Limited | |||
Draw Vector Graphics (PDF) | |||||
Based on Existing, Mature Library | MuPDF | QPDF | |||
Automatic Repair of Damaged PDFs | |||||
Encrypted PDFs | Limited | Limited | |||
Linerarized PDFs | |||||
Incremental Updates | |||||
Integrates with Jupyter and IPython Notebooks | |||||
Joining / Merging PDF with other Document Types | All document types | PDF only | PDF only | PDF only | PDF only |
OCR API for Seamless Integration with Tesseract | All document types | ||||
Integrated Checkpoint / Restart Feature (PDF) | |||||
PDF Optional Content | |||||
PDF Embedded Files | Limited | Limited | |||
PDF Redactions | |||||
PDF Annotations | Full | Limited | |||
PDF Form Fields | Create, read, update | Limited, no creation | |||
PDF Page Labels | |||||
Support Font Sub-Setting |
注釈
A note about Office document types (DOCX, XLXS, PPTX) and Hangul documents (HWPX). These documents can be loaded into PyMuPDF and you will receive a Document object.
There are some caveats:
we convert the input to HTML to layout the content.
because of this the original page separation has gone.
When saving out the result any faithful representation of the original layout cannot be expected.
Therefore input files are mostly in a form that's useful for text extraction.
パフォーマンス#
8つのPDFファイル(合計7,031ページ) にテキストと画像が含まれている固定されたセットのテストスイートを使用して、PyMuPDF のパフォーマンスをさまざまなタスクに対してベンチマークします。
以下は、タスクごとにグループ化された現在の結果です:
- Copying
This refers to opening a document and then saving it to a new file. This test measures the speed of reading a PDF and re-writing as a new PDF. This process is also at the core of functions like merging / joining multiple documents. The numbers below therefore apply to PDF joining and merging.
The results for all 7,031 pages are:
⏱
- Text Extraction
This refers to extracting simple, plain text from every page of the document and storing it in a text file.
The results for all 7,031 pages are:
⏱
- Rendering
This refers to making an image (like PNG) from every page of a document at a given DPI resolution. This feature is the basis for displaying a document in a GUI window.
The results for all 7,031 pages are:
⏱
注釈
これらのパフォーマンスのタイミングに関する方法の詳細については、パフォーマンス比較方法 を参照してください。
ライセンスと著作権#
PyMuPDFとMuPDFは現在、オープンソースのAGPLと商用ライセンス契約の両方で提供されています。ライセンスのガイドラインに従うことを確認するため、配布資料(COPYINGファイル)と ここ にあるAGPLライセンス契約の全文をお読みください。AGPLの要件を満たせないと判断された場合は、商用ライセンスに関する詳細情報については、 Artifex にお問い合わせください。
Artifex Artifexは、MuPDF の独占的な商業ライセンスエージェントです。
Artifex 、Artifex のロゴ、MuPDF 、およびMuPDFのロゴは、Artifex Software Inc. の登録商標です。
This documentation covers PyMuPDF v1.24.14 features as of 2024-11-19 00:00:01.
The major and minor versions of PyMuPDF and MuPDF will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF.
Typically PyMuPDF is released more frequently than MuPDF so it will often be the case that the patch level of PyMuPDF will be greater than the embedded MuPDF.
For example PyMuPDF-1.24.5 contains MuPDF-1.24.2.
Also see pymupdf_version
and mupdf_version
.