Archive#
New in v1.21.0
This class represents a generalization of file folders and container files like ZIP and TAR archives. Archives allow accessing arbitrary collections of file folders, ZIP / TAR files and single binary data elements as if they all were part of one hierarchical tree of folders.
In PyMuPDF, archives are currently only used by Story objects to specify where to look for fonts, images and other resources.
Method / Attribute |
Short Description |
---|---|
add new data to the archive |
|
check if given name is a member |
|
read the data given by the name |
|
list[dict] of archive items |
Class API
- class Archive#
- __init__(self[, content[, path]])#
Creates a new archive. Without parameters, an empty archive is created.
If provided,
content
may be one of the following:another Archive: the archive is being made a sub-archive of the new one.
a string: this must be the name of a local folder or file.
pathlib.Path
objects are also supported.A folder will be converted to a sub-archive, so its files (and any sub-folders) can be accessed by their names.
A file will be read with mode
"rb"
and these binary data (abytes
object) be treated as a single-member sub-archive. In this case, thepath
parameter is mandatory and should be the member name under which this item can be found / retrieved.
a
zipfile.ZipFile
ortarfile.TarFile
object: Will be added as a sub-archive.a Python binary object (
bytes
,bytearray
,io.BytesIO
): this will add a single-member sub-archive. In this case, thepath
parameter is mandatory and should be the member name under which this item can be found / retrieved.a tuple
(data, name)
: This will add a single-member sub-archive with the member namename
.data
may be a Python binary object or a local file name (in which case its binary file content is used). Use this format if you need to specifypath
.a Python sequence: This is a convenience format to specify any combination of the above.
If provided,
path
must be a string.If
content
is either binary data or a file name, this parameter is mandatory and must be the name under which the data can be found.Otherwise this parameter is optional. It can be used to simulate a folder name or a mount point, under which this sub-archive’s elements can be found. For example this specification
Archive((data, "name"), "path")
means thatdata
will be found using the element name"path/name"
. Similar is true for other sub-archives: to retrieve members of a ZIP sub-archive, their names must be prefixed with"path/"
. The main purpose of this parameter probably is to differentiate between duplicate names.
Note
If duplicate entry names exist in the archive, always the last entry with that name will be found / retrieved. During archive creation, or appending more data to an archive (see
Archive.add()
) no check for duplicates will be made. Use thepath
parameter to prevent this from happening.
- add(content[, path])#
Append a sub-archive. The meaning of the parameters are exactly the same as explained above. Of course, parametrer
content
is not optional here.
- has_entry(name)#
Checks whether an entry exists in any of the sub-archives.
- Parameters:
name (str) – The fully qualified name of the entry. So must include any
path
prefix under which the entry’s sub-archive has been added.- Returns:
True
orFalse
.
- read_entry(name)#
Retrieve the data of an entry.
- Parameters:
name (str) – The fully qualified name of the entry. So must include any
path
prefix under which the entry’s sub-archive has been added.- Returns:
The binary data (
bytes
) of the entry. If not found, an exception is raised.
- entry_list#
A list of the archive’s sub-archives. Each list item is a dictionary with the following keys:
entries
– a list of (top-level) entry names in this sub-archive.fmt
– the format of the sub-archive. This is one of the strings “dir” (file folder), “zip” (ZIP archive), “tar” (TAR archive), or “tree” for single binary entries or file content.path
– the value of thepath
parameter under which this sub-archive was added.
Example:
>>> from pprint import pprint >>> import fitz >>> dir1 = "fitz-32" # a folder name >>> dir2 = "fitz-64" # a folder name >>> img = ("nur-ruhig.jpg", "img") # an image file >>> members = (dir1, img, dir2) # we want to append these in one go >>> arch = fitz.Archive() >>> arch.add(members, path="mypath") >>> pprint(arch.entry_list) [{'entries': ['310', '37', '38', '39'], 'fmt': 'dir', 'path': 'mypath'}, {'entries': ['img'], 'fmt': 'tree', 'path': 'mypath'}, {'entries': ['310', '311', '37', '38', '39', 'pypy'], 'fmt': 'dir', 'path': 'mypath'}] >>>