Recipes: Stories

This document showcases some typical use cases for Stories.

As mentioned in the tutorial, stories may be created using up to three input sources: HTML, CSS and Archives – all of which are optional and which, respectively, can be provided programmatically.

The following examples will showcase combinations for using these inputs.

Note

Many of these recipe’s source code are included as examples in the docs folder.

How to add a line of text with some formatting

Here is the inevitable “Hello World” example. We will show two variants:

  1. Create using existing HTML source 1, that may come from anywhere.

  2. Create using the Python API.


Variant using an existing HTML source 1 – which in this case is defined as a constant in the script:

import fitz

HTML = """
<p style="font-family: sans-serif;color: blue">Hello World!</p>
"""

MEDIABOX = fitz.paper_rect("letter")  # output page format: Letter
WHERE = MEDIABOX + (36, 36, -36, -36)  # leave borders of 0.5 inches

story = fitz.Story(html=HTML)  # create story from HTML
writer = fitz.DocumentWriter("output.pdf")  # create the writer

more = 1  # will indicate end of input once it is set to 0

while more:  # loop outputting the story
    device = writer.begin_page(MEDIABOX)  # make new page
    more, _ = story.place(WHERE)  # layout into allowed rectangle
    story.draw(device)  # write on page
    writer.end_page()  # finish page

writer.close()  # close output file

Note

The above effect (sans-serif and blue text) could have been achieved by using a separate CSS source like so:

import fitz

CSS = """
body {
    font-family: sans-serif;
    color: blue;
}
"""

HTML = """
<p>Hello World!</p>
"""

# the story would then be created like this:
story = fitz.Story(html=HTML, user_css=CSS)

The Python API variant – everything is created programmatically:

import fitz

MEDIABOX = fitz.paper_rect("letter")
WHERE = MEDIABOX + (36, 36, -36, -36)

story = fitz.Story()  # create an empty story
body = story.body  # access the body of its DOM
with body.add_paragraph() as para:  # store desired content
    para.set_font("sans-serif").set_color("blue").add_text("Hello World!")

writer = fitz.DocumentWriter("output.pdf")

more = 1

while more:
    device = writer.begin_page(MEDIABOX)
    more, _ = story.place(WHERE)
    story.draw(device)
    writer.end_page()

writer.close()

Both variants will produce the same output PDF.


How to use images

Images can be referenced in the provided HTML source, or the reference to a desired image can also be stored via the Python API. In any case, this requires using an Archive, which refers to the place where the image can be found.

Note

Images with the binary content embedded in the HTML source are not supported by stories.

We extend our “Hello World” example from above and display an image of our planet right after the text. Assuming the image has the name “world.jpg” and is present in the script’s folder, then this is the modified version of the above Python API variant:

import fitz

MEDIABOX = fitz.paper_rect("letter")
WHERE = MEDIABOX + (36, 36, -36, -36)

# create story, let it look at script folder for resources
story = fitz.Story(archive=".")
body = story.body  # access the body of its DOM

with body.add_paragraph() as para:
    # store desired content
    para.set_font("sans-serif").set_color("blue").add_text("Hello World!")

# another paragraph for our image:
with body.add_paragraph() as para:
    # store image in another paragraph
    para.add_image("world.jpg")

writer = fitz.DocumentWriter("output.pdf")

more = 1

while more:
    device = writer.begin_page(MEDIABOX)
    more, _ = story.place(WHERE)
    story.draw(device)
    writer.end_page()

writer.close()

Reading external HTML and CSS for a Story

These cases are fairly straightforward.

As a general recommendation, HTML and CSS sources should be read as binary files and decoded before using them in a story. The Python pathlib.Path provides convenient ways to do this:

import pathlib
import fitz

htmlpath = pathlib.Path("myhtml.html")
csspath = pathlib.Path("mycss.css")

HTML = htmlpath.read_bytes().decode()
CSS = csspath.read_bytes().decode()

story = fitz.Story(html=HTML, user_css=CSS)

How to output database content with Story templates

This script demonstrates how to report SQL database content using an HTML template.

The example SQL database contains two tables:

  1. Table “films” contains one row per film with the fields “title”, “director” and (release) “year”.

  2. Table “actors” contains one row per actor and film title (fields (actor) “name” and (film) “title”).

The story DOM consists of a template for one film, which reports film data together with a list of casted actors.

Files:

  • docs/samples/filmfestival-sql.py

  • docs/samples/filmfestival-sql.db

See recipe

"""
This is a demo script for using PyMuPDF with its "Story" feature.

The following aspects are being covered here:

* The script produces a report of films that are stored in an SQL database
* The report format is provided as a HTML template

The SQL database contains two tables:
1. Table "films" which has the columns "title" (film title, str), "director"
   (str) and "year" (year of release, int).
2. Table "actors" which has the columns "name" (actor name, str) and "title"
   (the film title where the actor had been casted, str).

The script reads all content of the "films" table. For each film title it
reads all rows from table "actors" which took part in that film.

Comment 1
---------
To keep things easy and free from pesky technical detail, the relevant file
names inherit the name of this script:
- the database's filename is the script name with ".py" extension replaced
  by ".db".
- the output PDF similarly has script file name with extension ".pdf".

Comment 2
---------
The SQLITE database has been created using https://sqlitebrowser.org/, a free
multi-platform tool to maintain or manipulate SQLITE databases.
"""
import os
import sqlite3

import fitz

# ----------------------------------------------------------------------
# HTML template for the film report
# There are four placeholders coded as "id" attributes.
# One "id" allows locating the template part itself, the other three
# indicate where database text should be inserted.
# ----------------------------------------------------------------------
festival_template = (
    "<html><head><title>Just some arbitrary text</title></head>"
    '<body><h1 style="text-align:center">Hook Norton Film Festival</h1>'
    "<ol>"
    '<li id="filmtemplate">'
    '<b id="filmtitle"></b>'
    "<dl>"
    '<dt>Director<dd id="director">'
    '<dt>Release Year<dd id="filmyear">'
    '<dt>Cast<dd id="cast">'
    "</dl>"
    "</li>"
    "</ol>"
    "</body></html"
)

# -------------------------------------------------------------------
# define database access
# -------------------------------------------------------------------
dbfilename = __file__.replace(".py", ".db")  # the SQLITE database file name
assert os.path.isfile(dbfilename), f'{dbfilename}'
database = sqlite3.connect(dbfilename)  # open database
cursor_films = database.cursor()  # cursor for selecting the films
cursor_casts = database.cursor()  # cursor for selecting actors per film

# select statement for the films - let SQL also sort it for us
select_films = """SELECT title, director, year FROM films ORDER BY title"""

# select stament for actors, a skeleton: sub-select by film title
select_casts = """SELECT name FROM actors WHERE film = "%s" ORDER BY name"""

# -------------------------------------------------------------------
# define the HTML Story and fill it with database data
# -------------------------------------------------------------------
story = fitz.Story(festival_template)
body = story.body  # access the HTML body detail
template = body.find(None, "id", "filmtemplate")  # find the template part

# read the films from the database and put them all in one Python list
# NOTE: instead we might fetch rows one by one (advisable for large volumes)
cursor_films.execute(select_films)  # execute cursor, and ...
films = cursor_films.fetchall()  # read out what was found

for title, director, year in films:  # iterate through the films
    film = template.clone()  # clone template to report each film
    film.find(None, "id", "filmtitle").add_text(title)  # put title in templ
    film.find(None, "id", "director").add_text(director)  # put director
    film.find(None, "id", "filmyear").add_text(str(year))  # put year

    # the actors reside in their own table - find the ones for this film title
    cursor_casts.execute(select_casts % title)  # execute cursor
    casts = cursor_casts.fetchall()  # read actors for the film
    # each actor name appears in its own tuple, so extract it from there
    film.find(None, "id", "cast").add_text("\n".join([c[0] for c in casts]))
    body.append_child(film)

template.remove()  # remove the template

# -------------------------------------------------------------------
# generate the PDF
# -------------------------------------------------------------------
writer = fitz.DocumentWriter(__file__.replace(".py", ".pdf"), "compress")
mediabox = fitz.paper_rect("a4")  # use pages in ISO-A4 format
where = mediabox + (72, 36, -36, -72)  # leave page borders

more = 1  # end of output indicator

while more:
    dev = writer.begin_page(mediabox)  # make a new page
    more, filled = story.place(where)  # arrange content for this page
    story.draw(dev, None)  # write content to page
    writer.end_page()  # finish the page

writer.close()  # close the PDF


How to integrate with existing PDFs

Because a DocumentWriter can only write to a new file, stories cannot be placed on existing pages. This script demonstrates a circumvention of this restriction.

The basic idea is letting DocumentWriter output to a PDF in memory. Once the story has finished, we re-open this memory PDF and put its pages to desired locations on existing pages via method Page.show_pdf_page().

Files:

  • docs/samples/showpdf-page.py

See recipe

"""
Demo of Story class in PyMuPDF
-------------------------------

This script demonstrates how to the results of a fitz.Story output can be
placed in a rectangle of an existing (!) PDF page.

"""
import io
import os

import fitz


def make_pdf(fileptr, text, rect, font="sans-serif", archive=None):
    """Make a memory DocumentWriter from HTML text and a rect.

    Args:
        fileptr: a Python file object. For example an io.BytesIO().
        text: the text to output (HTML format)
        rect: the target rectangle. Will use its width / height as mediabox
        font: (str) font family name, default sans-serif
        archive: fitz.Archive parameter. To be used if e.g. images or special
                fonts should be used.
    Returns:
        The matrix to convert page rectangles of the created PDF back
        to rectangle coordinates in the parameter "rect".
        Normal use will expect to fit all the text in the given rect.
        However, if an overflow occurs, this function will output multiple
        pages, and the caller may decide to either accept or retry with
        changed parameters.
    """
    # use input rectangle as the page dimension
    mediabox = fitz.Rect(0, 0, rect.width, rect.height)
    # this matrix converts mediabox back to input rect
    matrix = mediabox.torect(rect)

    story = fitz.Story(text, archive=archive)
    body = story.body
    body.set_properties(font=font)
    writer = fitz.DocumentWriter(fileptr)
    while True:
        device = writer.begin_page(mediabox)
        more, _ = story.place(mediabox)
        story.draw(device)
        writer.end_page()
        if not more:
            break
    writer.close()
    return matrix


# -------------------------------------------------------------
# We want to put this in a given rectangle of an existing page
# -------------------------------------------------------------
HTML = """
<p>PyMuPDF is a great package! And it still improves significantly from one version to the next one!</p>
<p>It is a Python binding for <b>MuPDF</b>, a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit.<br> Both are maintained and developed by Artifex Software, Inc.</p>
<p>Via MuPDF it can access files in PDF, XPS, OpenXPS, CBZ, EPUB, MOBI and FB2 (e-books) formats,<br> and it is known for its top
<b><i>performance</i></b> and <b><i>rendering quality.</p>"""

# Make a PDF page for demo purposes
root = os.path.abspath( f"{__file__}/..")
doc = fitz.open(f"{root}/mupdf-title.pdf")
page = doc[0]

WHERE = fitz.Rect(50, 100, 250, 500)  # target rectangle on existing page

fileptr = io.BytesIO()  # let DocumentWriter use this as its file

# -------------------------------------------------------------------
# call DocumentWriter and Story to fill our rectangle
matrix = make_pdf(fileptr, HTML, WHERE)
# -------------------------------------------------------------------
src = fitz.open("pdf", fileptr)  # open DocumentWriter output PDF
if src.page_count > 1:  # target rect was too small
    raise ValueError("target WHERE too small")

# its page 0 contains our result
page.show_pdf_page(WHERE, src, 0)

doc.ez_save(f"{root}/mupdf-title-after.pdf")


How to make multi-columned layouts and access fonts from package pymupdf-fonts

This script outputs an article (taken from Wikipedia) that contains text and multiple images and uses a 2-column page layout.

In addition, two “Ubuntu” font families from package pymupdf-fonts are used instead of defaulting to Base-14 fonts.

Yet another feature used here is that all data – the images and the article HTML – are jointly stored in a ZIP file.

Files:

  • docs/samples/quickfox.py

  • docs/samples/quickfox.zip

See recipe

"""
This is a demo script using PyMuPDF's Story class to output text as a PDF with
a two-column page layout.

The script demonstrates the following features:
* How to fill columns or table cells of complex page layouts
* How to embed images
* How to modify existing, given HTML sources for output (text indent, font size)
* How to use fonts defined in package "pymupdf-fonts"
* How to use ZIP files as Archive

--------------
The example is taken from the somewhat modified Wikipedia article
https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog.
--------------
"""

import io
import os
import zipfile
import fitz


thisdir = os.path.dirname(os.path.abspath(__file__))
myzip = zipfile.ZipFile(os.path.join(thisdir, "quickfox.zip"))
arch = fitz.Archive(myzip)

if fitz.fitz_fontdescriptors:
    # we want to use the Ubuntu fonts for sans-serif and for monospace
    CSS = fitz.css_for_pymupdf_font("ubuntu", archive=arch, name="sans-serif")
    CSS = fitz.css_for_pymupdf_font("ubuntm", CSS=CSS, archive=arch, name="monospace")
else:
    # No pymupdf-fonts available.
    CSS=""

docname = __file__.replace(".py", ".pdf")  # output PDF file name

HTML = myzip.read("quickfox.html").decode()

# make the Story object
story = fitz.Story(HTML, user_css=CSS, archive=arch)

# --------------------------------------------------------------
# modify the DOM somewhat
# --------------------------------------------------------------
body = story.body  # access HTML body
body.set_properties(font="sans-serif")  # and give it our font globally

# modify certain nodes
para = body.find("p", None, None)  # find relevant nodes (here: paragraphs)
while para != None:
    para.set_properties(  # method MUST be used for existing nodes
        indent=15,
        fontsize=13,
    )
    para = para.find_next("p", None, None)

# choose PDF page size
MEDIABOX = fitz.paper_rect("letter")
# text appears only within this subrectangle
WHERE = MEDIABOX + (36, 36, -36, -36)

# --------------------------------------------------------------
# define page layout within the WHERE rectangle
# --------------------------------------------------------------
COLS = 2  # layout: 2 cols 1 row
ROWS = 1
TABLE = fitz.make_table(WHERE, cols=COLS, rows=ROWS)
# fill the cells of each page in this sequence:
CELLS = [TABLE[i][j] for i in range(ROWS) for j in range(COLS)]

fileobject = io.BytesIO()  # let DocumentWriter write to memory
writer = fitz.DocumentWriter(fileobject)  # define the writer

more = 1
while more:  # loop until all input text has been written out
    dev = writer.begin_page(MEDIABOX)  # prepare a new output page
    for cell in CELLS:
        # content may be complete after any cell, ...
        if more:  # so check this status first
            more, _ = story.place(cell)
            story.draw(dev)
    writer.end_page()  # finish the PDF page

writer.close()  # close DocumentWriter output

# for housekeeping work re-open from memory
doc = fitz.open("pdf", fileobject)
doc.ez_save(docname)


How make a layout which wraps around a predefined “no go area” layout

This is a demo script using PyMuPDF’s Story class to output text as a PDF with a two-column page layout.

The script demonstrates the following features:

  • Layout text around images of an existing (“target”) PDF.

  • Based on a few global parameters, areas on each page are identified, that can be used to receive text layouted by a Story.

  • These global parameters are not stored anywhere in the target PDF and must therefore be provided in some way:

    • The width of the border(s) on each page.

    • The fontsize to use for text. This value determines whether the provided text will fit in the empty spaces of the (fixed) pages of target PDF. It cannot be predicted in any way. The script ends with an exception if target PDF has not enough pages, and prints a warning message if not all pages receive at least some text. In both cases, the FONTSIZE value can be changed (a float value).

    • Use of a 2-column page layout for the text.

  • The layout creates a temporary (memory) PDF. Its produced page content (the text) is used to overlay the corresponding target page. If text requires more pages than are available in target PDF, an exception is raised. If not all target pages receive at least some text, a warning is printed.

  • The script reads “image-no-go.pdf” in its own folder. This is the “target” PDF. It contains 2 pages with each 2 images (from the original article), which are positioned at places that create a broad overall test coverage. Otherwise the pages are empty.

  • The script produces “quickfox-image-no-go.pdf” which contains the original pages and image positions, but with the original article text laid out around them.

Files:

  • docs/samples/quickfox-image-no-go.py

  • docs/samples/quickfox-image-no-go.pdf

  • docs/samples/quickfox.zip

See recipe

"""
This is a demo script using PyMuPDF's Story class to output text as a PDF with
a two-column page layout.

The script demonstrates the following features:
* Layout text around images of an existing ("target") PDF.
* Based on a few global parameters, areas on each page are identified, that
  can be used to receive text layouted by a Story.
* These global parameters are not stored anywhere in the target PDF and
  must therefore be provided in some way.
  - The width of the border(s) on each page.
  - The fontsize to use for text. This value determines whether the provided
    text will fit in the empty spaces of the (fixed) pages of target PDF. It
    cannot be predicted in any way. The script ends with an exception if
    target PDF has not enough pages, and prints a warning message if not all
    pages receive at least some text. In both cases, the FONTSIZE value
    can be changed (a float value).
  - Use of a 2-column page layout for the text.
* The layout creates a temporary (memory) PDF. Its produced page content
  (the text) is used to overlay the corresponding target page. If text
  requires more pages than are available in target PDF, an exception is raised.
  If not all target pages receive at least some text, a warning is printed.
* The script reads "image-no-go.pdf" in its own folder. This is the "target" PDF.
  It contains 2 pages with each 2 images (from the original article), which are
  positioned at places that create a broad overall test coverage. Otherwise the
  pages are empty.
* The script produces "quickfox-image-no-go.pdf" which contains the original pages
  and image positions, but with the original article text laid out around them.

Note:
--------------
This script version uses just image positions to derive "No-Go areas" for
layouting the text. Other PDF objects types are detectable by PyMuPDF and may
be taken instead or in addition, without influencing the layouting.
The following are candidates for other such "No-Go areas". Each can be detected
and located by PyMuPDF:
* Annotations
* Drawings
* Existing text

--------------
The text and images are taken from the somewhat modified Wikipedia article
https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog.
--------------
"""

import io
import os
import zipfile
import fitz


thisdir = os.path.dirname(os.path.abspath(__file__))
myzip = zipfile.ZipFile(os.path.join(thisdir, "quickfox.zip"))

docname = os.path.join(thisdir, "image-no-go.pdf")  # "no go" input PDF file name
outname = os.path.join(thisdir, "quickfox-image-no-go.pdf")  # output PDF file name
BORDER = 36  # global parameter
FONTSIZE = 12.5  # global parameter
COLS = 2  # number of text columns, global parameter


def analyze_page(page):
    """Compute MediaBox and rectangles on page that are free to receive text.

    Notes:
        Assume a BORDER around the page, make 2 columns of the resulting
        sub-rectangle and extract the rectangles of all images on page.
        For demo purposes, the image rectangles are taken as "NO-GO areas"
        on the page when writing text with the Story.
        The function returns free areas for each of the columns.

    Returns:
        (page.number, mediabox, CELLS), where CELLS is a list of free cells.
    """
    prect = page.rect  # page rectangle - will be our MEDIABOX later
    where = prect + (BORDER, BORDER, -BORDER, -BORDER)
    TABLE = fitz.make_table(where, rows=1, cols=COLS)

    # extract rectangles covered by images on this page
    IMG_RECTS = sorted(  # image rects on page (sort top-left to bottom-right)
        [fitz.Rect(item["bbox"]) for item in page.get_image_info()],
        key=lambda b: (b.y1, b.x0),
    )

    def free_cells(column):
        """Return free areas in this colum."""
        free_stripes = []  # y-value pairs wrapping a free area stripe
        # intersecting images: block complete intersecting column stripe
        col_imgs = [(b.y0, b.y1) for b in IMG_RECTS if abs(b & column) > 0]
        s_y0 = column.y0  # top y-value of column
        for y0, y1 in col_imgs:  # an image stripe
            if y0 > s_y0 + FONTSIZE:  # image starts below last free btm value
                free_stripes.append((s_y0, y0))  # store as free stripe
            s_y0 = y1  # start of next free stripe

        if s_y0 + FONTSIZE < column.y1:  # enough room to column bottom
            free_stripes.append((s_y0, column.y1))

        if free_stripes == []:  # covers "no image in this column"
            free_stripes.append((column.y0, column.y1))

        # make available cells of this column
        CELLS = [fitz.Rect(column.x0, y0, column.x1, y1) for (y0, y1) in free_stripes]
        return CELLS

    # collection of available Story rectangles on page
    CELLS = []
    for i in range(COLS):
        CELLS.extend(free_cells(TABLE[0][i]))

    return page.number, prect, CELLS


HTML = myzip.read("quickfox.html").decode()

# --------------------------------------------------------------
# Make the Story object
# --------------------------------------------------------------
story = fitz.Story(HTML)

# modify the DOM somewhat
body = story.body  # access HTML body
body.set_properties(font="sans-serif")  # and give it our font globally

# modify certain nodes
para = body.find("p", None, None)  # find relevant nodes (here: paragraphs)
while para != None:
    para.set_properties(  # method MUST be used for existing nodes
        indent=15,
        fontsize=FONTSIZE,
    )
    para = para.find_next("p", None, None)

# we remove all image references, because the target PDF already has them
img = body.find("img", None, None)
while img != None:
    next_img = img.find_next("img", None, None)
    img.remove()
    img = next_img

page_info = {}  # contains MEDIABOX and free CELLS per page
doc = fitz.open(docname)
for page in doc:
    pno, mediabox, cells = analyze_page(page)
    page_info[pno] = (mediabox, cells)
doc.close()  # close target PDF for now - re-open later

fileobject = io.BytesIO()  # let DocumentWriter write to memory
writer = fitz.DocumentWriter(fileobject)  # define output writer

more = 1  # stop if this ever becomes zero
pno = 0  # count output pages
while more:  # loop until all HTML text has been written
    try:
        MEDIABOX, CELLS = page_info[pno]
    except KeyError:  # too much text space required: reduce fontsize?
        raise ValueError("text does not fit on target PDF")
    dev = writer.begin_page(MEDIABOX)  # prepare a new output page
    for cell in CELLS:  # iterate over free cells on this page
        if not more:  # need to check this for every cell
            continue
        more, _ = story.place(cell)
        story.draw(dev)
    writer.end_page()  # finish the PDF page
    pno += 1

writer.close()  # close DocumentWriter output

# Re-open writer output, read its pages and overlay target pages with them.
# The generated pages have same dimension as their targets.
src = fitz.open("pdf", fileobject)
doc = fitz.open(doc.name)
for page in doc:  # overlay every target page with the prepared text
    if page.number >= src.page_count:
        print(f"Text only uses {src.page_count} target pages!")
        continue  # story did not need all target pages?

    # overlay target page
    page.show_pdf_page(page.rect, src, page.number)

    # DEBUG start --- draw the text rectangles
    # mb, cells = page_info[page.number]
    # for cell in cells:
    #     page.draw_rect(cell, color=(1, 0, 0))
    # DEBUG stop ---

doc.ez_save(outname)


How to output a table

Support for HTML tables is yet not complete in MuPDF. It is however possible to output tables with equal column widths that do not cross page boundaries.

This script reflects existing features.

Files:

  • docs/samples/table01.py

See recipe

import fitz

table_text = (
    (
        "Length",
        "integer",
        """(Required) The number of bytes from the beginning of the line following the keyword stream to the last byte just before the keyword endstream. (There may be an additional EOL marker, preceding endstream, that is not included in the count and is not logically part of the stream data.) See “Stream Extent,” above, for further discussion.""",
    ),
    (
        "Filter",
        "name or array",
        """(Optional) The name of a filter to be applied in processing the stream data found between the keywords stream and endstream, or an array of such names. Multiple filters should be specified in the order in which they are to be applied.""",
    ),
    (
        "FFilter",
        "name or array",
        """(Optional; PDF 1.2) The name of a filter to be applied in processing the data found in the stream's external file, or an array of such names. The same rules apply as for Filter.""",
    ),
)

HTML = """
<html>
<body><h2>TABLE 3.4 Entries common to all stream dictionaries</h2>
<table style="width: 100%">
    <tr>
        <th class="w25">KEY
        <th class="w25">TYPE
        <th class="w50">VALUE
    </tr>
    <tr id="rowtemplate">
        <td id="col0" class="w25"></td>
        <td id="col1" class="w25"></td>
        <td id="col2" class="w50"></td>
    </tr>
"""
CSS = """
body {font-family: sans-serif;}
th {text-align: left;}
td {font-size: 8px;}
.w25 {width: 50px;}
.w50 {width: 300px;}
"""

story = fitz.Story(HTML, user_css=CSS)
body = story.body
template = body.find(None, "id", "rowtemplate")
parent = template.parent

for col0, col1, col2 in table_text:
    row = template.clone()
    row.find(None, "id", "col0").add_text("\n" + col0)
    row.find(None, "id", "col1").add_text("\n" + col1)
    row.find(None, "id", "col2").add_text("\n" + col2)
    parent.append_child(row)
template.remove()

writer = fitz.DocumentWriter(__file__.replace(".py", ".pdf"), "compress")
mediabox = fitz.paper_rect("letter")
where = mediabox + (36, 36, -36, -36)

more = 1
while more:
    dev = writer.begin_page(mediabox)
    more, filled = story.place(where)
    story.draw(dev, None)
    writer.end_page()
writer.close()


How to create a simple grid layout

By creating a sequence of Story objects within a grid created via the make_table function a developer can create grid layouts as required.

Files:

  • docs/samples/simple-grid.py

See recipe

import fitz

MEDIABOX = fitz.paper_rect("letter")  # output page format: Letter
GRIDSPACE = fitz.Rect(100, 100, 400, 400)
GRID = fitz.make_table(GRIDSPACE, rows=2, cols=2)
CELLS = [GRID[i][j] for i in range(2) for j in range(2)]
text_table = ("A", "B", "C", "D")
writer = fitz.DocumentWriter(__file__.replace(".py", ".pdf"))  # create the writer

device = writer.begin_page(MEDIABOX)  # make new page
for i, text in enumerate(text_table):
    story = fitz.Story(em=1)
    body = story.body
    with body.add_paragraph() as para:
        para.set_bgcolor("#ecc")
        para.set_pagebreak_after()  # fills whole cell with bgcolor
        para.set_align("center")
        para.set_fontsize(16)
        para.add_text(f"\n\n\n{text}")
    story.place(CELLS[i])
    story.draw(device)
    del story

writer.end_page()  # finish page

writer.close()  # close output file


How to generate a Table of Contents

This script lists the source code of all Python scripts that live in the script’s directory.

Files:

  • docs/samples/code-printer.py

See recipe

"""
Demo script PyMuPDF Story class
-------------------------------

Read the Python sources in the script directory and create a PDF of all their
source codes.

The following features are included as a specialty:
1. HTML source for fitz.Story created via Python API exclusively
2. Separate Story objects for page headers and footers
3. Use of HTML "id" elements for identifying source start pages
4. Generate a Table of Contents pointing to source file starts. This
   - uses the new Stoy callback feature
   - uses Story also for making the TOC page(s)

"""
import io
import os
import time

import fitz

THISDIR = os.path.dirname(os.path.abspath(__file__))
TOC = []  # this will contain the TOC list items
CURRENT_ID = ""  # currently processed filename - stored by recorder func
MEDIABOX = fitz.paper_rect("a4-l")  # chosen page size
WHERE = MEDIABOX + (36, 50, -36, -36)  # sub rectangle for source content
# location of the header rectangle
HDR_WHERE = (36, 5, MEDIABOX.width - 36, 40)
# location of the footer rectangle
FTR_WHERE = (36, MEDIABOX.height - 36, MEDIABOX.width - 36, MEDIABOX.height)


def recorder(elpos):
    """Callback function invoked during story.place().
    This function generates / collects all TOC items and updates the value of
    CURRENT_ID - which is used to update the footer line of each page.
    """
    global TOC, CURRENT_ID
    if not elpos.open_close & 1:  # only consider "open" items
        return
    level = elpos.heading
    y0 = elpos.rect[1]  # top of written rectangle (use for TOC)
    if level > 0:  # this is a header (h1 - h6)
        pno = elpos.page + 1  # the page number
        TOC.append(
            (
                level,
                elpos.text,
                elpos.page + 1,
                y0,
            )
        )
        return

    CURRENT_ID = elpos.id if elpos.id else ""  # update for footer line
    return


def header_story(text):
    """Make the page header"""
    header = fitz.Story()
    hdr_body = header.body
    hdr_body.add_paragraph().set_properties(
        align=fitz.fitz.TEXT_ALIGN_CENTER,
        bgcolor="#eee",
        font="sans-serif",
        bold=True,
        fontsize=12,
        color="green",
    ).add_text(text)
    return header


def footer_story(text):
    """Make the page footer"""
    footer = fitz.Story()
    ftr_body = footer.body
    ftr_body.add_paragraph().set_properties(
        bgcolor="#eee",
        align=fitz.TEXT_ALIGN_CENTER,
        color="blue",
        fontsize=10,
        font="sans-serif",
    ).add_text(text)
    return footer


def code_printer(outfile):
    """Output the generated PDF to outfile."""
    global MAX_TITLE_LEN
    where = +WHERE
    writer = fitz.DocumentWriter(outfile, "")
    print_time = time.strftime("%Y-%m-%d %H:%M:%S (%z)")
    thispath = os.path.abspath(os.curdir)
    basename = os.path.basename(thispath)

    story = fitz.Story()
    body = story.body
    body.set_properties(font="sans-serif")

    text = f"Python sources in folder '{THISDIR}'"

    body.add_header(1).add_text(text)  # the only h1 item in the story

    files = os.listdir(THISDIR)  # list / select Python files in our directory
    i = 1
    for code_file in files:
        if not code_file.endswith(".py"):
            continue

        # read Python file source
        fileinput = open(os.path.join(THISDIR, code_file), "rb")
        text = fileinput.read().decode()
        fileinput.close()

        # make level 2 header
        hdr = body.add_header(2)
        if i > 1:
            hdr.set_pagebreak_before()
        hdr.add_text(f"{i}. Listing of file '{code_file}'")

        # Write the file code
        body.add_codeblock().set_bgcolor((240, 255, 210)).set_color("blue").set_id(
            code_file
        ).set_fontsize(10).add_text(text)

        # Indicate end of a source file
        body.add_paragraph().set_align(fitz.TEXT_ALIGN_CENTER).add_text(
            f"---------- End of File '{code_file}' ----------"
        )
        i += 1  # update file counter

    i = 0
    while True:
        i += 1
        device = writer.begin_page(MEDIABOX)
        # create Story objects for header, footer and the rest.
        header = header_story(f"Python Files in '{THISDIR}'")
        hdr_ok, _ = header.place(HDR_WHERE)
        if hdr_ok != 0:
            raise ValueError("header does not fit")
        header.draw(device, None)

        # --------------------------------------------------------------
        # Write the file content.
        # --------------------------------------------------------------
        more, filled = story.place(where)
        # Inform the callback function
        # Args:
        #   recorder: the Python function to call
        #   {}: dictionary containing anything - we pass the page number
        story.element_positions(recorder, {"page": i - 1})
        story.draw(device, None)

        # --------------------------------------------------------------
        # Make / write page footer.
        # We MUST have a paragraph b/o background color / alignment
        # --------------------------------------------------------------
        if CURRENT_ID:
            text = f"File '{CURRENT_ID}' printed at {print_time}{chr(160)*5}{'-'*10}{chr(160)*5}Page {i}"
        else:
            text = f"Printed at {print_time}{chr(160)*5}{'-'*10}{chr(160)*5}Page {i}"
        footer = footer_story(text)
        # write the page footer
        ftr_ok, _ = footer.place(FTR_WHERE)
        if ftr_ok != 0:
            raise ValueError("footer does not fit")
        footer.draw(device, None)

        writer.end_page()
        if more == 0:
            break
    writer.close()


if __name__ == "__main__" or os.environ.get('PYTEST_CURRENT_TEST'):
    fileptr1 = io.BytesIO()
    t0 = time.perf_counter()
    code_printer(fileptr1)  # make the PDF
    t1 = time.perf_counter()
    doc = fitz.open("pdf", fileptr1)
    old_count = doc.page_count
    # -----------------------------------------------------------------------------
    # Post-processing step to make / insert the toc
    # This also works using fitz.Story:
    # - make a new PDF in memory which contains pages with the TOC text
    # - add these TOC pages to the end of the original file
    # - search item text on the inserted pages and cover each with a PDF link
    # - move the TOC pages to the front of the document
    # -----------------------------------------------------------------------------
    story = fitz.Story()
    body = story.body
    body.add_header(1).set_font("sans-serif").add_text("Table of Contents")
    # prefix TOC with an entry pointing to this page
    TOC.insert(0, [1, "Table of Contents", old_count + 1, 36])

    for item in TOC[1:]:  # write the file name headers as TOC lines
        body.add_paragraph().set_font("sans-serif").add_text(
            item[1] + f" - ({item[2]})"
        )
    fileptr2 = io.BytesIO()  # put TOC pages to a separate PDF initially
    writer = fitz.DocumentWriter(fileptr2)
    i = 1
    more = 1
    while more:
        device = writer.begin_page(MEDIABOX)
        header = header_story(f"Python Files in '{THISDIR}'")
        # write the page header
        hdr_ok, _ = header.place(HDR_WHERE)
        header.draw(device, None)

        more, filled = story.place(WHERE)
        story.draw(device, None)

        footer = footer_story(f"TOC-{i}")  # separate page numbering scheme
        # write the page footer
        ftr_ok, _ = footer.place(FTR_WHERE)
        footer.draw(device, None)
        writer.end_page()
        i += 1

    writer.close()
    doc2 = fitz.open("pdf", fileptr2)  # open TOC pages as another PDF
    doc.insert_pdf(doc2)  # and append to the main PDF
    new_range = range(old_count, doc.page_count)  # the TOC page numbers
    pages = [doc[i] for i in new_range]  # these are the TOC pages within main PDF
    for item in TOC:  # search for TOC item text to get its rectangle
        for page in pages:
            rl = page.search_for(item[1], flags=~fitz.TEXT_PRESERVE_LIGATURES)
            if rl != []:  # this text must be on next page
                break
        rect = rl[0]  # rectangle of TOC item text
        link = {  # make a link from it
            "kind": fitz.LINK_GOTO,
            "from": rect,
            "to": fitz.Point(0, item[3]),
            "page": item[2] - 1,
        }
        page.insert_link(link)

    # insert the TOC in the main PDF
    doc.set_toc(TOC)
    # move all the TOC pages to the desired place (1st page here)
    for i in new_range:
        doc.move_page(doc.page_count - 1, 0)
    doc.ez_save(__file__.replace(".py", ".pdf"))

It features the following capabilities:

  • Automatic generation of a Table of Contents (TOC) on separately numbered pages at the start of the document - using a specialized Story.

  • Use of 3 separate Story objects per page: header story, footer story and the story for printing the Python sources.

    • The page footer is automatically changed to show the name of the current Python file.

  • Use of Story.element_positions() to collect the data for the TOC and for the dynamic adjustment of page footers. This is an example of a bidirectional communication between the story output process and the script.

  • The main PDF with the Python sources is being written to memory by its DocumentWriter. Another Story / DocumentWriter pair is then used to create a (memory) PDF for the TOC pages. Finally, both these PDFs are joined and the result stored to disk.


How to display a list from JSON data

This example takes some JSON data input which it uses to populate a Story. It also contains some visual text formatting and shows how to add links.

Files:

  • docs/samples/json-example.py

See recipe

import fitz
import json

my_json =  """
[
    {
         "name" :           "Five-storied Pagoda",
         "temple" :         "Rurikō-ji",
         "founded" :        "middle Muromachi period, 1442",
         "region" :         "Yamaguchi, Yamaguchi",
         "position" :       "34.190181,131.472917"
     },
     {
         "name" :           "Founder's Hall",
         "temple" :         "Eihō-ji",
         "founded" :        "early Muromachi period",
         "region" :         "Tajimi, Gifu",
         "position" :       "35.346144,137.129189"
     },
     {
         "name" :           "Fudōdō",
         "temple" :         "Kongōbu-ji",
         "founded" :        "early Kamakura period",
         "region" :         "Kōya, Wakayama",
         "position" :       "34.213103,135.580397"
     },
     {
         "name" :           "Goeidō",
         "temple" :         "Nishi Honganji",
         "founded" :        "Edo period, 1636",
         "region" :         "Kyoto",
         "position" :       "34.991394,135.751689"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Murō-ji",
         "founded" :        "early Heian period",
         "region" :         "Uda, Nara",
         "position" :       "34.536586819357986,136.0395548452301"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Fudō-in",
         "founded" :        "late Muromachi period, 1540",
         "region" :         "Hiroshima",
         "position" :       "34.427014,132.471117"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Ninna-ji",
         "founded" :        "Momoyama period, 1613",
         "region" :         "Kyoto",
         "position" :       "35.031078,135.713811"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Mii-dera",
         "founded" :        "Momoyama period, 1599",
         "region" :         "Ōtsu, Shiga",
         "position" :       "35.013403,135.852861"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Tōshōdai-ji",
         "founded" :        "Nara period, 8th century",
         "region" :         "Nara, Nara",
         "position" :       "34.675619,135.784842"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Tō-ji",
         "founded" :        "Momoyama period, 1603",
         "region" :         "Kyoto",
         "position" :       "34.980367,135.747686"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Tōdai-ji",
         "founded" :        "middle Edo period, 1705",
         "region" :         "Nara, Nara",
         "position" :       "34.688992,135.839822"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Hōryū-ji",
         "founded" :        "Asuka period, by 693",
         "region" :         "Ikaruga, Nara",
         "position" :       "34.614317,135.734458"
     },
     {
         "name" :           "Golden Hall",
         "temple" :         "Daigo-ji",
         "founded" :        "late Heian period",
         "region" :         "Kyoto",
         "position" :       "34.951481,135.821747"
     },
     {
         "name" :           "Keigū-in Main Hall",
         "temple" :         "Kōryū-ji",
         "founded" :        "early Kamakura period, before 1251",
         "region" :         "Kyoto",
         "position" :       "35.015028,135.705425"
     },
     {
         "name" :           "Konpon-chūdō",
         "temple" :         "Enryaku-ji",
         "founded" :        "early Edo period, 1640",
         "region" :         "Ōtsu, Shiga",
         "position" :       "35.070456,135.840942"
     },
     {
         "name" :           "Korō",
         "temple" :         "Tōshōdai-ji",
         "founded" :        "early Kamakura period, 1240",
         "region" :         "Nara, Nara",
         "position" :       "34.675847,135.785069"
     },
     {
         "name" :           "Kōfūzō",
         "temple" :         "Hōryū-ji",
         "founded" :        "early Heian period",
         "region" :         "Ikaruga, Nara",
         "position" :       "34.614439,135.735428"
     },
     {
         "name" :           "Large Lecture Hall",
         "temple" :         "Hōryū-ji",
         "founded" :        "middle Heian period, 990",
         "region" :         "Ikaruga, Nara",
         "position" :       "34.614783,135.734175"
     },
     {
         "name" :           "Lecture Hall",
         "temple" :         "Zuiryū-ji",
         "founded" :        "early Edo period, 1655",
         "region" :         "Takaoka, Toyama",
         "position" :       "36.735689,137.010019"
     },
     {
         "name" :           "Lecture Hall",
         "temple" :         "Tōshōdai-ji",
         "founded" :        "Nara period, 763",
         "region" :         "Nara, Nara",
         "position" :       "34.675933,135.784842"
     },
     {
         "name" :           "Lotus Flower Gate",
         "temple" :         "Tō-ji",
         "founded" :        "early Kamakura period",
         "region" :         "Kyoto",
         "position" :       "34.980678,135.746314"
     },
     {
         "name" :           "Main Hall",
         "temple" :         "Akishinodera",
         "founded" :        "early Kamakura period",
         "region" :         "Nara, Nara",
         "position" :       "34.703769,135.776189"
     }
]

"""

# the result is a Python dictionary:
my_dict = json.loads(my_json)

MEDIABOX = fitz.paper_rect("letter")  # output page format: Letter
WHERE = MEDIABOX + (36, 36, -36, -36)
writer = fitz.DocumentWriter("json-example.pdf")  # create the writer

story = fitz.Story()
body = story.body

for i, entry in enumerate(my_dict):

    for attribute, value in entry.items():
        para = body.add_paragraph()

        if attribute == "position":
            para.set_fontsize(10)
            para.add_link(f"www.google.com/maps/@{value},14z")
        else:
            para.add_span()
            para.set_color("#990000")
            para.set_fontsize(14)
            para.set_bold()
            para.add_text(f"{attribute} ")
            para.add_span()
            para.set_fontsize(18)
            para.add_text(f"{value}")

    body.add_horizontal_line()

# This while condition will check a value from the Story `place` method
# for whether all content for the story has been written (0), otherwise
# more content is waiting to be written (1)
more = 1
while more:
    device = writer.begin_page(MEDIABOX)  # make new page
    more, _ = story.place(WHERE)
    story.draw(device)
    writer.end_page()  # finish page

writer.close()  # close output file

del story


Using the alternative Story.write*() functions

The Story.write*() functions provide a different way to use the Story functionality, removing the need for calling code to implement a loop that calls Story.place() and Story.draw() etc, at the expense of having to provide at least a rectfn() callback.

How to do basic layout with Story.write()

This script lays out multiple copies of its own source code, into four rectangles per page.

Files:

  • docs/samples/story-write.py

See recipe

"""
Demo script for PyMuPDF's `Story.write()` method.

This is a way of laying out a story into a PDF document, that avoids the need
to write a loop that calls `story.place()` and `story.draw()`.

Instead just a single function call is required, albeit with a `rectfn()`
callback that returns the rectangles into which the story is placed.
"""

import html

import fitz


# Create html containing multiple copies of our own source code.
#
with open(__file__) as f:
    text = f.read()
text = html.escape(text)
html = f'''
<!DOCTYPE html>
<body>

<h1>Contents of {__file__}</h1>

<h2>Normal</h2>
<pre>
{text}
</pre>

<h2>Strong</h2>
<strong>
<pre>
{text}
</pre>
</strong>

<h2>Em</h2>
<em>
<pre>
{text}
</pre>
</em>

</body>
'''


def rectfn(rect_num, filled):
    '''
    We return four rectangles per page in this order:
    
        1 3
        2 4
    '''
    page_w = 800
    page_h = 600
    margin = 50
    rect_w = (page_w - 3*margin) / 2
    rect_h = (page_h - 3*margin) / 2
    
    if rect_num % 4 == 0:
        # New page.
        mediabox = fitz.Rect(0, 0, page_w, page_h)
    else:
        mediabox = None
    # Return one of four rects in turn.
    rect_x = margin + (rect_w+margin) * ((rect_num // 2) % 2)
    rect_y = margin + (rect_h+margin) * (rect_num % 2)
    rect = fitz.Rect(rect_x, rect_y, rect_x + rect_w, rect_y + rect_h)
    #print(f'rectfn(): rect_num={rect_num} filled={filled}. Returning: rect={rect}')
    return mediabox, rect, None

story = fitz.Story(html, em=8)

out_path = __file__.replace('.py', '.pdf')
writer = fitz.DocumentWriter(out_path)

story.write(writer, rectfn)
writer.close()


How to do iterative layout for a table of contents with Story.write_stabilized()

This script creates html content dynamically, adding a contents section based on ElementPosition items that have non-zero .heading values.

The contents section is at the start of the document, so modifications to the contents can change page numbers in the rest of the document, which in turn can cause page numbers in the contents section to be incorrect.

So the script uses Story.write_stabilized() to repeatedly lay things out until things are stable.

Files:

  • docs/samples/story-write-stabilized.py

See recipe

"""
Demo script for PyMuPDF's `fitz.Story.write_stabilized()`.

`fitz.Story.write_stabilized()` is similar to `fitz.Story.write()`,
except instead of taking a fixed html document, it does iterative layout
of dynamically-generated html content (provided by a callback) to a
`fitz.DocumentWriter`.

For example this allows one to add a dynamically-generated table of contents
section while ensuring that page numbers are patched up until stable.
"""

import textwrap

import fitz


def rectfn(rect_num, filled):
    '''
    We return one rect per page.
    '''
    rect = fitz.Rect(10, 20, 290, 380)
    mediabox = fitz.Rect(0, 0, 300, 400)
    #print(f'rectfn(): rect_num={rect_num} filled={filled}')
    return mediabox, rect, None


def contentfn(positions):
    '''
    Returns html content, with a table of contents derived from `positions`.
    '''
    ret = ''
    ret += textwrap.dedent('''
            <!DOCTYPE html>
            <body>
            <h2>Contents</h2>
            <ul>
            ''')
    
    # Create table of contents with links to all <h1..6> sections in the
    # document.
    for position in positions:
        if position.heading and (position.open_close & 1):
            text = position.text if position.text else ''
            if position.id:
                ret += f"    <li><a href=\"#{position.id}\">{text}</a>\n"
            else:
                ret += f"    <li>{text}\n"
            ret += f"        <ul>\n"
            ret += f"        <li>page={position.page_num}\n"
            ret += f"        <li>depth={position.depth}\n"
            ret += f"        <li>heading={position.heading}\n"
            ret += f"        <li>id={position.id!r}\n"
            ret += f"        <li>href={position.href!r}\n"
            ret += f"        <li>rect={position.rect}\n"
            ret += f"        <li>text={text!r}\n"
            ret += f"        <li>open_close={position.open_close}\n"
            ret += f"        </ul>\n"
    
    ret += '</ul>\n'
    
    # Main content.
    ret += textwrap.dedent(f'''
    
            <h1>First section</h1>
            <p>Contents of first section.
            
            <h1>Second section</h1>
            <p>Contents of second section.
            <h2>Second section first subsection</h2>
            
            <p>Contents of second section first subsection.
            
            <h1>Third section</h1>
            <p>Contents of third section.
            
            </body>
            ''')
    ret = ret.strip()
    with open(__file__.replace('.py', '.html'), 'w') as f:
        f.write(ret)
    return ret;


out_path = __file__.replace('.py', '.pdf')
writer = fitz.DocumentWriter(out_path)
fitz.Story.write_stabilized(writer, contentfn, rectfn)
writer.close()


How to do iterative layout and create PDF links with Story.write_stabilized_links()

This script is similar to the one described in “How to use Story.write_stabilized()” above, except that the generated PDF also contains links that correspond to the internal links in the original html.

This is done by using Story.write_stabilized_links(); this is slightly different from Story.write_stabilized():

[The reasons for this are a little involved; for example a DocumentWriter is not necessarily a PDF writer, so doesn’t really work in a PDF-specific API.]

Files:

  • docs/samples/story-write-stabilized-links.py

See recipe

"""
Demo script for PyMuPDF's `fitz.Story.write_stabilized_with_links()`.

`fitz.Story.write_stabilized_links()` is similar to
`fitz.Story.write_stabilized()` except that it creates a PDF `fitz.Document`
that contains PDF links generated from all internal links in the original html.
"""

import textwrap

import fitz


def rectfn(rect_num, filled):
    '''
    We return one rect per page.
    '''
    rect = fitz.Rect(10, 20, 290, 380)
    mediabox = fitz.Rect(0, 0, 300, 400)
    #print(f'rectfn(): rect_num={rect_num} filled={filled}')
    return mediabox, rect, None


def contentfn(positions):
    '''
    Returns html content, with a table of contents derived from `positions`.
    '''
    ret = ''
    ret += textwrap.dedent('''
            <!DOCTYPE html>
            <body>
            <h2>Contents</h2>
            <ul>
            ''')
    
    # Create table of contents with links to all <h1..6> sections in the
    # document.
    for position in positions:
        if position.heading and (position.open_close & 1):
            text = position.text if position.text else ''
            if position.id:
                ret += f"    <li><a href=\"#{position.id}\">{text}</a>\n"
            else:
                ret += f"    <li>{text}\n"
            ret += f"        <ul>\n"
            ret += f"        <li>page={position.page_num}\n"
            ret += f"        <li>depth={position.depth}\n"
            ret += f"        <li>heading={position.heading}\n"
            ret += f"        <li>id={position.id!r}\n"
            ret += f"        <li>href={position.href!r}\n"
            ret += f"        <li>rect={position.rect}\n"
            ret += f"        <li>text={text!r}\n"
            ret += f"        <li>open_close={position.open_close}\n"
            ret += f"        </ul>\n"
    
    ret += '</ul>\n'
    
    # Main content.
    ret += textwrap.dedent(f'''
    
            <h1>First section</h1>
            <p>Contents of first section.
            <ul>
            <li><a href="#idtest">Link to IDTEST</a>.
            <li><a href="#nametest">Link to NAMETEST</a>.
            </ul>
            
            <h1>Second section</h1>
            <p>Contents of second section.
            <h2>Second section first subsection</h2>
            
            <p>Contents of second section first subsection.
            <p id="idtest">IDTEST
            
            <h1>Third section</h1>
            <p>Contents of third section.
            <p><a name="nametest">NAMETEST</a>.
            
            </body>
            ''')
    ret = ret.strip()
    with open(__file__.replace('.py', '.html'), 'w') as f:
        f.write(ret)
    return ret;


out_path = __file__.replace('.py', '.pdf')
document = fitz.Story.write_stabilized_with_links(contentfn, rectfn)
document.save(out_path)


Footnotes

1(1,2)

HTML & CSS support

Note

At the time of writing the HTML engine for Stories is fairly basic and supports a subset of CSS2 attributes.

Some important CSS support to consider:

  • The only available layout is relative layout.

  • background is unavalable, use background-color instead.

  • float is unavailable.

Discord logo