Story

  • New in v1.21.0

Method / Attribute

Short Description

Story.reset()

“rewind” story output to its beginning

Story.place()

compute story content to fit in provided rectangle

Story.draw()

write the computed content to current page

Story.element_positions()

callback function logging currently processed story content

Story.body

the story’s underlying body

Story.write()

places and draws Story to a DocumentWriter

Story.write_stabilized()

iterative layout of html content to a DocumentWriter

Story.write_with_links()

like write() but also creates PDF links

Story.write_stabilized_with_links()

like write_stabilized() but also creates PDF links

Class API

class Story
__init__(self, html=None, user_css=None, em=12, archive=None)

Create a story, optionally providing HTML and CSS source. The HTML is parsed, and held within the Story as a DOM (Document Object Model).

This structure may be modified: content (text, images) may be added, copied, modified or removed by using methods of the Xml class.

When finished, the story can be written to any device; in typical usage the device may be provided by a DocumentWriter to make new pages.

Here are some general remarks:

  • The Story constructor parses and validates the provided HTML to create the DOM.

  • PyMuPDF provides a number of ways to manipulate the HTML source by providing access to the nodes of the underlying DOM. Documents can be completely built from ground up programmatically, or the existing DOM can be modified pretty arbitrarily. For details of this interface, please see the Xml class.

  • If no (or no more) changes to the DOM are required, the story is ready to be laid out and to be fed to a series of devices (typically devices provided by a DocumentWriter to produce new pages).

  • The next step is to place the story and write it out. This can either be done directly, by looping around calling place() and draw(), or alternatively, the looping can handled for you using the write() or write_stabilised() methods. Which method you choose is largely a matter of taste.

    • To work in the first of these styles, the following loop should be used:

      1. Obtain a suitable device to write to; typically by requesting a new, empty page from a DocumentWriter.

      2. Determine one or more rectangles on the page, that should receive story data. Note that not every page needs to have the same set of rectangles.

      3. Pass each rectangle to the story to place it, learning what part of that rectangle has been filled, and whether there is more story data that did not fit. This step can be repeated several times with adjusted rectangles until the caller is happy with the results.

      4. Optionally, at this point, we can request details of where interesting items have been placed, by calling the element_positions() method. Items are deemed to be interesting if their integer heading attribute is a non-zero (corresponding to HTML tags h1 - h6), if their id attribute is not None (corresponding to HTML tag id), or if their href attribute is not None (responding to HTML tag href). This can conveniently be used for automatic generation of a Table of Contents, an index of images or the like.

      5. Next, draw that rectangle out to the device with the draw() method.

      6. If the most recent call to place() indicated that all the story data had fitted, stop now.

      7. Otherwise, we can loop back. If there are more rectangles to be placed on the current device (page), we jump back to step 3 - if not, we jump back to step 1 to get a new device.

    • Alternatively, in the case where you are using a DocumentWriter, the write() or write_stabilized() methods can be used. These handle all the looping for you, in exchange for being provided with callbacks that control the behaviour (notably a callback that enumerates the rectangles/pages to use).

  • Which part of the story will land on which rectangle / which page, is fully under control of the Story object and cannot be predicted.

  • Images may be part of a story. They will be placed together with any surrounding text.

  • Multiple stories may - independently from each other - write to the same page. For example, one may have separate stories for page header, page footer, regular text, comment boxes, etc.

Parameters
  • html (str) – HTML source code. If omitted, a basic minimum is generated (see below). If provided, not a complete HTML document is needed. The in-built source parser will forgive (many / most) HTML syntax errors and also accepts HTML fragments like “<b>Hello, <i>World!</i></b>”.

  • user_css (str) – CSS source code. If provided, must contain valid CSS specifications.

  • em (float) – the default text font size.

  • archive

    an Archive from which to load resources for rendering. Currently supported resource types are images and text fonts. If omitted, the story will not try to look up any such data and may thus produce incomplete output.

    Note

    Instead of an actual archive, valid arguments for creating an Archive can also be provided – in which case an archive will temporarily be constructed. So, instead of story = fitz.Story(archive=fitz.Archive("myfolder")), one can also shorter write story = fitz.Story(archive="myfolder").

place(where)

Calculate that part of the story’s content, that will fit in the provided rectangle. The method maintains a pointer which part of the story’s content has already been written and upon the next invocation resumes from that pointer’s position.

Parameters

where (rect_like) – layout the current part of the content to fit into this rectangle. This must be a sub-rectangle of the page’s MediaBox.

Return type

tuple[bool, rect_like]

Returns

a bool (int) more and a rectangle filled. If more == 0, all content of the story has been written, otherwise more is waiting to be written to subsequent rectangles / pages. Rectangle filled is the part of where that has actually been filled.

draw(dev, matrix=None)

Write the content part prepared by Story.place() to the page.

Parameters
  • dev – the Device created by dev = writer.begin_page(mediabox). The device knows how to call all MuPDF functions needed to write the content.

  • matrix (matrix_like) – a matrix for transforming content when writing to the page. An example may be writing rotated text. The default means no transformation (i.e. the Identity matrix).

element_positions(function, args=None)

Let the Story provide positioning information about certain HTML elements once their place on the current page has been computed - i.e. invoke this method directly after Story.place().

Parameters
  • function – a Python callback function taking a ElementPostion instance, which will be invoked by this method to process positioning information.

  • args (dict) – an optional dictionary with any additional information that should be added to the ElementPosition instance passed to function. Like for example the current output page number. Every key in this dictionary must be a string that conforms to the rules for a valid Python identifier. The complete set of information is explained below.

reset()

Rewind the story’s document to the beginning for starting over its output.

body

The body part of the story’s DOM. Even if html=None has been used at story creation, the following minimum HTML source will always be available:

<html>
    <head></head>
    <body></body>
</html>

This attribute contains the Xml node of body. All relevant content for PDF production is contained between “<body>” and “</body>”.

write(writer, rectfn, positionfn=None, pagefn=None)

Places and draws Story to a DocumentWriter. Avoids the need for calling code to implement a loop that calls Story.place() and Story.draw() etc, at the expense of having to provide at least the rectfn() callback.

Parameters
  • writer – a DocumentWriter or None.

  • rectfn

    a callable taking (rect_num: int, filled: Rect) and returning (mediabox, rect, ctm):

    mediabox:

    None or rect for new page.

    rect:

    The next rect into which content should be placed.

    ctm:

    None or a Matrix.

  • positionfn

    None, or a callable taking (position: ElementPosition): position:

    An ElementPosition with an extra .page_num member.

    Typically called multiple times as we generate elements that are headings or have an id.

  • pagefn – None, or a callable taking (page_num, mediabox, dev, after); called at start (after=0) and end (after=1) of each page.

static write_stabilized(writer, contentfn, rectfn, user_css=None, em=12, positionfn=None, pagefn=None, archive=None, add_header_ids=True)

Static method that does iterative layout of html content to a DocumentWriter.

For example this allows one to add a table of contents section while ensuring that page numbers are patched up until stable.

Repeatedly creates a new Story from (contentfn(), user_css, em, archive) and lays it out with internal call to Story.write(); uses a None writer and extracts the list of ElementPosition’s which is passed to the next call of contentfn().

When the html from contentfn() becomes unchanged, we do a final iteration using writer.

Parameters
  • writer – A DocumentWriter.

  • contentfn – A function taking a list of ElementPositions and returning a string containing html. The returned html can depend on the list of positions, for example with a table of contents near the start.

  • rectfn

    A callable taking (rect_num: int, filled: Rect) and returning (mediabox, rect, ctm):

    mediabox:

    None or rect for new page.

    rect:

    The next rect into which content should be placed.

    ctm:

    A Matrix.

  • pagefn – None, or a callable taking (page_num, medibox, dev, after); called at start (after=0) and end (after=1) of each page.

  • archive

    .

  • add_header_ids – If true, we add unique ids to all header tags that don’t already have an id. This can help automatic generation of tables of contents.

Returns:

None.

Similar to write() except that we don’t have a writer arg and we return a PDF Document in which links have been created for each internal html link.

Similar to write_stabilized() except that we don’t have a writer arg and instead return a PDF Document in which links have been created for each internal html link.

Element Positioning CallBack function

The callback function can be used to log information about story output. The function’s access to the information is read-only: it has no way to influence the story’s output.

A typical loop for executing a story with using this method would look like this:

HTML = """
<html>
    <head></head>
    <body>
        <h1>Header level 1</h1>
        <h2>Header level 2</h2>
        <p>Hello MuPDF!</p>
    </body>
</html>
"""
MEDIABOX = fitz.paper_rect("letter")  # size of a page
WHERE = MEDIABOX + (36, 36, -36, -36)  # leave borders of 0.5 inches
story =  fitz.Story(html=HTML)  # make the story
writer = fitz.DocumentWriter("test.pdf")  # make the writer
pno = 0 # current page number
more = 1  # will be set to 0 when done
while more:  # loop until all story content is processed
    dev = writer.begin_page(MEDIABOX)  # make a device to write on the page
    more, filled = story.place(WHERE)  # compute content positions on page
    story.element_positions(recorder, {"page": pno})  # provide page number in addition
    story.draw(dev)
    writer.end_page()
    pno += 1  # increase page number
writer.close()  # close output file

def recorder(elpos):
    pass

Attributes of the ElementPosition class

The parameter passed to the recorder function is an object with the following attributes:

  • elpos.depth (int) – depth of this element in the box structure.

  • elpos.heading (int) – the header level, 0 if no header, 1-6 for h1 - h6.

  • elpos.href (str) – value of the ``href`attribute, or None if not defined.

  • elpos.id (str) – value of the id attribute, or None if not defined.

  • elpos.rect (tuple) – element position on page.

  • elpos.text (str) – immediate text of the element.

  • elpos.open_close (int bit field) – bit 0 set: opens element, bit 1 set: closes element. Relevant for elements that may contain other elements and thus may not immediately be closed after being created / opened.

  • elpos.rect_num (int) – count of rectangles filled by the story so far.

  • elpos.page_num (int) – page number; only present when using fitz.Story.write*() functions.

Discord logo