pdfreader.viewer submodule

class pdfreader.viewer.SimplePDFViewer(*args, **kwargs)
Simple PDF document interpreter (viewer).
  • uses PDFDocument as document navigation engine
  • renders document page content onto SimpleCanvas
  • has graphical state

On initialization automatically navigates to the 1st page.

Parameters:
  • fobj – file-like object: binary file descriptor, BytesIO stream etc.
  • password – Optional. Password to access PDF content.
current_page_number

Contains current page number

gss

Reflects current graphical state. GraphicsStateStack instance.

canvas

Current page canvas - SimpleCanvas instance

resources

Current page resources. Resources instance.

render()

Renders current page onto current canvas by interpreting content stream(s) commands. Charnges: graphical state, canvas.

navigate(n)

Navigates viewer to n-th page of the document. Side-effects: clears canvas, resets page resources, resets graphics state

Parameters:n – page number. The very first page has number 1
Raises:PageDoesNotExist – if there is no n-th page
next()

Navigates viewer to the next page of the document. Side-effects: clears canvas, resets page resources, resets graphics state

Raises:PageDoesNotExist – if there is no next page
prev()

Navigates viewer to the previous page of the document. Side-effects: clears canvas, resets page resources, resets graphics state

Raises:PageDoesNotExist – if there is no previous page
__iter__()

Returns document’s canvas iterator.

iter_pages()

Returns document’s pages iterator.

class pdfreader.viewer.SimpleCanvas

Very simple canvas for PDF viewer: can contain page images (inline and XObject), strings, forms and text content.

text_content

Shall be a meaningful string representation of page content for further usage (decoded strings + markdown for example)

strings

Shall be al list of decoded strings, no PDF commands

images

Shall be dict of name -> Image XObjects rendered with do command

inline_images

Shall be list of InlineImage objects as they appear on page stream (BI/ID/EI operators)

forms

Shall be dict of name -> SimpleCanvas built from Form XObjects displayed with do command

class pdfreader.viewer.GraphicsState(**kwargs)

Viewer’s graphics state. See PDF 1.7 specification

sec. 8.4 - Graphics state

sec. 9.3 - Text State Parameters and Operators

Parameters:kwargs – dict of attributes to set
CTM

current transformation matrix

LW

line width

LC

line cap

LJ

line join style

ML

miter limit

D

line dash

RI

color rendering intent

I

flatness tolerance

Font [font_name, font_size]

shall be a list if exists - [font_name, font_size] (Tf operator)

Tc

char spacing

Tw

word spacing

Tz

horizontlal scaling

TL

text leading

Tr

text rendering mode

Ts

text rise

class pdfreader.viewer.GraphicsStateStack

Graphics state stack. See PDF 1.7 specification sec. 8.4.2 - Graphics State Stack

save_state()

Copies current state and puts it on the top

restore_state()

Restore previously saved state from the top

class pdfreader.viewer.Resources(**kwargs)

Page resources. See sec 7.8.3 Resource Dictionaries

class pdfreader.viewer.PageDoesNotExist

Exception. Supposed to be raised by PDF viewers on navigation to non-existing pages.