pdfreader 0.1.4 Documentation¶

Overview¶

pdfreader is a Pythonic API to PDF documents which follows PDF-1.7 specification.

It allows to parse documents, extract texts, images, fonts, CMaps, and other data; access different objects within PDF documents.

Features:

Extracts texts (plain and formatted)
Extracts forms data (plain and formatted)
Extracts images and image masks as Pillow/PIL Images
Supports all PDF encodings, CMap, predefined cmaps.
Browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
Document history access and access to previous document versions if incremental updates are in place.
Follows PDF-1.7 specification
Fast document processing due to lazy objects access

Installing / Upgrading: Instructions on how to get and install the distribution.
Tutorial: A quick overview on how to start.
Examples and HowTos: Examples of how to perform specific tasks.
pdfreader API: API documentation, organized by module.

Issues, Support and Feature Requests¶

If you’re having trouble, have questions about pdfreader, or need some features the best place to ask is the Github issue tracker. Once you get an answer, it’d be great if you could work it back into this documentation and contribute!

Contributing¶

pdfreader is an open source project. You’re welcome to contribute:

Code patches
Bug reports
Patch reviews
Introduce new features
Documentation improvements

pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.

See project sources

Donation¶

If this project is helpful, you can treat me to coffee :-)

About This Documentation¶

This documentation is generated using the Sphinx documentation generator. The source files for the documentation are located in the doc/ directory of the pdfreader distribution. To generate the docs locally run the following command from the root directory of the pdfreader source:

$ python setup.py doc