pdfreader 0.1.13dev Documentation¶
Overview¶
pdfreader is a Pythonic API to PDF documents which follows PDF-1.7 specification.
It allows to parse documents, extract texts, images, fonts, CMaps, and other data; access different objects within PDF documents.
Features:
- Extracts texts (plain and formatted)
- Extracts forms data (plain and formatted)
- Extracts images and image masks as Pillow/PIL Images
- Supports all PDF encodings, CMap, predefined cmaps.
- Browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
- Document history access and access to previous document versions if incremental updates are in place.
- Follows PDF-1.7 specification
- Fast document processing due to lazy objects access
- Installing / Upgrading
- Instructions on how to get and install the distribution.
- Tutorial
- A quick overview on how to start.
- Examples and HowTos
- Examples of how to perform specific tasks.
- pdfreader API
- API documentation, organized by module.
Issues, Support and Feature Requests¶
If you’re having trouble, have questions about pdfreader, or need some features the best place to ask is the Github issue tracker. Once you get an answer, it’d be great if you could work it back into this documentation and contribute!
Contributing¶
pdfreader is an open source project. You’re welcome to contribute:
- Code patches
- Bug reports
- Patch reviews
- Introduce new features
- Documentation improvements
pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.
See project sources
About This Documentation¶
This documentation is generated using the Sphinx documentation generator. The source files for the documentation are located in the doc/ directory of the pdfreader distribution. To generate the docs locally run the following command from the root directory of the pdfreader source:
$ python setup.py doc