PDFDocument vs. SimplePDFViewer¶
- pdfreader provides 2 different interfaces for PDFs:
What is the difference?
PDFDocument
:- knows nothing about interpretation of content-level PDF operators
- knows all about PDF file and document structure (types, objects, indirect objects, references etc.)
- can be used to access any document object: XRef table, DocumentCatalog, page tree nodes (aka Pages), binary streams like Font, CMap, Form, Page etc.
- can be used to access raw objects content (raw page content stream for example)
- has no graphical state
SimplePDFViewer
:- uses
PDFDocument
as document navigation engine - can render document content properly decoding it and interpreting PDF operators
- has graphical state
- uses
Use PDFDocument
to navigate document and access raw data.
Use SimplePDFViewer
to extract content you see in your favorite viewer
(Adobe Acrobat Reader, hehe :-).
Let’s see several usecases.