PDFDocument vs. SimplePDFViewer

pdfreader provides 2 different interfaces for PDFs:

What is the difference?

PDFDocument:
  • knows nothing about interpretation of content-level PDF operators
  • knows all about PDF file and document structure (types, objects, indirect objects, references etc.)
  • can be used to access any document object: XRef table, DocumentCatalog, page tree nodes (aka Pages), binary streams like Font, CMap, Form, Page etc.
  • can be used to access raw objects content (raw page content stream for example)
  • has no graphical state
SimplePDFViewer:
  • uses PDFDocument as document navigation engine
  • can render document content properly decoding it and interpreting PDF operators
  • has graphical state

Use PDFDocument to navigate document and access raw data.

Use SimplePDFViewer to extract content you see in your favorite viewer (Adobe Acrobat Reader, hehe :-).

Let’s see several usecases.