PDFDocument vs. SimplePDFViewer¶
What is the difference?
- knows nothing about interpretation of content-level PDF operators
- knows all about PDF file and document structure (types, objects, indirect objects, references etc.)
- can be used to access any document object: XRef table, DocumentCatalog, page tree nodes (aka Pages), binary streams like Font, CMap, Form, Page etc.
- can be used to access raw objects content (raw page content stream for example)
- has no graphical state
PDFDocumentas document navigation engine
- can render document content properly decoding it and interpreting PDF operators
- has graphical state
PDFDocument to navigate document and access raw data.
Let’s see several usecases.
- How to extract XObject or Inline Images, Image Masks
- How to parse PDF texts
- How to parse PDF Forms
- How to extract CMap for a font from PDF
- How to extract Font data from PDF
- How to browse PDF objects