PDF page reader
Read a PDF file page by page and, optionally chunk by chunk. For example:
A PDF file
# Page 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
# Page 2
Quisque ac dolor massa.
Reader configuration
Chunk size: 20
Output
Page 1; Chunk 0; Content: Lorem ipsum dolor si
Page 1; Chunk 1; Content: t amet, consectetur ad
Page 1; Chunk 2; Content: ipiscing elit.
Page 2; Chunk 0; Content: Quisque ac dolor ma
Page 2; Chunk 1; Content: ssa.
Parameters
Chunk size
If a chunk size is specified, the reader will split the page in multiple chunks. The chunk size is the number of characters in each chunk.
Output
Page index
The index of the current page being read from the PDF file.
Chunk index
The index of the current chunk being read from the PDF file (if a chunk size has been specified).
Content
The content of the current page or chunk being read from the PDF file.
Metadata
Metadata of the current file.
Metadata content
- author (optional)
- title (optional)
- subject (optional)
- keywords (optional)
- creator (optional)
- producer (optional)
- creation_date (optional)
- modification_date (optional)