Mistral: OCR
Run Mistral's OCR API on a document (passed as either a URL or a file). Returns one page per JSONL line with the extracted text, with optional structured annotation per bounding box and per document.
Pre-requisite: Install a Mistral AI application from Profile > {Organization} > Applications.
Parameters
Mistral AI application.OCR model identifier (e.g. mistral-ocr-latest). Defaults to
mistral-ocr-latest.
Page selection expression (e.g. 0,1,2, 0-5, 0,2-4). When omitted,
all pages are processed.
When enabled, response pages are written to a JSONL file (one page per line) instead of an inline JSON object. Recommended for large documents.
When enabled, page headers are extracted as a separate field.
When enabled, page footers are extracted as a separate field.
Output format for detected tables: none, markdown, or html.
When enabled, every detected image is returned as base64 data inside the response. Significantly increases response size.
JSON Schema describing the structured annotation Mistral should return for each detected bounding box. Leave empty to skip bbox-level annotation.
JSON Schema describing the structured annotation Mistral should return for
the whole document. When set, the node also exposes the Document annotation output connector. Starter templates available: summary,
table-of-contents, flat table-of-contents.
Free-form prompt guiding the document-level annotation. Ignored when
Document Annotation Format is empty.
Input
Either a PlainText URL pointing to a publicly fetchable document, or a
File containing the document directly.
Output
JSONL file with one page per line containing the extracted text and any enabled per-page extras (header, footer, tables, bbox annotations).
Only present when Document Annotation Format is set. JSON document
matching the schema, summarizing the whole document.