Skip to main content

Mistral: OCR

Run Mistral's OCR API on a document (passed as either a URL or a file). Returns one page per JSONL line with the extracted text, with optional structured annotation per bounding box and per document.

Pre-requisite: Install a Mistral AI application from Profile > {Organization} > Applications.

Parameters

ApplicationREQUIRED
Configured Mistral AI application.
Model

OCR model identifier (e.g. mistral-ocr-latest). Defaults to mistral-ocr-latest.

Pages

Page selection expression (e.g. 0,1,2, 0-5, 0,2-4). When omitted, all pages are processed.

Pages as JSONL

When enabled, response pages are written to a JSONL file (one page per line) instead of an inline JSON object. Recommended for large documents.

Extract Header

When enabled, page headers are extracted as a separate field.

Extract Footer

When enabled, page footers are extracted as a separate field.

Table Format

Output format for detected tables: none, markdown, or html.

Include Image Base64

When enabled, every detected image is returned as base64 data inside the response. Significantly increases response size.

Bbox Annotation Format

JSON Schema describing the structured annotation Mistral should return for each detected bounding box. Leave empty to skip bbox-level annotation.

Document Annotation Format

JSON Schema describing the structured annotation Mistral should return for the whole document. When set, the node also exposes the Document annotation output connector. Starter templates available: summary, table-of-contents, flat table-of-contents.

Document Annotation Prompt

Free-form prompt guiding the document-level annotation. Ignored when Document Annotation Format is empty.

Input

URL or FileREQUIRED

Either a PlainText URL pointing to a publicly fetchable document, or a File containing the document directly.

Output

Document pages

JSONL file with one page per line containing the extracted text and any enabled per-page extras (header, footer, tables, bbox annotations).

Document annotation

Only present when Document Annotation Format is set. JSON document matching the schema, summarizing the whole document.