Skip to main content

Fixed Chunker

Deterministic, non-LLM chunker that splits each record's text into fixed-size character windows with optional overlap. Cheap, fast, and reproducible — use this when content boundaries don't matter and you just want consistently sized chunks. For semantic splitting use AI::Chunker; for markdown-aware splitting use AI::MarkdownChunker; for separator-based splitting use AI::RecursiveChunker.

Parameters

Source Attribute

Field on each incoming JSON record holding the text to chunk. Leave empty to chunk the full record (serialized to JSON).

Output AttributeREQUIRED

Field name written on each outgoing record holding the chunk text. Defaults to chunk when left empty.

Chunk Size

Maximum number of characters per chunk. Defaults to the platform default chunk size when left empty.

Overlap

Number of characters shared between consecutive chunks. Higher overlap preserves context across boundaries at the cost of duplicated content. Defaults to 0.

Input

FileREQUIRED
JSONL file with one record per line.

Output

File

JSONL file with one line per produced chunk. The original record is preserved with the chunk text written to the output attribute.

Input chunks

Number of input records that were processed.

Output chunks

Number of chunk records that were produced.