Skip to main content

Recursive Chunker

Deterministic, non-LLM chunker that splits each record's text by trying a list of separators from coarsest (e.g. \n\n) to finest (e.g. " "), recursively re-splitting any chunk that exceeds the size budget. Falls back to fixed-size splitting with overlap when no separator works.

This is the standard "recursive character text splitter" pattern — a good default for general prose. For semantic splitting use AI::Chunker; for markdown-aware splitting use AI::MarkdownChunker.

Parameters

Source Attribute

Field on each incoming JSON record holding the text to chunk. Leave empty to chunk the full record.

Output AttributeREQUIRED

Field name written on each outgoing record holding the chunk text. Defaults to chunk when left empty.

Chunk Size
Maximum number of characters per chunk.
Fallback Overlap

Overlap (characters) used when the splitter falls back to fixed-size splitting because no separator could break a chunk down further. Defaults to 0.

SeparatorsREQUIRED

Ordered list of separator strings tried from first to last. The chunker walks the list and uses the first separator that successfully breaks the content into pieces under the size budget. A common default is ["\n\n", "\n", ". ", " ", ""].

Input

FileREQUIRED
JSONL file with one record per line.

Output

File

JSONL file with one line per produced chunk.

Input chunks

Number of input records that were processed.

Output chunks

Number of chunk records that were produced.