Recursive Chunker
Deterministic, non-LLM chunker that splits each record's text by trying a list
of separators from coarsest (e.g. \n\n) to finest (e.g. " "), recursively
re-splitting any chunk that exceeds the size budget. Falls back to fixed-size
splitting with overlap when no separator works.
This is the standard "recursive character text splitter" pattern — a good
default for general prose. For semantic splitting use AI::Chunker; for
markdown-aware splitting use AI::MarkdownChunker.
Parameters
Field on each incoming JSON record holding the text to chunk. Leave empty to chunk the full record.
Field name written on each outgoing record holding the chunk text.
Defaults to chunk when left empty.
Overlap (characters) used when the splitter falls back to fixed-size
splitting because no separator could break a chunk down further. Defaults
to 0.
Ordered list of separator strings tried from first to last. The chunker
walks the list and uses the first separator that successfully breaks the
content into pieces under the size budget. A common default is ["\n\n", "\n", ". ", " ", ""].
Input
Output
JSONL file with one line per produced chunk.
Number of input records that were processed.
Number of chunk records that were produced.