Markdown Chunker
Deterministic chunker that splits markdown text along heading boundaries while respecting a maximum chunk size. Optionally prepends each chunk with the heading hierarchy it sits under so the chunk is interpretable in isolation.
Use this for markdown documentation, articles, or anything where the heading
structure carries meaning. For arbitrary prose use AI::RecursiveChunker;
for fixed-size character windows use AI::FixedChunker; for LLM-driven
semantic splitting use AI::Chunker.
Parameters
Field on each incoming JSON record holding the markdown text. Leave empty to chunk the full record.
Field name written on each outgoing record holding the chunk text.
Defaults to chunk when left empty.
Highest heading level (1 = #, 2 = ##, …) that the chunker is allowed
to break at. Headings deeper than this are kept inside their parent chunk.
When enabled, each chunk is prefixed with the chain of parent headings
(e.g. # Title > ## Section > ### Subsection) so the chunk is
self-contained.
Input
Output
JSONL file with one line per produced chunk.
Number of input records that were processed.
Number of chunk records that were produced.