Skip to main content

Markdown Chunker

Deterministic chunker that splits markdown text along heading boundaries while respecting a maximum chunk size. Optionally prepends each chunk with the heading hierarchy it sits under so the chunk is interpretable in isolation.

Use this for markdown documentation, articles, or anything where the heading structure carries meaning. For arbitrary prose use AI::RecursiveChunker; for fixed-size character windows use AI::FixedChunker; for LLM-driven semantic splitting use AI::Chunker.

Parameters

Source Attribute

Field on each incoming JSON record holding the markdown text. Leave empty to chunk the full record.

Output AttributeREQUIRED

Field name written on each outgoing record holding the chunk text. Defaults to chunk when left empty.

Chunk Size
Maximum number of characters per chunk.
Max Heading Level

Highest heading level (1 = #, 2 = ##, …) that the chunker is allowed to break at. Headings deeper than this are kept inside their parent chunk.

Include Heading Path

When enabled, each chunk is prefixed with the chain of parent headings (e.g. # Title > ## Section > ### Subsection) so the chunk is self-contained.

Input

FileREQUIRED
JSONL file containing markdown text.

Output

File

JSONL file with one line per produced chunk.

Input chunks

Number of input records that were processed.

Output chunks

Number of chunk records that were produced.