Math: BM25

Embed text as a BM25 sparse vector (indices + values) suitable for hybrid search in vector databases like Qdrant. Pair this with a dense embedding upstream to power a hybrid (dense + sparse) retrieval pipeline.

Example output
{
  "indices": [12, 84, 132, ...],
  "values":  [0.42, 0.31, 0.18, ...]
}

Parameters

avgdl

Average document length used by the BM25 length-normalisation term. Tune to the typical token count of your corpus. Defaults to 256.

Length-normalisation parameter — 0.0 disables length normalisation, 1.0 applies it fully. Defaults to 0.0.

Term-frequency saturation parameter. Higher values let very frequent terms dominate; typical range 1.2–2.0. Defaults to 1.2.

language

Language hint for tokenisation / stop-word handling. Use detect (default) to auto-detect, or an ISO-639 code (en, fr, …) when you know the language up front.

Input

Text—REQUIRED

The text to embed.

Output

Results

JSON object holding the sparse vector (indices + values arrays).

Parameters​

Input​

Output​

Parameters

Input

Output