BM25 statistics for growing and dynamic data
query terms
term statistics
global statistics
top-k
PASS 1 — terms out · term statistics back
PASS 2 — global statistics out · top-k back
Client
"machine learning"
Query Coordinator
terms → stats → global → top-k
slab_0
precomputed stats
slab_1
precomputed stats
slab_2
precomputed stats
merged global statistics
total_docs
·
total_tokens
·
term_doc_freqs
all slabs score against the same merged statistics
slab_0
score & rank
slab_1
score & rank
slab_2
score & rank
merge top-k
globally accurate BM25
RANKED RESULTS → CLIENT