input
run ▶
SimpleTokenizer → RemoveLongFilter(max_term_len=256) → LowerCaser → StopWordFilter → Stemmer
SimpleTokenizer
1 / 5
split on whitespace & punctuation
indexed →