Tokenization
T
Tokenization
Definition
The process of breaking text into smaller units called tokens, which can be words, subwords, or characters. Tokenization is the first step in natural language processing pipelines and determines how text is represented for model input.