(Redirected from text token)Jump to navigation Jump to search
- It can range from being a Word Token to being a Punctuation Token.
- It can be associated with a POS Tag.
- It can be associated with a Text Token Location.
- It can range from being a Single-Token Word Mention to being a Multi-Token Word Mention.
- It can be a String Member of a Text Token String.
- It can be represented by a Text Token Predictor Feature (from a text token feature space).
- It can be identified by a Text Tokenization Task.
sentence" is the 4th token on "
This is a sentence.".
- See: Contiguous Substring, Text Token Window.
- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. (2008). “Introduction to Information Retrieval." Cambridge University Press. ISBN:0521865719.
- QUOTE: Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. ... A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. A type is the class of all tokens containing the same character sequence. A term is a (perhaps normalized) type that is included in the IR system's dictionary.