(Redirected from tokenized text document)Jump to navigation Jump to search
- AKA: Tokenized Text Document.
- See: Annotated Document, CoNLL Format.
- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. (2008). “Introduction to Information Retrieval." Cambridge University Press. ISBN:0521865719.
- (Reiss et al., 2008) ⇒ Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, and Shivakumar Vaithyanathan. (2008). “An Algebraic Approach to Rule-Based Information Extraction.” In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE 2008). doi:10.1109/ICDE.2008.4497502
- QUOTE: Dictionary matching is a fairly expensive operation that involves tokenizing the current document’s text and looking for all occurrences of the set of words and phrases listed in a specified dictionary. ... Even when documents are tokenized at the very beginning of the processing pipeline, an entire pass over these tokens for each Ed operator requires thousands of probes into the dictionary data structures. ... A DictEval that produces a set of matching spans given a dictionary and a tokenized document.