TextRunner System

Jump to navigation Jump to search

A TextRunner System is an Information Extraction System developed at the University of Washington that can solve a Web-based Open Information Extraction Task.




1. Self-Supervised Learner: Given a small corpus sample as input, the Learner outputs a classifier that labels candidate extractions as "trustworthy" or not. The Learner requires no hand-tagged data.
2. Single-Pass Extractor: The Extractor makes a single pass over the entire corpus to extract tuples for all possible relations. The Extractor does not utilize a parser. The Extractor generates one or more candidate tuples from each sentence, sends each candidate to the classifier, and retains the ones labeled as trustworthy.
3. Redundancy-Based Assessor: The Assessor assigns a probability to each retained tuple based on a probabilistic model of redundancy in text introduced in (Downey et al., 2005).