Eugene Agichtein

Jump to navigation Jump to search

Eugene Agichtein is a person.





  • (Agichtein, 2005a) ⇒ Eugene Agichtein. (2005). “Scaling Information Extraction to Large Document Collections.” In: IEEE Data Eng. Bull., 28(4).
  • (Agichtein, 2005b) ⇒ Eugene Agichtein. (2005). “Extracting Relations from Large Text Collections." PhD thesis, Columbia University, New York.
    • ABSTRACT: A wealth of information is hidden within unstructured text. Often, this information can be beat exploited in structured or relational form, which is well suited for sophisticated query processing, for integration with relational database management systems, and for data mining. This thesis addresses two fundamental problems in extracting relations from large text collections: (1) portability: tuning extraction systems for new domains and (2) scalability: scaling up information extraction to large collections of documents. To address the first problem, we developed the Snowball information extraction system, a domain-independent system that learns to extract relations from unstructured text based on only a handful of user-provided example relation instances. Snowball can then be adapted to extract new relations with minimum human effort. Snowball improves the extraction accuracy by automatically evaluating the quality of both the acquired extraction patterns and the extracted relation instances. To address the second problem, we developed the QXtract system, which learns search engine queries that retrieve the documents that are relevant to a given information extraction system and extraction task. QXtract can dramatically improve the efficiency of the information extraction process, and provides a building block for extracting structured information and text data mining from the web at large.