Keywords: Relation Recognition from Text Algorithm, Dependency Grammar-based Relation Recognition Algorithm, LEILA
Summary
Contributions
- Applies [[Syntactic_Parsing?]] information to Supervised Relation Recognition with high F-score.
- First known application of [[kNN?]] and [[SVM?]] to Pattern selection. Past approaches use Heuristics or [[.
- Proposes a [[Text_Graph?]] representation that is expressive and robust.
- The patterns reported in the empirical studies are interesting and reasonable. (see Comments)
- Achieves good empirical results on several datasets.
- Their system is available to test.
Input/Output
Input
- positive examples
- (relationship direction? one-to-one, one-to-many, many-to-one?)
- annotated data (POS, Link, NER(e.g. person,date,...))
Output
- relation instances (& duplicates)
- and their associated patterns
Patterns
Questions, Patterns
- How are the boundaries of multi-word entities detected?
Algorithm
1. Discovery Phase (Train Phase)
- patterns are constructed from all sentences with both entities.
2. Assessment Phase (Train Phase)
- negative examples (counterexamples) are gathered
- negative patterns are constructed
- discriminative model induced
3. Harvesting Phase (Test Phase)
Questions, Algorithm
- How does the counterexample selection know which entity is required to be fixed. E.g. in "Chopin"/"1957", would the algorithm know that the relationship is one-to-many and that the 'many' is associated with the year?
General Questions
- What is the relationship of their approach with [Bunescu and Mooney, 2005] and with [Chiang and Yu, 2005]?
- E.g. what is the difference in their definition of "shortest-path"?
- It would be interesting to see all competitors evaluated.
- The patterns reported in the empirical studies do not demonstrate the benefit of shortest path or of replacement nouns and adjectives. For example, the pattern used in figure 3 "<X> was <ADJECTIVE> among <Y>", is not one of the patterns reported in 4.2.1.
- Why the discrepancy between the Snowball F-score performance of ~30% they report and the 80%+ figures reported in [Agichtein and Gravano, 2000]. This especially disconcerting given that the main pattern reported is the same one that Snowball (and DIPRE) have been shown to discover: "Y-based X".
- The statements around the contribution of anaphora resolution are unconvincing. Clearly Snowball would also benefit directly from accurate tagging of anaphoras as entities.
- There is no support for text that occurs before or after the entities, as in Snowball. This is a minor issue given that the value of prefiller and postfiller patterns.
- The size of the test sets are very small; in the hundreds.
- The size of the train sets is not reported
BibTeX
@InProceedings{conf/kdd/SuchanekIW06,
title = "Combining linguistic and statistical analysis to
extract relations from web documents",
author = "Fabian M. Suchanek and Georgiana Ifrim and Gerhard
Weikum",
bibdate = "2006-10-05",
bibsource = "DBLP,
http://dblp.uni-trier.de/db/conf/kdd/kdd2006.html#SuchanekIW06",
booktitle = "KDD",
booktitle = "Proceedings of the Twelfth {ACM} {SIGKDD}
International Conference on Knowledge Discovery and
Data Mining, Philadelphia, {PA}, {USA}, August 20-23,
2006",
publisher = "ACM",
year = "2006",
editor = "Tina Eliassi-Rad and Lyle H. Ungar and Mark Craven and
Dimitrios Gunopulos",
ISBN = "1-59593-339-5",
pages = "712--717",
URL = "http://doi.acm.org/10.1145/1150402.1150492",}