PPLRE Research Topics - Sentences with Many Relations
Back to PPLRE Research Topics
- Synopsis: Past Relation Recognition Algorithms have been applied to mainly to tasks where the Sentences contain at most one instance of the sought relation and few if any extraneous entities to confound the pattern search. A sentence with a Company/Headquarter proposition typically will not mention more than one such relation, nor mention other companies or locations in the sentence. There is an opportunity to improve performance both in terms of recall and precision in domains, such as PPLRE, whose corpus is summarized information and with writing from a technical domain. One idea is to build a model that can predict whether two entities would share in all relations stated in the sentence.
- This challenge may manifest itself in determining what organism the relationship refers to. The challenge is exacerbated by regular presence of other organisms used to support a statement. E.coli for example is often referred to.
- Increased need for Coreference Resolution and Anaphora Resolution (e.g. “this protein is localized”).
- Reference: PPLRE Corpus 491.a.2
- "In this study, Escherichia coli<ORGANISM> TonB<PROTEIN-1> was found to be distributed in sucrose density gradients approximately equally between the cytoplasmic membrane<LOCATION-1> and the outer membrane<LOCATION-2> fractions, while two proteins with which it is known to interact, ExbB<PROTEIN-2> and ExbD<PROTEIN-3>, as well as the NADH<PROTEIN-4> oxidase activity characteristic of the cytoplasmic membrane<LOCATION-3>, were localized in the cytoplasmic membrane<LOCATION-4> fraction."
- It is a long sentence.
- There are many relations: 5
- Not all permutations are valid: 11 are invalid.
- The problem may be simplified by inducing a model that groups entities. E.g. "ExbB and ExbD, as well as the NADH oxidase.” into a single group.
- Reference: PPLRE Corpus 6061.a.6
- "Isolated T. pallidum<ORGANISM> outer membrane(LOCATION-1) was devoid of the 19-kDa 4D(PROTEIN-1) protein and the normally abundant 47-kDa(PROTEIN-2) lipoprotein known to be associated with the cytoplasmic membrane(LOCATION-2) ; only trace amounts of the periplasmic endoflagella(LOCATION-3) were detected ."
- contains only one relation out of the six possible permutations.