2017 ChainsofReasoningoverEntitiesRe

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

Our goal is to combine the rich multistep inference of symbolic logical reasoning with the generalization capabilities of neural networks. We are particularly interested in complex reasoning about entities and relations in text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs to compose the distributed semantics of multi-hop paths in KBs; however for multiple reasons, the approach lacks accuracy and practicality. This paper proposes three significant modeling advances: (1) we learn to jointly reason about relations, entities, and entity-types; (2) we use neural attention modeling to incorporate multiple paths; (3) we learn to share strength in a single RNN that represents logical composition across all relations. On a largescale Freebase + ClueWeb prediction task, we achieve 25% error reduction, and a 53% error reduction on sparse relations due to shared strength. On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus previous state-of-the-art. The code and data are available at this https URL [1]

1 Introduction

There is a rising interest in extending neural networks to perform more complex reasoning, formerly addressed only by symbolic and logical reasoning systems. So far this work has mostly focused on small or synthetic data (Grefenstette, 2013; Bowman et al., 2015; Rockt¨aschel and Riedel, 2016). Our interest is primarily in reasoning about large knowledge bases (KBs) with diverse semantics, populated from text. One method for populating a KB from text (and for representing diverse semantics in the KB) is Universal Schema (Riedel et al., 2013; Verga et al., 2016), which learns vector embeddings capturing the semantic positioning of relation types - the union of all input relation types, both from the schemas of multiple structured KBs, as well as expressions of relations in natural language text.

i. place.birthpa; bq Бwas born in’pa; xq^
‘commonly known as’px; bq
ii. location.containspa; bq Ð(nationality)�1pa; xq^
place.birthpx; bq
iii. book.characterspa; bq Бaka’pa; xq^
(theater.character.plays)�1px; bq
iv. cause.deathpa; bq Бcontracted’pa; bq

Table 1: Several highly probable clauses learnt by our model. The textual relations are shown in quotes and italicized. Our model has the ability to combine textual and schema relations. r�1 is the inverse relation r, i.e. rpa; bq ô r�1pb; aq.

An important reason to populate a KB is to support not only look-up-style question answering, but reasoning on its entities and relations in order to make inferences not directly stored in the KB. KBs are often highly incomplete (Min et al., 2013), and reasoning can fill in these missing facts. The “matrix completion” mechanism that underlies the common implementation of Universal Schema can thus be seen as a simple type of reasoning, as can other work in tensor factorization (Nickel et al., 2011; Bordes et al., 2013; Socher et al., 2013). However these methods can be understood as operating on single pieces of evidence: for example, inferring that Microsoft–located-in–Seattle implies Microsoft– arXiv:1607.01426v3 [cs.CL] 1 May 2017 HQ-in–Seattle.

A highly desirable, richer style of reasoning makes inferences from Horn clauses that form multi-hop paths containing three or more entities in the KB’s entity-relation graph. For example, we may have no evidence directly linking Melinda Gates and Seattle, however, we may infer with some likelihood that Melinda–lives-in– Seattle, by observing that the KB contains the path Melinda–spouse–Bill–chairman–Microsoft– HQ-in–Seattle (Fig. 1a).

Symbolic rules of this form are learned by the Path Ranking Algorithm (PRA) (Lao et al., 2011). Dramatic improvement in generalization can be obtained by reasoning about paths, not in terms of relation-symbols, but Universal Schema style relation-vector-embeddings. This is done by Neelakantan et al. (2015), where RNNs semantically compose the per-edge relation embeddings along an arbitrary-length path, and output a vector embedding representing the inferred relation between the two entities at the end-points of the path. This approach thus represents a key example of complex reasoning over Horn clause chains using neural networks. However, for multiple reasons detailed below it is inaccurate and impractical.

This paper presents multiple modeling advances that significantly increase the accuracy and practicality of RNN-based reasoning on Horn clause chains in large-scale KBs. (1) Previous work, including (Lao et al., 2011; Neelakantan et al., 2015; Guu et al., 2015) reason about chains of relations, but not the entities that form the nodes of the path. Ignoring entities and entity-types leads to frequent errors, such as inferring that Yankee Stadium serves as a transportation hub for NY state. In our work, we jointly learn and reason about relation-types, entities, and entity-types. (2) The same previous work takes only a single path as evidence in inferring new predictions. However, as shown in Figure 1b, multiple paths can provide evidence for a prediction. In our work, we use neural attention mechanisms to reason about multiple paths. We use a novel pooling function which does soft attention during gradient step and find it to work better. (3) The most problematic impracticality of the above previous work [2] for application to KBs with broad semantics is their requirement to train a separate model for each relation-type to be predicted. In contrast, we train a single, high- capacity RNN that can predict all relation types. In addition to efficiency advantages, our approach significantly increases accuracy because the multitask nature of the training shares strength in the common RNN parameters.

We evaluate our new approach on a large scale dataset of Freebase entities, relations and ClueWeb text. In comparison with the previous best on this data, we achieve an error reduction of 25% in mean average precision (MAP). In an experiment specially designed to explore the benefits of sharing strength with a single RNN, we show a 54% error reduction in relations that are available only sparsely at training time. We also evaluate on a second data set, chains of reasoning inWordNet. In comparison with previous state-of-the-art (Guu et al., 2015) our model achieves a 84% reduction in error in mean quantile.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 ChainsofReasoningoverEntitiesReArvind Neelakantan
Rajarshi Das
David Belanger
Andrew McCallum
Chains of Reasoning over Entities, Relations, and Text Using Recurrent Neural Networks2017
  1. The code and data are available at https://rajarshd.github.io/ChainsofReasoning/
  2. with exception of (Guu et al., 2015)