2003 InducingHyperlinkingRulesInTextCollections

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Related Document Linking Task.

Notes

Cited By

~4 http://scholar.google.com/scholar?cites=9960369812068339373

Quotes

Abstract

  • Automatic hyperlinking methods based on Information Extraction techniques and on linking rules firing on salient facts have been proposed to connect documents with “typed” relations. However, the activity of defining link types and writing linking rules may be cumbersome due to the large number of possibilities. In this paper, we tackle this issue proposing a model for automatically extracting link types and, as a consequence, linking rules from large text collections. The novel idea is to exploit relations among facts expressed within documents and to use them in deciding the hyperlink types. The viability of our approach has been investigated using a collection of financial documents.

1. Introduction

  • Hyperlinked text collections are often seen as an added value. For instance, on-line news agencies and newspapers tend to offer news items enriched with links to the so-called “related articles” in order to better serve their customers. A journalist can write more rapidly an article for the current breaking news if he or she can easily access related facts. Similarly, a market analyst could better understand the sudden rise of a share if he or she is provided with the news items related to the acquisition activities of the involved company.
  • Tracing hyperlinks between documents involves the ability of finding relations among concepts or facts. This is the same ability used when writing. It is therefore an inherently difficult task. As pointed out in (Ellis et al. 94) the inter-agreement among the linking annotators may be very low even if they are only asked to produce links between “related texts”. The disagreement may be even bigger if the relevant link types are more than one. For instance the “cause-effect” relation may be used to better help to decide if it is worthy to traverse the provided link. This would help the final users to filter out the information they are not interest in.
  • Computational models able to suggest automatic procedures for linking documents (e.g. (Green 97)) and for typing the drawn links (e.g. (Allan 96)) have been proposed. However, the notion of “relatedness” provided by automatic approaches such as the ones based on the bag-ofword model (e.g. (Allan 96)) or the ones based on more “semantic” model (as the lexical chains (Morris & Hirst 91) used in (Green 97)) is not sufficient to classify links in types as “cause-effect”. The “cause-effect” link type is considered a “manual” link in (Allan 96) where a computational model for typing links in 6 classes is described (i.e. revision, summary/expansion, equivalence, comparison/ contrast, tangent, and aggregate there called the “automatic” links). Moreover, this dichotomy (“automatic” vs. “manual” link types) suggests the perceived inherent limitations of the above automatic approaches. These boundaries may be pushed forward, as also suggested in (Allan 96), using deeper text understanding models. In (Basili et al. 01) a hyper-linking method based on Information Extraction techniques conceived to connect documents with typed relations is proposed. Linking among documents is based on an intermediate representation of the documents, called the “objective representation”. The objective representation is a surrogate of the document containing only events judged relevant according to an underlying knowledge-base that models the given domain. On the basis of rules firing on the event classes, a link is justified according to events (and the involved entities) appearing in the two documents. The model offers a language in which specific linking rules may be manually written building on the supported event classes. However, the activity of defining link types and writing the related linking rules may not be an easy task mainly when a large number of fact classes is foreseen.
  • In this paper, we want to tackle this last issue by proposing a model for the automatic definition of link types and the related linking rules in the context of a rule-based hyper-linking method. The basic assumption we make is that the activity of building hypertexts is very similar to the process of writing. Therefore, the novel idea is to exploit the relations among facts as they appear inside the domain documents for inducing hyperlinking rules. In our opinion, the discourse structures of the domain documents are valuable resources for defining the types of relevant relations and the related linking rules.

5. Conclusions

  • Building on the basic idea that relevant link types are already expressed in the collections of domain documents, we presented a novel method for automatically deriving hyper-linking types lt(et1, et2) and, consequently, hyper-linking rules. The proposed method based on natural language processing techniques seems to be a viable solution to address the definition of such types and rules: in fact, when documents are covered by the domain knowledge model, the stated relations between event types may be recovered. As we have seen the method we propose reduces the linking rules that have to be considered.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 InducingHyperlinkingRulesInTextCollectionsRoberto Basili
Maria Teresa Pazienza
Fabio Massimo Zanzotto
Inducing Hyper­linking Rules in Text Collectionshttp://ai-nlp.info.uniroma2.it/zanzotto/2003 RANLP BasiliPazienzaZanzotto.pdf