2006 IdentifyingComparativeSentences

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Comparative Sentence.

Notes

Quotes

Abstract

  • This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the subjective opinion of the author. An important application area of sentiment/opinion identification is business intelligence as a product manufacturer always wants to know consumers' opinions on its products. Comparisons on the other hand can be subjective or objective. Furthermore, a comparison is not concerned with an object in isolation. Instead, it compares the object with others. An example opinion sentence is "the sound quality of CD player X is poor". An example comparative sentence is "the sound quality of CD player X is not as good as that of CD player Y". Clearly, these two sentences give different information. Their language constructs are quite different too. Identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation, which may even be more important than opinions on each individual object. This paper proposes to study the comparative sentence identification problem. It first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents. Experiment results using three types of documents, news articles, consumer reviews of products, and Internet forum postings, show a precision of 79% and recall of 81%. More detailed results are given in the paper.

3.3 Problem Statement

  • In this work, we study comparatives at the sentence level. Thus, we state the problem based on sentences.
    • Definition (comparative sentence): A comparative sentence is a sentence that expresses a relation based on similarities or differences of more than one object.
    • Definition (objects and their features): An object is an entity that can be a person, a product, an action, etc, under comparison in a comparative sentence. Each object has a set of features, which are used to compare objects.
  • A comparison can be between two or more objects, groups of objects, one object and the rest of the objects. It can also be between an object and its previous or future versions.
  • Types of comparatives: We group comparatives into four types. The first three of which are gradable comparatives and the fourth one is non-gradable comparative. The gradable types are defined based on the relationships of greater or less than, equal to, and greater orall others.
    • 1) Non-Equal Gradable: Relations of the type greater or less than that express an ordering of some objects with regard to certain features. This type includes user preferences, and also those comparatives that do not use JJR and RBR words
    • 2) Equative: Relations of the type equal to that state two objects as equal with respect to some features.
    • 3) Superlative: Relations of the type greater orall others that rank one object over all others.
    • 4) Non-Gradable: Sentences which compare features of two or more objects, but do not grade them. Sentences which imply:
      • 1. Object A is similar to or different from Object B with regard to some features.
      • 2. Object A has feature F1, Object B has feature F2 (F1 and F2 are usually substitutable).
      • 3. Object A has feature F, but object B does not have.
    • Incidentally, these definitions are also used as guidelines to annotate (or label) sentences for the evaluation of our technique.
    • Tasks: We identify two main tasks in dealing with comparisons:
      • Identifying comparative sentences from a given text data set.
      • Extracting comparative relations from sentences.
    • In this work, we focus on the first task, i.e. identifying comparative sentences from text documents. The second task is studied in [12].
    • Challenges: Two main challenges of this work are as follows:
      • 1. Not all sentences with POS tags JJR, RBR, JJS and RBS are comparisons, e.g., “In the context of speed, faster means better.”
      • 2. Some sentences are comparisons but do not use any indicator word. For example, “Coffee is expensive, but Tea is cheap.”,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 IdentifyingComparativeSentencesBing Liu
Nitin Jindal
Identifying Comparative Sentences in Text Documentshttp://www.cs.uic.edu/~liub/publications/sigir06-comp.pdf