Data-Driven Sentiment Analysis Task

(Redirected from sentiment analysis)
Jump to: navigation, search

A Data-Driven Sentiment Analysis Task is a sentiment analysis task that is a data-driven analysis task.



  • (Wikipedia, 2015) ⇒ Retrieved:2015-5-25.
    • Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.

      Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).




  • (Archak et al., 2007) ⇒ Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis. (2007). “Show Me the Money!: Deriving the pricing power of product features by mining consumer reviews.” In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2007). doi:10.1145/1281192.1281202.
    • ABSTRACT: The increasing pervasiveness of the Internet has dramatically changed the way that consumers shop for goods. Consumer-generated product reviews have become a valuable source of information for customers, who read the reviews and decide whether to buy the product based on the information provided. In this paper, we use techniques that decompose the reviews into segments that evaluate the individual characteristics of a product (e.g., image quality and battery life for a digital camera). Then, as a major contribution of this paper, we adapt methods from the econometrics literature, specifically the hedonic regression concept, to estimate: (a) the weight that customers place on each individual product feature, (b) the implicit evaluation score that customers assign to each feature, and (c) how these evaluations affect the revenue for a given product. Towards this goal, we develop a novel hybrid technique combining text mining and econometrics that models consumer product reviews as elements in a tensor product of feature and evaluation spaces. We then impute the quantitative impact of consumer reviews on product demand as a linear functional from this tensor product space. We demonstrate how to use a low-dimension approximation of this functional to significantly reduce the number of model parameters, while still providing good experimental results. We evaluate our technique using a data set from consisting of sales data and the related consumer reviews posted over a 15-month period for 242 products. Our experimental evaluation shows that we can extract actionable business intelligence from the data and better understand the customer preferences and actions. We also show that the textual portion of the reviews can improve product sales prediction compared to a baseline technique that simply relies on numeric data.


  • (Esuli & Sebastiani, 2006) ⇒ Andrea Esuli, and Fabrizio Sebastiani. (2006). “SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining.” In: Proceedings of LREC 2006.
    • QUOTE:Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the “PN-polarity” of subjective terms, i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been, instead, much more scarce. In this work we describe SENTIWORDNET, a lexical resource in which eachWORDNET synset s is associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classification. The three scores are derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels but different classification behaviour. SENTIWORDNET is freely available for research purposes, and is endowed with a Web-based graphical user interface.
  • (Choi et al., 2006) ⇒ Yejin Choi, Eric Breck, and Claire Cardie. (2006). “Joint Extraction of Entities and Relations for Opinion Recognition.” In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2006).
    • QUOTE: We present an approach for the joint extraction of entities and relations in the context of opinion recognition and analysis. We identify two types of opinion-related entities — expressions of opinions and sources of opinions — along with the linking relation that exists between them. …


  • (Choi et al., 2005) ⇒ Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. (2005). “Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns.” In: Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP 2005).
    • QUOTE: Recent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extraction task and adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a). While CRFs model source identification as a sequence tagging task, AutoSlog learns extraction patterns. Our results show that the combination of these two methods performs better than either one alone. ...
  • (Popescu & Etzioni, 2005) ⇒ Ana-Maria Popescu, and Oren Etzioni. (2005). “Extracting Product Features and Opinions from Reviews.” In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP 2005).
    • QUOTE:Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces Opine, an unsupervised information-extraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products. …


  • (Hu & Liu, 2004) ⇒ Minqing Hu, and Bing Liu . (2004). “Mining and Summarizing Customer Reviews.” In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004).
    • QUOTE: … This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.
  • (Kim & Hovy, 2004) ⇒ Soo-Min Kim, and Eduard Hovy. (2004). “Determining the Sentiment of Opinions.” In: Proceedings of the 20th International Conference on Computational Linguistics (ACL 2004). doi:10.3115/1220355.1220555
    • QUOTE: Identifying sentiments (the affective parts of opinions) is a challenging problem. We present a system that, given a topic, automatically finds the people who hold opinions about that topic and the sentiment of each opinion. The system contains a module for determining word sentiment and another for combining sentiments within a sentence. We experiment with various models of classifying and combining sentiment at word and sentence levels, with promising results.
  • (Wiebe et al., 2004) ⇒ Janyce M. Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. (2004). “Learning Subjective Language.” In: Computational Linguistics, 30(3). doi:10.1162/0891201041850885