- (Glorot et al., 2011) ⇒ Xavier Glorot, Antoine Bordes, and Yoshua Bengio. (2011). “Domain Adaptation for Large-scale Sentiment Classification: A Deep Learning Approach.” In: Proceedings of the 28th International Conference on Machine Learning (ICML-11).
The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic research and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classifiers, hereby a system is trained on labeled reviews from one source domain but is meant to be deployed on another. We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. Sentiment classifiers trained with this high-level feature representation clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products. Furthermore, this method scales well and allowed us to successfully perform domain adaptation on a larger industrial-strength dataset of 22 domains.
With the rise of social media such as blogs and social networks, reviews, ratings and recommendations are rapidly proliferating; being able to automatically filter them is a current key challenge for businesses looking to sell their wares and identify new market opportunities. This has created a surge of research in sentiment classification (or sentiment analysis), which aims to determine the judgment of a writer with respect to a given topic based on a given textual comment. Sentiment analysis is now a mature machine learning research topic, as illustrated with this review (Pang and Lee, 2008). Applications to many different domains have been presented, ranging from movie reviews (Pang et al., 2002) and congressional floor debates (Thomas et al., 2006) to product recommendations (Snyder and Barzilay, 2007; Blitzer et al., 2007).
3.2 Stacked Denoising Auto-encoders
The basic framework for our models is the Stacked Denoising Auto-encoder (Vincent et al., 2008). An auto-encoder is comprised of an encoder function [math]h(\cdot)[/math] and a decoder function [math]g(\cdot)[/math], typically with the dimension of [math]h(\cdot)[/math] smaller than that of its argument. The reconstruction of input x is given by r (x) = g (h (x)), and auto-encoders are typically trained to minimize a form of reconstruction error loss (x; r (x)). Examples of reconstruction error include the squared error, or like here, when the elements of x or r (x) can be considered as probabilities of a discrete event, the Kullback-Domain Adaptation for Sentiment Classification with Deep Learning Liebler divergence between elements of x and elements of r (x). When the encoder and decoder are linear and the reconstruction error is quadratic, one recovers in h (x) the space of the principal components (PCA) of x. Once an auto-encoder has been trained, one can stack another auto-encoder on top of it, by training a second one which sees the encoded output of the first one as its training data. Stacked auto-encoders were one of the first methods for building deep architectures (Bengio et al., 2006), along with Restricted Boltzmann Machines (RBMs) (Hinton et al., 2006). Once a stack of auto-encoders or RBMs has been trained, their parameters describe multiple levels of representation for x and can be used to initialize a supervised deep neural network (Bengio, 2009) or directly feed a classifier, as we do in this paper.
An interesting alternative to the ordinary autoencoder is the Denoising Auto-encoder (Vincent et al., 2008) or DAE, in which the input vector x is stochastically corrupted into a vector ~x, and the model is trained to denoise, i.e., to minimize a denoising reconstruction error loss (x; r (~x)). Hence the DAE cannot simply copy its input ~x in its code layer h (~x), even if the dimension of h (~x) is greater than that of ~x. The denoising error can be linked in several ways to the likelihood of a generative model of the distribution of the uncorrupted examples x (Vincent, 2011).
|2011 DomainAdaptationforLargeScaleSe||Xavier Glorot|
|Domain Adaptation for Large-scale Sentiment Classification: A Deep Learning Approach||2011|