Spectral Learning Algorithm: Difference between revisions
m (Text replacement - "<P> [[" to "<P> [[") |
m (Text replacement - "ments]]" to "ment]]s") |
||
Line 9: | Line 9: | ||
=== 2013 === | === 2013 === | ||
* ([[Cohen, Collins et al., 2013]]) ⇒ [[Shay Cohen]], [[Michael Collins]], [[Dean Foster]], [[Karl Stratos]] and [[Lyle Ungar]]. ([[2013]]). Spectral Learning Algorithms for Natural Language Processing." Tutorial at NAACL 2013 | * ([[Cohen, Collins et al., 2013]]) ⇒ [[Shay Cohen]], [[Michael Collins]], [[Dean Foster]], [[Karl Stratos]] and [[Lyle Ungar]]. ([[2013]]). Spectral Learning Algorithms for Natural Language Processing." Tutorial at NAACL 2013 | ||
** QUOTE: Recent [[work in machine learning]] and [[recent NLP research|NLP]] has developed [[Spectral Learning Algorithm|spectral algorithm]]s for many [[learning tasks involving latent variables]]. </s> [[Spectral Learning Algorithm|Spectral algorithm]]s rely on [[singular value decomposition]] as a [[basic operation]], usually followed by some [[simple estimation method]] based on the [[method of | ** QUOTE: Recent [[work in machine learning]] and [[recent NLP research|NLP]] has developed [[Spectral Learning Algorithm|spectral algorithm]]s for many [[learning tasks involving latent variables]]. </s> [[Spectral Learning Algorithm|Spectral algorithm]]s rely on [[singular value decomposition]] as a [[basic operation]], usually followed by some [[simple estimation method]] based on the [[method of moment]]s. </s> From a [[theoretical point of view]], these [[Spectral Learning Algorithm|methods]] are appealing in that they offer [[consistent estimator]]s (and [[PAC-style guarantees of sample complexity]]) for several important [[latent-variable model]]s. </s> This is in contrast to the [[EM algorithm]], which is an extremely successful approach, but which only has guarantees of reaching a local maximum of the [[likelihood function]]. </s> <P> From a practical point of view, the methods (unlike [[EM]]) have no need for careful [[initialization]], and have recently been shown to be highly efficient (as one example, in work under submission by the authors on learning of [[latent-variable PCFG]]s, a [[Spectral Learning Algorithm|spectral algorithm]] performs at [[identical accuracy to EM]], but is around 20 times faster). </s> <P> In [[this tutorial]] we will aim to give a broad overview of [[Spectral Learning Algorithm|spectral method]]s, describing theoretical guarantees, as well as practical issues. </s> [[We]] will start by covering the basics of [[singular value decomposition]] and describe efficient methods for doing [[singular value decomposition]]. </s> The [[SVD operation]] is at the core of most [[Spectral Learning Algorithm|spectral algorithm]]s that have been developed. </s> <P> [[We]] will then continue to cover [[canonical correlation analysis (CCA)]]. </s> [[CCA]] is an early method from [[statistics for dimensionality reduction]]. </s> With [[CCA]], two or more views of the data are created, and they are all projected into a [[lower dimensional space]] which [[maximize]]s the correlation between the views. </s> [[We]] will review the basic algorithms underlying [[CCA]], give some formal results giving guarantees for [[latent-variable model]]s and also describe how they have been applied recently to learning lexical representations from [[large quantities of unlabeled data]]. </s> This idea of [[learning lexical representation]]s can be extended further, where [[unlabeled data]] is used to learn underlying representations which are subsequently used as additional information for [[supervised training]]. </s> <P> [[We]] will also cover how [[Spectral Learning Algorithm|spectral algorithm]]s can be used for [[structured prediction problem]]s with [[sequence-based prediction task|sequence]]s and [[parse tree-based prediction task|parse tree]]s. </s> A striking recent result by [[Hsu, Kakade and Zhang (2009)]] shows that [[HMMs]] can be [[learned efficiently]] using a [[Spectral Learning Algorithm|spectral algorithm]]. </s> [[HMMs]] are widely used in [[NLP]] and [[speech task|speech]], and [[previous algorithm]]s (typically based on [[EM]]) were guaranteed to only reach a [[local maximum of the likelihood function]], so this is a crucial result. </s> [[We]] will review the basic mechanics of the [[HMM learning algorithm]], describe its formal guarantees, and also cover practical issues. </s> <P> Last, [[we]] will cover work about [[Spectral Learning Algorithm|spectral algorithm]]s in the context of [[natural language parsing]]. </s> [[We]] will show how [[Spectral Learning Algorithm|spectral algorithm]]s can be used to [[estimate the parameter models of latent-variable PCFGs]], a model which serves as the base for [[state-of-the-art parsing model]]s such as the one of [[Petrov et al. (2007)]]. </s> [[We]] will show what are the practical steps that are needed to be taken in order to make [[Spectral Learning Algorithm|spectral algorithm]]s for [[L-PCFGs]] (or other models in general) practical and comparable to [[state of the art]]. </s> | ||
=== 2003 === | === 2003 === |
Revision as of 04:37, 24 June 2024
A Spectral Learning Algorithms is a learning algorithm that ...
- See: [[]].
References
2013
- (Cohen, Collins et al., 2013) ⇒ Shay Cohen, Michael Collins, Dean Foster, Karl Stratos and Lyle Ungar. (2013). Spectral Learning Algorithms for Natural Language Processing." Tutorial at NAACL 2013
- QUOTE: Recent work in machine learning and NLP has developed spectral algorithms for many learning tasks involving latent variables. Spectral algorithms rely on singular value decomposition as a basic operation, usually followed by some simple estimation method based on the method of moments. From a theoretical point of view, these methods are appealing in that they offer consistent estimators (and PAC-style guarantees of sample complexity) for several important latent-variable models. This is in contrast to the EM algorithm, which is an extremely successful approach, but which only has guarantees of reaching a local maximum of the likelihood function.
From a practical point of view, the methods (unlike EM) have no need for careful initialization, and have recently been shown to be highly efficient (as one example, in work under submission by the authors on learning of latent-variable PCFGs, a spectral algorithm performs at identical accuracy to EM, but is around 20 times faster).
In this tutorial we will aim to give a broad overview of spectral methods, describing theoretical guarantees, as well as practical issues. We will start by covering the basics of singular value decomposition and describe efficient methods for doing singular value decomposition. The SVD operation is at the core of most spectral algorithms that have been developed.
We will then continue to cover canonical correlation analysis (CCA). CCA is an early method from statistics for dimensionality reduction. With CCA, two or more views of the data are created, and they are all projected into a lower dimensional space which maximizes the correlation between the views. We will review the basic algorithms underlying CCA, give some formal results giving guarantees for latent-variable models and also describe how they have been applied recently to learning lexical representations from large quantities of unlabeled data. This idea of learning lexical representations can be extended further, where unlabeled data is used to learn underlying representations which are subsequently used as additional information for supervised training.
We will also cover how spectral algorithms can be used for structured prediction problems with sequences and parse trees. A striking recent result by Hsu, Kakade and Zhang (2009) shows that HMMs can be learned efficiently using a spectral algorithm. HMMs are widely used in NLP and speech, and previous algorithms (typically based on EM) were guaranteed to only reach a local maximum of the likelihood function, so this is a crucial result. We will review the basic mechanics of the HMM learning algorithm, describe its formal guarantees, and also cover practical issues.
Last, we will cover work about spectral algorithms in the context of natural language parsing. We will show how spectral algorithms can be used to estimate the parameter models of latent-variable PCFGs, a model which serves as the base for state-of-the-art parsing models such as the one of Petrov et al. (2007). We will show what are the practical steps that are needed to be taken in order to make spectral algorithms for L-PCFGs (or other models in general) practical and comparable to state of the art.
- QUOTE: Recent work in machine learning and NLP has developed spectral algorithms for many learning tasks involving latent variables. Spectral algorithms rely on singular value decomposition as a basic operation, usually followed by some simple estimation method based on the method of moments. From a theoretical point of view, these methods are appealing in that they offer consistent estimators (and PAC-style guarantees of sample complexity) for several important latent-variable models. This is in contrast to the EM algorithm, which is an extremely successful approach, but which only has guarantees of reaching a local maximum of the likelihood function.
2003
- (Kamvar et al., 2003) ⇒ Sepandar D. Kamvar, Dan Klein, and Christopher D. Manning. (2003). “Spectral Learning.” In: Proceedings of the 18th international joint conference on Artificial intelligence.
- QUOTE: We present a simple, easily implemented spectral learning algorithm which applies equally whether we have no supervisory information, pairwise link constraints, or labeled examples. In the unsupervised case, it performs consistently with other spectral clustering algorithms.