Stacked Denoising Autoencoding (SdA) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Stacked Denoising Autoencoding (SdA) Algorithm is a feed-forward neural network learning algorithm that produce a stacked denoising autoencoding network (consisting of layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer).



References

2017

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Autoencoder#Denoising_autoencoder Retrieved:2017-6-5.
    • Denoising autoencoders take a partially corrupted input whilst training to recover the original undistorted input. This technique has been introduced with a specific approach to good representation. A good representation is one that can be obtained robustly from a corrupted input and that will be useful for recovering the corresponding clean input. This definition contains the following implicit assumptions:
      • The higher level representations are relatively stable and robust to the corruption of the input;
      • It is necessary to extract features that are useful for representation of the input distribution.
    • To train an autoencoder to denoise data, it is necessary to perform preliminary stochastic mapping [math]\displaystyle{ \mathbf{x}\rightarrow\mathbf{\tilde{x}} }[/math] in order to corrupt the data and use [math]\displaystyle{ \mathbf{\tilde{x}} }[/math] as input for a normal autoencoder, with the only exception being that the loss should be still computed for the initial input [math]\displaystyle{ \mathcal{L}(\mathbf{x},\mathbf{\tilde{x}}) }[/math] instead of [math]\displaystyle{ \mathcal{L}(\mathbf{\tilde{x}},\mathbf{\tilde{x}'}) }[/math] .


2017

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Deep_learning#Stacked Retrieved:2017-6-5.
    • The auto encoder idea is motivated by the concept of a good representation. For example, for a classifier, a good representation can be defined as one that will yield a better performing classifier.

      An encoder is a deterministic mapping [math]\displaystyle{ f_\theta }[/math] that transforms an input vector x into hidden representation y, where [math]\displaystyle{ \theta = \{\boldsymbol{W}, b\} }[/math] , [math]\displaystyle{ \boldsymbol{W} }[/math] is the weight matrix and b is an offset vector (bias). A decoder maps back the hidden representation y to the reconstructed input z via [math]\displaystyle{ g_\theta }[/math] . The whole process of auto encoding is to compare this reconstructed input to the original and try to minimize this error to make the reconstructed value as close as possible to the original.

      In stacked denoising auto encoders, the partially corrupted output is cleaned (de-noised). This idea was introduced in 2010 by Vincent et al. with a specific approach to good representation, a good representation is one that can be obtained robustly from a corrupted input and that will be useful for recovering the corresponding clean input. Implicit in this definition are the following ideas:

      • The higher level representations are relatively stable and robust to input corruption;
      • It is necessary to extract features that are useful for representation of the input distribution.
    • The algorithm consists of multiple steps; starts by a stochastic mapping of [math]\displaystyle{ \boldsymbol{x} }[/math] to [math]\displaystyle{ \tilde{\boldsymbol{x}} }[/math] through [math]\displaystyle{ q_D(\tilde{\boldsymbol{x}}|\boldsymbol{x}) }[/math], this is the corrupting step. Then the corrupted input [math]\displaystyle{ \tilde{\boldsymbol{x}} }[/math] passes through a basic auto encoder process and is mapped to a hidden representation [math]\displaystyle{ \boldsymbol{y} = f_\theta(\tilde{\boldsymbol{x}}) = s(\boldsymbol{W}\tilde{\boldsymbol{x}}+b) }[/math] . From this hidden representation, we can reconstruct [math]\displaystyle{ \boldsymbol{z} = g_\theta(\boldsymbol{y}) }[/math] . In the last stage, a minimization algorithm runs in order to have z as close as possible to uncorrupted input [math]\displaystyle{ \boldsymbol{x} }[/math] . The reconstruction error [math]\displaystyle{ L_H(\boldsymbol{x},\boldsymbol{z}) }[/math] might be either the cross-entropy loss with an affine-sigmoid decoder, or the squared error loss with an affine decoder.

      In order to make a deep architecture, auto encoders stack one on top of another.[1] Once the encoding function [math]\displaystyle{ f_\theta }[/math] of the first denoising auto encoder is learned and used to uncorrupt the input (corrupted input), we can train the second level.

      Once the stacked auto encoder is trained, its output can be used as the input to a supervised learning algorithm such as support vector machine classifier or a multi-class logistic regression.

  1. Dana H. Ballard (1987). Modular learning in neural networks. Proceedings of AAAI, pages 279–284.

2016

2011

2011

2008