Variational Autoencoding (VAE) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Variational Autoencoding (VAE) Algorithm is an autoencoding algorithm that make strong assumptions concerning the distribution of latent variables.



References

2022

  • (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Variational_autoencoder Retrieved:2022-12-12.
    • In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods. Variational autoencoders are often associated with the autoencoder model because of its architectural affinity, but with significant differences in the goal and mathematical formulation. Variational autoencoders are probabilistic generative models that require neural networks as only a part of their overall structure, as e.g. in VQ-VAE. The neural network components are typically referred to as the encoder and decoder for the first and second component respectively. The first neural network maps the input variable to a latent space that corresponds to the parameters of a variational distribution. In this way, the encoder can produce multiple different samples that all come from the same distribution. The decoder has the opposite function, which is to map from the latent space to the input space, in order to produce or generate data points. Both networks are typically trained together with the usage of the reparameterization trick, although the variance of the noise model can be learned separately. Although this type of model was initially designed for unsupervised learning, its effectiveness has been proven for semi-supervised learning and supervised learning.

2018a

  • https://en.wikipedia.org/wiki/Autoencoder#Variational_autoencoder_(VAE)
    • QUOTE: Variational autoencoder models inherit autoencoder architecture, but make strong assumptions concerning the distribution of latent variables. They use variational approach for latent representation learning, which results in an additional loss component and specific training algorithm called Stochastic Gradient Variational Bayes (SGVB). It assumes that the data is generated by a directed graphical model [math]\displaystyle{ p(\mathbf{x}|\mathbf{z}) }[/math] and that the encoder is learning an approximation [math]\displaystyle{ q_{\phi}(\mathbf{z}|\mathbf{x}) }[/math] to the posterior distribution [math]\displaystyle{ p_{\theta}(\mathbf{z}|\mathbf{x}) }[/math] where [math]\displaystyle{ \mathbf{\phi} }[/math] and [math]\displaystyle{ \mathbf{\theta} }[/math] denote the parameters of the encoder (recognition model) and decoder (generative model) respectively. The objective of the variational autoencoder in this case has the following form:
      [math]\displaystyle{ \mathcal{L}(\mathbf{\phi},\mathbf{\theta},\mathbf{x})=D_{KL}(q_{\phi}(\mathbf{z}|\mathbf{x})||p_{\theta}(\mathbf{z}))-\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}\big(\log p_{\theta}(\mathbf{x}|\mathbf{z})\big) }[/math]
      Here, [math]\displaystyle{ D_{KL} }[/math] stands for the Kullback–Leibler divergence. The prior over the latent variables is usually set to be the centred isotropic multivariate Gaussian [math]\displaystyle{ p_{\theta}(\mathbf{z})=\mathcal{N}(\mathbf{0,I}) }[/math]; however, alternative configurations have also been recently considered, e.g. [1]

2018b

2016

  • (Doersch, 2016) ⇒ Carl Doersch. (2016). “Tutorial on Variational Autoencoders.” arXiv preprint arXiv:1606.05908
    • ABSTRACT: In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks), and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, including handwritten digits, faces, house numbers, CIFAR images, physical models of scenes, segmentation, and predicting the future from static images. This tutorial introduces the intuitions behind VAEs, explains the mathematics behind them, and describes some empirical behavior. No prior knowledge of variational Bayesian methods is assumed.

2014

2013

  • (Kingma & Welling, 2013) ⇒ Diederik P. Kingma, and Max Welling. (2013). “Auto-encoding Variational Bayes.” arXiv preprint arXiv:1312.6114
    • ABSTRACT: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

  1. Harris Partaourides and Sotirios P. Chatzis, “Asymmetric Deep Generative Models,” Neurocomputing, vol. 241, pp. 90-96, June 2017. [1]