Local Gaussian Regression (LGR) Task

From GM-RKB
Jump to navigation Jump to search

A Local Gaussian Regression (LGR) Task is a Nonparametric Regression Task that ...



References

2017

  • (Ting et al., 2017) ⇒ Jo-Anne Ting, Franzisk Meier, Sethu Vijayakumar, Stefan Schaal (2017) "Locally Weighted Regression for Control" in "Encyclopedia of Machine Learning and Data Mining" (2017) pp 759-772
    • QUOTE: Meier et al. (2014) offer an alternative approach to local learning. They start out with the global objective (Eq.3) and reformulate it to capture the idea of local models that cooperate to generate a function fit, resulting in

      [math]\displaystyle{ J = E\left [\frac{1} {2}\sum _{i=1}^{N}\left (\mathbf{t}_{ i} -\sum _{k=1}^{K}w_{ k,i}(\mathbf{x}_{i}^{T}\boldsymbol{\beta }_{ k})\right )^{2}\right ]\quad\quad }[/math] (11)

      With this change, a local models’ contribution [math]\displaystyle{ \hat{y}_{k} = \mathbf{x}_{i}^{T}\boldsymbol{\beta }_{k} }[/math] toward the fit of target [math]\displaystyle{ \mathbf{t}_i }[/math] is localized through weight [math]\displaystyle{ w_{k,i} }[/math]. However, this form of localization couples all local models. For efficient learning, local Gaussian regression (LGR) thus employs approximations to decouple learning of parameters. The main ideas of LGR are:

      • Introduce Gaussian hidden variables [math]\displaystyle{ \mathbf{f}_k }[/math] that form virtual targets for the weighted contribution of the [math]\displaystyle{ k }[/math]th local model:

        [math]\displaystyle{ f_{k,i} = \mathcal{N}\left (w_{k,i}(\mathbf{x}_{i}^{T}\boldsymbol{\beta }_{ k}),\;\beta _{m}^{-1}\right )\quad\quad }[/math](12)

        Assume that the target [math]\displaystyle{ t }[/math] is observed with Gaussian noise and that the hidden variables [math]\displaystyle{ f_k }[/math] need to sum up to noisy target [math]\displaystyle{ t_i }[/math]

        [math]\displaystyle{ t_{i} = \mathcal{N}\left (\sum _{k}f_{k,i},\;\beta _{y}^{-1}\right )\quad\quad }[/math](13)

        In its exact form, this model learning procedure will couple all local models parameters.

      • Employ a variational approximation to decouple local models. This results in an iterative (EM style) learning procedure, between updating posteriors over hidden variables [math]\displaystyle{ f_k }[/math] followed by posterior updates for regression parameters [math]\displaystyle{ \boldsymbol{\beta }_{k} }[/math], for all local models [math]\displaystyle{ k = 1,\cdots,K }[/math].
      • The updates over the hidden variables [math]\displaystyle{ f_k }[/math] turn out to be a form of message passing between local model predictions. This step allows the redistribution of virtual target values for each local model. This communication between local models is what distinguishes LGR from typical LWR approaches. This update is linear in the number of local models and in the number of data points.
      • The parameter updates ([math]\displaystyle{ \boldsymbol{\beta }_{k} }[/math] and [math]\displaystyle{ \mathbf{D}_k }[/math]) per local model become completely independent through the variational approximation, resulting in a localized learning algorithm, similar in spirit to LWR.
      • Place Gaussian priors over regression parameters [math]\displaystyle{ \boldsymbol{\beta }_{k} \sim \mathcal{N}(\boldsymbol{\beta }_{k};0,\mathrm{diag}\left (\boldsymbol{\alpha }_{k}\right )) }[/math] that allow for automatic relevance determination of the input dimensions.
      • For incrementally incoming data, apply recursive Bayesian updates that utilize the posterior over parameters at time step [math]\displaystyle{ t − 1 }[/math] to be the prior over parameters at time step [math]\displaystyle{ t }[/math]. Furthermore, new local models are added if no existing local model is activated with some minimal activation weight, similar to LWPR.
      • Prediction for a query input [math]\displaystyle{ x_q }[/math] becomes a weighted average of local models predictions

        [math]\displaystyle{ y_{q} =\sum _{ k=1}^{K}w_{ k,q}(\mathbf{x}_{q}^{T}\boldsymbol{\beta }_{ k})\quad\quad }[/math]

      More details and a pseudo-algorithm for incremental LGR can be found in Meier et al. (2014)