# Memory-Based Locally Weighted Regression Task

A Memory-Based Locally Weighted Regression Task is a Nonparametric Regression Task that ...

**AKA:**LWR, Memory-Based Locally Weighted Regression.**Example(s):****Counter-Example(s):****See:**Initialism, Scattergram, Non-Parametric Regression, k-Nearest Neighbor Algorithm, Classical Statistics, Least Squares Regression, Nonlinear Regression.

## References

### 2017

- (Ting et al., 2017) ⇒ Jo-Anne Ting, Franzisk Meier, Sethu Vijayakumar, Stefan Schaal (2017) "Locally Weighted Regression for Control" in "Encyclopedia of Machine Learning and Data Mining" (2017) pp 759-772
- QUOTE:
**Memory-Based Locally Weighted Regression (LWR)**The original locally weighted regression algorithm was introduced by Cleveland (1979) and popularized in the machine learning and learning control community by Atkeson (1989). The algorithm – categorized as a “lazy” approach – can be summarized as follows below (for algorithmic pseudo-code, see Schaal et al. 2002):

- All training data is collected in the rows of the matrix [math]\mathbf{X}[/math] and the vector (For simplicity, only functions with a scalar output are addressed. Vector-valued outputs can be learned either by fitting a separate learning system for each output or by modifying the algorithms to fit multiple outputs (similar to multi-output linear regression).) [math]\mathbf{t}[/math].
- For every query point [math]\mathbf{x}_q[/math], the weighting kernel is centered at the query point.
- The weights are computed with Eq.(4), and all data points’ weights are collected in the diagonal weight matrix [math]\mathbf{W}_q[/math]
- The local regression coefficients are computed as [math] \boldsymbol{\beta }_{q} = \left (\mathbf{X}^{T}\mathbf{W}_{ q}\mathbf{X}\right )^{-1}\mathbf{X}^{T}\mathbf{W}_{ q}\mathbf{t} \quad\quad[/math](5)
- A prediction is formed with [math]y_{q} = \left [\mathbf{x}_{q}^{T}\;1\right ]\boldsymbol{\beta }_{q}[/math].

As in all kernel methods, it is important to optimize the kernel parameters in order to get optimal function fitting quality. For LWR, the critical parameter determining the bias-variance trade-off is the distance metric [math]\mathbf{D}_q[/math]. If the kernel is too narrow, it starts fitting noise. If it is too broad, oversmoothing will occur. [math]\mathbf{D}_q[/math] can be optimized with leave-one-out cross validation to obtain a

*globally*optimal value, i.e., the same [math]\mathbf{D}_q=\mathbf{D}[/math] is used throughout the entire input space of the data. Alternatively, [math]\mathbf{D}_q[/math] can be*locally*optimized as a function of the query point, i.e., obtain a [math]\mathbf{D}_q[/math] (as indicated by the subscript “q”). In the recent machine learning literature (in particular, work related to kernel methods), such input-dependent kernels are referred to as nonstationary kernels.

- QUOTE: