Kernel Density Estimation (KDE) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Kernel Density Estimation (KDE) Algorithm is a probability density estimation algorithm based on kernels.



References

2020

2020

  • (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Kernel_density_estimation#Definition Retrieved:2020-12-16.
    • Let (x1, x2, …, xn) be a univariate independent and identically distributed sample drawn from some distribution with an unknown density ƒ at any given point x. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is : [math]\displaystyle{ \widehat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), }[/math] where K is the kernel — a non-negative function — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as . Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.

      A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means , where ϕ is the standard normal density function. The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map).

2008

  • http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xlghtmlnode33.html
    • QUOTE: The goal of density estimation is to approximate the probability density function [math]\displaystyle{ f(\bullet) }[/math] of a random variable [math]\displaystyle{ X }[/math]. Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.

      Table 6.1: Kernel functions.

      • Kernel [math]\displaystyle{ K(u) }[/math]
      • Uniform [math]\displaystyle{ \frac{1}{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
      • Triangle [math]\displaystyle{ (1-\vert u \vert) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
      • Epanechnikov [math]\displaystyle{ \frac{3}{4}(1-u^{2}) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
      • Quartic [math]\displaystyle{ \frac{15}{16}(1-u^{2})^{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
      • Triweight [math]\displaystyle{ \frac{35}{32}(1-u^{2})^{3} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
      • Gaussian [math]\displaystyle{ \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) }[/math]
      • Cosinus [math]\displaystyle{ \frac{\pi}{4}\cos(\frac{\pi}{2}u) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]

2008

  • (Upton & Cook, 2008) ⇒ Graham Upton, and Ian Cook. (2008). “A Dictionary of Statistics, 2nd edition revised." Oxford University Press. ISBN:0199541450
    • QUOTE:
      • Kernel Method (Kernel Density Estimation): A method for the estimation of probability density function s. Suppose [math]\displaystyle{ X }[/math] is a continuous random variable with unknown probability density functionf. A random sample of observations of [math]\displaystyle{ X }[/math] is taken. If the sample values are denoted by [math]\displaystyle{ x_l, x_2, ..., x_n }[/math] the estimate of f is given by
        [math]\displaystyle{ f(x) = A \sum_{j=1}^{n}K(x, x_j), }[/math]
        where [math]\displaystyle{ K }[/math] is a kernel function and the constant A is chosen so that [math]\displaystyle{ \int_{-\infty}^{\infty} f(x)dx = 1 }[/math]. The observation [math]\displaystyle{ x_j }[/math] may be regarded as being spread out between [math]\displaystyle{ x_j - a }[/math] and [math]\displaystyle{ x_j+ b }[/math] (usually with [math]\displaystyle{ a = b) }[/math]. The result is that the naive estimate of f as a function capable of taking values only at [math]\displaystyle{ x_l, x_2, ..., x_n }[/math] is replaced by a continuous function having peaks where the data are densest. Examples of kernel functions are the Gaussian kernel,
[math]\displaystyle{ K(x, t) = \text{exp} \bigg \{ - {(x-t)^2 \over 2h^2} \bigg \} - \infty \lt x \lt \infty }[/math]
[math]\displaystyle{ K(x, t) = \left\{ \begin{array}{l l} 1 & - {(x-t)^2 \over 5h^2,}- \sqrt{5}h \lt x - t \lt \sqrt{5}h,\\ 0 & \text{otherwise}\\ \end{array} \right. }[/math]

The constant h is the window width or bandwidth. The choice of h is critical: small values may retain the spikes of the naive estimate, and large values may oversmooth so that important aspects of f are lost.

 	  	----