Epanechnikov Kernel

An Epanechnikov Kernel is a quadratic bounded statistical kernel function that achieves optimal efficiency for kernel density estimation tasks.

AKA: Parabolic Kernel, Epanechnikov Function, Optimal Kernel Function.
Context:
- It can typically compute Kernel Weights using the formula [math]\displaystyle{ K(u) = \frac{3}{4}(1-u^2)\mathbf{1}_{|u| \leq 1} }[/math].
- It can typically achieve Minimal Mean Integrated Squared Error among all bounded kernel functions.
- It can typically provide Compact Support on the interval [-1, 1], yielding computational efficiency.
- It can typically satisfy Kernel Properties including non-negativity, symmetry, and unit integration.
- It can typically balance Bias-Variance Tradeoffs optimally in density estimation.
- ...
- It can often be implemented in Kernel Density Estimators as [math]\displaystyle{ \hat{f}_h(x) = \frac{1}{nh}\sum_{i=1}^n K\left(\frac{x-x_i}{h}\right) }[/math].
- It can often outperform other kernel functions in terms of asymptotic mean squared error.
- It can often be generalized to Multivariate Epanechnikov Kernels for high-dimensional density estimation.
- It can often require Bandwidth Selection to control the smoothing parameter.
- ...
- It can range from being a Narrow Bandwidth Epanechnikov Kernel to being a Wide Bandwidth Epanechnikov Kernel, depending on its kernel bandwidth parameter.
- It can range from being a Univariate Epanechnikov Kernel to being a Multivariate Epanechnikov Kernel, depending on its kernel dimensionality.
- It can range from being a Standard Epanechnikov Kernel to being a Modified Epanechnikov Kernel, depending on its kernel normalization.
- ...
- It can have Mathematical Characteristics such as:
  - Zero Values outside the support interval |u| > 1.
  - Quadratic Decay from center to boundaries.
  - Efficiency Factor of 1.0 (optimal among all kernels).
  - Roughness Value R(K) = 3/5.
- It can satisfy Optimality Conditions including:
  - Minimum Asymptotic Mean Integrated Squared Error (AMISE).
  - Maximum Efficiency relative to the gaussian kernel.
  - Optimal Convergence Rate for nonparametric density estimation.
- ...
Example(s):
- Standard Epanechnikov Kernel Implementations, such as:
  - One-Dimensional Epanechnikov Kernel: [math]\displaystyle{ K(x) = \frac{3}{4}(1-x^2) }[/math] for [math]\displaystyle{ |x| \leq 1 }[/math].
  - Scaled Epanechnikov Kernel: [math]\displaystyle{ K_h(x) = \frac{1}{h}K\left(\frac{x}{h}\right) }[/math] with bandwidth parameter h.
- Multivariate Epanechnikov Kernels, such as:
  - Product Epanechnikov Kernel: [math]\displaystyle{ K(\mathbf{x}) = \prod_{j=1}^d K(x_j) }[/math] for independent dimensions.
  - Spherical Epanechnikov Kernel: [math]\displaystyle{ K(\mathbf{x}) = \frac{d+2}{2c_d}(1-\|\mathbf{x}\|^2)\mathbf{1}_{\|\mathbf{x}\| \leq 1} }[/math].
- Application-Specific Epanechnikov Kernels, such as:
- Modified Epanechnikov Kernels, such as:
- ...
Counter-Example(s):
- Uniform Kernel Function, which lacks optimal efficiency property and has discontinuous derivatives.
- Triangular Kernel Function, which has lower efficiency and linear decay.
- Gaussian Kernel Function, which has unbounded support and infinite tails.
- Quartic Kernel Function, which has higher polynomial order and different efficiency.
- Triweight Kernel Function, which uses sixth-order polynomial instead of quadratic form.
- Tricube Kernel Function, which has different support interval and polynomial structure.
- Cosine Kernel Function, which uses trigonometric functions rather than polynomials.
- Logistic Kernel Function, which has unbounded support and exponential tails.
- Sigmoid Kernel Function, which lacks symmetry property required for density estimation.
- Silverman Kernel Function, which combines gaussian and uniform characteristics.
See: Kernel Density Estimation, Nonparametric Statistics, Bandwidth Selection, Mean Integrated Squared Error, Kernel Smoothing, Multivariate Density Estimation, Kernel Regression, Optimal Kernel Theory, Epanechnikov Distribution.

References

2017a

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use Retrieved:2017-7-16.
- Several types of kernel functions are commonly used: uniform, triangle, Epanechnikov, ^[1] quartic (biweight), tricube, triweight, Gaussian, quadratic and cosine.
  In the table below, if [math]\displaystyle{ K }[/math] is given with a bounded support, then [math]\displaystyle{ K(u) = 0 }[/math] for values of u lying outside the support.

Epanechnikov (parabolic)	[math]\displaystyle{ K(u) = \frac{3}{4}(1-u^2) }[/math] Support: [math]\displaystyle{ \|u\|\leq 1 }[/math]	...

2017b

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Kernel_density_estimation#Definition Retrieved:2017-7-16.
- Let (x₁, x₂, …, x_n) be an independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is : [math]\displaystyle{ \hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), }[/math] where K(•) is the kernel — a non-negative function that integrates to one and has mean zero — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as . Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.
  A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously,^[2] and due to its convenient mathematical properties, the normal kernel is often used, which means , where ϕ is the standard normal density function. The construction of a kernel density estimate finds interpretations in fields outside of density estimation.^[3] For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations x_i. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning. Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. To see this, we compare the construction of histogram and kernel density estimators, using these 6 data points: x₁ = −2.1, x₂ = −1.3, x₃ = −0.4, x₄ = 1.9, x₅ = 5.1, x₆ = 6.2. For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover the range of the data. In this case, we have 6 bins each of width 2. Whenever a data point falls inside this interval, we place a box of height 1/12. If more than one data point falls inside the same bin, we stack the boxes on top of each other. For the kernel density estimate, we place a normal kernel with variance 2.25 (indicated by the red dashed lines) on each of the data points x_i. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate is evident compared to the discreteness of the histogram, as kernel density estimates converge faster to the true underlying density for continuous random variables.

1992

(Scott, 1992) ⇒ David W. Scott. (1992). “Multivariate Density Estimation: theory, practice, and visualization." Wiley. ISBN:0471547700
- BOOK PREVIEW: Density estimation has long been recognized as an important tool when used with univariate and bivariate data. But the computer revolution of recent years has provided access to data of unprecedented complexity in ever-growing volume. New tools are required to detect and summarize the multivariate structure of these difficult data. Multivariate Density Estimation: Theory, Practice, and Visualization demonstrates that density estimation retains its explicative power even when applied to trivariate and quadrivariate data. By presenting the major ideas in the context of the classical histogram, the text simplifies the understanding of advanced estimators and develops links between the intuitive histogram and other methods that are more statistically efficient. The theoretical results covered are those particularly relevant to application and understanding. The focus is on methodology, new ideas, and practical advice. A hierarchical approach draws attention to the similarities among different estimators. Also, detailed discussions of nonparametric dimension reduction, nonparametric regression, additive modeling, and classification are included. Because visualization is a key element in effective multivariate nonparametric analysis, more than 100 graphic illustrations supplement the numerous problems and examples presented in the text. In addition, sixteen four-color plates help to convey an intuitive feel for both the theory and practice of density estimation in several dimensions. Ideal as an introductory textbook, Multivariate Density Estimation is also an indispensable professional reference for statisticians, biostatisticians, electrical engineers, econometricians, and other scientists involved in data analysis.

1969

(Epanechnikov, 1969) ⇒ Epanechnikov, V. A. (1969). “Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications", 14(1), 153-158. DOI:10.1137/1114019
- QUOTE: Introduction
  Let

[math]\displaystyle{ X_i= X(x_1^{(i)},x_2^{(i)},\cdots,x_k^{(i)}),\quad i=1,\cdots, n }[/math],

be a given sample of [math]\displaystyle{ n }[/math] independent realizations of a k-dimensional random variable [math]\displaystyle{ X(x_1^{(i)},x_2^{(i)},\cdots,x_k^{(i)}) }[/math] from a population characterized by a continuous k-variate probability density [math]\displaystyle{ f(x_1,x_2,\cdots,x_k) }[/math]. We define the multivariate empirical probability density [math]\displaystyle{ f_n(x_1,x_2,\cdots,x_k) }[/math] to be the function of sample values [math]\displaystyle{ X_i }[/math] given by

[math]\displaystyle{ (1) \quad f_n(x_1,x_2,\cdots,x_k)=\frac{1}{n}\sum_{i=1}^n\prod_{i=1}^k\frac{1}{h_\ell(n)}K_\ell\Big(\frac{x_\ell-x_\ell^i}{h_\ell(n)}\Big) }[/math]

Each “kernel” [math]\displaystyle{ K_\ell(y) }[/math] has the following properties:

(a) [math]\displaystyle{ 0 \leq K_\ell(y)\lt C \lt \infty }[/math]

(b) [math]\displaystyle{ K_\ell(y)=K_\ell(-y) }[/math]

(c) [math]\displaystyle{ \int_{-\infty}^{+\infty} K_\ell(y)dy=1 }[/math]

(2)

(d) [math]\displaystyle{ \int_{-\infty}^{+\infty} K_\ell(y)y^2dy=1 }[/math]

(e) [math]\displaystyle{ \int_{-\infty}^{+\infty} K_\ell(y)y^mdy\lt \infty }[/math] for [math]\displaystyle{ 0\leq m \lt \infty }[/math]

and the "spreading" coefficients [math]\displaystyle{ h_\ell(n) }[/math] of the kernels depend in general on the sample size [math]\displaystyle{ n }[/math] and tend to zero as [math]\displaystyle{ n\rightarrow \infty }[/math].

↑ Named for Epanechnikov, V. A. (1969). “Non-Parametric Estimation of a Multivariate Probability Density". Theory Probab. Appl. 14 (1): 153–158. doi:10.1137/1114019.
↑ Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC. ISBN 0-412-55270-1.
↑ Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. (2010). “Kernel density estimation via diffusion". Annals of Statistics. 38 (5): 2916–2957. doi:10.1214/10-AOS799.

[1] Named for Epanechnikov, V. A. (1969). “Non-Parametric Estimation of a Multivariate Probability Density". Theory Probab. Appl. 14 (1): 153–158. doi:10.1137/1114019.

[WJ1995-2] Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC. ISBN 0-412-55270-1.

[bo10-3] Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. (2010). “Kernel density estimation via diffusion". Annals of Statistics. 38 (5): 2916–2957. doi:10.1214/10-AOS799.

[1]

[2]

[3]