Kernel Function
(Redirected from kernel function)
Jump to navigation
Jump to search
A Kernel Function is a Mathematical Function that computes inner products in a feature space without explicitly mapping data points to that feature space.
- AKA: Kernel, Kernel Method, Kernel Mapping.
- Context:
- It can typically compute Inner Products between data points in a high-dimensional feature space through the kernel trick.
- It can typically enable Nonlinear Pattern Recognition by implicitly mapping input data to a higher-dimensional space.
- It can typically satisfy Mercer's Conditions to guarantee the existence of a corresponding feature space mapping.
- It can typically preserve Computational Efficiency by avoiding explicit feature space computations.
- It can typically incorporate Domain Knowledge through custom kernel design.
- ...
- It can often support Machine Learning Algorithms including support vector machines, kernel principal component analysis, and gaussian processes.
- It can often enable Similarity Computation between complex structures such as strings, trees, and graphs.
- It can often combine with other Kernel Functions through kernel composition operations.
- It can often require Hyperparameter Tuning for optimal model performance.
- ...
- It can range from being a Simple Linear Kernel Function to being a Complex Non-Linear Kernel Function, depending on its kernel complexity.
- It can range from being a Local Kernel Function to being a Global Kernel Function, depending on its kernel influence scope.
- It can range from being a Stationary Kernel Function to being a Non-Stationary Kernel Function, depending on its kernel translation invariance.
- ...
- It can have Mathematical Properties such as:
- Positive Semi-Definiteness ensuring valid inner product computation.
- Closure Properties under addition, multiplication, and scalar multiplication.
- Reproducing Property in its associated reproducing kernel Hilbert space.
- It can be characterized by its Kernel Parameters such as:
- ...
- Example(s):
- Linear Kernel Functions, such as:
- Standard Linear Kernel: K(x,z) = [math]\displaystyle{ \langle x, z \rangle }[/math]
- Homogeneous Linear Kernel: K(x,z) = [math]\displaystyle{ \langle x, z \rangle + c }[/math]
- Polynomial Kernel Functions, such as:
- Quadratic Kernel Function: K(x,z) = [math]\displaystyle{ (\langle x, z \rangle + c)^2 }[/math]
- Cubic Kernel Function: K(x,z) = [math]\displaystyle{ (\langle x, z \rangle + c)^3 }[/math]
- General Polynomial Kernel: K(x,z) = [math]\displaystyle{ (\langle x, z \rangle + c)^d }[/math]
- Radial Basis Function Kernels, such as:
- Gaussian RBF Kernel: K(x,z) = [math]\displaystyle{ \exp(-\gamma||x-z||^2) }[/math]
- Laplacian RBF Kernel: K(x,z) = [math]\displaystyle{ \exp(-\gamma||x-z||_1) }[/math]
- Exponential Kernel: K(x,z) = [math]\displaystyle{ \exp(-\gamma||x-z||) }[/math]
- Statistical Kernel Functions, such as:
- Epanechnikov Kernel: [math]\displaystyle{ K(u) = \frac{3}{4}(1-u^{2})\mathbf{1}_{|u| \leq 1} }[/math]
- Uniform Kernel: [math]\displaystyle{ K(u) = \frac{1}{2}\mathbf{1}_{|u| \leq 1} }[/math]
- Triangular Kernel: [math]\displaystyle{ K(u) = (1-|u|)\mathbf{1}_{|u| \leq 1} }[/math]
- Quartic Kernel Function: [math]\displaystyle{ K(u) = \frac{15}{16}(1-u^{2})^{2}\mathbf{1}_{|u| \leq 1} }[/math]
- Triweight Kernel Function: [math]\displaystyle{ K(u) = \frac{35}{32}(1-u^{2})^{3}\mathbf{1}_{|u| \leq 1} }[/math]
- Gaussian Kernel Function: [math]\displaystyle{ K(u) = \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) }[/math]
- Cosine Kernel Function: [math]\displaystyle{ K(u) = \frac{\pi}{4}\cos(\frac{\pi}{2}u)\mathbf{1}_{|u| \leq 1} }[/math]
- Structured Data Kernels, such as:
- Specialized Kernel Functions, such as:
- Fisher Kernel Function incorporating probabilistic models.
- Sigmoid Kernel Function: K(x,z) = [math]\displaystyle{ \tanh(\alpha \langle x, z \rangle + c) }[/math]
- Spectral-Mixture Kernel Function for multi-scale patterns.
- Positive-Definite Kernel Functions guaranteeing valid metric spaces.
- ...
- Linear Kernel Functions, such as:
- Counter-Example(s):
- Distance Functions, which measure dissimilarity rather than similarity (though some can be converted to kernels).
- Loss Functions, which evaluate prediction errors rather than data similarity.
- Activation Functions, which perform element-wise transformations rather than pairwise computations.
- Hash Functions, which map to discrete spaces rather than computing continuous similaritys.
- See: Kernel Trick, Reproducing Kernel Hilbert Space, Kernel-based Learning Algorithm, Support Vector Machine, Gaussian Process, Kernel Matrix, Feature Space, Kernel Density Estimation, Multiple Kernel Learning, Deep Kernel Learning.
References
- tbd
- If X is the instance space, a kernel function is a mapping K:X
x
X->[0,infinity) such that given two instances [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], K(x,y) = SUM(i) ti(x) ti(y) = t(x)·t(y), where ti(x) is some feature function over the instance x.
- If X is the instance space, a kernel function is a mapping K:X
2008
- http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xlghtmlnode33.html
- QUOTE: Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.
Table 6.1: Kernel functions.
- Kernel [math]\displaystyle{ K(u) }[/math]
- Uniform [math]\displaystyle{ \frac{1}{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triangle [math]\displaystyle{ (1-\vert u \vert) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Epanechnikov [math]\displaystyle{ \frac{3}{4}(1-u^{2}) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Quartic [math]\displaystyle{ \frac{15}{16}(1-u^{2})^{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triweight [math]\displaystyle{ \frac{35}{32}(1-u^{2})^{3} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Gaussian [math]\displaystyle{ \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) }[/math]
- Cosinus [math]\displaystyle{ \frac{\pi}{4}\cos(\frac{\pi}{2}u) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- QUOTE: Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.
2004
- (Lanckriet et al., 2004a) ⇒ Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. (2004). “Learning the Kernel Matrix with Semidefinite Programming.” In: The Journal of Machine Learning Research, 5.
- QUOTE: Kernel-based learning algorithms (see, for example, Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002; Shawe-Taylor and Cristianini, 2004) work by embedding the data into a Hilbert space, and searching for linear relations in such a space. The embedding is performed implicitly, by specifying the inner product between each pair of points rather than by giving their coordinates explicitly. This approach has several advantages, the most important deriving from the fact that the inner product in the embedding space can often be computed much more easily than the coordinates of the points themselves. …
… Definition 1: A kernel is a function [math]\displaystyle{ k }[/math], such that [math]\displaystyle{ k(\mathbf{x}, \mathbf{z}) = \lt \Phi(\mathbf{x}); \Phi(\mathbf{z})\gt }[/math] for all x; z 2 X, where ff is a mapping from X to an (inner product) feature space F. …
- QUOTE: Kernel-based learning algorithms (see, for example, Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002; Shawe-Taylor and Cristianini, 2004) work by embedding the data into a Hilbert space, and searching for linear relations in such a space. The embedding is performed implicitly, by specifying the inner product between each pair of points rather than by giving their coordinates explicitly. This approach has several advantages, the most important deriving from the fact that the inner product in the embedding space can often be computed much more easily than the coordinates of the points themselves. …
1997
- (Mitchell, 1997) ⇒ Tom M. Mitchell. (1997). “Machine Learning." McGraw-Hill.
- Much of the literature on nearest-neighbor methods and weighted local regression uses a terminology that has arisen from the field of statistical pattern recognition....
- Regression means approximating a real-valued target function.
- Residual is the error f^(x) - [math]\displaystyle{ f }[/math](x) in approximating the target function.
- Kernel function is the function of distance that is used to determine the weight of each training example. In other words, the kernel function is the function [math]\displaystyle{ K }[/math] such that wi = K(d(xi, xq)).
- Much of the literature on nearest-neighbor methods and weighted local regression uses a terminology that has arisen from the field of statistical pattern recognition....