Regularized Learning Task

Context:
- It can be solved by a Regularized Learning System (that implements a regularized learning algorithm).
- …
Counter-Example(s):
- a Classification Tree Prunning Task.
See: L1 Regularization, L2 Regularization.

References

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/regularization_(mathematics) Retrieved:2015-11-8.
- Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm.
  A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.
  The same idea arose in many fields of science. For example, the least-squares method can be viewed as a very simple form of regularization. A simple form of regularization applied to integral equations, generally termed Tikhonov regularization after Andrey Nikolayevich Tikhonov, is essentially a trade-off between fitting the data and reducing a norm of the solution. More recently, non-linear regularization methods, including total variation regularization have become popular.

(Mohamed, 2015) ⇒ Shakir Mohamed (2015). “A Statistical View of Deep Learning (V): Generalisation and Regularisation.” In: Personal Blog, 10 May 2015
- QUOTE: The principle technique for addressing overfitting in deep learning is by regularisation — adding additional penalties to our training objective that prevents the model parameters from becoming large and from fitting to the idiosyncrasies of the training data. This transforms our estimation framework from maximum likelihood into a maximum penalised likelihood, or more commonly maximum a posteriori (MAP) estimation (or a shrinkage estimator). For a deep model with loss function L(θ) and parameters θ, we instead use the modified loss that includes a regularisation function R:
  L(θ)=−∑nlogp(yn|xn,θ)+1λR(θ)