# Trained Point-Estimation Structure

A Trained Point-Estimation Structure is a point estimation function that is a trained prediction function.

**AKA:**Fitted Regressor, Regressed Point Estimation Model, Fitted Point Estimator.**Context:**- It can (typically) be produced by a Supervised Point Estimation System (that solves a point estimation model training task)
- It can (often) be an attempt to approximate some underlying True Probability Function.
- It can range from being a Regressed Linear Model to being a Regressed Nonlinear Model, depending on the regression metamodel.
- It can (typically) contain an Error Term.
- It can be an input to a Point Estimator Evaluation Task.

**Example(s):**- a specific Trained Regression Tree.
- a Fitted Linear Function.
- …

**Counter-Example(s):****See:**Loss Function, Independent Random Variable, Range Estimation Function, Supervised Parametric Regression.

## References

### 2011

- (Quadrianto & Buntine, 2011c) ⇒ Novi Quadrianto, and Wray L. Buntine. (2011). “Regression.” In: (Sammut & Webb, 2011).
- QUOTE: Regression is a fundamental problem in statistics and machine learning. In regression studies, we are typically interested in inferring a real-valued function (called a regression function) whose values correspond to the mean of a dependent (or response or output) variable conditioned on one or more independent (or input) variables. Many different techniques for estimating this regression function have been developed, including parametric, semi-parametric, and nonparametric methods

### 2007

- (Caponnetto & De Vito, 2007) ⇒ Andrea Caponnetto, and Ernesto De Vito. (2007). “Optimal Rates for the Regularized Least-Squares Algorithm." Foundations of Computational Mathematics. doi:10.1007/s10208-006-0196-8
- QUOTE: ... The aim of a regression algorithm is estimating a particular invariant of the unknown distribution: the
*regression function*, using only the available empirical samples.

- QUOTE: ... The aim of a regression algorithm is estimating a particular invariant of the unknown distribution: the

### 2005

- (Dekel et al., 2005) ⇒ Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer. (2005). “Smooth ε-Insensitive Regression by Loss Symmetrization.” In: The Journal of Machine Learning Research, 6. doi:10.1007/978-3-540-45167-9_32
- QUOTE: The focus of this paper is supervised learning of real-valued functions. We observe a sequence [math]\displaystyle{ S = {(x_1,y_1),...,(x_m,y_m)} }[/math] of instance-target pairs, where the instances are vectors in [math]\displaystyle{ \R^n }[/math] and the targets are real-valued scalars, [math]\displaystyle{ y_i \in \mathbb{R} }[/math]. Our goal is to learn a function [math]\displaystyle{ f : \mathbb{R}^n \rightarrow \mathbb{R} }[/math] which provides a good approximation of the target values from their corresponding instance vectors. Such a function is often referred to as a regression function or a regressor for short. Regression problems have long been the focus of research papers in statistics and learning theory (see for instance the book by Hastie, Tibshirani, and Friedman (2001) and the references therein). In this paper we discuss learning of linear regressors, that is, [math]\displaystyle{ f }[/math] is of the form [math]\displaystyle{ f(\bf{x}) = \lambda \cdot \bf{x} }[/math]. This setting is also suitable for learning a linear combination of base regressors of the form [math]\displaystyle{ f(\bf{x}) = \Sigma^l_{j=1} \lambda_j h_j(\bf{x}) }[/math] where each base regressor [math]\displaystyle{ h_j }[/math] is a mapping from an instance domain [math]\displaystyle{ X }[/math] into [math]\displaystyle{ \R }[/math]. The latter form enables us to employ kernels by setting [math]\displaystyle{ h_j(\bf{x}) = K(x_j,\bf{x}) }[/math].
The class of linear regressors is rather restricted. Furthermore, in real applications both the instances and the target values are often corrupted by noise and a perfect mapping such that for all [math]\displaystyle{ (x_i, y_i) \in S }[/math], [math]\displaystyle{ f(x_i) = y_i }[/math] is usually unobtainable. Hence, we employ a loss function [math]\displaystyle{ L : \mathbb{R} \times \mathbb{R} \rightarrow \R_+ }[/math] which determines the penalty for a discrepancy between the predicted target, [math]\displaystyle{ f(\bf{x}) }[/math], and the true (observed) target [math]\displaystyle{ y }[/math]. As we discuss shortly, the loss functions we consider in this paper depend only on the discrepancy between the predicted target and the true target [math]\displaystyle{ \delta = f(\bf{x}) − y }[/math], hence [math]\displaystyle{ L }[/math] can be viewed as a function from </math>\R</math> into [math]\displaystyle{ \R_+ }[/math]. We therefore allow ourselves to overload our notation and denote [math]\displaystyle{ L(\delta) = L(f(\bf{x}),y) }[/math].

Given a loss function </math>L</math>, the goal of a regression algorithm is to find a regressor [math]\displaystyle{ f }[/math] which attains a small total loss on the training set [math]\displaystyle{ S }[/math], :[math]\displaystyle{ \text{Loss}(\lambda,S) = \Sigma^m_{i=1}L(f(x_i)-y_i) = \Sigma^m_{i=1}L(\lambda \cdot x_i - y_1) }[/math]

- QUOTE: The focus of this paper is supervised learning of real-valued functions. We observe a sequence [math]\displaystyle{ S = {(x_1,y_1),...,(x_m,y_m)} }[/math] of instance-target pairs, where the instances are vectors in [math]\displaystyle{ \R^n }[/math] and the targets are real-valued scalars, [math]\displaystyle{ y_i \in \mathbb{R} }[/math]. Our goal is to learn a function [math]\displaystyle{ f : \mathbb{R}^n \rightarrow \mathbb{R} }[/math] which provides a good approximation of the target values from their corresponding instance vectors. Such a function is often referred to as a regression function or a regressor for short. Regression problems have long been the focus of research papers in statistics and learning theory (see for instance the book by Hastie, Tibshirani, and Friedman (2001) and the references therein). In this paper we discuss learning of linear regressors, that is, [math]\displaystyle{ f }[/math] is of the form [math]\displaystyle{ f(\bf{x}) = \lambda \cdot \bf{x} }[/math]. This setting is also suitable for learning a linear combination of base regressors of the form [math]\displaystyle{ f(\bf{x}) = \Sigma^l_{j=1} \lambda_j h_j(\bf{x}) }[/math] where each base regressor [math]\displaystyle{ h_j }[/math] is a mapping from an instance domain [math]\displaystyle{ X }[/math] into [math]\displaystyle{ \R }[/math]. The latter form enables us to employ kernels by setting [math]\displaystyle{ h_j(\bf{x}) = K(x_j,\bf{x}) }[/math].

### 1998

- (Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). “Glossary of Terms.” In: Machine Leanring 30(2-3).
**Regressor**: A mapping from unlabeled instances to a value within a predefined metric space (e.g., a continuous range).