# Least-Squares Function Fitting Algorithm

A Least-Squares Function Fitting Algorithm is a function fitting algorithm can be implemented by a least-squares system to solve a least-squares task (to provide a minimizing least-squares solution).

**AKA:**Method of Least Squares, LSE Fitting/Regression.**Context:**- It consists of a minimization algorithm applied to the objective function of the form: [math]S=\sum _{i=1}^{n}|r_i|^{2}[/math].
- It can (often) be applied as a Model-based Supervised Learning Algorithm, such as a model-based point estimation algorithm.
- It can be applied by a Least Squares Estimation System (that can solve a least squares estimation task).
- It can be applied by a Least-Squares System (to solve a least-squares task).
- It can range from being an Ordinary Least Squares Algorithm and a Generalized Least Squares Algorithm.
- It can range from being an Ordinary Least Squares Algorithm to being a Regularized Least Squares Algorithm.
- It can range from being a Linear Least Squares Algorithm to being a Non-Linear Least Squares Algorithm.
- It can range from being a Non-Regularized Least Squares Algorithm to being a Regularized Least Squares Algorithm.
- It can range from being a Least-Squares Point Estimation Algorithm to being a Least-Squares Ranking Algorithm to being a Least-Squares Classification Algorithm???

**Example(s):****Counter-Example(s):****See:**Regression Algorithm, Root Mean Squared Error, Partial Least Squares (PLS) Regression, Overdetermined System, Closed-Form Expression, Method of Moments.

## References

### 2015

- http://en.wikipedia.org/wiki/Category:Least_squares
- Coefficient of determination, Discrete least squares meshless method, Explained sum of squares, Fraction of variance unexplained, Gauss–Newton algorithm, Generalized least squares, Iteratively reweighted least squares, Lack-of-fit sum of squares, Least squares (function approximation), Least squares adjustment, Least squares support vector machine, Levenberg–Marquardt algorithm, Linear least squares (mathematics), Mean squared error, Moving least squares, Non-linear iterative partial least squares, Non-linear least squares, Non-negative least squares, Numerical smoothing and differentiation, Ordinary least squares, Partial least squares path modeling, Partial least squares regression, Partition of sums of squares, Polynomial least squares, Proofs involving ordinary least squares, Residual sum of squares, Total least squares, Total sum of squares.

### 2014a

- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/least_squares Retrieved:2014-11-23.
- The method of
**least squares**is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in the results of every single equation.The most important application is in data fitting. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model. When the problem has substantial uncertainties in the independent variable (the 'x' variable), then simple regression and least squares methods have problems; in such cases, the methodology required for fitting errors-in-variables models may be considered instead of that for least squares.

Least squares problems fall into two categories: linear or ordinary least squares and non-linear least squares, depending on whether or not the residuals are linear in all unknowns. The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution. A closed-form solution (or closed-form expression) is any formula that can be evaluated in a finite number of standard operations. The non-linear problem has no closed-form solution and is usually solved by iterative refinement; at each iteration the system is approximated by a linear one, and thus the core calculation is similar in both cases.

When the observations come from an exponential family and mild conditions are satisfied, least-squares estimates and maximum-likelihood estimates are identical. The method of least squares can also be derived as a method of moments estimator. The following discussion is mostly presented in terms of linear functions but the use of least-squares is valid and practical for more general families of functions. Also, by iteratively applying local quadratic approximation to the likelihood (through the Fisher information), the least-squares method may be used to fit a generalized linear model. For the topic of approximating a function by a sum of others using an objective function based on squared distances, see least squares (function approximation). The least-squares method is usually credited to Carl Friedrich Gauss (1795),

^{[1]}but it was first published by Adrien-Marie Legendre.

- The method of

- ↑ Cite error: Invalid
`<ref>`

tag; no text was provided for refs named`brertscher`

### 2014b

- http://en.wikipedia.org/wiki/Least_squares#Problem_statement
- The objective consists of adjusting the parameters of a model function to best fit a data set. A simple data set consists of
*n*points (data pairs) [math](x_i,y_i)\![/math], i*= 1, ...,*n*, where [math]x_i\![/math] is an independent variable and [math]y_i\![/math] is a dependent variable whose value is found by observation. The model function has the form [math]f(x,\beta)[/math], where the*m*adjustable parameters are held in the vector [math]\boldsymbol \beta[/math]. The goal is to find the parameter values for the model which "best" fits the data. The least squares method finds its optimum when the sum,*S*, of squared residuals :[math]S=\sum_{i=1}^{n}{r_i}^2[/math] is a minimum. A residual is defined as the difference between the actual value of the dependent variable and the value predicted by the model. :[math]r_i=y_i-f(x_i,\boldsymbol \beta)[/math].*

- The objective consists of adjusting the parameters of a model function to best fit a data set. A simple data set consists of

### 2012

- http://www.mathworks.com/help/optim/ug/least-squares-model-fitting-algorithms.html
- Least squares, in general, is the problem of finding a vector [math]x[/math] that is a local minimizer to a function that is a sum of squares, possibly subject to some constraints: [math]\underset{x}{\operatorname{min}} \, ||F(x)||^2_2 = \underset{x}{\operatorname{min}} \, \Sigma_iF^2_i(x)[/math] such that [math]A·x ≤ b; Aeq·x = beq; lb ≤ x ≤ ub[/math].
There are several Optimization Toolbox™ solvers available for various types of F(x) and various types of constraints:

- Least squares, in general, is the problem of finding a vector [math]x[/math] that is a local minimizer to a function that is a sum of squares, possibly subject to some constraints: [math]\underset{x}{\operatorname{min}} \, ||F(x)||^2_2 = \underset{x}{\operatorname{min}} \, \Sigma_iF^2_i(x)[/math] such that [math]A·x ≤ b; Aeq·x = beq; lb ≤ x ≤ ub[/math].

Solver | F(x) | Constraints |
---|---|---|

\ | C·x – d | None |

lsqnonneg | C·x – d | x ≥ 0 |

lsqlin | C·x – d | Bound, linear |

lsqnonlin | General F(x) | Bound |

lsqcurvefit | F(x, xdata) – ydata | Bound |

### 2009

- (WordNet, 2009) ⇒ http://wordnetweb.princeton.edu/perl/webwn?s=least%20squares
- S: (n) least squares, method of least squares (a method of fitting a curve to data points so as to minimize the sum of the squares of the distances of the points from the curve)

### 2007

- (Rifkin, 2007) ⇒ Ryan Rifkin. (2007). “Regularized Least Squares.” In: MIT Course, 9.520: Statistical Learning Theory and Applications, Spring 2007
- QUOTE: "You should be asking how the answers will be used and what is really needed from the computation. Time and time again someone will ask for the inverse of a matrix when all that is needed is the solution of a linear system; for an interpolating polynomial when all that is needed is its values at some point; for the solution of an ODE at a sequence of points when all that is needed is the limiting, steady-state value. A common complaint is that least squares curve-fitting couldn’t possibly work on this data set and some more complicated method is needed; in almost all such cases, least squares curve-fitting will work just fine because it is so very robust”

Leader, Numerical Analysis and Scientific Computation

- QUOTE: "You should be asking how the answers will be used and what is really needed from the computation. Time and time again someone will ask for the inverse of a matrix when all that is needed is the solution of a linear system; for an interpolating polynomial when all that is needed is its values at some point; for the solution of an ODE at a sequence of points when all that is needed is the limiting, steady-state value. A common complaint is that least squares curve-fitting couldn’t possibly work on this data set and some more complicated method is needed; in almost all such cases, least squares curve-fitting will work just fine because it is so very robust”

### 2003

- (Myung, 2003) ⇒ In Jae Myung. (2003). “Tutorial on Maximum Likelihood Estimation.” In: Journal of Mathematical Psychology, 47.
- QUOTE: There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in psychology (e.g., Rubin, Hinton, & Wenzel, 1999; Lamberts, 2000 but see Usher & McClelland, 2001) and is tied to many familiar statistical concepts such as linear regression, sum of squares error, proportion variance accounted for (i.e. [math]r^2[/math]), and root mean squared deviation. LSE, which unlike MLE requires no or minimal distributional assumptions, is useful for obtaining a descriptive measure for the purpose of summarizing observed data, but it has no basis for testing hypotheses or constructing confidence intervals. … Unlike least-squares estimation which is primarily a descriptive tool, MLE is a preferred method of parameter estimation in statistics and is an indispensable tool for many statistical modeling techniques, in particular in non-linear modeling with non-normal data. … MLE has many optimal properties in estimation: sufficiency (complete information about the parameter of interest contained in its MLE estimator); consistency (true parameter value that generated the data recovered asymptotically, i.e. for data of sufficiently large samples); efficiency (lowest-possible variance of parameter estimates achieved asymptotically); and parameterization invariance (same MLE solution obtained independent of the parametrization used). In contrast, no such things can be said about LSE. As such, most statisticians would not view LSE as a general method for parameter estimation, but rather as an approach that is primarily used with linear regression models. Further, many of the inference methods in statistics are developed based on MLE.

### 1974

- (Lawson & Lawson, 1974) ⇒ Charles L. Lawson, and Richard J. Hanson. (1974). “Solving Least Squares Problems." Prentice Hall. ISBN:0138225850

### 1805

- (Legendre, 1805) ⇒ Adrien-Marie Legendre. (1805). “Nouvelle formula pour réduire en distances vraies les distances apparentes de la Lune au Soleil ou à une étoile."
- https://archive.org/stream/sourcebookinmath00smit#page/576/mode/2up
- QUOTE: … Of all the principles which can be proposed for [making estimates from a sample], I think there is none more general, more exact, and more easy of application, than that of which we have made use… which consists of rendering the sum of the squares of the errors a minimum. ...