# 2013 ARiskComparisonofOrdinaryLeastS

- (Dhillon et al., 2013) ⇒ Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, and Lyle H. Ungar. (2013). “A Risk Comparison of Ordinary Least Squares Vs Ridge Regression.” In: The Journal of Machine Learning Research, 14(1).

**Subject Headings:** Ordinary Least Squares Estimate.

## Notes

## Cited By

- http://scholar.google.com/scholar?q=%222013%22+A+Risk+Comparison+of+Ordinary+Least+Squares+Vs+Ridge+Regression
- http://dl.acm.org/citation.cfm?id=2567709.2567711&preflayout=flat#citedby

## Quotes

### Abstract

We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a principal component analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method (PCA-OLS) is within a constant factor (namely 4) of the risk of ridge regression (RR).

### 1. Introduction

Consider the fixed design setting where we have a set of n vectors X = {X_i}, and let X denote the matrix where the ith row of X is Xi. The observed label vector is [math]\displaystyle{ Y \in R^n }[/math].

Suppose that: Y = Xb+e, where e is independent noise in each coordinate, with the variance of ei being s2. The objective is to learn E[Y] = Xb. The expected loss of a vector b estimator is: L(b) = 1 n EY[kY -Xbk2], Let ˆb be an estimator of b (constructed with a sample Y). Denoting � := 1 n XTX,

we have that the risk (i.e., expected excess loss) is: Risk(ˆb) := Eˆb [L(ˆb)-L(b)] = Eˆb kˆb -bk2 �, where kxk� = x?�x and where the expectation is with respect to the randomness in Y.

We show that a simple variant of ordinary (un-regularized) least squares always compares favorably to ridge regression (as measured by the risk). This observation is based on the following bias variance decomposition:

Risk(ˆb) = Ekˆb - ¯b k2 � | {z }

Variance + k¯b -bk2 � | {z }

Prediction Bias , (1) where ¯b = E[ˆb].

#### 1.1 The Risk of Ridge Regression (RR)

Ridge regression or Tikhonov Regularization (Tikhonov, 1963) penalizes the l2 norm of a parameter vector b and “shrinks” it towards zero, penalizing large values more. The estimator is: ˆb l = argmin b {kY -Xbk2+lkbk2}. The closed form estimate is then: ˆb l = (�+lI)-1 � 1 n XTY � .

Note that ˆb 0 = ˆbl=0 = argmin b {kY -Xbk2}, is the ordinary least squares estimator. Without loss of generality, rotate X such that: � = diag(l1,l2, . . . ,lp), where the li’s are ordered in decreasing order.

To see the nature of this shrinkage observe that: [ˆbl] j := lj lj +l [ˆb0] j, where ˆb0 is the ordinary least squares estimator.

…

### 2. Ordinary Least Squares with PCA (PCA-OLS)

Now let us construct a simple estimator based on [math]\displaystyle{ \lambda }[/math]. Note that our rotated coordinate system where � is equal to diag(l1,l2, . . . ,lp) corresponds the PCA coordinate system.

Consider the following ordinary least squares estimator on the “top” PCA subspace — it uses the least squares estimate on coordinate j if lj = l and 0 otherwise

[ˆbPCA,l] j = � [ˆb0] j if lj = l 0 otherwise .

…

### 3. Experiments

First, we generated synthetic data with p = 100 and varying values of n= {20, 50, 80, 110}. …

…

### 4. Conclusion

We showed that the risk inflation of a particular ordinary least squares estimator (on the “top” PCA subspace) is within a factor 4 of the ridge estimator. It turns out the converse is not true — this PCA estimator may be arbitrarily better than the ridge one.

## References

- 1. D. P. Foster and E. I. George. The Risk Inflation Criterion for Multiple Regression.
*The Annals of Statistics*, Pages 1947-1975, 1994. - 2. A. N. Tikhonov. Solution of Incorrectly Formulated Problems and the Regularization Method.
*Soviet Math Dokl 4*, Pages 501-504, 1963.

;

Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|

2013 ARiskComparisonofOrdinaryLeastS | Lyle H. Ungar Paramveer S. Dhillon Dean P. Foster Sham M. Kakade | A Risk Comparison of Ordinary Least Squares Vs Ridge Regression |