Expanded Basis Function

From GM-RKB
Jump to navigation Jump to search

An Expanded Basis Function is a series expansion of a basis function (for linear regression).



References

2010

[math]\displaystyle{ f(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} }[/math]
Many machine learning textbooks, however, introduce linear methods with an explicit intercept[1] term [math]\displaystyle{ w_0 }[/math], as something like
[math]\displaystyle{ f(\mathbf{x}) = w_0 + \mathbf{w} \cdot \mathbf{x} \quad (1.1) }[/math]
In learning, both the parameters [math]\displaystyle{ \mathbf{w} }[/math] and [math]\displaystyle{ w_0 }[/math] need to be adjusted. We have not bothered with this because our original model can be made equivalent by “tacking” a constant term onto [math]\displaystyle{ \mathbf{x} }[/math]. Define the function [math]\displaystyle{ \phi }[/math] which just takes the vector [math]\displaystyle{ \mathbf{x} }[/math], and prepend a constant of 1 by
[math]\displaystyle{ \phi(\mathbf{x}) = (1, \mathbf{x})\quad (1.2) }[/math]
Then, if we take all our training data, and replace each element [math]\displaystyle{ (\hat{y}, \hat{\mathbf{x}}) }[/math] by [math]\displaystyle{ (\hat{y}, \phi(\hat{\mathbf{x}})) }[/math], then we will have done the equivalent of adding an intercept term. This is a special example of a straightforward but powerful idea known as “basis expansion”.

2009

[math]\displaystyle{ f_\theta(x) = \sum_{k=1}^K h_k(x)θ_k, \quad }[/math] (2.30)
where the [math]\displaystyle{ h_k }[/math] are a suitable set of functions or transformations of the input vector [math]\displaystyle{ x }[/math].
Denote by [math]\displaystyle{ h_m(X) : I\!R^p \rightarrow I\!R }[/math] the m-th transformation of [math]\displaystyle{ X,\; m = 1, \cdots, M }[/math]. We then model
[math]\displaystyle{ f(X) = \sum_{m=1}^M\beta_m h_m(X) \quad (5.1) }[/math]
a linear basis expansion in [math]\displaystyle{ X }[/math]. The beauty of this approach is that once the basis functions [math]\displaystyle{ h_m }[/math] have been determined, the models are linear in these new variables, and the fitting proceeds as before.
Some simple and widely used examples of the hm are the following:
  • [math]\displaystyle{ h_m(X) = X_m,\; m = 1, \cdots, p }[/math] recovers the original linear model.
  • [math]\displaystyle{ h_m(X) = X_j^2 }[/math] or [math]\displaystyle{ h_m(X) = X_jX_k }[/math] allows us to augment the inputs with polynomial terms to achieve higher-order Taylor expansions. Note, however, that the number of variables grows exponentially in the degree of the polynomial. A full quadratic model in [math]\displaystyle{ p }[/math] variables requires [math]\displaystyle{ O(p^2) }[/math] square and cross-product terms, or more generally [math]\displaystyle{ O(p^d) }[/math] for a degree-d polynomial.
  • [math]\displaystyle{ h_m(X) = \log(X_j ), \sqrt{X_j} , \cdots }[/math] permits other nonlinear transformations of single inputs. More generally one can use similar functions involving several inputs, such as [math]\displaystyle{ h_m(X) = ||X|| }[/math].
  • [math]\displaystyle{ h_m(X) = I(L_m \leq X_k \lt U_m) }[/math], an indicator for a region of [math]\displaystyle{ X_k }[/math]. By breaking the range of [math]\displaystyle{ X_k }[/math] up into [math]\displaystyle{ M_k }[/math] such non-overlapping regions results in a model with a piecewise constant contribution for [math]\displaystyle{ X_k }[/math].

  1. The terminology “bias” is more common, but we will stick to “intercept”, since this has nothing to do with the “bias” we discuss in the bias-variance tradeoff.