One-half Squared-Error Cost Function

From GM-RKB
Jump to navigation Jump to search

An One-half Squared-Error Cost Function is a Squared-Error Cost Function that is an average sum-of-squares error plus a regularization term.



References

2014

  • (DL, 2014) ⇒ http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm
    • QUOTE: Suppose we have a fixed training set [math]\displaystyle{ \{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \} }[/math] of [math]\displaystyle{ m }[/math] training examples. We can train our neural network using batch gradient descent. In detail, for a single training example [math]\displaystyle{ (x,y) }[/math], we define the cost function with respect to that single example to be:

      [math]\displaystyle{ \begin{align} J(W,b; x,y) = \frac{1}{2} \left\| h_{W,b}(x) - y \right\|^2. \end{align} }[/math]

      This is a (one-half) squared-error cost function. Given a training set of [math]\displaystyle{ m }[/math] examples, we then define the overall cost function to be:

      [math]\displaystyle{ \begin{align} J(W,b) &= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \\ &= \left[ \frac{1}{m} \sum_{i=1}^m \left( \frac{1}{2} \left\| h_{W,b}(x^{(i)}) - y^{(i)} \right\|^2 \right) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \end{align} }[/math]

      The first term in the definition of [math]\displaystyle{ J(W,b) }[/math] is an average sum-of-squares error term. The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.