# Backpropagation of Errors (BP)-based Training Algorithm

(Redirected from back propagation)

A Backpropagation of Errors (BP)-based Training Algorithm is a feed-forward supervised neural network training algorithm that applies a delta rule to feed the intermediate NNets loss function gradient (which is used to update the neuron weights to minimize the loss function).

## References

### 2017b

\begin{align}J(W,b; x,y) = \frac{1}{2} \left\| h_{W,b}(x) - y \right\|^2.\end{align}

This is a (one-half) squared-error cost function. Given a training set of $m$ examples, we then define the overall cost function to be:

\begin{align}J(W,b)&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \\ &= \left[ \frac{1}{m} \sum_{i=1}^m \left( \frac{1}{2} \left\| h_{W,b}(x^{(i)}) - y^{(i)} \right\|^2 \right) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \end{align}

The first term in the definition of $J(W,b)$ is an average sum-of-squares error term. The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.