# 2-Layer ANN Training System

A 2-Layer ANN Training System is a ANN Training System that implements a 2-Layer ANN Training Algorithm to solve a 2-Layer ANN Training Task.

**AKA:**Single Hidden-Layer ANN Training System.**Context:**- It is based on a 2-Layer ANN Mathematical Model.

**Example(s):**- A 2-Layer Feed-Forward Neural Network Training System applied to Sigmoid Neurons) such as:
- A 2-Layer Feed-Forward Neural Network Training System applied to Rectified Linear Neurons such as:
- Google's Tensorflow Neural Network Training System

**Counter-Examples**:**See:**2-Layer Neural Network, Artificial Neural Network, Neural Network Layer, Artificial Neuron, Neuron Activation Function, Neural Network Topology.

## References

### 2016

- (Zhao, 2016) ⇒ Peng Zhao, (2016). “Build Neural Network: Architecture, Prediction, and Training". In: "R for Deep Learning (I): Build Fully Connected Neural Network from Scratch"
- QUOTE: Training is to search the optimization parameters (weights and bias) under the given network architecture and minimize the classification error or residuals. This process includes two parts: feed forward and back propagation. Feed forward is going through the network with input data (as prediction parts) and then compute data loss in the output layer by loss function (cost function). “Data loss measures the compatibility between a prediction (e.g. the class scores in classification) and the ground truth label.” In our example code, we selected cross-entropy function to evaluate data loss, see detail in here.
After getting data loss, we need to minimize the data loss by changing the weights and bias. The very popular method is to back-propagate the loss into every layers and neuron by gradient descent or stochastic gradient descent which requires derivatives of data loss for each parameter (W1, W2, b1, b2). And back propagation will be different for different activation functions and see here and here for their derivatives formula and method, and Stanford CS231n for more training tips.

In our example, the point-wise derivative for ReLu is:

R code: train.dnn

- QUOTE: Training is to search the optimization parameters (weights and bias) under the given network architecture and minimize the classification error or residuals. This process includes two parts: feed forward and back propagation. Feed forward is going through the network with input data (as prediction parts) and then compute data loss in the output layer by loss function (cost function). “Data loss measures the compatibility between a prediction (e.g. the class scores in classification) and the ground truth label.” In our example code, we selected cross-entropy function to evaluate data loss, see detail in here.

### 2015

- (Trask, 2015) ⇒ Trask (July 2015). “Part 1: A Tiny Toy Network". In: A Neural Network in 11 lines of Python (Part 1)
- QUOTE: A neural network trained with backpropagation is attempting to use input to predict output.

**Inputs****Output**0 0 1 0 1 1 1 1 1 0 1 1 0 1 1 0

- Consider trying to predict the output column given the three input columns. We could solve this problem by simply measuring statistics between the input values and the output values. If we did so, we would see that the leftmost input column is perfectly correlated with the output. Backpropagation, in its simplest form, measures statistics like this to make a model. Let's jump right in and use it to do this.
**2 Layer Neural Network:**Python code: nn.py

Output After Training:

- Consider trying to predict the output column given the three input columns. We could solve this problem by simply measuring statistics between the input values and the output values. If we did so, we would see that the leftmost input column is perfectly correlated with the output. Backpropagation, in its simplest form, measures statistics like this to make a model. Let's jump right in and use it to do this.

[ [0.00966449] [0.00786506]

[ 0.99358898]

[ 0.99211957] ]