Perceptron Training Algorithm
Jump to navigation
Jump to search
A Perceptron Training Algorithm is a supervised binary classification algorithm that can train a Perceptron-based Classifier.
- AKA: Perceptron Learning Algorithm.
- Context:
- It can identify a Hyperplane that separates a Linearly Separable set of Training Vectors.
- The Predictive Classifier is h(x) = Sign(f(x)), where f() is the Hyperplane Decision Boundary.
- Example(s):
- See: Dual Optimization Task, Neural Network Training Algorithm, Linear Model-based Classification Algorithm.
References
2011
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Perceptron#Learning_algorithm
- Below is an example of a learning algorithm for a single-layer (no hidden layer) perceptron. For multilayer perceptrons, more complicated algorithms such as backpropagation must be used. Alternatively, methods such as the delta rule can be used if the function is non-linear and differentiable, although the one below will work as well.
The learning algorithm we demonstrate is the same across all the output neurons, therefore everything that follows is applied to a single neuron in isolation. We first define some variables:
- [math]\displaystyle{ y = f(\mathbf{z}) \, }[/math] denotes the output from the perceptron for an input vector [math]\displaystyle{ \mathbf{z} }[/math].
- [math]\displaystyle{ b \, }[/math] is the bias term, which in the example below we take to be 0.
- [math]\displaystyle{ D = \{(\mathbf{x}_1,d_1),\dots,(\mathbf{x}_s,d_s)\} \, }[/math] is the training set of [math]\displaystyle{ s }[/math] samples, where:
- [math]\displaystyle{ \mathbf{x}_j }[/math] is the [math]\displaystyle{ n }[/math]-dimensional input vector.
- [math]\displaystyle{ d_j \, }[/math] is the desired output value of the perceptron for that input.
- We show the values of the nodes as follows:
- [math]\displaystyle{ x_{j,i} \, }[/math] is the value of the [math]\displaystyle{ i }[/math]th node of the [math]\displaystyle{ j }[/math]th training input vector.
- [math]\displaystyle{ x_{j,0} = 1 \, }[/math].
- To represent the weights:
- [math]\displaystyle{ w_i \, }[/math] is the [math]\displaystyle{ i }[/math]th value in the weight vector, to be multiplied by the value of the [math]\displaystyle{ i }[/math]th input node.
- An extra dimension, with index [math]\displaystyle{ n+1 }[/math], can be added to all input vectors, with [math]\displaystyle{ x_{j,n+1}=1 \, }[/math], in which case [math]\displaystyle{ w_{n+1} \, }[/math] replaces the bias term.
To show the time-dependence of [math]\displaystyle{ \mathbf{w} }[/math], we use:
- [math]\displaystyle{ w_i(t) \, }[/math] is the weight [math]\displaystyle{ i }[/math] at time [math]\displaystyle{ t }[/math].
- [math]\displaystyle{ \alpha \, }[/math] is the learning rate, where [math]\displaystyle{ 0 \lt \alpha \leq 1 }[/math].
- Too high a learning rate makes the perceptron periodically oscillate around the solution unless additional steps are taken.
- Below is an example of a learning algorithm for a single-layer (no hidden layer) perceptron. For multilayer perceptrons, more complicated algorithms such as backpropagation must be used. Alternatively, methods such as the delta rule can be used if the function is non-linear and differentiable, although the one below will work as well.
2009
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Perceptron#Learning_algorithm
- The learning algorithm is the same across all neurons, therefore everything that follows is applied to a single neuron in isolation. ... Learning is modeled as the weight vector being updated for multiple iterations over all training examples. ...
Sample: (xi,ti), ti in {-1,+1} If ti <wk,xi> < 0 THEN /* Error*/ wk+1 = wk + ti xi k=k+1 until (error==false) return k,(wk,bk) where k is the number of mistakes
2007
- (Surdeanu and Ciaramita, 2007) ⇒ Mihai Surdeanu, and Massimiliano Ciaramita. (2007). "Robust Information Extraction with Perceptrons." In: Proceedings of NIST 2007 Automatic Content Extraction Workshop.