Perceptron Training Algorithm

A Perceptron Training Algorithm is a supervised binary classification algorithm that can train a Perceptron-based Classifier.

AKA: Perceptron Learning Algorithm.
Context:
- It can identify a Hyperplane that separates a Linearly Separable set of Training Vectors.
- The Predictive Classifier is h(x) = Sign(f(x)), where f() is the Hyperplane Decision Boundary.
Example(s):
- a Voted Perceptron Training Algorithm.
See: Dual Optimization Task, Neural Network Training Algorithm, Linear Model-based Classification Algorithm.

References

2011

(Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Perceptron#Learning_algorithm
- Below is an example of a learning algorithm for a single-layer (no hidden layer) perceptron. For multilayer perceptrons, more complicated algorithms such as backpropagation must be used. Alternatively, methods such as the delta rule can be used if the function is non-linear and differentiable, although the one below will work as well.
  The learning algorithm we demonstrate is the same across all the output neurons, therefore everything that follows is applied to a single neuron in isolation. We first define some variables:
  - [math]\displaystyle{ y = f(\mathbf{z}) \, }[/math] denotes the output from the perceptron for an input vector [math]\displaystyle{ \mathbf{z} }[/math].
  - [math]\displaystyle{ b \, }[/math] is the bias term, which in the example below we take to be 0.
  - [math]\displaystyle{ D = \{(\mathbf{x}_1,d_1),\dots,(\mathbf{x}_s,d_s)\} \, }[/math] is the training set of [math]\displaystyle{ s }[/math] samples, where:
    - [math]\displaystyle{ \mathbf{x}_j }[/math] is the [math]\displaystyle{ n }[/math]-dimensional input vector.
    - [math]\displaystyle{ d_j \, }[/math] is the desired output value of the perceptron for that input.
- We show the values of the nodes as follows:
  - [math]\displaystyle{ x_{j,i} \, }[/math] is the value of the [math]\displaystyle{ i }[/math]th node of the [math]\displaystyle{ j }[/math]th training input vector.
  - [math]\displaystyle{ x_{j,0} = 1 \, }[/math].
- To represent the weights:
  - [math]\displaystyle{ w_i \, }[/math] is the [math]\displaystyle{ i }[/math]th value in the weight vector, to be multiplied by the value of the [math]\displaystyle{ i }[/math]th input node.
- An extra dimension, with index [math]\displaystyle{ n+1 }[/math], can be added to all input vectors, with [math]\displaystyle{ x_{j,n+1}=1 \, }[/math], in which case [math]\displaystyle{ w_{n+1} \, }[/math] replaces the bias term.
  To show the time-dependence of [math]\displaystyle{ \mathbf{w} }[/math], we use:
  - [math]\displaystyle{ w_i(t) \, }[/math] is the weight [math]\displaystyle{ i }[/math] at time [math]\displaystyle{ t }[/math].
  - [math]\displaystyle{ \alpha \, }[/math] is the learning rate, where [math]\displaystyle{ 0 \lt \alpha \leq 1 }[/math].
- Too high a learning rate makes the perceptron periodically oscillate around the solution unless additional steps are taken.

2009

(Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Perceptron#Learning_algorithm
- The learning algorithm is the same across all neurons, therefore everything that follows is applied to a single neuron in isolation. ... Learning is modeled as the weight vector being updated for multiple iterations over all training examples. ...

    Sample: (xi,ti), ti in {-1,+1}
    If  ti <wk,xi>  < 0 THEN  /* Error*/
    wk+1 = wk + ti xi
    k=k+1
    until (error==false) 
 return k,(wk,bk)   where k is the number of mistakes

2007

(Surdeanu and Ciaramita, 2007) ⇒ Mihai Surdeanu, and Massimiliano Ciaramita. (2007). "Robust Information Extraction with Perceptrons." In: Proceedings of NIST 2007 Automatic Content Extraction Workshop.

Perceptron Training Algorithm

References

2011

2009

2007

Navigation menu

Search