Parametric Rectified Linear Activation Function

From GM-RKB
Jump to navigation Jump to search

A Parametric Rectified Linear Activation Function is a Rectified-based Activation Function that is based on the mathematical function: [math]\displaystyle{ f(x)=max(0,x)+\alpha∗min(0,x) }[/math], where [math]\displaystyle{ \alpha }[/math] is a Neural Network Learnable Parameter.



References

2018a

  • (Pytorch,2018) & rArr; http://pytorch.org/docs/master/nn.html#prelu Retrieved: 2018-2-18.
    • QUOTE: class torch.nn.PReLU(num_parameters=1, init=0.25)source

      Applies element-wise the function [math]\displaystyle{ PReLU(x)=max(0,x)+a∗min(0,x) }[/math] Here “[math]\displaystyle{ a }[/math]” is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter “[math]\displaystyle{ a }[/math]” across all input channels. If called with nn.PReLU(nChannels), a separate “[math]\displaystyle{ a }[/math]” is used for each input channel.

      Note:

      weight decay should not be used when learning “[math]\displaystyle{ a }[/math]” for good performance.

      Parameters:

      • num_parameters – number of “[math]\displaystyle{ a }[/math]” to learn. Default: 1
      • init – the initial value of “[math]\displaystyle{ a }[/math]”. Default: 0.25
Shape:
  • Input: (N,∗) where * means, any number of additional dimensions
  • Output: (N,∗), same shape as the input
Examples:
>>> m = nn.PReLU() >>> input = autograd.Variable(torch.randn(2))

>>> print(input)

>>> print(m(input))

2018b

  • (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.prelu.html Retrieved:2018-2-18
    • QUOTE: chainer.functions.prelu(x, W)source

       Parametric ReLU function.

      It accepts two arguments: an input x and a weight array W and computes the output as [math]\displaystyle{ PReLU(x)=max(x,W∗x) }[/math], where ∗ is an elementwise multiplication for each sample in the batch.

      When the PReLU function is combined with two-dimensional convolution, the elements of parameter W are typically shared across the same filter of different pixels. In order to support such usage, this function supports the shape of parameter array that indicates leading dimensions of input arrays except the batch dimension.

      For example, if [math]\displaystyle{ W }[/math] has the shape of [math]\displaystyle{ (2,3,4) }[/math], [math]\displaystyle{ x }[/math] must have the shape of [math]\displaystyle{ (B,2,3,4,S1,...,SN) }[/math] where [math]\displaystyle{ B }[/math] is the batch size and the number of trailing [math]\displaystyle{ S }[/math]‘s </math>N</math> is an arbitrary non-negative integer.

      Parameters:

      • x(Variable) – Input variable. Its first argument is assumed to be the minibatch dimension.
      • W (Variable) – Weight variable.
Returns: Output variable.
Return type: Variable.
See also: PReLU

2018c

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Leaky_ReLUs Retrieved:2018-2-4.
    • Leaky ReLUs allow a small, non-zero gradient when the unit is not active.[1] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ 0.01x & \mbox{otherwise} \end{cases} }[/math]

      Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.[2] : [math]\displaystyle{ f(x) = \begin{cases} x & \mbox{if } x \gt 0 \\ a x & \mbox{otherwise} \end{cases} }[/math]

      Note that for [math]\displaystyle{ a\leq1 }[/math], this is equivalent to : [math]\displaystyle{ f(x) = \max(x, ax) }[/math] and thus has a relation to "maxout" networks.

2017

  • (Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
    • QUOTE:  Parametric Rectified Linear Unit(PReLU) — It makes the coefficient of leakage into a parameter that is learned along with the other neural network parameters. Alpha(α) is the coefficient of leakage here.

      For [math]\displaystyle{ \alpha\leq 1 \quad f(x) = max(x, \alpha x) }[/math]

      Range:[math]\displaystyle{ (-\infty, +\infty) }[/math]

      [math]\displaystyle{ f(\alpha, x) = \begin{cases} \alpha x, & \mbox{for } x \lt 0 \\ x, & \mbox{for } x \geq 0 \end{cases} }[/math]

2015


  1. Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models
  2. He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification". arXiv:1502.01852 Freely accessible [cs.CV].