Inverse Square Root Linear Unit (ISRLU) Activation Function

From GM-RKB
Jump to navigation Jump to search

An Inverse Square Root Linear Unit (ISRLU) Activation Function is a neuron activation function that is based on the piecewise function:

[math]\displaystyle{ f(x) = \begin{cases}\frac{x}{\sqrt{1 + \alpha x^2}} & \text{for } x \lt 0\\ x & \text{for } x \ge 0\end{cases} }[/math].



References

2018

Name Plot Equation Derivative (with respect to x) Range Order of continuity Monotonic Derivative Monotonic Approximates identity near the origin
Identity [math]\displaystyle{ f(x)=x }[/math] [math]\displaystyle{ f'(x)=1 }[/math] [math]\displaystyle{ (-\infty,\infty) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes Yes Yes
Binary step [math]\displaystyle{ f(x) = \begin{cases} 0 & \text{for } x \lt 0\\ 1 & \text{for } x \ge 0\end{cases} }[/math] [math]\displaystyle{ f'(x) = \begin{cases} 0 & \text{for } x \ne 0\\ ? & \text{for } x = 0\end{cases} }[/math] [math]\displaystyle{ \{0,1\} }[/math] [math]\displaystyle{ C^{-1} }[/math] Yes No No
Logistic (a.k.a. Soft step) [math]\displaystyle{ f(x)=\frac{1}{1+e^{-x}} }[/math] [math]\displaystyle{ f'(x)=f(x)(1-f(x)) }[/math] [math]\displaystyle{ (0,1) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes No No
(...) (...) (...) (...) (...) (...) (...) (...) (...)
Inverse square root unit (ISRU)[1] [math]\displaystyle{ f(x) = \frac{x}{\sqrt{1 + \alpha x^2}} }[/math] [math]\displaystyle{ f'(x) = \left(\frac{1}{\sqrt{1 + \alpha x^2}}\right)^3 }[/math] [math]\displaystyle{ \left(-\frac{1}{\sqrt{\alpha}},\frac{1}{\sqrt{\alpha}}\right) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes No Yes
(...) (...) (...) (...) (...) (...) (...) (...) (...)
Inverse square root linear unit (ISRLU) [math]\displaystyle{ f(x) = \begin{cases} \frac{x}{\sqrt{1 + \alpha x^2}} & \text{for } x \lt 0\\ x & \text{for } x \ge 0\end{cases} }[/math] [math]\displaystyle{ f'(x) = \begin{cases} \left(\frac{1}{\sqrt{1 + \alpha x^2}}\right)^3 & \text{for } x \lt 0\\ 1 & \text{for } x \ge 0\end{cases} }[/math] [math]\displaystyle{ \left(-\frac{1}{\sqrt{\alpha}},\infty\right) }[/math] [math]\displaystyle{ C^2 }[/math] Yes Yes Yes
Adaptive piecewise linear (APL) [2] [math]\displaystyle{ f(x) = \max(0,x) + \sum_{s=1}^{S}a_i^s \max(0, -x + b_i^s) }[/math] [math]\displaystyle{ f'(x) = H(x) - \sum_{s=1}^{S}a_i^s H(-x + b_i^s) }[/math] [math]\displaystyle{ (-\infty,\infty) }[/math] [math]\displaystyle{ C^0 }[/math] No No No
SoftPlus[3] [math]\displaystyle{ f(x)=\ln(1+e^x) }[/math] [math]\displaystyle{ f'(x)=\frac{1}{1+e^{-x}} }[/math] [math]\displaystyle{ (0,\infty) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes Yes No
Bent identity [math]\displaystyle{ f(x)=\frac{\sqrt{x^2 + 1} - 1}{2} + x }[/math] [math]\displaystyle{ f'(x)=\frac{x}{2\sqrt{x^2 + 1}} + 1 }[/math] [math]\displaystyle{ (-\infty,\infty) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes Yes Yes
SoftExponential [4] [math]\displaystyle{ f(\alpha,x) = \begin{cases} -\frac{\ln(1-\alpha (x + \alpha))}{\alpha} & \text{for } \alpha \lt 0\\ x & \text{for } \alpha = 0\\ \frac{e^{\alpha x} - 1}{\alpha} + \alpha & \text{for } \alpha \gt 0\end{cases} }[/math] [math]\displaystyle{ f'(\alpha,x) = \begin{cases} \frac{1}{1-\alpha (\alpha + x)} & \text{for } \alpha \lt 0\\ e^{\alpha x} & \text{for } \alpha \ge 0\end{cases} }[/math] [math]\displaystyle{ (-\infty,\infty) }[/math] [math]\displaystyle{ C^\infty }[/math] Yes Yes Template:Depends
Sinusoid[5] [math]\displaystyle{ f(x)=\sin(x) }[/math] [math]\displaystyle{ f'(x)=\cos(x) }[/math] [math]\displaystyle{ [-1,1] }[/math] [math]\displaystyle{ C^\infty }[/math] No No Yes
Sinc [math]\displaystyle{ f(x)=\begin{cases} 1 & \text{for } x = 0\\ \frac{\sin(x)}{x} & \text{for } x \ne 0\end{cases} }[/math] [math]\displaystyle{ f'(x)=\begin{cases} 0 & \text{for } x = 0\\ \frac{\cos(x)}{x} - \frac{\sin(x)}{x^2} & \text{for } x \ne 0\end{cases} }[/math] [math]\displaystyle{ [\approx-.217234,1] }[/math] [math]\displaystyle{ C^\infty }[/math] No No No
Gaussian [math]\displaystyle{ f(x)=e^{-x^2} }[/math] [math]\displaystyle{ f'(x)=-2xe^{-x^2} }[/math] [math]\displaystyle{ (0,1] }[/math] [math]\displaystyle{ C^\infty }[/math] No No No

Here, H is the Heaviside step function.

α is a stochastic variable sampled from a uniform distribution at training time and fixed to the expectation value of the distribution at test time.

2017


  1. Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian (2017-11-09). “Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)". arXiv:1710.09967 Freely accessible [cs.LG].
  2. Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi (21 Dec 2014). “Learning Activation Functions to Improve Deep Neural Networks". arXiv:1412.6830 Freely accessible cs.NE.
  3. Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011). "Deep sparse rectifier neural networks" (PDF). International Conference on Artificial Intelligence and Statistics.
  4. Godfrey, Luke B.; Gashler, Michael S. (2016-02-03). "A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks". 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR 1602: 481–486. arXiv:1602.01321. Bibcode 2016arXiv160201321G. 
  5. Gashler, Michael S.; Ashmore, Stephen C. (2014-05-09). “Training Deep Fourier Neural Networks To Fit Time-Series Data". arXiv:1405.2262 Freely accessible cs.NE.