# Neuron Activation Function

(Redirected from activation function)

## References

### 2018c

• (CS231n, 2018) ⇒ Commonly used activation functions. In: CS231n Convolutional Neural Networks for Visual Recognition Retrieved: 2018-01-28.
• QUOTE: Every activation function (or non-linearity) takes a single number and performs a certain fixed mathematical operation on it. There are several activation functions you may encounter in practice:
• Sigmoid. The sigmoid non-linearity has the mathematical form $\sigma(x)=1/(1+e^{−x})$ and is shown in the image above on the left. As alluded to in the previous section, it takes a real-valued number and “squashes” it into range between 0 and 1. In particular, large negative numbers become 0 and large positive numbers become 1. The sigmoid function has seen frequent use historically since it has a nice interpretation as the firing rate of a neuron: from not firing at all (0) to fully-saturated firing at an assumed maximum frequency (1) (...)
• Tanh. The tanh non-linearity is shown on the image above on the right. It squashes a real-valued number to the range [-1, 1]. Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered. Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity. Also note that the tanh neuron is simply a scaled sigmoid neuron, in particular the following holds: $tanh(x)=2\sigma(2x)−1$ (...)
• ReLU. The Rectified Linear Unit has become very popular in the last few years. It computes the function $f(x)=max(0,x)$. (...)
• Leaky ReLU. Leaky ReLUs are one attempt to fix the “dying ReLU” problem. Instead of the function being zero when $x \lt 0$, a leaky ReLU will instead have a small negative slope (of 0.01, or so). That is, the function computes $f(x)=1(x\lt 0)(αx)+1(x\gt =0)(x)$ where $\alpha$ is a small constant (...)
• Maxout. Other types of units have been proposed that do not have the functional form $f(w^Tx+b)$ where a non-linearity is applied on the dot product between the weights and the data (...)

### 2005

a. linear function,
b. threshold function,
c. sigmoid function.

### 1986

• (Williams, 1986) ⇒ Ronald J. Williams. (1986). “The Logic of Activation Functions.” In: (Rumelhart & McClelland, 1986).
• QUOTE: The notion of logical computation, in some form or other, seems to provide a convenient language for describing the operation of many of the networks we seek to understand. Digital computers are built out such constituents as AND and OR gates. Feature-detecting neurons in biological sensory system are often idealized as signaling the presence or absence of their preferred features by becoming highly active or inactively, respectively. It seems a relatively simple extension of this concept to allow the activity of units in the network to range over some interval rather than over just two values; in this case the activity of a unit is regarded as signaling its degree of conﬁdence that its preferred feature is present, rather than just the presence or absence of this feature. There are several ways one might attempt to formalize this degree-of-conﬁdence notion. For example, if the activation values range over the closed unit interval [0,1], one might treat such an activation value as a conditional probability: alternatively, it might be viewed as a measure of truth in some unit-interval-valued logic, such as fuzzy logic (Zadeh, 1965).