# Adaptive Piecewise Linear Activation Function

An Adaptive Piecewise Linear (APL) Activation Function is a neuron activation function that is based on the piecewise linear function: [math]f(x) = \max(0,x) + \sum_{s=1}^{S}a_i^s \max(0, -x + b_i^s)[/math] [math][/math].

**AKA:**APL.**Context:**- It can (typically) be used in the activation of Adaptive Piecewise Linear Neurons.

**Example(s):**- ...

**Counter-Example(s):**- a Softmax-based Activation Function,
- a Rectified-based Activation Function,
- a Heaviside Step Activation Function,
- a Ramp Function-based Activation Function,
- a Logistic Sigmoid-based Activation Function,
- a Hyperbolic Tangent-based Activation Function,
- a Gaussian-based Activation Function,
- a Softsign Activation Function,
- a Softshrink Activation Function,
- a Bent Identity Activation Function,
- a Maxout Activation Function.

**See:**Artificial Neural Network, Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.

## References

### 2017

- (Mate Labs, 2017) ⇒ Mate Labs Aug 23, 2017. Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
- QUOTE:
**Adaptive Piecewise Linear (APL)**Range: [math](-\infty,+\infty)[/math]

[math]f(x) = \max(0,x) + \sum_{s=1}^{S}a_i^s \max(0, -x + b_i^s)[/math]

- QUOTE:

### 2014

- (Agostinelli et al., 2014) ⇒ Agostinelli, F., Hoffman, M., Sadowski, P., & Baldi, P. (2014). "Learning activation functions to improve deep neural networks". arXiv preprint arXiv:1412.6830.
- QUOTE: Here we define the adaptive piecewise linear (APL) activation unit. Our method formulates the activation function [math]h_i(x)[/math] of an APL unit i as a sum of hinge-shaped functions,
[math]h_i(x) = \max(0,x) + \sum_{s=1}^{S}a_i^s \max(0, -x + b_i^s) (1) [/math]

The result is a piecewise linear activation function. The number of hinges, [math]S[/math], is a hyperparameter set in advance, while the variables [math]a^s_i , b^s_i[/math] for [math]i \in 1,\cdots, S[/math] are learned using standard gradient descent during training. The [math]a^s_i[/math] variables control the slopes of the linear segments, while the [math]b^s_i[/math] variables determine the locations of the hinges. The number of additional parameters that must be learned when using these APL units is [math]2SM[/math], where [math]M[/math] is the total number of hidden units in the network. This number is small compared to the total number of weights in typical networks. Figure 1 shows example APL functions for [math]S = 1[/math]. Note that unlike maxout, the class of functions that can be learned by a single unit includes non-convex functions.

*Figure 1: Sample activation functions obtained from changing the parameters. Notice that figure b shows that the activation function can also be non-convex. Asymptotically, the activation functions tend to [math]g(x) = x[/math] as [math]x \rightarrow \infty[/math] and [math]g(x) = \alpha x − c[/math] as x ← −∞ for some α and c. S = 1 for all plots*.

- QUOTE: Here we define the adaptive piecewise linear (APL) activation unit. Our method formulates the activation function [math]h_i(x)[/math] of an APL unit i as a sum of hinge-shaped functions,