# Maxout Activation Function

A Maxout Activation Function is a neuron activation function that is based on the mathematical function: $f_i(x) =max_j(W^T_{ij} x+b_{ij}).$ .

## References

### 2018a

• (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.maxout.html Retrieved:2018-2-18
• QUOTE: chainer.functions.maxout(x, pool_size, axis=1)source

It accepts an input tensor x, reshapes the axis dimension (say the size being M * pool_size) into two dimensions (M, pool_size), and takes maximum along the axis dimension.

Parameters:

• x (Variable or numpy.ndarray or cupy.ndarray) – Input variable. A n-dimensional (n≥ axis) float array. In general, its first dimension is assumed to be the minibatch dimension. The other dimensions are treated as one concatenated dimension.
• pool_size (int) – The size used for downsampling of pooling layer.
• axis (int) – The axis dimension to be reshaped. The size of axis dimension should be M * pool_size.
Returns: Output variable. The shape of the output is same as x except that axis dimension is transformed from M * pool_size to M.
Return type: Variable.
Example:

Typically, x is the output of a linear layer or a convolution layer. The following is the example where we use maxout() in combination with a Linear link.

>>> in_size, out_size, pool_size = 10, 10, 10
>>> bias = np.arange(out_size * pool_size).astype('f')
>>> l = L.Linear(in_size, out_size * pool_size, initial_bias=bias)
>>> x = np.zeros((1, in_size), 'f')  # prepare data
>>> x = l(x)
>>> y = F.maxout(x, pool_size)
>>> x.shape
(1, 100)
>>> y.shape
(1, 10)
>>> x.reshape((out_size, pool_size)).data
array([ [0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
[10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
[20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],
[30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],
[40., 41., 42., 43., 44., 45., 46., 47., 48., 49.],
[50., 51., 52., 53., 54., 55., 56., 57., 58., 59.],
[60., 61., 62., 63., 64., 65., 66., 67., 68., 69.],
[70., 71., 72., 73., 74., 75., 76., 77., 78., 79.],
[80., 81., 82., 83., 84., 85., 86., 87., 88., 89.],
[90., 91., 92., 93., 94., 95., 96., 97., 98., 99.] ], dtype=float32)
>>> y.data
array([ [9., 19., 29., 39., 49., 59., 69., 79., 89., 99.] ], dtype=float32)


### 2018b

Name Equation Derivatives Range Order of continuity
Softmax $f_i(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^J e^{x_j}}$    for i = 1, …, J $\frac{\partial f_i(\vec{x})}{\partial x_j} = f_i(\vec{x})(\delta_{ij} - f_j(\vec{x}))$ $(0,1)$ $C^\infty$
Maxout[1] $f(\vec{x}) = \max_i x_i$ $\frac{\partial f}{\partial x_j} = \begin{cases} 1 & \text{for } j = \underset{i}{\operatorname{argmax}} \, x_i\\ 0 & \text{for } j \ne\underset{i}{\operatorname{argmax}} \, x_i\end{cases}$ $(-\infty,\infty)$ $C^0$

Here, δ is the Kronecker delta.

### 2013

• (Goodfellow et al., 2013) ⇒ Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. arXiv preprint arXiv:1302.4389.
• ABSTRACT: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.

1. Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua (2013-02-18). "Maxout Networks". JMLR WCP 28 (3): 1319–1327. arXiv:1302.4389. Bibcode 2013arXiv1302.4389G.