Neural Network Pooling Layer

From GM-RKB
Jump to navigation Jump to search

A Neural Network Pooling Layer is a neural network hidden layer that applies a downsampling operation to reduce the spatial dimension of the input data.



References

2018

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling Retrieved:2018-3-4.
    • Convolutional networks may include local or global pooling layers, which combine the outputs of neuron clusters at one layer into a single neuron in the next layer[1][2]. For example, max pooling uses the maximum value from each of a cluster of neurons at the prior layer[3]. Another example is average pooling, which uses the average value from each of a cluster of neurons at the prior layer.

2017a

2017b

2016a

  • (Cox, 2016) ⇒ Jonathan A. Cox, (2016). https://www.quora.com/What-is-a-downsampling-layer-in-Convolutional-Neural-Network-CNN
    • QUOTE: The main type of pooling layer in use today is a "max pooling" layer, where the feature map is downsampled in such a way that the maximum feature response within a given sample size is retained. This is in contrast with average pooling, where you basically just lower the resolution by averaging together a group of pixels. Max pooling tends to do better because it is more responsive to kernels that are "lit up" or respond to patterns detected in the data.

2016b

  • (CS231n, 2016) ⇒ http://cs231n.github.io/convolutional-networks/#pool Retrieved: 2016
    • QUOTE: It is common to periodically insert a Pooling layer in-between successive Conv layers in a ConvNet architecture. Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations. Every MAX operation would in this case be taking a max over 4 numbers (little 2x2 region in some depth slice). The depth dimension remains unchanged. More generally, the pooling layer:
      • Accepts a volume of size W1×H1×D1
      • Requires two hyperparameters:
        • their spatial extent F,
        • the stride S,
      • Produces a volume of size W2×H2×D2, where:
        • W2=(W1−F)/S+1
        • H2=(H1−F)/S+1
        • D2=D1
      • Introduces zero parameters since it computes a fixed function of the input
      • Note that it is not common to use zero-padding for Pooling layers

2015


  1. Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). "Flexible, High Performance Convolutional Neural Networks for Image Classification" (PDF). Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Two. 2: 1237–1242. Retrieved 17 November 2013.
  2. Krizhevsky, Alex. "ImageNet Classification with Deep Convolutional Neural Networks" (PDF). Retrieved 17 November 2013.
  3. Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). "Multi-column deep neural networks for image classification". 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York, NY: Institute of Electrical and Electronics Engineers (IEEE): 3642–3649. arXiv:1202.2745v1 Freely accessible. doi:10.1109/CVPR.2012.6248110. ISBN 978-1-4673-1226-4. OCLC 812295155. Retrieved 2013-12-09.