Residual Neural Network (ResNet) Architecture

From GM-RKB
Jump to navigation Jump to search

A Residual Neural Network (ResNet) Architecture is an deep neural network architecture that is based on batch normalization and consists of residual units which have skip connections.



References

2021

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Residual_neural_network Retrieved:2021-1-24.
    • A residual neural network (ResNet) is an artificial neural network (ANN) of a kind that builds on constructs known from pyramidal cells in the cerebral cortex. Residual neural networks do this by utilizing skip connections, or shortcuts to jump over some layers. Typical ResNet models are implemented with double- or triple- layer skips that contain nonlinearities (ReLU) and batch normalization in between. An additional weight matrix may be used to learn the skip weights; these models are known as HighwayNets. Models with several parallel skips are referred to as DenseNets. In the context of residual neural networks, a non-residual network may be described as a plain network.

      One motivation for skipping over layers is to avoid the problem of vanishing gradients, by reusing activations from a previous layer until the adjacent layer learns its weights. During training, the weights adapt to mute the upstream layer, and amplify the previously-skipped layer. In the simplest case, only the weights for the adjacent layer's connection are adapted, with no explicit weights for the upstream layer. This works best when a single nonlinear layer is stepped over, or when the intermediate layers are all linear. If not, then an explicit weight matrix should be learned for the skipped connection (a HighwayNet should be used).

      Skipping effectively simplifies the network, using fewer layers in the initial training stages. This speeds learning by reducing the impact of vanishing gradients, as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space. Towards the end of training, when all layers are expanded, it stays closer to the manifoldand thus learns faster. A neural network without residual parts explores more of the feature space. This makes it more vulnerable to perturbations that cause it to leave the manifold, and necessitates extra training data to recover.


2020

The output of the previous layer is added to the output of the layer after it in the residual block. The hop or skip could be 1, 2 or even 3. When adding, the dimensions of $x$ may be different than $F(x)$ due to the convolution process, resulting in a reduction of its dimensions. Thus, we add an additional 1 × 1 convolution layer to change the dimensions of $x$.
A residual block has a 3 × 3 convolution layer followed by a batch normalization layer and a ReLU activation function. This is again continued by a 3 × 3 convolution layer and a batch normalization layer. The skip connection basically skips both these layers and adds directly before the ReLU activation function. Such residual blocks are repeated to form a residual network.

2019

2019 DeepResidualNeuralNetworksforAu Fig1.png
Figure 1: Model architecture for the Spec-ResNet model. Detailed structure of residual blocks is shown in 2.

2019 DeepResidualNeuralNetworksforAur Fig2.png
Figure 2: Detailed architecture of the convolution block with residual connection.

2018

2018 MultiScaleResidualNetworkforIma Fig3.png
Figure 3: The structure of multi-scale residual block (MSRB).

2016a

2016 DeepResidualLearningforImageRec Fig2.png
Figure 2: Residual learning: a building block.

2016b

2016 IdentityMappingsinDeepResidualN Fig2A.png
2016 IdentityMappingsinDeepResidualN Fig2B.png
2016 IdentityMappingsinDeepResidualN Fig2C.png
Figure 2: Various types of shortcut connections used in Table 1. The grey arrows indicate the easiest paths for the information to propagate. The shortcut connections in (b-f) are impeded by different components. For simplifying illustrations we do not display the BN layers, which are adopted right after the weight layers for all units here.

2016c

[math]\displaystyle{ \mathbf{x}_{l+1}=\mathbf{x}_{l}+\mathcal{F}\left(\mathbf{x}_{l}, \mathcal{W}_{l}\right) }[/math] (1)
where $\mathbf{x}_{l+1}$ and $\mathbf{x}_{l}$ are input and output of the $l$-th unit in the network, $\mathcal{F}$ is a residual function and $\mathcal{W}_{l}$ are parameters of the block. Residual network consists of sequentially stacked residual block.

2016 WideResidualNetwork Fig1.png
Figure 1: Various residual blocks used in the paper. Batch normalization and ReLU precede each convolution (omitted for clarity).