2017 DenselyConnectedConvolutionalNe

(Huang et al., 2017) ⇒ Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. (2017). “Densely Connected Convolutional Networks.” In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). ISBN:978-1-5386-0457-1 doi:10.1109/CVPR.2017.243

Subject Headings: DenseNet; Dense Block; Deep Convolutional Neural Network.

Notes

Online Resource(s):

Cited By

Google Scholar: ~ 18,345 Citations.
(Pleiss et al., 2017) ⇒ Geoff Pleiss, Danlu Chen, Gao Huang, Tongcheng Li, Laurens van der Maaten, and Kilian Q. Weinberger. (2017). “Memory-Efficient Implementation of DenseNets.” eprint arXiv:1707.06990.

Quotes

Author Keywords

Convolution; Feedforward Neural Networks; Learning (Artificial Intelligence).

Abstract

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have a connections-one between each layer and its subsequent layer-our network has L (L + 1) / 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

1. Introduction

Convolutional neural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago LeCun et al., 1989, improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 (LeCun et al., 1998) consisted of 5 layers, VGG featured 19 (Russakovsky et al., 2015), and only last year Highway Networks (Srivastava et al., 2015) and Residual Networks (ResNets) (He et al., 2016) have surpassed the 100-layer barrier.

As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network. Many recent publications address this or related problems. ResNets (He et al., 2017) and Highway Networks (Srivastava et al., 2015) bypass signal from one layer to the next via identity connections. Stochastic depth (Huang et al., 2016) shortens ResNets by randomly dropping layers during training to allow better information and gradient flow. FractalNets (Larsson et al., 2016) repeatedly combine several parallel layer sequences with different number of convolutional blocks to obtain a large nominal depth, while maintaining many short paths in the network. Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers.

In this paper, we propose an architecture that distills this insight into a simple connectivity pattern: to ensure maximum information flow between layers in the network, we] connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Figure 1 illustrates this layout schematically. Crucially, in contrast to ResNets, we never combine features through summation before they are passed into a layer; instead, we combine features by concatenating them. Hence, the $\ell$-th layer has $\ell$ inputs, consisting of the feature-maps of all preceding convolutional blocks. Its own feature-maps are passed on to all $L−\ell$ subsequent layers. This introduces $\frac{L (L + 1)}{2}$ connections in an $L$-layer network, instead of just $L$, as in traditional architectures. Because of its dense connectivity pattern, we refer to our approach as Dense Convolutional Network (DenseNet).

**Figure 1:** A 5-layer dense block with a growth rate of $k = 4$. Each layer takes all preceding feature-maps as input.

(...)

2. Related Work

3. DenseNets

**Figure 2:** A deep DenseNet with three dense blocks. The layers between two adjacent blocks are referred to as transition layers and change feature-map sizes via convolution and pooling.

4. Experiments

**Figure 3:** Comparison of the DenseNets and ResNets top-1 error rates (single-crop testing) on the ImageNet validation dataset as a function of learned parameters (left) and FLOPs during test-time (right).

**Figure 4:** *Left*: Comparison of the parameter efficiency on C10+ between DenseNet variations. *Middle*: Comparison of the parameter efficiency between DenseNet-BC and (pre-activation) ResNets. DenseNet-BC requires about 1/3 of the parameters as ResNet to achieve comparable accuracy. *Right*: Training and testing curves of the 1001-layer pre-activation ResNet (He et al., 2016) with more than 10M parameters and a 100-layer DenseNet with only 0.8M parameters.

5. Discussion

**Figure 5:** The average absolute filter weights of convolutional layers in a trained DenseNet. The color of pixel $(s, \ell)$ encodes the average L1 norm (normalized by number of input feature-maps) of the weights connecting convolutional layer $s$ to $\ell$ within a dense block. Three columns highlighted by black rectangles correspond to two transition layers and the classification layer. The first row encodes weights connected to the input layer of the dense block.

6. Conclusion

We proposed a new convolutional network architecture, which we refer to as Dense Convolutional Network (DenseNet). It introduces direct connections between any two layers with the same feature-map size. We showed that DenseNets scale naturally to hundreds of layers, while exhibiting no optimization difficulties. In our experiments, DenseNets tend to yield consistent improvement in accuracy with growing number of parameters, without any signs of performance degradation or overfitting. Under multiple settings, it achieved state-of-the-art results across several highly competitive datasets. Moreover, DenseNets require substantially fewer parameters and less computation to achieve state-of-the-art performances. Because we adopted hyperparameter settings optimized for residual networks in our study, we believe that further gains in accuracy of DenseNets may be obtained by more detailed tuning of hyperparameters and learning rate schedules.

Whilst following a simple connectivity rule, DenseNets naturally integrate the properties of identity mappings, deep supervision, and diversified depth. They allow feature reuse throughout the networks and can consequently learn more compact and, according to our experiments, more accurate models. Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features (e.g. Gardner et al.,2015; Gatys et al., 2015). We plan to study such feature transfer with DenseNets in future work.

Acknowledgements

The authors are supported in part by the III-1618134, III-1526012, IIS-1149882 grants from the National Science Foundation, and the Bill an Melinda Gates foundation. Gao Huang is supported by the International Postdoctoral Exchange Fellowship Program of China Postdoctoral Council (No.20150015). Zhuang Liu is supported by the National Basic Research Program of China Grants 2011CBA00300, 2011CBA00301, the National Natural Science Foundation of China Grant 61361136003. We also thank Daniel Sedra, Geoff Pleiss and Yu Sun for many insightful discussions.

References

BibTeX

@inproceedings{2017_DenselyConnectedConvolutionalNe,
  author    = {Gao Huang and
               Zhuang Liu and
               Laurens van der Maaten and
               Kilian Q. Weinberger},
  title     = {Densely Connected Convolutional Networks},
  booktitle = {Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)},
  pages     = {2261--2269},
  publisher = {IEEE Computer Society},
  year      = {2017},
  url       = {https://doi.org/10.1109/CVPR.2017.243},
  doi       = {10.1109/CVPR.2017.243},
}

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2017 DenselyConnectedConvolutionalNe	Gao Huang Kilian Q. Weinberger Zhuang Liu Laurens van der Maaten Geoff Pleiss Danlu Chen Tongcheng Li			Densely Connected Convolutional Networks				10.1109/CVPR.2017.243		2017