Neural Network Pruning Algorithm

From GM-RKB
Jump to navigation Jump to search

A Neural Network Pruning Algorithm is a model pruning algorithm that applies to artificial neural network.



References

2019a

2019b

2018

  • (Huang et al., 2018) ⇒ Qiangui Huang, Kevin Zhou, Suya You, and Ulrich Neumann. (2018). “Learning to Prune Filters in Convolutional Neural Networks.” In: The Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV-2018).
    • ABSTRACT: Many state-of-the-art computer vision algorithms use large scale convolutional neural networks (CNNs) as basic building blocks. These CNNs are known for their huge number of parameters, high redundancy in weights, and tremendous computing resource consumptions. This paper presents a learning algorithm to simplify and speed up these CNNs. Specifically, we introduce a “try-and-learn” algorithm to train pruning agents that remove unnecessary CNN filters in a data-driven way. With the help of a novel reward function, our agents removes a significant number of filters in CNNs while maintaining performance at a desired level. Moreover, this method provides an easy control of the tradeoff between network performance and its scale. Performance of our algorithm is validated with comprehensive pruning experiments on several popular CNNs for visual recognition and semantic segmentation tasks.

2017

  • (Zhu & Gupta, 2017) ⇒ Michael Zhu, and Suyog Gupta. (2017). “To Prune, Or Not to Prune: Exploring the Efficacy of Pruning for Model Compression.” In: Proceedings of ICLR 2018 (ICLR-2018).
    • ABSTRACT: Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.

2015b

  • (Han et al., 2015) ⇒ Song Han, Jeff Pool, John Tran, and William Dally. (2015). (2015). “Learning Both Weights and Connections for Efficient Neural Network.” In: Advances in Neural Information Processing Systems, pp. 1135–1143,
    • ABSTRACT: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.

2015b

  • (Hinton, Vinyals and Dean, 2015) ⇒ Geoffrey E. Hinton, Oriol Vinyals, and Jeff Dean. (2015). "Distilling the Knowledge in a Neural Network." In: NIPS Deep Learning and Representation Learning Workshop (2015).
    • ABSTRACT: A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.