Pruning Deep Neural Networks



A required composition for effectiveness


Imagine yourself as a gardener. You own a big garden where you have trees, various plants, and also bushes that grow around the corner. Since you are having your own garden it must be having a boundary or a limited plot of land. While you are watering the trees and plants every day you observe that the branches of the trees are extending towards your neighbor’s boundary. So you decide to cut those branches that are growing outside your own boundary. This process is known as pruning. According to Wikipedia “Pruning is a horticultural and silvicultural practice involving the selective removal of certain parts of a plant, such as branches, buds, or roots”. Through pruning, we maintain a required component of what is necessary and effective without losing its functionality.


The same logic or process can be followed in a deep neural network. A deep neural network consists of a large number of parameters, it may be due to the fact that the data is complicated, and to avoid underfitting we might design a deep neural network where we have to add a lot of layers. Such models might perform well on various tasks but it is plausible that it may be computationally expensive and require a lot of space for storage. And at times some of the parameters might turn out to be redundant and may not contribute a lot to the output. The idea of pruning the deep neural network is to remove those parameters.


How do you prune a neural network?

Pruning is usually done in an iterative fashion, to avoid the pruning of necessary neurons. This also ensures that an important part of the network is not lost, as neural networks are a black box. The first step is to determine which neurons are important and which aren’t. After this, the last important neuron is removed, followed by the fine-tuning of the algorithm. At this point, a decision can be made to continue the pruning process or to stop pruning.


Methods of pruning the network

The neural network is a mathematically very heavy model because of the different mathematical operation that goes behind it.


Pruning can come in various forms, with the application of the method depending on the kind of output that is required from the developer. In some cases, speed is more preferred, and in some cases, the storage is required to be reduced.


One of the methods is ranking the neurons based upon the contribution that can be performed by using the L1 and L2 norm. Their mean of neuron weights, their mean activations, the number of times a neuron wasn’t zero on some validation set, and other methods. Essentially, if you could rank the neurons in the network according to how much they contribute, you could then remove the low ranking neurons from the network, resulting in a smaller and faster network. But pruning too much can lead to a drop in accuracy thus we have to be careful.


In a paper published by NVIDIA in 2017[1], the authors came up with an approach where they interleave greedy criteria-based pruning with fine-tuning by backpropagation—a computationally efficient procedure that maintains good generalization in the pruned network, and a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters.

The proposed method for pruning consists of the following steps:

  1. Fine-tune the network until convergence on the target task;

  2. Alternate iterations of pruning and further fine-tuning;

  3. Stop pruning after reaching the target trade-off between accuracy and pruning objective


Consider a set of training examples, an input, and a target output. Here the network’s parameters are optimized to minimize cost value. The most common choice for a cost function is a negative log-likelihood function. A cost function is selected independently of pruning and depends only on the task to be solved by the original network. In the case of transfer learning, we adapt a large network initialized with parameters pre-trained on a related but distinct dataset.

During pruning, we refine a subset of parameters that preserves the accuracy of the adapted network.

The process is iterative and it removes the least important parameters until it reaches the trade-off between accuracy and pruning objective.


Following that year a new approach was developed which could speed up and simplify the heavily parameterized CNNs[2]. A “try-and-learn” algorithm that would train a pruning agent that removes unnecessary CNN filters in a data-driven way with the help of a reward function that can remove a significant amount of filters in the CNN models while maintaining the performance at the desired level.


In the figure above, the filters with red borders are kept by the algorithm and the rest are discarded.

The authors in this paper trained a pruning agent, modeled by a neural network, to take the filter weights as input and output binary decisions to remove or keep filters. The agent is trained with a novel reward function that encourages high pruning ratios and guarantees the pruned network performance remains above a specified level. In another word, this reward function provides easy control of the tradeoff between network performance and scale. Since the reward function is non-differentiable with respect to the parameters of pruning agents, they use the policy gradient method to update the parameters in training.

The pruning agent starts by guessing which filters to prune. Every time it takes a pruning action, the action is evaluated by our reward function. The reward is then fed back to the agent which supervises the agent to output actions with higher rewards.

The authors argue that their method is fully automatic and data-driven. It automatically discovers redundant filters and removes them while keeping the performance of the model intact within a certain tolerance.

Both approaches were basically designed to remove unnecessary and redundant parameters that contribute 0 to less in a network. And as time goes by new ways of compressing the model are unfolding which not maintains its integrity by not sacrificing accuracy but also they are faster and smaller.


Why Pruning?


With the rise of mobile inference and machine learning capabilities, pruning becomes more relevant than ever before. Lightweight algorithms are the need of the hour, as more and more applications find use with neural networks.

The most recent example of this comes in the form of Apple’s new products, which use neural networking to ensure a multitude of privacy and security features across products. Owing to the disruptive nature of technology, it is easy to see its adoption by various companies.

The easy availability of neural networks is also required due to the varied nature of their applications. Their move to mobile is also complemented by the standalone computer in flagship devices, further creating a need for an efficient program that performs the most amount of work while consuming the least amount of resources.

This is the reason pruning is more relevant today, as the applications need to get lighter and faster without sacrificing accuracy.


Further Reading:

  1. PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE – Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz NVIDIA

  2. LEARNING TO PRUNE FILTERS IN CONVOLUTIONAL NEURAL NETWORKS - Qiangui Huang Kevin Zhou Suya You Ulrich Neumann