J. Syst. Archit. | 2021
CURATING: A multi-objective based pruning technique for CNNs
Abstract
Abstract As convolutional neural networks (CNNs) improve in accuracy, their model size and computational overheads have also increased. These overheads make it challenging to deploy the CNNs on resource-constrained devices. Pruning is a promising technique to mitigate these overheads. In this paper, we propose a novel pruning technique called CURATING that looks at the pruning of CNNs as a multi-objective optimization problem. CURATING retains filters that (i) are very different (less redundant) from each other in terms of their representation (ii) have high saliency score i.e., they reduce the model accuracy drastically if pruned (iii) are likely to produce higher activations. We treat a filter specific to an output channel as a probability distribution over spatial filters to measure the similarity between filters. The similarity matrix is leveraged to create filter embeddings, and we constrain our optimization problem to retain a diverse set of filters based on these filter embeddings. On a range of CNNs over well-known datasets, CURATING exercises a better or comparable tradeoff between model size, accuracy, and inference latency than existing techniques. For example, while pruning VGG16 on the ILSVRC-12 dataset, CURATING achieves higher accuracy and a smaller model size than the previous techniques. We plan to release the source-code of CURATING and pruned models in the open-source.