[PDF] CNNPruner: Pruning Convolutional Neural Networks with Visual Analytics

Abstract

Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computational resources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing less important neurons and fine-tuning the pruned networks to minimize the accuracy loss. Nevertheless, existing automated pruning solutions often rely on a numerical threshold of the pruning criteria, lacking the flexibility to optimally balance the trade-off between model size and accuracy. Moreover, the complicated interplay between the stages of neuron pruning and model fine-tuning makes this process opaque, and therefore becomes difficult to optimize. In this paper, we address these challenges through a visual analytics approach, named CNNPruner. It considers the importance of convolutional filters through both instability and sensitivity, and allows users to interactively create pruning plans according to a desired goal on model size or accuracy. Also, CNNPruner integrates state-of-the-art filter visualization techniques to help users understand the roles that different filters played and refine their pruning plans. Through comprehensive case studies on CNNs with real-world sizes, we validate the effectiveness of CNNPruner.

Full PDF

CCNN

Pruner: Pruning Convolutional Neural Networkswith Visual Analytics

Guan Li, Junpeng Wang, Han-Wei Shen, Kaixin Chen, Guihua Shan, and Zhonghua Lu

Fig. 1.

CNN

Pruner: (a) the Tree view helps to track different pruning plans; (b) the Statistics view presents model-critic statistics tomonitor the pruned models; (c) the Model view enables users to interactively conduct the pruning with informative visual hints fromdifferent criteria; (d) the Filter view presents details of individual ﬁlters for users to investigate and interactively prune them.

Abstract — Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer visiontasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computationalresources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing lessimportant neurons and ﬁne-tuning the pruned networks to minimize the accuracy loss. Nevertheless, existing automated pruningsolutions often rely on a numerical threshold of the pruning criteria, lacking the ﬂexibility to optimally balance the trade-off betweenefﬁciency and accuracy. Moreover, the complicated interplay between the stages of neuron pruning and model ﬁne-tuning makes thisprocess opaque, and therefore becomes difﬁcult to optimize. In this paper, we address these challenges through a visual analyticsapproach, named

CNN

Pruner. It considers the importance of convolutional ﬁlters through both instability and sensitivity , and allowsusers to interactively create pruning plans according to a desired goal on model size or accuracy. Also,

CNN

Pruner integratesstate-of-the-art ﬁlter visualization techniques to help users understand the roles that different ﬁlters played and reﬁne their pruningplans. Through comprehensive case studies on CNNs with real-world sizes, we validate the effectiveness of

CNN

Pruner.

Index Terms —visualization, model pruning, convolutional neural network, explainable artiﬁcial intelligence

NTRODUCTION

Convolutional neural networks (CNNs) have demonstrated extraordinar-ily good performance in many applications, such as image classication, • Guan Li, Kaixin Chen, Guihua Shan, and Zhonghua Lu are with ComputerNetwork Information Center, Chinese Academy of Sciences. They are alsowith University of Chinese Academy of Sciences. E-mail: { liguan, sgh,zhlu } @sccas.cn, [email protected].• Junpeng Wang is with Visa Research. E-mail: [email protected].• Han-Wei Shen is with The Ohio State University. E-mail: [email protected].• Guihua Shan is the corresponding author.Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publicationxx xxx. 201x; date of current version xx xxx. 201x. For information onobtaining reprints of this article, please send e-mail to: [email protected] Object Identiﬁer: xx.xxxx/TVCG.201x.xxxxxxx object detection, and speech recognition [11, 20, 21, 34, 35]. The re-cent improvements of CNNs’ performance are often at the cost ofmodel sizes. It becomes increasingly more common now to see mod-els with hundreds of layers and millions of parameters. For example,VGG-16 [35] is a commonly used model for classiﬁcation tasks. Ithas ∼ . ∼ . et al. [23] ﬁrst im-proved the efﬁciency of their neural networks by removing unimportantmodel parameters (weights) based on information theory metrics. Ingeneral, model pruning algorithms can be divided into structured prun- a r X i v : . [ c s . C V ] S e p ig. 2. The iterative model pruning process of CNN models. ing and unstructured pruning [17]. Compared to unstructured pruning,which requires support from additional hardware to achieve excellentperformance, structured pruning has gradually dominated the recentdevelopments and has become a hot research topic. Most notably, ﬁlterpruning is an effective structured pruning method, which directly prunesﬁlters that are less relevant to the prediction outcomes to reduce models’size. There exist three key steps in a typical ﬁlter pruning pipeline: 1)ﬁlters evaluation; 2) ﬁlters pruning; 3) model ﬁne-tuning. Frequently,the pipeline is executed in an automated yet iterative manner (Fig. 2),where the ﬁlters are removed based on hard thresholds, and the modelsare pruned multiple times to achieve the desired compression goal,without signiﬁcantly compromising the models’ accuracy.Nevertheless, existing automated CNN pruning solutions lack theﬂexibility to optimally balance the trade-off between pruning efﬁciencyand prediction accuracy. The automated pruning usually removes aﬁxed number or a ﬁxed percentage of convolutional ﬁlters in each prun-ing. If too many ﬁlters are removed in one pruning iteration, the modelwill be severely damaged and difﬁcult to recover. Conversely, if too fewﬁlters are deleted, the effectiveness of pruning will be signiﬁcantly un-dermined. In practice, the degree of “over-parameterization” is relatedto the corresponding CNN’s size and the type of computer vision tasks,and in turn, different models have different abilities to recover from thedamage caused by ﬁlter removal. Using a ﬁxed numerical thresholdas the criterion to remove ﬁlters in each pruning iteration ignores thecharacteristics of each model and may not lead to the optimal pruningsolution. Moreover, the complicated interplay between the stages of ﬁl-ter pruning and model ﬁne-tuning makes this process difﬁcult to control,i.e., the automated pruning focuses solely on the accuracy of the prunedmodel, but pays little attention to the intermediate state changes. As aresult, the anomalies accumulated in the iterative pruning process mayget enlarged, which will affect the pruning efﬁciency and eventuallyimpact the accuracy of the pruned model.Focused on the above challenges, we propose CNN

Pruner, a visualanalytics system to help deep learning experts create interactive pruningplans and evaluate the pruning process.

CNN

Pruner contains four mainvisualization components (Fig. 1): (a) the

Tree View helps to overviewand track the altered models from iterative pruning stages; (b) the

Statistics View presents the loss/accuracy ﬂuctuation, model recoverycapability, and recovery cost to help users adjust pruning strategy intime; (c) the

Model View , facilitated with two new metrics of instabilityand sensitivity, evaluate the importance of different CNN ﬁlters andenables users to interactively create pruning plans; (d) the

Filter View reveals the roles that different ﬁlters played in the prediction processand helps users interpret and prune the CNN model. We conductedcase studies with

CNN

Pruner on CNNs of real-world sizes to validateits effectiveness. To sum up, the contributions of our work are:• We design and develop a visual analytics system to help deeplearning experts progressively analyze the CNN pruning processand introduce interactive intervention to the process on-demand.• We introduce two metrics (instability and sensitivity) to assistmodel designers in better estimating ﬁlters’ importance beforepruning, and three criteria (recovery capability, loss ﬂuctuation,and recovery cost) to evaluate the pruned model.• We examine data instances, where the intermediate pruned modelsbehave differently, to study the critical ﬁlters and interpret theinternal working mechanism of CNNs.

ELATED WORKS

In this section, we introduce the concept of model pruning and itsdevelopment in the ﬁeld of deep learning. Also, we review the visualanalytics works in interpreting and diagnosing deep learning models.

Model pruning compresses deep learning models by removing lessimportant parameters and seeks a trade-off between model size andprediction accuracy. To date, many works have achieved good perfor-mance in neural network pruning, and we can roughly divide them intotwo categories [17], weighted pruning and ﬁlter pruning.Weight pruning is an unstructured pruning method that deletesweights or compresses weights in a ﬁlter. Han et al. [15] proposed amethod to reduce the model size by removing the unimportant con-nections. They applied this method to CNN models that are trainedfor the ImageNet dataset, and reduced parameters about 89% for theAlexNet model and about 92% for the VGG-16 model. Han et al. [14]used weight sharing on the basis of removing the unimportant con-nections [15], and employed the Huffman encoding to compress theweight to maximize the compression rate. Their experiment showsthat through the compression of the VGG-16 model, the method canreduce the memory consumption from 552 MB to 11.3 MB withoutcompromising the accuracy. Carreira-Perpinan et al. [6] proposed amethod to ﬁnd unimportant weights by minimizing loss changes whilecompressing those weights. Some other studies have achieved goodresults through weight pruning [8,13,16,37,38,46], but weight pruningmay cause unstructured sparsities and requires support from additionalhardware to achieve excellent performance.Filter pruning is a structured pruning method that directly removesconvolutional ﬁlters from CNNs. Luo et al. [28] proposed a frame-work named ThiNet to help the user identify the unimportant ﬁlters bycomputing the statistical information of adjacent layers. Li et al. [24]proposed an acceleration method for CNNs by removing the ﬁlters andtheir feature maps, which could reduce the computation to 34% of theVGG-16 model. Molchanov et al. [31] proposed a new Taylor expan-sion criterion to ﬁnd the ﬁlters which have little inﬂuence on the lossvalue and remove them to reduce the model size. Some other pruningstudies along this line also show good pruning results [9, 17, 18, 27, 43].Filter pruning keeps the regular structure of the model but signiﬁcantlyreduces the calculation and storage cost, making it a popular solutionfor model compression. We focus on this approach in this work as well.Most of the aforementioned model pruning studies focus on propos-ing new pruning criteria and use a small ﬁxed numerical threshold todetermine the number of ﬁlters to be removed. This is because theymainly focus on the accuracy of the ﬁnal model but concern less on theintermediate pruning process. The small number of removal makes themodel recover easily from the pruning and often leads to an optimalpruning result. However, it prolongs the pruning process, as morepruning iterations will be needed to achieve the compression goal. Theprocess is usually not efﬁcient and may incur a higher computationalcost to perform ﬁne-tuning in each pruning.

Based on the taxonomy from [7, 26, 44], the visualizations for deepneural networks (DNNs) interpretations can roughly be categorizedinto three groups, targeting on model understanding [19, 26, 29, 41],model debugging [25, 32, 33, 42, 45], and model reﬁnement [5, 40].To understand a DNN model, researchers usually use visualizationtechniques to show the internal structure and state information of themodel. For example, CNNVis [26] uses directed acyclic graphs toformulate the model architecture and help domain experts understandCNN through visualization. GANViz [41] helps the user understandthe model by visualizing and comparing the internal model states (i.e.,hidden activations) of the generative adversarial networks (GANs) [12]over the training process. GAN Lab [19] is an interactive visualiza-tion tool for non-experts to learn the GAN models, and it signiﬁcantlyreduces the difﬁculty of understanding complex generative neural net-works by using visualization techniques.To debug/diagnose a DNN model, researchers usually deﬁne somevisual evaluation methods to assist in the analysis of the model. Forexample, DeepEyes [32] helps the user diagnose a CNN model byvisualizing the convolutional layers and convolutional ﬁlters. Based onthe active level of different ﬁlters, this system improves the efﬁciency ofmodel design by optimizing the network structure. DGMTracker [25]onitors and diagnoses the training process of deep generative modelsthrough the visualization of a large amount of time-series informationover time.For model improvement, researchers usually use visualizations tohelp users identify the weakness of the model. For example, DQNViz[40] exposes the details in the training process of deep Q-networks [30]and uses visualization techniques to extract useful patterns of the modelto better control the training. Blocks [5] uses visualization techniquesto analyze the impact of class hierarchy on the training of CNN models.Using the analysis results, the tool can accelerate model convergenceand alleviate the problem of overﬁtting.These studies have proved the effectiveness of visualization andvisual analytics in the machine learning ﬁled. Our work focuses onCNN model pruning and uses visualization to help deep learning expertsto better understand and improve the pruning process of CNN models.We believe that with visualization and visual analytics, our system caneffectively improve the efﬁciency of model pruning.

ACKGROUND A ND C ONCEPTS

This section introduces the basic concepts of model pruning and a state-of-the-art ﬁlter visualization technique. Following them, we introducethe metrics used in this work and propose a novel evaluation concept.

This section describes the details of each step in the ﬁlter pruningprocess and introduces the Taylor expansion based ﬁlter evaluations.

Our work uses the Taylor expansion criterion [31] for ﬁlter pruning.Its idea is to remove ﬁlters and check how signiﬁcant the removalwill impact the loss function, i.e., examine the importance of ﬁltersby perturbation. The resulting importance values can then be used toprioritize ﬁlters during pruning. Mathematically, this process can bedenoted as: ∆ L ( f i ) = | L ( D , f i = ) − L ( D , f i ) | (1)where D is the training data, L () is the loss function, and f i is the output(i.e. feature map) produced from ﬁlter i , and L ( D , f i ) is the loss beforeany model perturbation. L ( D , f i = ) is the loss when f i is removed.Physically removing individual ﬁlters and recomputing the loss foreach removal is computationally expensive. But, the process can beapproximated through Taylor expansions, as demonstrated in [31], i.e., L ( D , f i = ) = L ( D , f i ) − ∂ L ∂ f i f i (2) ∆ L ( f i ) can then be transformed as follows: ∆ L ( f i ) = | L ( D , f i ) − ∂ L ∂ f i f i − L ( D , f i ) | = | ∂ L ∂ f i f i | (3)In Equation 3, we need to calculate the product of the feature mapand the gradient (the loss function w.r.t. to the feature map) to getthe estimated cost of removing the corresponding ﬁlter, and this valuecan be calculated through back-propagation. After the calculation, l − normalization is used to normalize the set of ∆ L values resultedfrom removing individual ﬁlters. With the normalized importancevalues, we can prioritize all ﬁlters and prune the less important ones. Wecall this process of choosing a proper importance criterion to prioritizeﬁlters and decide the number of less important ones to remove as a pruning plan . Our objective is to derive efﬁcient and effective pruningplans through interactive visual analytics. After removing the less important ﬁlters, the model structure is slightlydamaged, and its accuracy will drop. To recover the accuracy, weneed to retrain the model using the training dataset. As most of theimportant ﬁlters are still retained in the model, the original accuracy canbe recovered with a few numbers of training epochs. This process, i.e.,retrain the CNN model to recover its accuracy, is called ﬁne-tuning . As described in Fig. 2, ﬁlters evaluation, ﬁlters pruning, and ﬁne-tuning constitute one pruning iteration . Repeating the process multi-ple times, we can generate the pruned CNN model.

Our primary goal in this work is to remove less important ﬁlters. There-fore, we need a proper ﬁlter visualization technique to reveal whatindividual features have been captured and to verify their importance.Guided back-propagation [36], as one of the state-of-the-art ﬁlter visu-alization technique, is adopted in our work.Given an input image, this algorithm ﬁrst performs a forward passto the target network-layer. It sets all activations of that layer to zero,except the one extracted by the ﬁlter that we want to analyze. Next, thealgorithm propagates the non-zero activations back to the input image tohighlight what was extracted by the corresponding ﬁlter. Therefore, theresulting ﬁlter visualization image will have the same size as the inputimage and highlight what individual ﬁlters have captured. We adoptthis ﬁlter visualization technique, as it can work well in interpretingﬁlters in deeper CNN layers [36]. It has also been adopted by multipleother model interpretation works [40].Fig. 3 shows some ﬁlter visualization examples through this guidedback-propagation technique. Four ﬁlters from Layer 0 of a 6-layerCNN is visualized, when taking a mountain image as input. From thehighlighted regions in the ﬁlter visualization result, Filter 0 and Filter 3capture the silhouette features of the mountain, whereas Filter 1 andFilter 2 capture the texture features of the mountain.

Fig. 3. An example of ﬁlter visualization. The input is a mountain image.Filter 0 and Filter 3 capture the silhouette features of the mountain,whereas Filter 1 and Filter 2 capture the texture features of the mountain.

Based upon the Taylor expansion algorithm explained in Sect. 3.1.1,we deﬁne one criterion and propose a new metric as another criterionfor ﬁlter pruning in our work.The

Sensitivity of a ﬁlter reﬂects the ﬁlter’s impact on the model’sloss when being removed. It is calculated using L2-normalized ∆ L (Equation 3). A ﬁlter with a lower Sensitivity value should be removedﬁrst to reduce the impact on the model.Notice that repeating the sensitivity calculation for the same ﬁltermultiple times may result in different sensitivity values of the ﬁlter,due to the randomness inherited from the statistical parameter updateprocess. Speciﬁcally, the updates of CNN model parameters are often inthe unit of data batches. Feeding data batches into a CNN with shufﬂedorders will result in different parameter update orders and scales. Theimpact of this randomness is usually marginal to important ﬁlters, astheir sensitivity values are always large. However, for less importantones, their sensitivity values are minimal and can be easily inﬂuencedby this randomness. Therefore, for these less important ones, theirsensitivity orders may be very different from different calculations.We introduce the metric

Instability to accommodate the above issue,which is deﬁned as the mean absolute deviation of the ﬁlters’ ranksfrom different calculations, i.e.,

Instability ( f j ) = ∑ ni = | ( Rank i ( f j ) − Rank ( f j )) | n (4)where n is the total time that we computed the sensitivity for individualﬁlters, Rank i ( f j ) is the ranking of the j th ﬁlter in the i th computation,and Rank ( f j ) is the average rankings for ﬁlter j . The instability of aﬁlter reﬂects the uncertainty of the removal order, and often, the ﬁlterwith a higher instability indicates it is less important. We set n = .4 Degenerated Instances and Improved Instances Each pruning iteration improves and degenerates the model a little bit,and its prediction accuracy also changes, i.e., some data instances inthe test data have different predictions results from the original and thepruned model. To better index the subset of instances with differentpredictions from the two models, we deﬁne the following two concepts:

Degenerated Instances are images that are correctly predicted in theoriginal model but incorrectly predicted by the pruned model, i.e., thepruning hurts the model’s recognition ability on these images.

Improved Instances are images that are incorrectly predicted by theoriginal model but correctly predicted by the pruned model, i.e., thepruning improves the model’s recognition ability on these images.The test dataset used by a CNN model usually contains many images,and it is difﬁcult to analyze the effect of the pruning on every singleimage. The degenerated instances and improved instances help usersto quickly locate the analysis target from the massive images, whichimproves the analysis efﬁciency.

ESIGN R EQUIREMENT

We worked with a couple of deep learning researchers and had somediscussions/interviews with them during the system design stages. Also,we investigated the related works on model pruning to identify thechallenges that deep learning experts are facing with. From thesediscussions and literature reviews, we found that proposing effectivepruning criteria is an important research topic, and the criteria are oftenevaluated by the accuracy of the pruned model. Based on differentcriteria, people often use a ﬁxed number or a mathematical formula todecide the amount of ﬁlters to be removed in each pruning iteration,which lacks ﬂexibility and is usually not efﬁcient. For example, asmall removal count is often used to guarantee the model’s recoverycapability. However, the small number often leads to more pruningiterations, which inevitably prolongs the pruning process, costing morecomputing resources for model ﬁne-tuning. Additionally, we noticedthat even if the original and pruned models have similar predictionaccuracy, their recognition power for different classes may be verydifferent. Revealing these details, along with other model-level details(e.g., model architecture evolution, recovery capabilities from pruning)are very important to understand the pruning process. Through theresponses from the experts and our studies of the existing works, wehave identiﬁed the following design requirements for

CNN

Pruner.•

R1: Display different levels of information about the CNNmodels during pruning.

Many intermediate CNN models aregenerated in the iterative process of model pruning, and our sys-tem needs to track and display the details of those models. Dis-playing these model information is the basis for understandingand exploring the pruning process, which requires

CNN

Pruner to: – R1.1: track the intermediate models generated over thepruning process and index the models effectively. – R1.2: display the states of the pruned models and monitorthe evolution of these states over the pruning process. – R1.3: visualize the internal structure of a selected CNNmodel (e.g., the original/intermediate/ﬁnal pruned model)and its ﬁlters’ attributes.•

R2: Interactively analyze and decide the number of ﬁlters tobe removed in each pruning iteration.

After each pruning, themodel needs to be ﬁne-tuned, and its prediction accuracy willchange. The experts want to minimize the computational cost forﬁne-tuning but restore the accuracy as much as possible. There-fore, they expect

CNN

Pruner to help them analyze the impactof pruning and select the appropriate removal amount in eachpruning. We, therefore, design

CNN

Pruner to be able to: – R2.1: estimate the inﬂuence of a pruning plan on the modelbefore the pruning actually happens (i.e., pre-estimation). – R2.2: evaluate the quality of the pruning plan and thepruned model after each pruning (i.e., post-evaluation). – R2.3: assist the user in better selecting or optimizing thenumber of ﬁlters to be removed in each pruning iteration.•

R3: Understand model pruning process and reﬁne the prun-ing plan.

The convolutional ﬁlters are the basic units to be re-moved in each pruning. The in-depth analysis of them can helpthe user better understand the pruning process and identify theabnormal changes of the accuracy values for different classes ofthe studied dataset. Therefore,

CNN

Pruner needs to be able to: – R3.1: visualize the ﬁlters of interest and help the user under-stand the roles that different ﬁlters played during pruning. – R3.2: interactively reﬁne the pruning plan by adding orremoving ﬁlters to be pruned to reduce undesired changesof the model over the pruning.

YSTEM O VERVIEW

Fig. 4 shows the architecture of

CNN

Pruner, which contains a back-endpowered by PyTorch [4], and a web-based front-end for visualizationand interaction. We use the Flask [3] library to support the communica-tion between the back-end and the front-end.

Fig. 4. The architecture of

CNN

Pruner, including a back-end powered byPyTorch and a web-based font-end visualization interface.

CNN

Pruner takes a pre-trained CNN model as input and outputs thepruned model. Users can ﬂexibly interact with the four visualizationcomponents from the font-end to complete the above process. In detail,the

Tree view layouts the pre-trained (tree root) and post-pruned CNN(tree leaf), as well as all intermediate pruned models through a treestructure (R1.1). An estimator (R2.3) is equipped in this view tohelp users estimate a proper number of ﬁlters to be removed betweenadjacent tree nodes (i.e., CNN models). The

Statistics view (Fig. 1-b) shows the evolution of the model’s statistics over the process ofpruning (R1.2), where users can evaluate the pruning scheme throughthese statistics (R2.2). The

Model view (Fig. 1c) presents the internalstructure and the ﬁlter attributes of a selected tree node (i.e., a CNNmodel) from the

Tree view (R1.3). It is the main component thatallows users to interactively prune the selected model and providesimmediate feedback to the pruning operation to guide users towards anoptimal pruning plan. The

Filter view (Fig. 1d) presents details of theindividual ﬁlters to help users interpret them and interactively reﬁne thepruning plan (R3.1, R3.2). All of the four visualization components arecoordinated, and they work together to meet the objective of helpingexperts understand, diagnose, and reﬁne the pruning process of CNNs.

ISUAL A NALYTICS S YSTEM : CNN P RUNER

CNN

Pruner (Fig. 1) is composed of four visualization components,demonstrating different levels of CNN information and the pruningprocess. We provide the details of individual components in this section.

Tree

View

The

Tree view provides an overview of the iterative model pruningprocess (R1.1). The root and leaves of the tree are the original andthe pruned models respectively. Each branch of the tree (connectingthe root to a leaf) chains a sequence of intermediate models from theiterative pruning process. For each node, we use two horizontal ﬁlledrectangles to denote the corresponding model’s prediction accuracyand compression ratio, and one vertical rectangle to display the modelID. The system automatically generates the ID, and the ID of the rootmodel is 0. The vertical position of a node is decided by the numberof ﬁlters in the corresponding model (see the left vertical axis). Theedges connecting a pair of parent-child nodes represent the ﬁne-tuningrocess, where we use gray and purple lines to indicate if the ﬁne-tuning process converged or not after reaching the user-speciﬁed stopconditions (e.g., the maximum number of epochs). The

Tree view cantrack the whole pruning process, and each node in the tree represents amodel. When the mouse hovers over a node, a prompt bar will appearto show the storage size of the corresponding model. The user can clickon individual nodes of the tree to update the data displayed in otherviews. The node with the red border is the currently selected one.The

Tree view is also equipped with a pruning estimator, to helpusers balance the trade-off between model size and prediction accuracy(R2.3). It estimates the number of ﬁlters in a model and the model’sprediction accuracy, by linearly interpreting these values from a pair ofmost adjacent nodes from the tree. This rough linear estimation workswell based on our experiments, and we expect more sophisticatedinterpolation algorithms to yield better results. The pruning estimatornode is only visible when users pressing the black box icon on the topright corner of the tree view, and users can ﬂexibly drag it vertically togenerate the estimations dynamically.This view has two important parameters to be conﬁgured before anypruning (using the buttons on the top of this view). One is the pruningmode, which could be automated or manual pruning. The other is thetermination criteria for ﬁne-tuning. They are explained as follows:

Auto/Manual-Pruning.

For auto-pruning, users specify a ﬁxed ratioof the ﬁlters to be removed, e.g., 1/2, 1/3, or 1/4 of the total amount,and

CNN

Pruner will iteratively remove the speciﬁed amount of ﬁlters(based on the pruning criteria) and ﬁne-tune the model. The iterativeprocess runs until the pruned model fails to meet the desired need (e.g.,the prediction accuracy no longer meets the requirement). This processis automated but lack of pruning ﬂexibility. Conversely, in manual-pruning, users can ﬂexibly specify the number of ﬁlters to be removed(based on their distribution) in individual pruning iterations.

Termination Criteria for Fine-Tuning.

CNN

Pruner has three termi-nation criteria to ﬁnish the ﬁne-tuning process, (1) Delta Loss, i.e., thechange of loss values, (2) the Target Accuracy, and (3) the MaximumEpoch. The ﬁne-tuning will be terminated if any of them is met.

Statistics

View

The

Statistics view (Fig. 1-b) is used to display detailed statisticalinformation of the CNNs (R1.2). When the user selects a model in the

Tree view, the system will ﬁnd a path from the current node to the rootnode where the models along the path form a pruning process. Thisview shows statistical information from multiple dimensions over thepruning process. There are ﬁve components in this view.

Confusion Matrix.

Fig. 1-b1 shows the confusion matrix of the cur-rent model (R1.2). Diagonal cells of the matrix represent the accuracyof true-positive instances (i.e., the percentage of correctly predictedimages in one category). Non-diagonal cells represent the percentageof incorrectly predicted images. Values (of the cells) from small tolarge are mapping to colors from light blue to dark blue. Clicking anycell of the matrix will show a line-chart presenting the value changesfor the corresponding cell during the pruning process (i.e., X-axis is thepruning iterations, Y-axis is the cell values from different iterations).The line-chart reﬂects the model’s prediction power for a particularcategory across the pruning process.

Recovery Capability.

This sub-view (Fig. 1-b2) reveals the model’srecovery capability after each pruning (R1.2), i.e., how difﬁcult it isto learn the prediction power back over the ﬁne-tuning process. TheX-axis is the model ID, and it represents different pruning iterations,whereas the Y-axis denotes the model’s prediction accuracy from indi-vidual iterations. The gray-curve connects the model’s ﬁnal accuracyvalues after the individual ﬁne-tuning process. The rectangular colorstripe at each iteration shows the distribution of the accuracy valuesfrom different epochs of the corresponding ﬁne-tuning. A longer stripeindicates a more signiﬁcant accuracy change before and after the ﬁne-tuning. If the pruned ﬁlters have little effect on the model, the recoveryregion will be very short, and the accuracy ﬂuctuation will be verysmall (i.e., a short strip with dark blue color). The information fromthis sub-view is an important criterion to evaluate the pruning plan.

Loss Fluctuation.

The

Loss Fluctuation sub-view shows the loss changes in the process of ﬁne-tuning (R1.2). The X-axis in the chart isthe model ID, and the Y-axis is the loss value. The curve between thetwo IDs represents the ﬂuctuation of the training loss in the ﬁne-tuningprocess (between two models). The importance of a ﬁlter is estimatedbased on how signiﬁcantly the loss will change when removing it.The loss values, quantifying the inconsistency between the predictedlabel and the true label, can effectively monitor the model’s evolution.If our pruning plan is good enough, the impact on the loss will besmall. Therefore, the ﬂuctuation of loss is another important criterionto measure the pruning plan, and this sub-view helps users analyze theﬁne-tuning process by displaying the loss ﬂuctuation.

Recovery Cost.

The

Recovery Cost sub-view shows the numberof epochs in the ﬁne-tuning process through a bar chart (R1.2). TheX-axis of the chart is the model ID, and the Y-axis is the epoch count.If the pruning plan has little effect on the model, then only a smallamount of training epochs is needed in the ﬁne-tuning process to re-cover the accuracy. Conversely, if over-pruning happens, even with alot of training epochs, it is still difﬁcult to recover the accuracy. There-fore, the recovery cost is also a criterion to evaluate the pruning plan.Through this sub-view, the user can have an intuitive understanding ofthe recovery cost in the pruning process.

Parameters and Computation.

This sub-view displays the reducedamount of the model parameters and the computational cost. As isshown in Fig. 1-b5, the line chart displays the reduced number of pa-rameters, and the histogram displays a reduced amount of computations(R1.2). The pruning process removes ﬁlters from the network, thusreducing the size of parameters. Meanwhile, the size of the parametersis proportional to the amount of computation in the model. By calcu-lating the amount of computation for the process of one image in thetest dataset, users can estimate the running efﬁciency of the model inmobile/embedded devices. The user can verify if the pruned modelmeets the computation requirements or not through this chart.All sub-views, except the Confusion Matrix, can be scaled horizon-tally to take the full space of the Statistics view (by double-clickingthe corresponding sub-view). This interaction helps to scale the systemwhen the pruning process is long or involves many pruning iterations.It also reduces the information that users need to watch at once, helpingthem to better focus on a single metric at a time (rather than over-whelmed by all ﬁve statistical metrics).

Model

View

The visualization of the internal information of a CNN model can helpusers understand the state of CNN and make proper pruning plans.As shown in Fig. 1-c, we designed the

Model view to display thearchitecture of the studied CNN model (Fig. 1-c1), the evaluation ofﬁlters from the model (Fig. 1-c2), and the pruning plan (Fig. 1-c3).The architecture of the model selected in the

Tree view is displayed inFig. 1-c1 (R1.3, R2.1). Each box in the architecture diagram representsa layer of the model. Different colors represent different types of layers.In particular, we use the red box to represent the deleted ﬁlter, and thewidth of this box to denote the percentage of the deleted ﬁlters in thecurrent convolutional layer. The height of each box is proportional tothe size of the feature map. The number on the box is the number ofﬁlters in the corresponding convolutional layer.The visualization of ﬁlter evaluation is shown in Fig. 1-c2, whichconsists of a radar plot and a bubble plot (R1.3, R2.1). The radar plotshows the impact of the pruning plan on the current model. There arethree dimensions of information in the radar chart, namely, the numberof ﬁlters, the remaining sensitivity percentage, and the remaining in-stability percentage. The remaining percentage means the ratio of themetric between the model after this pruning iteration and the currentmodel. The bubble plot on the right shows the sensitivity and instabilityof each ﬁlter in each layer. In the bubble plot, each bubble represents aﬁlter, and different layers have different colors. The X-axis representsthe sensitivity value, and hence the bubbles closer to the right are theﬁlters with more impact on the loss (i.e., important ones that shouldnot be pruned). The size of the bubbles represents the correspondingﬁlters’ instability, i.e., bigger ones correspond to larger values.The pruning plan is shown in Fig. 1-c3, and it shows the indices oflters that each layer will be removed (R2.1). Each circle representsone ﬁlter, and the number on the circle is the index of the ﬁlter. Thecircles of different layers use different colors, which are consistentwith the bubble plot above. The multi-color line under the circles isan overview of the number of ﬁlters to be removed in the pruning plan.Different colors represent different convolutional layers, and the lengthof the color segment represents the percentage of the removed ﬁlters inthe corresponding convolutional layer.This view displays information of the model selected from the

Tree view. The icons (i.e., the layer legends) on the right of the modelarchitecture support the ﬁltering of different layers. For example, whenclicking the icon for convolutional layers, other layers, e.g., poolingand linear layers, will become transparent to help users better focus onthe layers in the analysis. There is a vertical slider in the bubble plot,and users can drag it to specify the pruning threshold. The bubbles onthe left of the slider are shown in the pruning plan and represent theﬁlters that will be removed in the current pruning. Meanwhile, the radarplot on the left shows the inﬂuence of pruning on the number of ﬁlters,the sensitivity, and the instability (R2.1). Dragging the slider will alsochange the width of the red boxes in the model architecture diagramand the proportion of different colors in the multi-color segment of thepruning plan (R2.1). Additionally, the system provides a set of buttonson the right of the bubble plot to help users quickly move the slider tocertain positions. Users can scale the bubble plot horizontally along thesensitivity axis to reduce the occlusion between bubbles. They can alsoswitch among different convolutional layers in the

Filter view throughthe convolutional buttons between the radar plot and the bubble plot.

Filter

View

The

Filter view allows the user to conduct an in-depth analysis of aspeciﬁc convolutional layer (R3.1, R3.2). As shown in Fig. 1-d, thisview consists of a scatter plot and a ﬁlter visualization matrix. Thepoints in the scatter plot represent the degenerated and the improvedinstances in the test dataset, and the color represents the category ofthe exemplars. We use the t-SNE [39] algorithm to process the imageinstances, and display them in the scatter plot. Our system uses thedegenerated and improved instances to distinguish sensitive images,which efﬁciently narrows down the analysis scope. The selected imagein the middle of the

Filter view shows the point that the user clicked inthe scatter plot. There are two lines of texts at the bottom of the image.The ﬁrst line shows the image name and its true label. The second lineshows the labels of the image before and after the pruning, separatedby an arrow. In the ﬁlter visualization matrix, each item represents aﬁlter, and the items with red borders will be deleted in current pruning.The image in each item is the visualization of the ﬁlter. The area charton the top right of the item shows the distribution of pixel values of theﬁlter visualization images. The blue and green bar below the area chartrepresent the sensitivity and instability of the ﬁlter, respectively.When the user selects a node in the

Tree view, the system retrievesthe degenerated and improved data instances according to the selectednode and its child node. The user can switch the displayed convolutionallayer in the

Layer view by clicking on the convolutional buttons in the

Model view (between the radar plot and the bubble plot). The scatterplot supports the ﬁltering of different types of data instances throughthe icons on the upper right corner. After the user clicks one point inthe scatter plot, the selected image and the matrix view on the right willbe updated accordingly to reﬂect the selection. From the matrix view,the user can double-click any item to add/delete the corresponding ﬁlterto/from the current pruning plan.

ASE S TUDIES

In this section, through three cases we present how

CNN

Pruner canassist pruning, improve pruning efﬁciency, and optimize pruning plans.

The MNIST dataset [22] is a commonly used classiﬁcation dataset. Itcontains 60,000 images for training and 10,000 images for testing. Wetrain a two-layer CNN to perform this classiﬁcation task. There are 32ﬁlters in the ﬁrst convolutional layer and 64 ﬁlters in the second. The

Fig. 5. The sensitivity and instability distribution of the root model. Theradar chart shows the inﬂuence of removing one third of the ﬁlters. network structure is shown in Fig. 6-c1. The accuracy of the model is98 . . CNN

Pruner, we need to set some nec-essary parameters before the pruning. First, we conﬁgure the datasetparameters to tell the system where the dataset is. Then, we set theﬁne-tuning parameters (i.e., set the Delta Loss to 0 . . . Tree view after setting the aboveparameters. By selecting this root node, we can observe the sensitivityand instability distribution of the model in the

Model view (Fig. 5). Forone pruning iteration, we want to minimize the impact of sensitivitywhile maximally decreasing the instability and the number of ﬁlters.From the estimated pruning results (in the radar chart), we see thatremoving one-third of the ﬁlters will preserve 96% of the sensitivity,and reduce 38% of the instability. We, therefore, believe we can usethe 1 / Tree view, we automatically prune the model and generate a pruning tree.Fig. 6-a is the pruning tree for this auto-pruning process. It showsthat the number of CNN ﬁlters is reduced from 96 to 10 after sixpruning iterations. The prediction accuracy changes marginally in theﬁrst ﬁve iterations, and the ﬁne-tuning process converges well. Thepruned model from the sixth iteration failed to meet our requirement(i.e., the accuracy dropped to 98 . < . CNN

Pruner for further analysisof the auto-pruning process. Fig. 6-b1 shows the recovery ability andthe volatility of the six pruned models. As demonstrated by the shortand light blue bars, the “damage” introduced by the ﬁrst three pruningoperations is small, and the pruned models can easily recover from it.Starting from the fourth iteration, the resilience of the model decreases,and the accuracy ﬂuctuates more signiﬁcantly. Fig. 6-b2 shows themodel’s loss function in the six ﬁne-tuning iterations, which can reduceto the same level after individual ﬁne-tuning iterations. For Model6, pruning has a large impact on the loss, and it cannot recover theaccuracy, even after 30 epochs re-training. Therefore, we think that theparameters of Model 6 are not enough to support the original accuracy.From the statistics in Fig. 6-b1 and 6-b2, we believe Model 5 is thebest candidate model to meet the compression goal. Fig. 6-b3 and 6-b4show that the number of operations in one forward pass of Model 5is ∼ ∼ . . . CNN

Pruner, we can reveal model-pruning details,such as model convergence, model accuracy, recovery ability, lossﬂuctuation, and recovery cost. These details can help the user betterunderstand the state changes of the model in the pruning process andevaluate the ﬁne-tuning process. ig. 6. The result of CNN pruning. The system executed six prunings toget six models. The

Statistics view shows the information for Model 6.Fig. 7. The Cat&Dog dataset and the CNN model architecture.

Our second study presents the case of using the Cat&Dog dataset [1]to interactively achieve a pruning goal. The Cat&Dog dataset contains25,000 images of cat or dog (the two classes). We randomly select10,000 cat images and 10,000 dog images as the training dataset. Therest of the images are used for testing. A CNN with six convolutionallayers is trained to differentiate cats from dogs, and its structure isshown in Fig. 7. The original well-trained model before any compres-sion can achieve a prediction accuracy of 92.76%. The model contains2200 ﬁlters, which has 6.88 million parameters with a size of 26.30MB. A single forward pass of the CNN needs 4.6 GFLOPs operations.The desired pruning goal is to maximally shrink the model whilemaintaining the prediction accuracy to be above 92 . CNN

Prunercan help the user choose the optimal pruning solution by analyzingthe pruning process and revealing the pruning details, so as to improvethe pruning efﬁciency and ensures the accuracy of the pruned model.To demonstrate this, we use manual+estimator pruning in this study,which includes two major stages. The ﬁrst stage relies on statisticalinformation and immediate visual feedback from the system to removethe ﬁlters. The second stage uses the estimator to remove ﬁlters interac-tively in much ﬁner granularity. In addition, this section also comparesthe manual+estimator pruning with the automated only pruning andautomated+estimator pruning to show its advantages.

Stage 1: Rough-Pruning with Interactive Estimation of Thresh-olds (R2.2, R2.3).

After setting the dataset parameters and ﬁne-tuningparameters, we use the bubble plot in the

Model view to interactivelyprobe and determine the number of ﬁlters to be removed (Fig. 8). Asshown in Fig. 8-b, removing 50% of the ﬁlters does not seem to signif-icantly impact the model’s sensitivity (change by 6%) and instability(change by 85%). Therefore, we decided to remove 50% of the ﬁlters.After one round of ﬁne-tuning, we get Model 1 and the statistical in-formation corresponding to this model, as shown in Fig. 8-e. Thesestatistics reﬂect the difﬁculty level of the ﬁne-tuning process. For ex-ample, although the accuracy of Model 1 meets the requirements, theaccuracy ﬂuctuated signiﬁcantly over ﬁne-tuning (reﬂected by the longstrip in Fig. 8-e1). Also, the model’s training loss reduced a lot over theﬁne-tuning process (Fig. 8-e2). With these observations, we decided to

Fig. 8. The ﬁrst stage of manual pruning. The

Statistics view is theinformation corresponding to Model 8. remove fewer ﬁlters in the next iteration to guarantee a quick recovery.Note that, if the pruned model cannot be recovered after pruning 50%of the ﬁlters, we should restart again from the root node.In the second pruning iteration, we decided to delete 25% of theﬁlters (based on our observations of Model 1’s statistics). As expected,the accuracy ﬂuctuation and the training loss changed much less inthe pruning from Model 1 to Model 2 (i.e., the second pruning did notdamage the model as signiﬁcantly as the ﬁrst pruning iteration).We repeat the above pruning process with on-demand human-interventions until the model no longer meets the requirements. Overthis iterative process, we get a pruning tree, as shown in Fig. 8-a. Withthe pruning process going forward, the instability of the model grad-ually decreases (i.e., from Fig. 8-b, c, to d, the instability changesfrom 15%, over 66% to 73%). Meanwhile, the accuracy ﬂuctuation be-comes more and more violent (i.e., from Model 2 to Model 8, the rangechanges from 92% ∼

94% to 80% ∼ CNN

Prunercan directly control the pruning strategy to improve pruning efﬁciencyand prevent the model from being excessively damaged.

Stage 2: Fine-Pruning with a Real-Time Estimator (R2.3).

Fromthe pruning tree obtained in the ﬁrst stage (Fig. 8-a), we can see that thenumber of ﬁlters in the target model should be between that of Model 7and Model 8. At this stage, the estimator of

CNN

Pruner can be used tohelp the user better estimate the number of ﬁlters to be removed next.In the ﬁrst estimation, the target number of ﬁlters given by the estimatoris 182. Therefore, we prune Model 7 to Model 9, i.e., removed 19(201-182) ﬁlters. Using the estimator again, we ﬁnd that the number ofﬁlters in the target model is 174 (Fig. 9). At this time, the gap of ﬁlternumbers between the target model and the current model is only 8, sowe decided to terminate the pruning.The pruning process reduced the storage of the model from 26.30MB to 188 KB. The accuracy of the ﬁnal pruned model is 92.64%(92.96% for the cat and 92.32% for the dog, Fig. 9). The accuracyis reduced by 0.12% compared with the root model. The parametersof the model are reduced by 99.44%, and the computation needed forprocessing an image is reduced by 98.58%.

Comparison of three pruning strategies.

To highlight the prun-ing efﬁciency of the manual+estimator pruning, we compare it withanother two pruning strategies, i.e., the automated pruning and auto-mated+estimator pruning, as shown in Fig. 10. The automated pruningin Fig. 10-a uses the 1/2 auto-pruning plan, i.e., removing half of theﬁlters in each pruning iteration. From the result, we can see that model3 is the ﬁnal pruned model, and the results are worse than the other twostrategies. If we use a smaller removal number, e.g., removing 1% ﬁl- ig. 9. The second stage of manual+estimator pruning. The

Statistics view is the information corresponding to Model 9.Fig. 10. Comparison of three pruning strategies, (a) automated pruning,(b) automated+estimator pruning, (c) manual+estimator pruning. ters in each pruning, we will get a better result, but it will also increasethe pruning iterations, costing more computing resources and makingthe pruning less efﬁcient. Therefore, automated pruning is inﬂexibleand difﬁcult to achieve the best performance. The automated+estimatorpruning in Fig. 10-b contains two stages. The ﬁrst stage uses the 1/2auto-pruning plan and the second stage uses the estimator for ﬁnergranularity pruning. From the result, we can see that the estimator pro-vides guidance for ﬁne-pruning to help the user get an optimal model.But the large range between Model 3 and Model 4 is not preferableto the second stage of estimation, as it may affect the estimator’s per-formance. Besides the pruning strategy in Fig. 10-b used about 21%( ( − ) /

91, please check the total epoch numbers) of additionaltime than that of the strategy in Fig. 10-c (manual+estimator pruning).From these comparisons, we can clearly see how human interventionin the pruning process can help improve the pruning efﬁciency.As shown in [10], there should be an optimized sparse sub-networkstructure in a complex DNN, which can use fewer parameters to getthe same accuracy. Model pruning is an effective way to ﬁnd this kindof sparse sub-network structure. Our system targets to detect whetherthe sub-network has been damaged or not during pruning, and in turn,improve the effectiveness and efﬁciency of model pruning.

Our third study presents the case of using an image dataset of naturescenes [2] to diagnose the pruning process. The dataset contains 17,034images in 6 classes, 14,034 for training, and 3,000 for testing. The 6categories are: ‘buildings’, ‘forest’, ‘glacier’, ‘mountain’, ‘sea’, and‘street’. Example images from individual classes are shown in Fig. 11.

Fig. 11. Example images from the scene classiﬁcation dataset.

A CNN classiﬁer with six convolutional layers is used in this case,and its structure is shown in Fig. 12-b. The original well-trained model before any compression can achieve a prediction accuracy of 86.10%.Our pruning goal is to maximally shrink the model while maintainingthe prediction accuracy at above 85.00%. We used

CNN

Pruner to prunethe model and got the pruning tree in Fig. 12-a. After pruning, wereduced the number of ﬁlters in the model to 130, and the changes inthe model structure are shown in Fig. 12-b,c,d. Model 6 is our ﬁnalpruned model, and its accuracy is 85.16%. By analyzing the confusionmatrix, we found the model’s recognition accuracy for ‘buildings’dropped sharply from Model 4 to Model 6 (see Fig. 12-e2).It is worth mentioning that a model’s recognition power for differentclasses may not be equally important in various tasks. For example,in autonomous driving, recognizing pedestrians around a car is farmore important than recognizing the mountains several miles away.Therefore, in some model pruning tasks, domain experts care moreabout maintaining models’ recognition power for certain classes. Inthis case, we use

CNN

Pruner to present an in-depth analysis of theabnormal changes of the accuracy value, and demonstrate how thesystem can help to reﬁne the pruning plan to reduce its impact.

Reﬁning Pruning Plan (R3.2).

CNN

Pruner can be used to securethe prediction accuracy for the ‘buildings’ class, while maximallycompressing the model. From Model 4 to Model 6, the model’s overallaccuracy descends by 0.3%, resulting in 168 degenerated images and159 improved images. 40 out of the 168 degenerated images and 10 outof the 159 improved images have the true label ‘buildings’.We analyze the degenerated ‘buildings’ instances to ﬁnd out whypruning affects the recognition of this particular class. Fig. 13 showstwo degenerated instances of the class ‘buildings’. From the ﬁltervisualization matrix, we can see that the system deletes the ﬁlters thathave the lowest sensitivity and highest instability, i.e., Filter 0 and Filter5 (see the blue and green bar on the right of the ﬁlter visualization).However, to the class of ‘buildings’, the features captured by these twoﬁlters are not the least important. The area chart in the upper right of theﬁlter visualization displays the distribution of pixel values for the ﬁltervisualization image (feature map). In general, the more concentratedthe distribution is, the sharper the features are extracted. Comparing theeight distributions, Filter 1 and Filter 6 are the least important ones (for‘buildings’). The pixel value distributions for these two ﬁlters are morechaotic than others, and there are more noises in the correspondingfeature maps. The decision of deleting Filter 0 and 5, rather than Filter1 and 6, reduces the model’s power in recognizing ‘buildings’, which ishard to recover from the subsequent ﬁne-tuning process.Based on the above observation, we decided to reﬁne the pruningplan by removing Filter 1 and 6, but keep Filter 0 and 5. We set up anew branch from Model 4 and pruned it with the reﬁned plan to getModel 7, and the result is shown in Fig. 1. The accuracy of Model 7 is85.40% and 87.41% for the class ‘buildings’. Therefore, our systemoptimized the pruning plan through this in-depth analysis of the ﬁlters.To avoid the inﬂuence of randomness introduced during the ﬁne-turning process, we repeated the pruning multiple times to validateif our reﬁned pruning plan is indeed better. Speciﬁcally, we prunedModel 4 20 times, 10 times of which used the original plan, the other10 times used the reﬁned plan. After pruning, we got 20 pruned models.

Fig. 12. The result after model pruning. (e2) The accuracy changes forthe class ‘buildings’. (e3) The accuracy changes for the class ‘mountain’.ig. 13. Examples of degenerated instances from Model 4 to Model 6.

Original Plan Reﬁned PlanAll Categories Degenerated 158.3 157.0Improved 149.7 152.1Accuracy 85.18% 85.30%‘buildings’ Only Degenerated 35.1 20.2Improved 12.1 17.8Accuracy

Table 1. The statistics of two pruning plans (averaged over 10 runs).

Their statistics are shown in Table 1. From the table, we can see thatthe reﬁned pruning plan can effectively reduce the decreasing trend ofthe accuracy of class ‘buildings’ when pruning Model 4.

Interpreting Pruning Process (R3.1).

From Model 4 to Model 6,the accuracy for the class ‘mountain’ increased by 11.62% (Fig. 12-e3),resulting in 20 degenerated instances and 81 improved instances for thisclass. With

CNN

Pruner, we can interpret what has contributed to themodel improvement over the pruning. As shown in Fig. 14, we selectedsome images to analyze why the pruning plan improved the accuracyof ‘mountain’. The image in Fig. 14-a was mis-classiﬁed as ‘sea’ bythe model initially. The pruning removed Filter 5, which extracted themajority of the pixels for ‘sea’ in the image. As a result, the prunedmodel believes the image is more like a ‘mountain’, rather than ‘sea’.Similarly, in Fig. 14-b, Filter 5 mostly extracted the glacier features,which is probably why the image was mis-classiﬁed as ‘glacier’ beforepruning. Removing these noisy features makes the model concentratemore on the mountain and generate the correct prediction of ‘mountain’.

Fig. 14. Examples of the improved instances from Model 4 to Model 6.

Identify Confusing Images.

Additionally, from the investigationswith the degenerated image instances (from Model 4 to Model 6) with

CNN

Pruner, we also found images with improper labels. For example,the image in Fig. 15 is one of the degenerated instances with the truelabel ‘buildings’. The original image contains both street and buildings,and the street takes a major portion of the image. Although the imageis labeled as ‘buildings’, we feel ‘street’ is more proper for it. As thisimage only confuses the model, we recommend removing it from thetest dataset, which can make our model evaluation more objective.

ISCUSSION AND D OMAIN E XPERTS ’ F

EEDBACK

We conducted open-ended interviews with two machine learning ex-perts ( E , E ) to discuss the strengths, weaknesses, and potential exten-sions of CNN

Pruner. The experts’ research interests are acceleratingdeep neural networks, and model pruning is an important portion oftheir research works. We ﬁrst introduce the design goal of

CNN

Pruner

Fig. 15. The confusing image example from Model 4 to Model 6. and the individual visualization components to them (in about 30 min-utes). With the experts’ background on model pruning, they can quicklypick up the domain-related concepts and understand the functionalityof individual components, though it still took them some time to getfamiliar with the visualization and interaction of the system (about 60minutes). We then go through the cases presented in the Case Studiesand ask them to freely play with the system and provide feedback.In general, both experts felt positive on

CNN

Pruner, and they be-lieved that the model pruning process can be clearly and intuitivelypresented through visualization techniques. E likes the Tree view themost, as it can reveal the evolution of the pruned model quickly andallow users to reprocess the pruning interactively. The estimator in the

Tree view was very interesting to him, and he agreed that it could effec-tively help users determine the pruning depth in the last pruning stage. E appreciated the progressive pruning method proposed in CNN

Pruner.Through the proposed criteria (i.e., recovery capability, loss ﬂuctuation,and recovery cost), domain experts can evaluate the pruning processmore objectively. Both E and E were glad to see the effectiveness ofthe Filter view in interpreting CNNs and reﬁning pruning plans. Withthe existing techniques, it is still hard for them to thoroughly understandthe model pruning process from numerical statistics only.

CNN

Prunerprovides a practical way for them to interpret individual ﬁlters visuallyand understand their roles over the pruning process. Moreover, bothexperts agreed that the concepts of degenerated and improved instancesare beneﬁcial in effectively identifying images of interest.The experts also pointed out several limitations of

CNN

Pruner, aswell as some improvements that can be applied in the future. For ex-ample, E mentioned that for models with many classes, the ConfusionMatrix view may not scale well. We plan to improve this view by sup-porting the ﬁltering of different classes in the future. Also, the expertsprovided their domain feedback on how we can proceed further alongthis research direction. E suggested that we can extend model pruningto fully connected layers, as the parameters from these layers can takea considerably large portion of the networks in many scenarios. E recommended us to enhance the system by supporting the comparisonsof different pruning criteria. As model pruning is still a fast-growingtopic, he believed more and more criteria will be proposed. With oursystem, researchers can more intuitively compare different pruningplans, which in turn, will help them optimize the pruning process. ONCLUSION

In this work, we proposed

CNN

Pruner, a visual analytics system to helpmachine learning experts to understand, diagnose, and reﬁne the CNNpruning process.

CNN

Pruner contains four visualization componentsthat work together to reveal model details on different levels over theiterative pruning process. Two criteria and three metrics are used in

CNN

Pruner to estimate ﬁlters’ importance before pruning and evaluatethe pruned model’s quality after pruning. Both the pre-estimation andpost-evaluation facilitate users to make and reﬁne their pruning plans.Moreover, the capability of

CNN

Pruner in thoroughly examining thedegenerated and improved data instances within one pruning iterationplays an essential role in interpreting and diagnosing the pruned model.Through multiple case studies on CNN models with real-world sizes,we validated the effectiveness of

CNN

Pruner. A CKNOWLEDGMENTS

This work was supported by the Strategic Priority Research Programof the Chinese Academy of Sciences, grant No. XDA19080102. Thework was started at The Ohio State University when Guan was visitingthe GRAVITY research group. The authors would like to thank allGRAVITY members for their suggestions and insightful discussions.

EFERENCES [1] Dataset: Dogs vs. Cats (accessed: 2020-03-10). .[2] Dataset: scene classiﬁcation dataset (accessed: 2020-03-10). .[3] Flask (accessed: 2020-03-10). https://palletsprojects.com/p/flask .[4] PyTorch (accessed: 2020-03-10). https://pytorch.org .[5] A. Bilal, A. Jourabloo, M. Ye, X. Liu, and L. Ren. Do convolutional neuralnetworks learn class hierarchy.

IEEE Transactions on Visualization andComputer Graphics , 24(1):152–162, 2017.[6] M. A. Carreira-Perpinan and Y. Idelbayev. “learning-compression” algo-rithms for neural net pruning. In , pages 8532–8541, 2018.[7] J. Choo and S. Liu. Visual analytics for explainable deep learning.

IEEEComputer Graphics and Applications , 38(4):84–92, 2018.[8] X. Dong, S. Chen, and S. J. Pan. Learning to prune deep neural networksvia layer-wise optimal brain surgeon. In

Advances in Neural InformationProcessing Systems , pages 4857–4867, 2017.[9] A. Dubey, M. Chatterjee, and N. Ahuja. Coreset-based neural networkcompression. In

Proceedings of the European Conference on ComputerVision (ECCV) , pages 469–486, 2018.[10] J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse,trainable neural networks. In

ICLR 2019 : 7th International Conferenceon Learning Representations , 2019.[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchiesfor accurate object detection and semantic segmentation. In

CVPR ’14Proceedings of the 2014 IEEE Conference on Computer Vision and PatternRecognition , pages 580–587, 2014.[12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In

Advances in Neural Information Processing Systems 27 , pages 2672–2680,2014.[13] Y. Guo, A. Yao, and Y. Chen. Dynamic network surgery for efﬁcient dnns.In

NIPS’16 Proceedings of the 30th International Conference on NeuralInformation Processing Systems , pages 1387–1395, 2016.[14] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deepneural networks with pruning, trained quantization and huffman coding.In

ICLR 2016 : International Conference on Learning Representations2016 , 2016.[15] S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights andconnections for efﬁcient neural networks. In

NIPS’15 Proceedings of the28th International Conference on Neural Information Processing Systems ,pages 1135–1143, 2015.[16] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang. Soft ﬁlter pruning foraccelerating deep convolutional neural networks. In

IJCAI 2018: 27thInternational Joint Conference on Artiﬁcial Intelligence , pages 2234–2240,2018.[17] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang. Filter pruning via geo-metric median for deep convolutional neural networks acceleration. In , pages 4340–4349, 2019.[18] Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deepneural networks. In , pages 1398–1406, 2017.[19] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Viegas, and M. Wattenberg.Gan lab: Understanding complex deep generative models using interactivevisual experimentation.

IEEE Transactions on Visualization and ComputerGraphics , 25(1):310–320, 2018.[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcationwith deep convolutional neural networks.

Communications of The ACM ,60(6):84–90, 2017.[21] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.

Nature ,521(7553):436–444, 2015.[22] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learningapplied to document recognition.

Intelligent Signal Processing , pages306–351, 2001.[23] Y. LeCun, J. S. Denker, and S. A. Solla. Optimal brain damage. In

Advances in neural information processing systems , pages 598–605, 1990.[24] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruningﬁlters for efﬁcient convnets. In

ICLR 2017 : International Conference onLearning Representations 2017 , 2017. [25] M. Liu, J. Shi, K. Cao, J. Zhu, and S. Liu. Analyzing the training processesof deep generative models.

IEEE Transactions on Visualization andComputer Graphics , 24(1):77–87, 2018.[26] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better analysis ofdeep convolutional neural networks.

IEEE Transactions on Visualizationand Computer Graphics , 23(1):91–100, 2017.[27] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang. Learning efﬁ-cient convolutional networks through network slimming. In , pages 2755–2763,2017.[28] J.-H. Luo, J. Wu, and W. Lin. Thinet: A ﬁlter level pruning method fordeep neural network compression. In , pages 5068–5076, 2017.[29] Y. Ming, H. Qu, and E. Bertini. Rulematrix: Visualizing and understandingclassiﬁers with rules.

IEEE Transactions on Visualization and ComputerGraphics , 25(1):342–352, 2019.[30] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Belle-mare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen,C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra,S. Legg, and D. Hassabis. Human-level control through deep reinforce-ment learning.

Nature , 518(7540):529–533, 2015.[31] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convo-lutional neural networks for resource efﬁcient inference. In

ICLR 2017 :International Conference on Learning Representations 2017 , 2017.[32] N. Pezzotti, T. Hollt, J. V. Gemert, B. P. Lelieveldt, E. Eisemann, andA. Vilanova. Deepeyes: Progressive visual analytics for designing deepneural networks.

IEEE Transactions on Visualization and ComputerGraphics , 24(1):98–108, 2018.[33] D. Ren, S. Amershi, B. Lee, J. Suh, and J. D. Williams. Squares: Sup-porting interactive performance analysis for multiclass classiﬁers.

IEEETransactions on Visualization and Computer Graphics , 23(1):61–70, 2017.[34] F. Seide, G. Li, and D. Yu. Conversational speech transcription usingcontext-dependent deep neural networks. In

INTERSPEECH , pages 437–440, 2011.[35] K. Simonyan and A. Zisserman. Very deep convolutional networks forlarge-scale image recognition. In

ICLR 2015 : International Conferenceon Learning Representations 2015 , 2015.[36] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller. Strivingfor simplicity: The all convolutional net. In

ICLR (workshop track) , 2014.[37] C. Tai, T. Xiao, Y. Zhang, X. Wang, and W. E. Convolutional neuralnetworks with low-rank regularization. In

ICLR 2016 : InternationalConference on Learning Representations 2016 , 2016.[38] F. Tung and G. Mori. Clip-q: Deep network compression learning byin-parallel pruning-quantization. In , pages 7873–7882, 2018.[39] L. van der Maaten and G. Hinton. Visualizing data using t-sne.

Journal ofMachine Learning Research , 9:2579–2605, 2008.[40] J. Wang, L. Gou, H.-W. Shen, and H. Yang. Dqnviz: A visual analytics ap-proach to understand deep q-networks.

IEEE transactions on visualizationand computer graphics , 25(1):288–298, 2018.[41] J. Wang, L. Gou, H. Yang, and H.-W. Shen. Ganviz : A visual analyticsapproach to understand the adversarial game.

IEEE Transactions onVisualization and Computer Graphics , 24(6):1905–1917, 2018.[42] J. Wang, L. Gou, W. Zhang, H. Yang, and H.-W. Shen. Deepvid : Deepvisual interpretation and diagnosis for image classiﬁers via knowledgedistillation.

IEEE Transactions on Visualization and Computer Graphics ,25(6):2168–2180, 2019.[43] R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y.Lin, and L. S. Davis. Nisp: Pruning networks using neuron importancescore propagation. In , pages 9194–9203, 2018.[44] J. Yuan, C. Chen, W. Yang, M. Liu, J. Xia, and S. Liu. A survey of visualanalytics techniques for machine learning.

Computational Visual Media ,7(1), 2021.[45] J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learningmodels.

IEEE Transactions on Visualization and Computer Graphics ,25(1):364–373, 2019.[46] X. Zhang, J. Zou, K. He, and J. Sun. Accelerating very deep convolutionalnetworks for classiﬁcation and detection.