CNNPruner: Pruning Convolutional Neural Networks with Visual Analytics
Guan Li, Junpeng Wang, Han-Wei Shen, Kaixin Chen, Guihua Shan, Zhonghua Lu
CCNN
Pruner: Pruning Convolutional Neural Networkswith Visual Analytics
Guan Li, Junpeng Wang, Han-Wei Shen, Kaixin Chen, Guihua Shan, and Zhonghua Lu
Fig. 1.
CNN
Pruner: (a) the Tree view helps to track different pruning plans; (b) the Statistics view presents model-critic statistics tomonitor the pruned models; (c) the Model view enables users to interactively conduct the pruning with informative visual hints fromdifferent criteria; (d) the Filter view presents details of individual filters for users to investigate and interactively prune them.
Abstract — Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer visiontasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computationalresources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing lessimportant neurons and fine-tuning the pruned networks to minimize the accuracy loss. Nevertheless, existing automated pruningsolutions often rely on a numerical threshold of the pruning criteria, lacking the flexibility to optimally balance the trade-off betweenefficiency and accuracy. Moreover, the complicated interplay between the stages of neuron pruning and model fine-tuning makes thisprocess opaque, and therefore becomes difficult to optimize. In this paper, we address these challenges through a visual analyticsapproach, named
CNN
Pruner. It considers the importance of convolutional filters through both instability and sensitivity , and allowsusers to interactively create pruning plans according to a desired goal on model size or accuracy. Also,
CNN
Pruner integratesstate-of-the-art filter visualization techniques to help users understand the roles that different filters played and refine their pruningplans. Through comprehensive case studies on CNNs with real-world sizes, we validate the effectiveness of
CNN
Pruner.
Index Terms —visualization, model pruning, convolutional neural network, explainable artificial intelligence
NTRODUCTION
Convolutional neural networks (CNNs) have demonstrated extraordinar-ily good performance in many applications, such as image classication, • Guan Li, Kaixin Chen, Guihua Shan, and Zhonghua Lu are with ComputerNetwork Information Center, Chinese Academy of Sciences. They are alsowith University of Chinese Academy of Sciences. E-mail: { liguan, sgh,zhlu } @sccas.cn, [email protected].• Junpeng Wang is with Visa Research. E-mail: [email protected].• Han-Wei Shen is with The Ohio State University. E-mail: [email protected].• Guihua Shan is the corresponding author.Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publicationxx xxx. 201x; date of current version xx xxx. 201x. For information onobtaining reprints of this article, please send e-mail to: [email protected] Object Identifier: xx.xxxx/TVCG.201x.xxxxxxx object detection, and speech recognition [11, 20, 21, 34, 35]. The re-cent improvements of CNNs’ performance are often at the cost ofmodel sizes. It becomes increasingly more common now to see mod-els with hundreds of layers and millions of parameters. For example,VGG-16 [35] is a commonly used model for classification tasks. Ithas ∼ . ∼ . et al. [23] first im-proved the efficiency of their neural networks by removing unimportantmodel parameters (weights) based on information theory metrics. Ingeneral, model pruning algorithms can be divided into structured prun- a r X i v : . [ c s . C V ] S e p ig. 2. The iterative model pruning process of CNN models. ing and unstructured pruning [17]. Compared to unstructured pruning,which requires support from additional hardware to achieve excellentperformance, structured pruning has gradually dominated the recentdevelopments and has become a hot research topic. Most notably, filterpruning is an effective structured pruning method, which directly prunesfilters that are less relevant to the prediction outcomes to reduce models’size. There exist three key steps in a typical filter pruning pipeline: 1)filters evaluation; 2) filters pruning; 3) model fine-tuning. Frequently,the pipeline is executed in an automated yet iterative manner (Fig. 2),where the filters are removed based on hard thresholds, and the modelsare pruned multiple times to achieve the desired compression goal,without significantly compromising the models’ accuracy.Nevertheless, existing automated CNN pruning solutions lack theflexibility to optimally balance the trade-off between pruning efficiencyand prediction accuracy. The automated pruning usually removes afixed number or a fixed percentage of convolutional filters in each prun-ing. If too many filters are removed in one pruning iteration, the modelwill be severely damaged and difficult to recover. Conversely, if too fewfilters are deleted, the effectiveness of pruning will be significantly un-dermined. In practice, the degree of “over-parameterization” is relatedto the corresponding CNN’s size and the type of computer vision tasks,and in turn, different models have different abilities to recover from thedamage caused by filter removal. Using a fixed numerical thresholdas the criterion to remove filters in each pruning iteration ignores thecharacteristics of each model and may not lead to the optimal pruningsolution. Moreover, the complicated interplay between the stages of fil-ter pruning and model fine-tuning makes this process difficult to control,i.e., the automated pruning focuses solely on the accuracy of the prunedmodel, but pays little attention to the intermediate state changes. As aresult, the anomalies accumulated in the iterative pruning process mayget enlarged, which will affect the pruning efficiency and eventuallyimpact the accuracy of the pruned model.Focused on the above challenges, we propose CNN
Pruner, a visualanalytics system to help deep learning experts create interactive pruningplans and evaluate the pruning process.
CNN
Pruner contains four mainvisualization components (Fig. 1): (a) the
Tree View helps to overviewand track the altered models from iterative pruning stages; (b) the
Statistics View presents the loss/accuracy fluctuation, model recoverycapability, and recovery cost to help users adjust pruning strategy intime; (c) the
Model View , facilitated with two new metrics of instabilityand sensitivity, evaluate the importance of different CNN filters andenables users to interactively create pruning plans; (d) the
Filter View reveals the roles that different filters played in the prediction processand helps users interpret and prune the CNN model. We conductedcase studies with
CNN
Pruner on CNNs of real-world sizes to validateits effectiveness. To sum up, the contributions of our work are:• We design and develop a visual analytics system to help deeplearning experts progressively analyze the CNN pruning processand introduce interactive intervention to the process on-demand.• We introduce two metrics (instability and sensitivity) to assistmodel designers in better estimating filters’ importance beforepruning, and three criteria (recovery capability, loss fluctuation,and recovery cost) to evaluate the pruned model.• We examine data instances, where the intermediate pruned modelsbehave differently, to study the critical filters and interpret theinternal working mechanism of CNNs.
ELATED WORKS
In this section, we introduce the concept of model pruning and itsdevelopment in the field of deep learning. Also, we review the visualanalytics works in interpreting and diagnosing deep learning models.
Model pruning compresses deep learning models by removing lessimportant parameters and seeks a trade-off between model size andprediction accuracy. To date, many works have achieved good perfor-mance in neural network pruning, and we can roughly divide them intotwo categories [17], weighted pruning and filter pruning.Weight pruning is an unstructured pruning method that deletesweights or compresses weights in a filter. Han et al. [15] proposed amethod to reduce the model size by removing the unimportant con-nections. They applied this method to CNN models that are trainedfor the ImageNet dataset, and reduced parameters about 89% for theAlexNet model and about 92% for the VGG-16 model. Han et al. [14]used weight sharing on the basis of removing the unimportant con-nections [15], and employed the Huffman encoding to compress theweight to maximize the compression rate. Their experiment showsthat through the compression of the VGG-16 model, the method canreduce the memory consumption from 552 MB to 11.3 MB withoutcompromising the accuracy. Carreira-Perpinan et al. [6] proposed amethod to find unimportant weights by minimizing loss changes whilecompressing those weights. Some other studies have achieved goodresults through weight pruning [8,13,16,37,38,46], but weight pruningmay cause unstructured sparsities and requires support from additionalhardware to achieve excellent performance.Filter pruning is a structured pruning method that directly removesconvolutional filters from CNNs. Luo et al. [28] proposed a frame-work named ThiNet to help the user identify the unimportant filters bycomputing the statistical information of adjacent layers. Li et al. [24]proposed an acceleration method for CNNs by removing the filters andtheir feature maps, which could reduce the computation to 34% of theVGG-16 model. Molchanov et al. [31] proposed a new Taylor expan-sion criterion to find the filters which have little influence on the lossvalue and remove them to reduce the model size. Some other pruningstudies along this line also show good pruning results [9, 17, 18, 27, 43].Filter pruning keeps the regular structure of the model but significantlyreduces the calculation and storage cost, making it a popular solutionfor model compression. We focus on this approach in this work as well.Most of the aforementioned model pruning studies focus on propos-ing new pruning criteria and use a small fixed numerical threshold todetermine the number of filters to be removed. This is because theymainly focus on the accuracy of the final model but concern less on theintermediate pruning process. The small number of removal makes themodel recover easily from the pruning and often leads to an optimalpruning result. However, it prolongs the pruning process, as morepruning iterations will be needed to achieve the compression goal. Theprocess is usually not efficient and may incur a higher computationalcost to perform fine-tuning in each pruning.
Based on the taxonomy from [7, 26, 44], the visualizations for deepneural networks (DNNs) interpretations can roughly be categorizedinto three groups, targeting on model understanding [19, 26, 29, 41],model debugging [25, 32, 33, 42, 45], and model refinement [5, 40].To understand a DNN model, researchers usually use visualizationtechniques to show the internal structure and state information of themodel. For example, CNNVis [26] uses directed acyclic graphs toformulate the model architecture and help domain experts understandCNN through visualization. GANViz [41] helps the user understandthe model by visualizing and comparing the internal model states (i.e.,hidden activations) of the generative adversarial networks (GANs) [12]over the training process. GAN Lab [19] is an interactive visualiza-tion tool for non-experts to learn the GAN models, and it significantlyreduces the difficulty of understanding complex generative neural net-works by using visualization techniques.To debug/diagnose a DNN model, researchers usually define somevisual evaluation methods to assist in the analysis of the model. Forexample, DeepEyes [32] helps the user diagnose a CNN model byvisualizing the convolutional layers and convolutional filters. Based onthe active level of different filters, this system improves the efficiency ofmodel design by optimizing the network structure. DGMTracker [25]onitors and diagnoses the training process of deep generative modelsthrough the visualization of a large amount of time-series informationover time.For model improvement, researchers usually use visualizations tohelp users identify the weakness of the model. For example, DQNViz[40] exposes the details in the training process of deep Q-networks [30]and uses visualization techniques to extract useful patterns of the modelto better control the training. Blocks [5] uses visualization techniquesto analyze the impact of class hierarchy on the training of CNN models.Using the analysis results, the tool can accelerate model convergenceand alleviate the problem of overfitting.These studies have proved the effectiveness of visualization andvisual analytics in the machine learning filed. Our work focuses onCNN model pruning and uses visualization to help deep learning expertsto better understand and improve the pruning process of CNN models.We believe that with visualization and visual analytics, our system caneffectively improve the efficiency of model pruning.
ACKGROUND A ND C ONCEPTS
This section introduces the basic concepts of model pruning and a state-of-the-art filter visualization technique. Following them, we introducethe metrics used in this work and propose a novel evaluation concept.
This section describes the details of each step in the filter pruningprocess and introduces the Taylor expansion based filter evaluations.
Our work uses the Taylor expansion criterion [31] for filter pruning.Its idea is to remove filters and check how significant the removalwill impact the loss function, i.e., examine the importance of filtersby perturbation. The resulting importance values can then be used toprioritize filters during pruning. Mathematically, this process can bedenoted as: ∆ L ( f i ) = | L ( D , f i = ) − L ( D , f i ) | (1)where D is the training data, L () is the loss function, and f i is the output(i.e. feature map) produced from filter i , and L ( D , f i ) is the loss beforeany model perturbation. L ( D , f i = ) is the loss when f i is removed.Physically removing individual filters and recomputing the loss foreach removal is computationally expensive. But, the process can beapproximated through Taylor expansions, as demonstrated in [31], i.e., L ( D , f i = ) = L ( D , f i ) − ∂ L ∂ f i f i (2) ∆ L ( f i ) can then be transformed as follows: ∆ L ( f i ) = | L ( D , f i ) − ∂ L ∂ f i f i − L ( D , f i ) | = | ∂ L ∂ f i f i | (3)In Equation 3, we need to calculate the product of the feature mapand the gradient (the loss function w.r.t. to the feature map) to getthe estimated cost of removing the corresponding filter, and this valuecan be calculated through back-propagation. After the calculation, l − normalization is used to normalize the set of ∆ L values resultedfrom removing individual filters. With the normalized importancevalues, we can prioritize all filters and prune the less important ones. Wecall this process of choosing a proper importance criterion to prioritizefilters and decide the number of less important ones to remove as a pruning plan . Our objective is to derive efficient and effective pruningplans through interactive visual analytics. After removing the less important filters, the model structure is slightlydamaged, and its accuracy will drop. To recover the accuracy, weneed to retrain the model using the training dataset. As most of theimportant filters are still retained in the model, the original accuracy canbe recovered with a few numbers of training epochs. This process, i.e.,retrain the CNN model to recover its accuracy, is called fine-tuning . As described in Fig. 2, filters evaluation, filters pruning, and fine-tuning constitute one pruning iteration . Repeating the process multi-ple times, we can generate the pruned CNN model.
Our primary goal in this work is to remove less important filters. There-fore, we need a proper filter visualization technique to reveal whatindividual features have been captured and to verify their importance.Guided back-propagation [36], as one of the state-of-the-art filter visu-alization technique, is adopted in our work.Given an input image, this algorithm first performs a forward passto the target network-layer. It sets all activations of that layer to zero,except the one extracted by the filter that we want to analyze. Next, thealgorithm propagates the non-zero activations back to the input image tohighlight what was extracted by the corresponding filter. Therefore, theresulting filter visualization image will have the same size as the inputimage and highlight what individual filters have captured. We adoptthis filter visualization technique, as it can work well in interpretingfilters in deeper CNN layers [36]. It has also been adopted by multipleother model interpretation works [40].Fig. 3 shows some filter visualization examples through this guidedback-propagation technique. Four filters from Layer 0 of a 6-layerCNN is visualized, when taking a mountain image as input. From thehighlighted regions in the filter visualization result, Filter 0 and Filter 3capture the silhouette features of the mountain, whereas Filter 1 andFilter 2 capture the texture features of the mountain.
Fig. 3. An example of filter visualization. The input is a mountain image.Filter 0 and Filter 3 capture the silhouette features of the mountain,whereas Filter 1 and Filter 2 capture the texture features of the mountain.
Based upon the Taylor expansion algorithm explained in Sect. 3.1.1,we define one criterion and propose a new metric as another criterionfor filter pruning in our work.The
Sensitivity of a filter reflects the filter’s impact on the model’sloss when being removed. It is calculated using L2-normalized ∆ L (Equation 3). A filter with a lower Sensitivity value should be removedfirst to reduce the impact on the model.Notice that repeating the sensitivity calculation for the same filtermultiple times may result in different sensitivity values of the filter,due to the randomness inherited from the statistical parameter updateprocess. Specifically, the updates of CNN model parameters are often inthe unit of data batches. Feeding data batches into a CNN with shuffledorders will result in different parameter update orders and scales. Theimpact of this randomness is usually marginal to important filters, astheir sensitivity values are always large. However, for less importantones, their sensitivity values are minimal and can be easily influencedby this randomness. Therefore, for these less important ones, theirsensitivity orders may be very different from different calculations.We introduce the metric
Instability to accommodate the above issue,which is defined as the mean absolute deviation of the filters’ ranksfrom different calculations, i.e.,
Instability ( f j ) = ∑ ni = | ( Rank i ( f j ) − Rank ( f j )) | n (4)where n is the total time that we computed the sensitivity for individualfilters, Rank i ( f j ) is the ranking of the j th filter in the i th computation,and Rank ( f j ) is the average rankings for filter j . The instability of afilter reflects the uncertainty of the removal order, and often, the filterwith a higher instability indicates it is less important. We set n = .4 Degenerated Instances and Improved Instances Each pruning iteration improves and degenerates the model a little bit,and its prediction accuracy also changes, i.e., some data instances inthe test data have different predictions results from the original and thepruned model. To better index the subset of instances with differentpredictions from the two models, we define the following two concepts:
Degenerated Instances are images that are correctly predicted in theoriginal model but incorrectly predicted by the pruned model, i.e., thepruning hurts the model’s recognition ability on these images.
Improved Instances are images that are incorrectly predicted by theoriginal model but correctly predicted by the pruned model, i.e., thepruning improves the model’s recognition ability on these images.The test dataset used by a CNN model usually contains many images,and it is difficult to analyze the effect of the pruning on every singleimage. The degenerated instances and improved instances help usersto quickly locate the analysis target from the massive images, whichimproves the analysis efficiency.
ESIGN R EQUIREMENT
We worked with a couple of deep learning researchers and had somediscussions/interviews with them during the system design stages. Also,we investigated the related works on model pruning to identify thechallenges that deep learning experts are facing with. From thesediscussions and literature reviews, we found that proposing effectivepruning criteria is an important research topic, and the criteria are oftenevaluated by the accuracy of the pruned model. Based on differentcriteria, people often use a fixed number or a mathematical formula todecide the amount of filters to be removed in each pruning iteration,which lacks flexibility and is usually not efficient. For example, asmall removal count is often used to guarantee the model’s recoverycapability. However, the small number often leads to more pruningiterations, which inevitably prolongs the pruning process, costing morecomputing resources for model fine-tuning. Additionally, we noticedthat even if the original and pruned models have similar predictionaccuracy, their recognition power for different classes may be verydifferent. Revealing these details, along with other model-level details(e.g., model architecture evolution, recovery capabilities from pruning)are very important to understand the pruning process. Through theresponses from the experts and our studies of the existing works, wehave identified the following design requirements for
CNN
Pruner.•
R1: Display different levels of information about the CNNmodels during pruning.
Many intermediate CNN models aregenerated in the iterative process of model pruning, and our sys-tem needs to track and display the details of those models. Dis-playing these model information is the basis for understandingand exploring the pruning process, which requires
CNN
Pruner to: – R1.1: track the intermediate models generated over thepruning process and index the models effectively. – R1.2: display the states of the pruned models and monitorthe evolution of these states over the pruning process. – R1.3: visualize the internal structure of a selected CNNmodel (e.g., the original/intermediate/final pruned model)and its filters’ attributes.•
R2: Interactively analyze and decide the number of filters tobe removed in each pruning iteration.
After each pruning, themodel needs to be fine-tuned, and its prediction accuracy willchange. The experts want to minimize the computational cost forfine-tuning but restore the accuracy as much as possible. There-fore, they expect
CNN
Pruner to help them analyze the impactof pruning and select the appropriate removal amount in eachpruning. We, therefore, design
CNN
Pruner to be able to: – R2.1: estimate the influence of a pruning plan on the modelbefore the pruning actually happens (i.e., pre-estimation). – R2.2: evaluate the quality of the pruning plan and thepruned model after each pruning (i.e., post-evaluation). – R2.3: assist the user in better selecting or optimizing thenumber of filters to be removed in each pruning iteration.•
R3: Understand model pruning process and refine the prun-ing plan.
The convolutional filters are the basic units to be re-moved in each pruning. The in-depth analysis of them can helpthe user better understand the pruning process and identify theabnormal changes of the accuracy values for different classes ofthe studied dataset. Therefore,
CNN
Pruner needs to be able to: – R3.1: visualize the filters of interest and help the user under-stand the roles that different filters played during pruning. – R3.2: interactively refine the pruning plan by adding orremoving filters to be pruned to reduce undesired changesof the model over the pruning.
YSTEM O VERVIEW
Fig. 4 shows the architecture of
CNN
Pruner, which contains a back-endpowered by PyTorch [4], and a web-based front-end for visualizationand interaction. We use the Flask [3] library to support the communica-tion between the back-end and the front-end.
Fig. 4. The architecture of
CNN
Pruner, including a back-end powered byPyTorch and a web-based font-end visualization interface.
CNN
Pruner takes a pre-trained CNN model as input and outputs thepruned model. Users can flexibly interact with the four visualizationcomponents from the font-end to complete the above process. In detail,the
Tree view layouts the pre-trained (tree root) and post-pruned CNN(tree leaf), as well as all intermediate pruned models through a treestructure (R1.1). An estimator (R2.3) is equipped in this view tohelp users estimate a proper number of filters to be removed betweenadjacent tree nodes (i.e., CNN models). The
Statistics view (Fig. 1-b) shows the evolution of the model’s statistics over the process ofpruning (R1.2), where users can evaluate the pruning scheme throughthese statistics (R2.2). The
Model view (Fig. 1c) presents the internalstructure and the filter attributes of a selected tree node (i.e., a CNNmodel) from the
Tree view (R1.3). It is the main component thatallows users to interactively prune the selected model and providesimmediate feedback to the pruning operation to guide users towards anoptimal pruning plan. The
Filter view (Fig. 1d) presents details of theindividual filters to help users interpret them and interactively refine thepruning plan (R3.1, R3.2). All of the four visualization components arecoordinated, and they work together to meet the objective of helpingexperts understand, diagnose, and refine the pruning process of CNNs.
ISUAL A NALYTICS S YSTEM : CNN P RUNER
CNN
Pruner (Fig. 1) is composed of four visualization components,demonstrating different levels of CNN information and the pruningprocess. We provide the details of individual components in this section.
Tree
View
The
Tree view provides an overview of the iterative model pruningprocess (R1.1). The root and leaves of the tree are the original andthe pruned models respectively. Each branch of the tree (connectingthe root to a leaf) chains a sequence of intermediate models from theiterative pruning process. For each node, we use two horizontal filledrectangles to denote the corresponding model’s prediction accuracyand compression ratio, and one vertical rectangle to display the modelID. The system automatically generates the ID, and the ID of the rootmodel is 0. The vertical position of a node is decided by the numberof filters in the corresponding model (see the left vertical axis). Theedges connecting a pair of parent-child nodes represent the fine-tuningrocess, where we use gray and purple lines to indicate if the fine-tuning process converged or not after reaching the user-specified stopconditions (e.g., the maximum number of epochs). The
Tree view cantrack the whole pruning process, and each node in the tree represents amodel. When the mouse hovers over a node, a prompt bar will appearto show the storage size of the corresponding model. The user can clickon individual nodes of the tree to update the data displayed in otherviews. The node with the red border is the currently selected one.The
Tree view is also equipped with a pruning estimator, to helpusers balance the trade-off between model size and prediction accuracy(R2.3). It estimates the number of filters in a model and the model’sprediction accuracy, by linearly interpreting these values from a pair ofmost adjacent nodes from the tree. This rough linear estimation workswell based on our experiments, and we expect more sophisticatedinterpolation algorithms to yield better results. The pruning estimatornode is only visible when users pressing the black box icon on the topright corner of the tree view, and users can flexibly drag it vertically togenerate the estimations dynamically.This view has two important parameters to be configured before anypruning (using the buttons on the top of this view). One is the pruningmode, which could be automated or manual pruning. The other is thetermination criteria for fine-tuning. They are explained as follows:
Auto/Manual-Pruning.
For auto-pruning, users specify a fixed ratioof the filters to be removed, e.g., 1/2, 1/3, or 1/4 of the total amount,and
CNN
Pruner will iteratively remove the specified amount of filters(based on the pruning criteria) and fine-tune the model. The iterativeprocess runs until the pruned model fails to meet the desired need (e.g.,the prediction accuracy no longer meets the requirement). This processis automated but lack of pruning flexibility. Conversely, in manual-pruning, users can flexibly specify the number of filters to be removed(based on their distribution) in individual pruning iterations.
Termination Criteria for Fine-Tuning.
CNN
Pruner has three termi-nation criteria to finish the fine-tuning process, (1) Delta Loss, i.e., thechange of loss values, (2) the Target Accuracy, and (3) the MaximumEpoch. The fine-tuning will be terminated if any of them is met.
Statistics
View
The
Statistics view (Fig. 1-b) is used to display detailed statisticalinformation of the CNNs (R1.2). When the user selects a model in the
Tree view, the system will find a path from the current node to the rootnode where the models along the path form a pruning process. Thisview shows statistical information from multiple dimensions over thepruning process. There are five components in this view.
Confusion Matrix.
Fig. 1-b1 shows the confusion matrix of the cur-rent model (R1.2). Diagonal cells of the matrix represent the accuracyof true-positive instances (i.e., the percentage of correctly predictedimages in one category). Non-diagonal cells represent the percentageof incorrectly predicted images. Values (of the cells) from small tolarge are mapping to colors from light blue to dark blue. Clicking anycell of the matrix will show a line-chart presenting the value changesfor the corresponding cell during the pruning process (i.e., X-axis is thepruning iterations, Y-axis is the cell values from different iterations).The line-chart reflects the model’s prediction power for a particularcategory across the pruning process.
Recovery Capability.
This sub-view (Fig. 1-b2) reveals the model’srecovery capability after each pruning (R1.2), i.e., how difficult it isto learn the prediction power back over the fine-tuning process. TheX-axis is the model ID, and it represents different pruning iterations,whereas the Y-axis denotes the model’s prediction accuracy from indi-vidual iterations. The gray-curve connects the model’s final accuracyvalues after the individual fine-tuning process. The rectangular colorstripe at each iteration shows the distribution of the accuracy valuesfrom different epochs of the corresponding fine-tuning. A longer stripeindicates a more significant accuracy change before and after the fine-tuning. If the pruned filters have little effect on the model, the recoveryregion will be very short, and the accuracy fluctuation will be verysmall (i.e., a short strip with dark blue color). The information fromthis sub-view is an important criterion to evaluate the pruning plan.
Loss Fluctuation.
The
Loss Fluctuation sub-view shows the loss changes in the process of fine-tuning (R1.2). The X-axis in the chart isthe model ID, and the Y-axis is the loss value. The curve between thetwo IDs represents the fluctuation of the training loss in the fine-tuningprocess (between two models). The importance of a filter is estimatedbased on how significantly the loss will change when removing it.The loss values, quantifying the inconsistency between the predictedlabel and the true label, can effectively monitor the model’s evolution.If our pruning plan is good enough, the impact on the loss will besmall. Therefore, the fluctuation of loss is another important criterionto measure the pruning plan, and this sub-view helps users analyze thefine-tuning process by displaying the loss fluctuation.
Recovery Cost.
The
Recovery Cost sub-view shows the numberof epochs in the fine-tuning process through a bar chart (R1.2). TheX-axis of the chart is the model ID, and the Y-axis is the epoch count.If the pruning plan has little effect on the model, then only a smallamount of training epochs is needed in the fine-tuning process to re-cover the accuracy. Conversely, if over-pruning happens, even with alot of training epochs, it is still difficult to recover the accuracy. There-fore, the recovery cost is also a criterion to evaluate the pruning plan.Through this sub-view, the user can have an intuitive understanding ofthe recovery cost in the pruning process.
Parameters and Computation.
This sub-view displays the reducedamount of the model parameters and the computational cost. As isshown in Fig. 1-b5, the line chart displays the reduced number of pa-rameters, and the histogram displays a reduced amount of computations(R1.2). The pruning process removes filters from the network, thusreducing the size of parameters. Meanwhile, the size of the parametersis proportional to the amount of computation in the model. By calcu-lating the amount of computation for the process of one image in thetest dataset, users can estimate the running efficiency of the model inmobile/embedded devices. The user can verify if the pruned modelmeets the computation requirements or not through this chart.All sub-views, except the Confusion Matrix, can be scaled horizon-tally to take the full space of the Statistics view (by double-clickingthe corresponding sub-view). This interaction helps to scale the systemwhen the pruning process is long or involves many pruning iterations.It also reduces the information that users need to watch at once, helpingthem to better focus on a single metric at a time (rather than over-whelmed by all five statistical metrics).
Model
View
The visualization of the internal information of a CNN model can helpusers understand the state of CNN and make proper pruning plans.As shown in Fig. 1-c, we designed the
Model view to display thearchitecture of the studied CNN model (Fig. 1-c1), the evaluation offilters from the model (Fig. 1-c2), and the pruning plan (Fig. 1-c3).The architecture of the model selected in the
Tree view is displayed inFig. 1-c1 (R1.3, R2.1). Each box in the architecture diagram representsa layer of the model. Different colors represent different types of layers.In particular, we use the red box to represent the deleted filter, and thewidth of this box to denote the percentage of the deleted filters in thecurrent convolutional layer. The height of each box is proportional tothe size of the feature map. The number on the box is the number offilters in the corresponding convolutional layer.The visualization of filter evaluation is shown in Fig. 1-c2, whichconsists of a radar plot and a bubble plot (R1.3, R2.1). The radar plotshows the impact of the pruning plan on the current model. There arethree dimensions of information in the radar chart, namely, the numberof filters, the remaining sensitivity percentage, and the remaining in-stability percentage. The remaining percentage means the ratio of themetric between the model after this pruning iteration and the currentmodel. The bubble plot on the right shows the sensitivity and instabilityof each filter in each layer. In the bubble plot, each bubble represents afilter, and different layers have different colors. The X-axis representsthe sensitivity value, and hence the bubbles closer to the right are thefilters with more impact on the loss (i.e., important ones that shouldnot be pruned). The size of the bubbles represents the correspondingfilters’ instability, i.e., bigger ones correspond to larger values.The pruning plan is shown in Fig. 1-c3, and it shows the indices oflters that each layer will be removed (R2.1). Each circle representsone filter, and the number on the circle is the index of the filter. Thecircles of different layers use different colors, which are consistentwith the bubble plot above. The multi-color line under the circles isan overview of the number of filters to be removed in the pruning plan.Different colors represent different convolutional layers, and the lengthof the color segment represents the percentage of the removed filters inthe corresponding convolutional layer.This view displays information of the model selected from the
Tree view. The icons (i.e., the layer legends) on the right of the modelarchitecture support the filtering of different layers. For example, whenclicking the icon for convolutional layers, other layers, e.g., poolingand linear layers, will become transparent to help users better focus onthe layers in the analysis. There is a vertical slider in the bubble plot,and users can drag it to specify the pruning threshold. The bubbles onthe left of the slider are shown in the pruning plan and represent thefilters that will be removed in the current pruning. Meanwhile, the radarplot on the left shows the influence of pruning on the number of filters,the sensitivity, and the instability (R2.1). Dragging the slider will alsochange the width of the red boxes in the model architecture diagramand the proportion of different colors in the multi-color segment of thepruning plan (R2.1). Additionally, the system provides a set of buttonson the right of the bubble plot to help users quickly move the slider tocertain positions. Users can scale the bubble plot horizontally along thesensitivity axis to reduce the occlusion between bubbles. They can alsoswitch among different convolutional layers in the
Filter view throughthe convolutional buttons between the radar plot and the bubble plot.
Filter
View
The
Filter view allows the user to conduct an in-depth analysis of aspecific convolutional layer (R3.1, R3.2). As shown in Fig. 1-d, thisview consists of a scatter plot and a filter visualization matrix. Thepoints in the scatter plot represent the degenerated and the improvedinstances in the test dataset, and the color represents the category ofthe exemplars. We use the t-SNE [39] algorithm to process the imageinstances, and display them in the scatter plot. Our system uses thedegenerated and improved instances to distinguish sensitive images,which efficiently narrows down the analysis scope. The selected imagein the middle of the
Filter view shows the point that the user clicked inthe scatter plot. There are two lines of texts at the bottom of the image.The first line shows the image name and its true label. The second lineshows the labels of the image before and after the pruning, separatedby an arrow. In the filter visualization matrix, each item represents afilter, and the items with red borders will be deleted in current pruning.The image in each item is the visualization of the filter. The area charton the top right of the item shows the distribution of pixel values of thefilter visualization images. The blue and green bar below the area chartrepresent the sensitivity and instability of the filter, respectively.When the user selects a node in the
Tree view, the system retrievesthe degenerated and improved data instances according to the selectednode and its child node. The user can switch the displayed convolutionallayer in the
Layer view by clicking on the convolutional buttons in the
Model view (between the radar plot and the bubble plot). The scatterplot supports the filtering of different types of data instances throughthe icons on the upper right corner. After the user clicks one point inthe scatter plot, the selected image and the matrix view on the right willbe updated accordingly to reflect the selection. From the matrix view,the user can double-click any item to add/delete the corresponding filterto/from the current pruning plan.
ASE S TUDIES
In this section, through three cases we present how
CNN
Pruner canassist pruning, improve pruning efficiency, and optimize pruning plans.
The MNIST dataset [22] is a commonly used classification dataset. Itcontains 60,000 images for training and 10,000 images for testing. Wetrain a two-layer CNN to perform this classification task. There are 32filters in the first convolutional layer and 64 filters in the second. The
Fig. 5. The sensitivity and instability distribution of the root model. Theradar chart shows the influence of removing one third of the filters. network structure is shown in Fig. 6-c1. The accuracy of the model is98 . . CNN
Pruner, we need to set some nec-essary parameters before the pruning. First, we configure the datasetparameters to tell the system where the dataset is. Then, we set thefine-tuning parameters (i.e., set the Delta Loss to 0 . . . Tree view after setting the aboveparameters. By selecting this root node, we can observe the sensitivityand instability distribution of the model in the
Model view (Fig. 5). Forone pruning iteration, we want to minimize the impact of sensitivitywhile maximally decreasing the instability and the number of filters.From the estimated pruning results (in the radar chart), we see thatremoving one-third of the filters will preserve 96% of the sensitivity,and reduce 38% of the instability. We, therefore, believe we can usethe 1 / Tree view, we automatically prune the model and generate a pruning tree.Fig. 6-a is the pruning tree for this auto-pruning process. It showsthat the number of CNN filters is reduced from 96 to 10 after sixpruning iterations. The prediction accuracy changes marginally in thefirst five iterations, and the fine-tuning process converges well. Thepruned model from the sixth iteration failed to meet our requirement(i.e., the accuracy dropped to 98 . < . CNN
Pruner for further analysisof the auto-pruning process. Fig. 6-b1 shows the recovery ability andthe volatility of the six pruned models. As demonstrated by the shortand light blue bars, the “damage” introduced by the first three pruningoperations is small, and the pruned models can easily recover from it.Starting from the fourth iteration, the resilience of the model decreases,and the accuracy fluctuates more significantly. Fig. 6-b2 shows themodel’s loss function in the six fine-tuning iterations, which can reduceto the same level after individual fine-tuning iterations. For Model6, pruning has a large impact on the loss, and it cannot recover theaccuracy, even after 30 epochs re-training. Therefore, we think that theparameters of Model 6 are not enough to support the original accuracy.From the statistics in Fig. 6-b1 and 6-b2, we believe Model 5 is thebest candidate model to meet the compression goal. Fig. 6-b3 and 6-b4show that the number of operations in one forward pass of Model 5is ∼ ∼ . . . CNN
Pruner, we can reveal model-pruning details,such as model convergence, model accuracy, recovery ability, lossfluctuation, and recovery cost. These details can help the user betterunderstand the state changes of the model in the pruning process andevaluate the fine-tuning process. ig. 6. The result of CNN pruning. The system executed six prunings toget six models. The
Statistics view shows the information for Model 6.Fig. 7. The Cat&Dog dataset and the CNN model architecture.
Our second study presents the case of using the Cat&Dog dataset [1]to interactively achieve a pruning goal. The Cat&Dog dataset contains25,000 images of cat or dog (the two classes). We randomly select10,000 cat images and 10,000 dog images as the training dataset. Therest of the images are used for testing. A CNN with six convolutionallayers is trained to differentiate cats from dogs, and its structure isshown in Fig. 7. The original well-trained model before any compres-sion can achieve a prediction accuracy of 92.76%. The model contains2200 filters, which has 6.88 million parameters with a size of 26.30MB. A single forward pass of the CNN needs 4.6 GFLOPs operations.The desired pruning goal is to maximally shrink the model whilemaintaining the prediction accuracy to be above 92 . CNN
Prunercan help the user choose the optimal pruning solution by analyzingthe pruning process and revealing the pruning details, so as to improvethe pruning efficiency and ensures the accuracy of the pruned model.To demonstrate this, we use manual+estimator pruning in this study,which includes two major stages. The first stage relies on statisticalinformation and immediate visual feedback from the system to removethe filters. The second stage uses the estimator to remove filters interac-tively in much finer granularity. In addition, this section also comparesthe manual+estimator pruning with the automated only pruning andautomated+estimator pruning to show its advantages.
Stage 1: Rough-Pruning with Interactive Estimation of Thresh-olds (R2.2, R2.3).
After setting the dataset parameters and fine-tuningparameters, we use the bubble plot in the
Model view to interactivelyprobe and determine the number of filters to be removed (Fig. 8). Asshown in Fig. 8-b, removing 50% of the filters does not seem to signif-icantly impact the model’s sensitivity (change by 6%) and instability(change by 85%). Therefore, we decided to remove 50% of the filters.After one round of fine-tuning, we get Model 1 and the statistical in-formation corresponding to this model, as shown in Fig. 8-e. Thesestatistics reflect the difficulty level of the fine-tuning process. For ex-ample, although the accuracy of Model 1 meets the requirements, theaccuracy fluctuated significantly over fine-tuning (reflected by the longstrip in Fig. 8-e1). Also, the model’s training loss reduced a lot over thefine-tuning process (Fig. 8-e2). With these observations, we decided to
Fig. 8. The first stage of manual pruning. The
Statistics view is theinformation corresponding to Model 8. remove fewer filters in the next iteration to guarantee a quick recovery.Note that, if the pruned model cannot be recovered after pruning 50%of the filters, we should restart again from the root node.In the second pruning iteration, we decided to delete 25% of thefilters (based on our observations of Model 1’s statistics). As expected,the accuracy fluctuation and the training loss changed much less inthe pruning from Model 1 to Model 2 (i.e., the second pruning did notdamage the model as significantly as the first pruning iteration).We repeat the above pruning process with on-demand human-interventions until the model no longer meets the requirements. Overthis iterative process, we get a pruning tree, as shown in Fig. 8-a. Withthe pruning process going forward, the instability of the model grad-ually decreases (i.e., from Fig. 8-b, c, to d, the instability changesfrom 15%, over 66% to 73%). Meanwhile, the accuracy fluctuation be-comes more and more violent (i.e., from Model 2 to Model 8, the rangechanges from 92% ∼
94% to 80% ∼ CNN
Prunercan directly control the pruning strategy to improve pruning efficiencyand prevent the model from being excessively damaged.
Stage 2: Fine-Pruning with a Real-Time Estimator (R2.3).
Fromthe pruning tree obtained in the first stage (Fig. 8-a), we can see that thenumber of filters in the target model should be between that of Model 7and Model 8. At this stage, the estimator of
CNN
Pruner can be used tohelp the user better estimate the number of filters to be removed next.In the first estimation, the target number of filters given by the estimatoris 182. Therefore, we prune Model 7 to Model 9, i.e., removed 19(201-182) filters. Using the estimator again, we find that the number offilters in the target model is 174 (Fig. 9). At this time, the gap of filternumbers between the target model and the current model is only 8, sowe decided to terminate the pruning.The pruning process reduced the storage of the model from 26.30MB to 188 KB. The accuracy of the final pruned model is 92.64%(92.96% for the cat and 92.32% for the dog, Fig. 9). The accuracyis reduced by 0.12% compared with the root model. The parametersof the model are reduced by 99.44%, and the computation needed forprocessing an image is reduced by 98.58%.
Comparison of three pruning strategies.
To highlight the prun-ing efficiency of the manual+estimator pruning, we compare it withanother two pruning strategies, i.e., the automated pruning and auto-mated+estimator pruning, as shown in Fig. 10. The automated pruningin Fig. 10-a uses the 1/2 auto-pruning plan, i.e., removing half of thefilters in each pruning iteration. From the result, we can see that model3 is the final pruned model, and the results are worse than the other twostrategies. If we use a smaller removal number, e.g., removing 1% fil- ig. 9. The second stage of manual+estimator pruning. The
Statistics view is the information corresponding to Model 9.Fig. 10. Comparison of three pruning strategies, (a) automated pruning,(b) automated+estimator pruning, (c) manual+estimator pruning. ters in each pruning, we will get a better result, but it will also increasethe pruning iterations, costing more computing resources and makingthe pruning less efficient. Therefore, automated pruning is inflexibleand difficult to achieve the best performance. The automated+estimatorpruning in Fig. 10-b contains two stages. The first stage uses the 1/2auto-pruning plan and the second stage uses the estimator for finergranularity pruning. From the result, we can see that the estimator pro-vides guidance for fine-pruning to help the user get an optimal model.But the large range between Model 3 and Model 4 is not preferableto the second stage of estimation, as it may affect the estimator’s per-formance. Besides the pruning strategy in Fig. 10-b used about 21%( ( − ) /
91, please check the total epoch numbers) of additionaltime than that of the strategy in Fig. 10-c (manual+estimator pruning).From these comparisons, we can clearly see how human interventionin the pruning process can help improve the pruning efficiency.As shown in [10], there should be an optimized sparse sub-networkstructure in a complex DNN, which can use fewer parameters to getthe same accuracy. Model pruning is an effective way to find this kindof sparse sub-network structure. Our system targets to detect whetherthe sub-network has been damaged or not during pruning, and in turn,improve the effectiveness and efficiency of model pruning.
Our third study presents the case of using an image dataset of naturescenes [2] to diagnose the pruning process. The dataset contains 17,034images in 6 classes, 14,034 for training, and 3,000 for testing. The 6categories are: ‘buildings’, ‘forest’, ‘glacier’, ‘mountain’, ‘sea’, and‘street’. Example images from individual classes are shown in Fig. 11.
Fig. 11. Example images from the scene classification dataset.
A CNN classifier with six convolutional layers is used in this case,and its structure is shown in Fig. 12-b. The original well-trained model before any compression can achieve a prediction accuracy of 86.10%.Our pruning goal is to maximally shrink the model while maintainingthe prediction accuracy at above 85.00%. We used
CNN
Pruner to prunethe model and got the pruning tree in Fig. 12-a. After pruning, wereduced the number of filters in the model to 130, and the changes inthe model structure are shown in Fig. 12-b,c,d. Model 6 is our finalpruned model, and its accuracy is 85.16%. By analyzing the confusionmatrix, we found the model’s recognition accuracy for ‘buildings’dropped sharply from Model 4 to Model 6 (see Fig. 12-e2).It is worth mentioning that a model’s recognition power for differentclasses may not be equally important in various tasks. For example,in autonomous driving, recognizing pedestrians around a car is farmore important than recognizing the mountains several miles away.Therefore, in some model pruning tasks, domain experts care moreabout maintaining models’ recognition power for certain classes. Inthis case, we use
CNN
Pruner to present an in-depth analysis of theabnormal changes of the accuracy value, and demonstrate how thesystem can help to refine the pruning plan to reduce its impact.
Refining Pruning Plan (R3.2).
CNN
Pruner can be used to securethe prediction accuracy for the ‘buildings’ class, while maximallycompressing the model. From Model 4 to Model 6, the model’s overallaccuracy descends by 0.3%, resulting in 168 degenerated images and159 improved images. 40 out of the 168 degenerated images and 10 outof the 159 improved images have the true label ‘buildings’.We analyze the degenerated ‘buildings’ instances to find out whypruning affects the recognition of this particular class. Fig. 13 showstwo degenerated instances of the class ‘buildings’. From the filtervisualization matrix, we can see that the system deletes the filters thathave the lowest sensitivity and highest instability, i.e., Filter 0 and Filter5 (see the blue and green bar on the right of the filter visualization).However, to the class of ‘buildings’, the features captured by these twofilters are not the least important. The area chart in the upper right of thefilter visualization displays the distribution of pixel values for the filtervisualization image (feature map). In general, the more concentratedthe distribution is, the sharper the features are extracted. Comparing theeight distributions, Filter 1 and Filter 6 are the least important ones (for‘buildings’). The pixel value distributions for these two filters are morechaotic than others, and there are more noises in the correspondingfeature maps. The decision of deleting Filter 0 and 5, rather than Filter1 and 6, reduces the model’s power in recognizing ‘buildings’, which ishard to recover from the subsequent fine-tuning process.Based on the above observation, we decided to refine the pruningplan by removing Filter 1 and 6, but keep Filter 0 and 5. We set up anew branch from Model 4 and pruned it with the refined plan to getModel 7, and the result is shown in Fig. 1. The accuracy of Model 7 is85.40% and 87.41% for the class ‘buildings’. Therefore, our systemoptimized the pruning plan through this in-depth analysis of the filters.To avoid the influence of randomness introduced during the fine-turning process, we repeated the pruning multiple times to validateif our refined pruning plan is indeed better. Specifically, we prunedModel 4 20 times, 10 times of which used the original plan, the other10 times used the refined plan. After pruning, we got 20 pruned models.
Fig. 12. The result after model pruning. (e2) The accuracy changes forthe class ‘buildings’. (e3) The accuracy changes for the class ‘mountain’.ig. 13. Examples of degenerated instances from Model 4 to Model 6.
Original Plan Refined PlanAll Categories Degenerated 158.3 157.0Improved 149.7 152.1Accuracy 85.18% 85.30%‘buildings’ Only Degenerated 35.1 20.2Improved 12.1 17.8Accuracy
Table 1. The statistics of two pruning plans (averaged over 10 runs).
Their statistics are shown in Table 1. From the table, we can see thatthe refined pruning plan can effectively reduce the decreasing trend ofthe accuracy of class ‘buildings’ when pruning Model 4.
Interpreting Pruning Process (R3.1).
From Model 4 to Model 6,the accuracy for the class ‘mountain’ increased by 11.62% (Fig. 12-e3),resulting in 20 degenerated instances and 81 improved instances for thisclass. With
CNN
Pruner, we can interpret what has contributed to themodel improvement over the pruning. As shown in Fig. 14, we selectedsome images to analyze why the pruning plan improved the accuracyof ‘mountain’. The image in Fig. 14-a was mis-classified as ‘sea’ bythe model initially. The pruning removed Filter 5, which extracted themajority of the pixels for ‘sea’ in the image. As a result, the prunedmodel believes the image is more like a ‘mountain’, rather than ‘sea’.Similarly, in Fig. 14-b, Filter 5 mostly extracted the glacier features,which is probably why the image was mis-classified as ‘glacier’ beforepruning. Removing these noisy features makes the model concentratemore on the mountain and generate the correct prediction of ‘mountain’.
Fig. 14. Examples of the improved instances from Model 4 to Model 6.
Identify Confusing Images.
Additionally, from the investigationswith the degenerated image instances (from Model 4 to Model 6) with
CNN
Pruner, we also found images with improper labels. For example,the image in Fig. 15 is one of the degenerated instances with the truelabel ‘buildings’. The original image contains both street and buildings,and the street takes a major portion of the image. Although the imageis labeled as ‘buildings’, we feel ‘street’ is more proper for it. As thisimage only confuses the model, we recommend removing it from thetest dataset, which can make our model evaluation more objective.
ISCUSSION AND D OMAIN E XPERTS ’ F
EEDBACK
We conducted open-ended interviews with two machine learning ex-perts ( E , E ) to discuss the strengths, weaknesses, and potential exten-sions of CNN
Pruner. The experts’ research interests are acceleratingdeep neural networks, and model pruning is an important portion oftheir research works. We first introduce the design goal of
CNN
Pruner
Fig. 15. The confusing image example from Model 4 to Model 6. and the individual visualization components to them (in about 30 min-utes). With the experts’ background on model pruning, they can quicklypick up the domain-related concepts and understand the functionalityof individual components, though it still took them some time to getfamiliar with the visualization and interaction of the system (about 60minutes). We then go through the cases presented in the Case Studiesand ask them to freely play with the system and provide feedback.In general, both experts felt positive on
CNN
Pruner, and they be-lieved that the model pruning process can be clearly and intuitivelypresented through visualization techniques. E likes the Tree view themost, as it can reveal the evolution of the pruned model quickly andallow users to reprocess the pruning interactively. The estimator in the
Tree view was very interesting to him, and he agreed that it could effec-tively help users determine the pruning depth in the last pruning stage. E appreciated the progressive pruning method proposed in CNN
Pruner.Through the proposed criteria (i.e., recovery capability, loss fluctuation,and recovery cost), domain experts can evaluate the pruning processmore objectively. Both E and E were glad to see the effectiveness ofthe Filter view in interpreting CNNs and refining pruning plans. Withthe existing techniques, it is still hard for them to thoroughly understandthe model pruning process from numerical statistics only.
CNN
Prunerprovides a practical way for them to interpret individual filters visuallyand understand their roles over the pruning process. Moreover, bothexperts agreed that the concepts of degenerated and improved instancesare beneficial in effectively identifying images of interest.The experts also pointed out several limitations of
CNN
Pruner, aswell as some improvements that can be applied in the future. For ex-ample, E mentioned that for models with many classes, the ConfusionMatrix view may not scale well. We plan to improve this view by sup-porting the filtering of different classes in the future. Also, the expertsprovided their domain feedback on how we can proceed further alongthis research direction. E suggested that we can extend model pruningto fully connected layers, as the parameters from these layers can takea considerably large portion of the networks in many scenarios. E recommended us to enhance the system by supporting the comparisonsof different pruning criteria. As model pruning is still a fast-growingtopic, he believed more and more criteria will be proposed. With oursystem, researchers can more intuitively compare different pruningplans, which in turn, will help them optimize the pruning process. ONCLUSION
In this work, we proposed
CNN
Pruner, a visual analytics system to helpmachine learning experts to understand, diagnose, and refine the CNNpruning process.
CNN
Pruner contains four visualization componentsthat work together to reveal model details on different levels over theiterative pruning process. Two criteria and three metrics are used in
CNN
Pruner to estimate filters’ importance before pruning and evaluatethe pruned model’s quality after pruning. Both the pre-estimation andpost-evaluation facilitate users to make and refine their pruning plans.Moreover, the capability of
CNN
Pruner in thoroughly examining thedegenerated and improved data instances within one pruning iterationplays an essential role in interpreting and diagnosing the pruned model.Through multiple case studies on CNN models with real-world sizes,we validated the effectiveness of
CNN
Pruner. A CKNOWLEDGMENTS
This work was supported by the Strategic Priority Research Programof the Chinese Academy of Sciences, grant No. XDA19080102. Thework was started at The Ohio State University when Guan was visitingthe GRAVITY research group. The authors would like to thank allGRAVITY members for their suggestions and insightful discussions.
EFERENCES [1] Dataset: Dogs vs. Cats (accessed: 2020-03-10). .[2] Dataset: scene classification dataset (accessed: 2020-03-10). .[3] Flask (accessed: 2020-03-10). https://palletsprojects.com/p/flask .[4] PyTorch (accessed: 2020-03-10). https://pytorch.org .[5] A. Bilal, A. Jourabloo, M. Ye, X. Liu, and L. Ren. Do convolutional neuralnetworks learn class hierarchy.
IEEE Transactions on Visualization andComputer Graphics , 24(1):152–162, 2017.[6] M. A. Carreira-Perpinan and Y. Idelbayev. “learning-compression” algo-rithms for neural net pruning. In , pages 8532–8541, 2018.[7] J. Choo and S. Liu. Visual analytics for explainable deep learning.
IEEEComputer Graphics and Applications , 38(4):84–92, 2018.[8] X. Dong, S. Chen, and S. J. Pan. Learning to prune deep neural networksvia layer-wise optimal brain surgeon. In
Advances in Neural InformationProcessing Systems , pages 4857–4867, 2017.[9] A. Dubey, M. Chatterjee, and N. Ahuja. Coreset-based neural networkcompression. In
Proceedings of the European Conference on ComputerVision (ECCV) , pages 469–486, 2018.[10] J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse,trainable neural networks. In
ICLR 2019 : 7th International Conferenceon Learning Representations , 2019.[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchiesfor accurate object detection and semantic segmentation. In
CVPR ’14Proceedings of the 2014 IEEE Conference on Computer Vision and PatternRecognition , pages 580–587, 2014.[12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In
Advances in Neural Information Processing Systems 27 , pages 2672–2680,2014.[13] Y. Guo, A. Yao, and Y. Chen. Dynamic network surgery for efficient dnns.In
NIPS’16 Proceedings of the 30th International Conference on NeuralInformation Processing Systems , pages 1387–1395, 2016.[14] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deepneural networks with pruning, trained quantization and huffman coding.In
ICLR 2016 : International Conference on Learning Representations2016 , 2016.[15] S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights andconnections for efficient neural networks. In
NIPS’15 Proceedings of the28th International Conference on Neural Information Processing Systems ,pages 1135–1143, 2015.[16] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang. Soft filter pruning foraccelerating deep convolutional neural networks. In
IJCAI 2018: 27thInternational Joint Conference on Artificial Intelligence , pages 2234–2240,2018.[17] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang. Filter pruning via geo-metric median for deep convolutional neural networks acceleration. In , pages 4340–4349, 2019.[18] Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deepneural networks. In , pages 1398–1406, 2017.[19] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Viegas, and M. Wattenberg.Gan lab: Understanding complex deep generative models using interactivevisual experimentation.
IEEE Transactions on Visualization and ComputerGraphics , 25(1):310–320, 2018.[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classificationwith deep convolutional neural networks.
Communications of The ACM ,60(6):84–90, 2017.[21] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.
Nature ,521(7553):436–444, 2015.[22] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learningapplied to document recognition.
Intelligent Signal Processing , pages306–351, 2001.[23] Y. LeCun, J. S. Denker, and S. A. Solla. Optimal brain damage. In
Advances in neural information processing systems , pages 598–605, 1990.[24] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruningfilters for efficient convnets. In
ICLR 2017 : International Conference onLearning Representations 2017 , 2017. [25] M. Liu, J. Shi, K. Cao, J. Zhu, and S. Liu. Analyzing the training processesof deep generative models.
IEEE Transactions on Visualization andComputer Graphics , 24(1):77–87, 2018.[26] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better analysis ofdeep convolutional neural networks.
IEEE Transactions on Visualizationand Computer Graphics , 23(1):91–100, 2017.[27] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang. Learning effi-cient convolutional networks through network slimming. In , pages 2755–2763,2017.[28] J.-H. Luo, J. Wu, and W. Lin. Thinet: A filter level pruning method fordeep neural network compression. In , pages 5068–5076, 2017.[29] Y. Ming, H. Qu, and E. Bertini. Rulematrix: Visualizing and understandingclassifiers with rules.
IEEE Transactions on Visualization and ComputerGraphics , 25(1):342–352, 2019.[30] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Belle-mare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen,C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra,S. Legg, and D. Hassabis. Human-level control through deep reinforce-ment learning.
Nature , 518(7540):529–533, 2015.[31] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convo-lutional neural networks for resource efficient inference. In
ICLR 2017 :International Conference on Learning Representations 2017 , 2017.[32] N. Pezzotti, T. Hollt, J. V. Gemert, B. P. Lelieveldt, E. Eisemann, andA. Vilanova. Deepeyes: Progressive visual analytics for designing deepneural networks.
IEEE Transactions on Visualization and ComputerGraphics , 24(1):98–108, 2018.[33] D. Ren, S. Amershi, B. Lee, J. Suh, and J. D. Williams. Squares: Sup-porting interactive performance analysis for multiclass classifiers.
IEEETransactions on Visualization and Computer Graphics , 23(1):61–70, 2017.[34] F. Seide, G. Li, and D. Yu. Conversational speech transcription usingcontext-dependent deep neural networks. In
INTERSPEECH , pages 437–440, 2011.[35] K. Simonyan and A. Zisserman. Very deep convolutional networks forlarge-scale image recognition. In
ICLR 2015 : International Conferenceon Learning Representations 2015 , 2015.[36] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller. Strivingfor simplicity: The all convolutional net. In
ICLR (workshop track) , 2014.[37] C. Tai, T. Xiao, Y. Zhang, X. Wang, and W. E. Convolutional neuralnetworks with low-rank regularization. In
ICLR 2016 : InternationalConference on Learning Representations 2016 , 2016.[38] F. Tung and G. Mori. Clip-q: Deep network compression learning byin-parallel pruning-quantization. In , pages 7873–7882, 2018.[39] L. van der Maaten and G. Hinton. Visualizing data using t-sne.
Journal ofMachine Learning Research , 9:2579–2605, 2008.[40] J. Wang, L. Gou, H.-W. Shen, and H. Yang. Dqnviz: A visual analytics ap-proach to understand deep q-networks.
IEEE transactions on visualizationand computer graphics , 25(1):288–298, 2018.[41] J. Wang, L. Gou, H. Yang, and H.-W. Shen. Ganviz : A visual analyticsapproach to understand the adversarial game.
IEEE Transactions onVisualization and Computer Graphics , 24(6):1905–1917, 2018.[42] J. Wang, L. Gou, W. Zhang, H. Yang, and H.-W. Shen. Deepvid : Deepvisual interpretation and diagnosis for image classifiers via knowledgedistillation.
IEEE Transactions on Visualization and Computer Graphics ,25(6):2168–2180, 2019.[43] R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y.Lin, and L. S. Davis. Nisp: Pruning networks using neuron importancescore propagation. In , pages 9194–9203, 2018.[44] J. Yuan, C. Chen, W. Yang, M. Liu, J. Xia, and S. Liu. A survey of visualanalytics techniques for machine learning.
Computational Visual Media ,7(1), 2021.[45] J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learningmodels.
IEEE Transactions on Visualization and Computer Graphics ,25(1):364–373, 2019.[46] X. Zhang, J. Zou, K. He, and J. Sun. Accelerating very deep convolutionalnetworks for classification and detection.