EasyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing
EEasyConvPooling: Random Pooling with Easy Convolution forAccelerating Training and Testing
Jianzhong Sheng ∗ Huazhong University of Science and TechnologyCity University of Hong [email protected]
Chuanbo Chen
Huazhong University of Science and [email protected]
Chenchen Fu
City University of Hong [email protected]
Chun Jason Xue † City University of Hong [email protected]
ABSTRACT
Convolution operations dominate the overall execution time ofConvolutional Neural Networks (CNNs). This paper proposes aneasy yet efficient technique for both Convolutional Neural Networktraining and testing. The conventional convolution and poolingoperations are replaced by Easy Convolution and Random Pooling(ECP). In ECP, we randomly select one pixel out of four and onlyconduct convolution operations of the selected pixel. As a result,only a quarter of the conventional convolution computations areneeded. Experiments demonstrate that the proposed EasyConvPool-ing can achieve 1.45x speedup on training time and 1.64x on testingtime. What’s more, a speedup of 5.09x on pure Easy Convolutionoperations is obtained compared to conventional convolution oper-ations.
CCS CONCEPTS • Computer systems organization → Embedded systems ; Re-dundancy ; Robotics; •
Networks → Network reliability;
KEYWORDS
Easy Convolution, Random Pooling, Training, Testing
ACM Reference Format:
Jianzhong Sheng, Chuanbo Chen, Chenchen Fu, and Chun Jason Xue. 2018.EasyConvPooling: Random Pooling with Easy Convolution for AcceleratingTraining and Testing. In
Proceedings of Archive.
ACM, New York, NY, USA,9 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Convolutional Neural Networks (CNNs) are a promising class ofmachine learning algorithms that achieves remarkable performancein various computer vision tasks, e.g., image classification [18]. Oneof the key reason for this success is their deep architecture [23]. It ∗ Author † Corresponding AuthorPermission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).
Archive, Submission ID, 2286011 © 2018 Copyright held by the owner/author(s).ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.https://doi.org/10.1145/nnnnnnn.nnnnnnn has been proved that deeper architecture makes better performance.As a result, the performance of CNNs over past few years hasbeen improved mainly by designing a deeper architecture. It is notuncommon for a neural network to have massive parameters in itsmodel, costing more time to train and test the network.In this study, we propose an effective technique called EasyCon-vPooling (ECP) to accelerate both training and testing. EasyCon-vPooling is consist of two parts: Easy Convolution and RandomPooling. In Random Pooling, we select one pixel out of four ran-domly, and then compute convolution of the selected pixel only.This leads to reduction in 75% convolution computation comparedto conventional convolution and thus reducing both training timeand testing time.In order to realize the proposed method, we are facing two ques-tions. The first question is how to determine the selected pixel inRandom Pooling and how to obtain its index for conducting EasyConvolution in the upper layer. For selecting pixel, we randomlyappoint one pixel out of four to be the “lucky” pixel; for its index,we keep the index of the “lucky” pixel which is appointed beforepooling. This does not lead to significant loss in accuracy.The second question is how to conduct Easy Convolution in se-lected mode and keep the shape of output feature map unchanged.In order to solve this problem, we determine the mode of EasyConvolution according to the index of the selected pixel to assurethat they match. Based on the experiments, we find that conduct-ing Random Pooling alone does reduce training time and testingtime. The reduction grows to be more significant when combinedwith Easy Convolution. Experimental results demonstrate that theproposed ECP achieves 1.45x speedup on training time and 1.64xspeedup on testing time. In addition, we obtain a speedup of 5.09xon pure Easy Convolution operations compared to conventionalconvolution operations.The contributions of this work are as follows: • Proposes a novel EasyConvPooling technique to conductconvolution and pooling, in which only 25% of conventionconvolution operations is needed. • Proposes an universal technique to accelerate both trainingand testing. • The proposed novel technique (ECP) can be transfered toany other platform supporting Python.Remainder of this paper is organized as follows. Section 2 sum-marizes the related work. Section 3 presents the proposed method. a r X i v : . [ c s . C V ] J un rchive, Submission ID, 2286011 Jianzhong Sheng, Chuanbo Chen, Chenchen Fu, and Chun Jason Xue Section 4 compares the proposed method with the state-of-the-artschemes. Finally, we conclude the paper in Section 5.
Lots of algorithm are proposed for accelerating convolutional neuralnetwork. Han et al. [8–12] proposed pruning methods to cut offunimportant connections in fully connected layer to avoid uselesscomputation. These methods can be applied to CPU, GPU, FPGAand ASIC, achieving speedup of 13x, 10x for VGG-16 [24] and LSTM[13] respectively. These optimizations on fully connected layersdiffer from ours, we are focusing on the most time consuming partof convolutional neural network.Liu et al. [22], Yuan et al. [29], Feng et al. [7], Lebedev et al. [19],Wen et al. [27], Denil et al. [5], Denton et al. [6], Jaderberg et al.[16], Ioannou et al. [15] and Tai et al. [26] proposed weight sparsitymethods to utilize the sparsity in weights. They increased zeroelements in weight matrices to make the matrices sparse, and thusreduced data to be stored and computed. Their methods make useof sparsity to reduce computation, however, it remains unknownhow much time they cost to transfer these weights to be sparse.Courbariaux et al. [3, 4], Lin et al. [21], Baldassi et al. [1], Chenget al. [2], Kim et al. [17] and Hwang et al. [14] proposed Binary andTernary network to make weights constrained to 0 and ±
1. As aresult, lots of multiplications are reduced, this makes it possiblefor FPGA. Their methods convert weights to 0 and ±
1, and makeuse of 0 and ± Max Pooling [28] and Average Pooling [20] are mostly used pool-ing methods in conventional Convolutional Neural Networks. MaxPooling conducts pooling by selecting the max value pixel to repre-sent the output of pooling window while Average Pooling computestheir average value to be output. Both Max Pooling and AveragePooling make full convolution operations of the four pixels requiredin pooling window to output one pixel only. This is where we can re-duce 75% convolution operations by conducting Easy Convolutionand Random Pooling.
We propose an easy yet efficient technique called EasyConvPooling(ECP) for Convolutional Neural Networks to conduct convolutionoperations and pooling. In the proposed method ECP, only 25%of original convolution operations are done, which reduces 75%multiplications in convolutions with little loss in accuracy. ECP isconsist of two parts: Easy Convolution and Random Pooling. Hereis how to conduct ECP: • Randomly set Mode K. • Determine the positions of selected pixels for Random Pool-ing and Easy Convolution. • Conduct Easy Convolution on the selected pixels and padthe neighbor pixels to recover the output shape for poolinglayer. • Conduct Random Pooling.In the following subsection, we first present the architecture ofthe network, conventional convolution and pooling, then describeRandom Pooling and Easy Convolution in details.
Figure 1 shows overall architecture of the proposed network forconducting ECP compared to conventional convolution and pooling.In Figure 1, we design a two-convolution neural network with twofully connected layers. Each convolution layer is followed by apooling layer and a ReLU layer. In the fully connected layers, weadd one ReLU layer at the end of the first layer and connect thesecond fully connected layer to the Softmax layer directly.The upper part of Figure 1 is the proposed ECP technique andthe lower part is conventional way to do convolution and pooling,such as Average Pooling and Max Pooling. In ECP, we replace con-ventional convolution and pooling operations by Easy Convolutionand Random Pooling. Both Easy Convolution and Random Poolinghave a Mode K to control operation mode. In order to assure thatthey are matched under the same Mode K, each Easy Convolutionlayer is followed by a Random Pooling layer.
Convolution operations occupy the most time of CNNs, and inFigure 2 we look into conventional convolution and pooling to makean overall view of the conventional convolution and pooling. In thefollowing subsection, we will describe and compare conventionalconvolution and pooling with the proposed Easy Convolution andRandom Pooling in details.In Figure 2, every sliding (convolution) window is consist of fourpixels, taking kernel size 2 × W ∗ x ( m , n ) = (cid:213) u (cid:213) v W ( u , v ) x ( m + u , n + v ) where x is an input image and W is a weight matrix of the convo-lution filter. The operator ‘ ∗ ’ means 2D convolution.In the pooling layer, output is calculated by selecting one pixelout of four to represent the whole four pixels. In Average Pooling,the output is the average value of the four pixels in feature map; inMax Pooling, we select the max value pixel as the output pixel.In short, we compute four conventional convolutions to formthe pooling elements required for pooling window. However, theoutput of both Average Pooling and Max Pooling are one pixel only,wasting extra 75% convolutions. If we can determine which pixel asyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing Archive, Submission ID, 2286011 Easy Convolution Random Pooling
ReLU
Convolution Ave/Max
Pooling
ReLU Easy Convolution Random Pooling
ReLU
Convolution Ave/Max
Pooling
ReLU SoftmaxFully
Connected Layer ReLU
Fully
Connected LayerFully
Connected Layer ReLU
Fully
Connected Layer Output
Conventional Convolution and PoolingEasyConvPooling (ECP) EasyConvPooling (ECP)Conventional Convolution and Pooling
Figure 1: Architecture of the Network.
Feature Map
Pixel0Pixel1 Pixel2Pixel3
Input Image
Max3 Avg1.5
Sliding Window 0 Sliding Window 2Sliding Window 1Sliding Window 3
PoolingConventional Convolution
Figure 2: Conventional Convolution and Pooling. to be selected in the pooling layer, we can reduce the extra 75%convolutions in convolution layer. That’s where we benefit in theproposed Easy Convolution and Random Pooling.
In conventional convolution, we need to calculate the outputs ofevery convolution window to make the feature map, however, inthe pooling layer, only one pixel out of four (stride = 2) is chosento represent the output of the pooling window. In Average Pooling,we compute the average of the four pixels to make the output of pooling window and in Max Pooling we make the output bychoosing the max value pixel. Random Pooling just randomly selectone pixel out of four to represent the output of pooling windowreducing 75% extra convolutions.In Random Pooling, we obtain the index of the selected pixel bysetting Random Pooling Mode K. Random Pooling Mode K standsfor the position index of the four pixels in the pooling windows. Itvaries from 0 to 3, from up to down and left to right. The RandomPooling Mode K is randomly set before pooling so that we canfigure out how to conduct Easy Convolution in the upper layer. rchive, Submission ID, 2286011 Jianzhong Sheng, Chuanbo Chen, Chenchen Fu, and Chun Jason Xue
Max Pooling
Average Pooling pixel pixel pixel pixel pixelPixel Pixel pixel pixelpixel pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel Pixel2
Pixel2
Pixel2
Pixel2Pixel Pixel Pixel Pixel Ave
K=0K=1K=2
K=3Random
PoolingFeature Map
Max3
Max3
Max3
Max3
Figure 3: Random Pooling vs Average/ Max Pooling.
Figure 3 demonstrates how exactly Random Pooling works. Oncethe Random Pooling Mode K is set, we can determine which pixelto be selected in the dotted pooling window. Mode 0 means pixel0 is selected from the dotted pooling window every time. Afterselecting the first pixel 0 element, we slide the pooling windowtwo step right to obtain the second pixel 0 element. The poolingwindow slides from left to right, top to down with two strides everytime. Finally, the output feature map of Random Pooling is formedby those pixel 0 elements. In Mode 1, 2 and 3, the same operationsare done to pixel 1, 2 and 3 elements. Random Pooling actuallyalways select the pixel of the same position in the pooling windowsto make up the outputs of the pooling windows and thus form theoutput feature map of the pooling layer. Various Mode K meansvarious pixel position in the pooling window.In conventional Average Pooling/ Max Pooling, the output ofpooling window is always the Averaged/ Maxed value of the poolingwindow. In the middle of Figure 3 is the proposed Random Pooling,and beside it is conventional Average/ Max Pooling.
In order to match and control the Easy Convolution Mode withRandom Pooling Mode, we use the same K to control Easy Convolu-tion Mode. Due to the overlapping in convolution sliding window,we need two variables to locate the position of selected convolutionwindow. In programming level, we make the two variables in EasyConvolution to match the Random Pooling Mode K so that wecan use the same K in Easy Convolution layer to extract the sameposition elements matching Random Pooling. In subsection Random Pooling, we obtain the selected pixel’sindex by setting Random Pooling Mode K and then we conductEasy Convolution using the same Mode K. Figure 4 shows howwe conduct the Easy Convolution operations, it’s very similar toRandom Pooling.In Figure 4, every convolution window on input image containsseveral weights. In order to obtain feature map for the next layer,convolution operations are carried out on these input image withsliding convolution window. Window 0 is the selected window forproducing selected pixel 0 element for pooling window in poolinglayer. Every window 0 is a sliding convolution window over inputimage under Mode 0. The output of these convolution windowis pixels needed in pooling layer for Random Pooling. The firstwindow 0 produces the first pixel 0 element in the pooling windowin Figure 3, and the second window 0 produces the second pixel 0element in the pooling window. The convolution window slides overthe input image to produce the output feature of convolution layer.Various Mode K determine various window K to produce variouspixel K element needed in the pooling window. After sliding fromleft to right and up to down, the feature map of convolution layeris formed.In conventional convolution, sliding convolution window slidesover all input image area to produce feature map elements. In EasyConvolution, sliding convolution window slides only to windowK position to extract selected data, reducing 75% extra data with aquarter of original shape.After extracting data we need from input image, we can easilycompute convolution as usual, reducing 75% convolutions. The asyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing Archive, Submission ID, 2286011
Feature Map
Stride = 1
Win0Win Win2Win Win2Win Win2Win Win2Win Pixel ng paddi ngpaddi ngPixel padding paddingPixel paddi ngpaddingPixel3 paddi ng padding paddi ng Input Image S li d i n g W i n d o w Overlapping
Stride = 1
Stride = 1
Stride = 1Stride = 1 S t r i d e = S t r i d e = S t r i d e = S t r i d e = S t r i d e = Pixel paddi ngpadding
Pixel1paddi ng paddingpaddi ngPixel2paddi ng paddingpaddi ng Pixel padding paddi ng padding K=0
K=1K=2K=3
Figure 4: Easy Convolution. remaining problem is how we can get the pruned shape back. Insome situation, we make use of padding technique to keep theoutput shape unchanged. We add padding to the output of EasyConvolution to restore the shape of the feature map so that thenetwork can run as usual. For Easy Convolution, we pad the samevalue to its neighbor empty pixels as shown in Figure 4.
To demonstrate that the proposed technique ECP is effective and re-liable, we perform experiments on various hidden layers comparedwith Average Pooling and Max Pooling under different Mode K.We coded a one-convolution layer network and a two-convolutionlayer network to evaluate ECP’s performance under various depthof layers. All the codes are written in Python without any frame-work, and all the experiments are conducted on CPU, making ituniversal to all platform supporting Python.
We set batch size to 50 and learning rate to 0.001 in the experiments.For MNIST [20], the input dimension is 28 ×
28 =784, and the outputdimension is 10. The experiments are conducted on Intel(R) Corei7-7700HQ 2.80GHz CPU with Python 3.6.3 installed on Windows10 operation system.Table 1 and Table 2 show the parameters of the networks undervarious hidden layers.
Table 1: Network Parameters of One-convolution layerCNN.
Layer Name ParameterInput image size: 28 ×
28, channel:1Convolution kernel: 5 ×
5, channel: 20Pooling kernel: 2 ×
2, stride: 2ReLUFully connected channel: 100ReLUFully connected channel: 10Softmax
Table 2: Network Parameters of Two-convolution layerCNN.
Layer Name ParameterInput image size: 28 ×
28, channel:1Convolution kernel: 5 ×
5, channel: 20Pooling kernel: 2 ×
2, stride: 2ReLUConvolution kernel: 5 ×
5, channel: 32Pooling kernel: 2 ×
2, stride: 2ReLUFully connected channel: 100ReLUFully connected channel: 10Softmax rchive, Submission ID, 2286011 Jianzhong Sheng, Chuanbo Chen, Chenchen Fu, and Chun Jason Xue
In order to verify the time performance of the proposed techniqueECP, we evaluate both training time and testing time at the sametime. Furthermore, we design a special test on pure convolutionwith the proposed ECP and conventional convolution method tocompare their real operation time on convolution. This special testis conducted on MNIST database for 100 epochs, and we have testedit for several times.Table 3 shows the overall time performance of the proposed ECPcompared with the conventional Max/ Average Pooling. The firstcolumn shows the exact epoch when testing accuracy first reaches98%, and the others indicate time performance. After applying ECP,we achieve 1.45x speedup on training time and 1.64x on testing timecompared to Average Pooling. In terms of the pure convolutiontime, we achieve the speedup of 5.09x. This speedup is even largerthan the theoretical speedup value. This is because of the limitationof the memory space. Less convolution data can save the space ofmemory and thus avoid the content switch operations due to thelack of memory. In addition, based on the result over Iteration, wecan be sure that the ECP technique does not lead to more trainingdata.
Besides the time performance, accuracy is another critical parame-ter in both training and testing steps. Considering the randomnessof ECP, it may lead to drop in accuracy. In order to figure out this,we first conduct experiments by training the MNIST dataset for 200times to make sure we can get its best accuracy during experiment.Results indicate that 100 epochs are already enough. For most situa-tion, they achieve their best accuracy within 80 epochs. Sometimeswe get worse accuracy while training more due to overfitting. Soin the experiments, we decide to evaluate the accuracy within 100epochs, which is more valuable than training another more 100epochs to gain accuracy improvement less than 0.5%.The experiments are conducted on a two-convolution layer net-work demonstrated in Figure 1. Results in Table 4 indicate theproposed ECP achieves good improvement with little loss in accu-racy.The experiment does not only consider the accuracy performanceof ECP compared with Max Pooling and Average Pooling but alsotakes Mode K into consideration to evaluate the Robustness of theproposed ECP. The results are reliable, and Mode K will be discussedin detail in the following subsection.
Another interesting problem is to check the role of Mode K. In Ran-dom Pooling and Easy Convolution, we randomly set a parameterby Mode K, and it determines how to conduct Random Pooling andwhere to apply the Easy Convolution. In the following experiment,we test what happens if we vary the parameter Mode K.To find out the role of Mode K plays in the proposed ECP, wedesign tests on a two-convolution network using Random Poolingtechnique only and ECP technique respectively. Figure 5 is RandomPooling under different Mode K, and Figure 6 shows results forECP.
R a n d o m P o o l i n g M o d e K
Accuracy (%)
E p o c h (cid:1)(cid:7) (cid:6) (cid:2)(cid:1)(cid:7) (cid:6) (cid:3)(cid:1)(cid:7) (cid:6) (cid:4)(cid:1)(cid:7) (cid:6) (cid:5)
Figure 5: Random Pooling Convergence under Mode K.
E C P M o d e K
Accuracy (%)
E p o c h (cid:1)(cid:7) (cid:6) (cid:2)(cid:1)(cid:7) (cid:6) (cid:3)(cid:1)(cid:7) (cid:6) (cid:4)(cid:1)(cid:7) (cid:6) (cid:5)
Figure 6: ECP Convergence under Mode K.
From the figures we can conclude that the randomly set Mode Kis not crucial to the results, while it does affect the training processin some aspect. Randomly selected Mode K does not affect theoverall convergence of the network no matter it’s conducted aloneor together with Easy Convolution, and Mode K has little influenceon training time and testing time based on Table 4.However, in Figure 5 and Figure 6, the randomly selected Mode Kseems to have some effect on the convergence in the very beginningof the training process and it seems to have effect on the finalaccuracy. But recall that the weights in the kernel are initiatedrandomly by uniform distribution. It can be noted that Mode K haslittle influence on ECP’s time performance as well as accuracy. asyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing Archive, Submission ID, 2286011
Table 3: Overall Time Performance of ECP vs Max/ Average Pooling.
Method Iter (98%) Training Time (ms) Testing Time (ms) PureConvTime (ms)Max Pooling 3 299650.15 23814.87 4496.77ECP 4
Method Iter (98%) Training Time (ms) Testing Time (ms) PureConvTime (ms)Ave Pooling 5 352544.96 21426.24 4496.77ECP 4
Method Iter (98%) Training Time (ms) Testing Time (ms) Best Accuraacy (%)Max Pooling 3 299650.15 23814.87 99.17Random k=0 4
Random k=1 4
Random k=2 5
Random k=3 4
ECP k=0 4
ECP k=1 5
ECP k=2 5
ECP k=3 5
Method Iter (98%) Training Time (ms) Testing Time (ms) Best Accuraacy (%)Ave Pooling 5 352544.96 21426.24 98.69Random k=0 4
Random k=1 4
Random k=2 5
Random k=3 4
ECP k=0 4
ECP k=1 5
ECP k=2 5
ECP k=3 5
R a n d o m P o o l i n g v s M a x / A v e r a g e P o o l i n g
Accuracy (%)
E p o c h (cid:1)(cid:3) (cid:6) (cid:17) (cid:1)(cid:4) (cid:14) (cid:14) (cid:11)(cid:10)(cid:13) (cid:9)(cid:1)(cid:2) (cid:16) (cid:8) (cid:15) (cid:6) (cid:9) (cid:8) (cid:1)(cid:4) (cid:14) (cid:14) (cid:11)(cid:10)(cid:13) (cid:9)(cid:1)(cid:5) (cid:6) (cid:13) (cid:7) (cid:14) (cid:12) (cid:1)(cid:4) (cid:14) (cid:14) (cid:11)(cid:10)(cid:13) (cid:9)
Figure 7: Random Pooling Convergence vs Average/ MaxPooling under One-convolution Layer.
In this part, we evaluate the proposed ECP under various hiddenlayers: one-convolution network and two-convolution network.Table 5 is the result of ECP compared with Max Pooling and Table6 is for Average Pooling.From the tables, we notice that the performance compared toAverage Pooling is better than that of Max Pooling. We gain moreperformance speedup compared to Average Pooling with a littleaccuracy improvement rather than drop. What’s more, comparingto Max Pooling, the time performance of the proposed ECP is evenbetter when we make the network deeper, with little accuracy loss.
The proposed ECP is consist of two parts: Easy Convolution andRandom Pooling. In order to compare convergence of RandomPooling alone with Average Pooling/ Max Pooling, we conduct thesame conventional convolution in the upper convolution layer ofRandom Pooling.Figure 7 and Figure 8 show Random Pooling convergence vs Aver-age/ Max Pooling under one-convolution layer and two-convolutionLayer respectively. Figure 9 and Figure 10 show ECP convergence rchive, Submission ID, 2286011 Jianzhong Sheng, Chuanbo Chen, Chenchen Fu, and Chun Jason Xue
Table 5: ECP vs Max Pooling under Various Convolution Layers.
ConvLayer Method Iter (98%) Training Time (ms) Testing Time (ms) Best Accuracy (%)1 Max Pooling 8 215387.59 12500.73 98.37ECP 5
ConvLayer Method Iter (98%) Training Time (ms) Testing Time (ms) Best Accuracy (%)1 Ave Pooling 12 303580.05 18579.89 98.29ECP 5
R a n d o m P o o l i n g v s M a x / A v e r a g e P o o l i n g
Accuracy (%)
E p o c h (cid:1)(cid:3) (cid:6) (cid:17) (cid:1)(cid:4) (cid:14) (cid:14) (cid:11)(cid:10)(cid:13) (cid:9)(cid:1)(cid:2) (cid:16) (cid:8) (cid:15) (cid:6) (cid:9) (cid:8) (cid:1)(cid:4) (cid:14) (cid:14) (cid:11)(cid:10)(cid:13) (cid:9)(cid:1)(cid:5) (cid:6) (cid:13) (cid:7) (cid:14) (cid:12) (cid:1)(cid:4) (cid:14) (cid:14) (cid:11)(cid:10)(cid:13) (cid:9)
Figure 8: Random Pooling Convergence vs Average/ MaxPooling under Two-convolution Layer. vs Average/ Max Pooling under one-convolution layer and two-convolution Layer respectively.Based on the experiments above, both Random Pooling and ECPcan achieve good convergence compared to conventional Average/Max Pooling.
Based on the experiments above, in the following we summarizethe major characteristics of the proposed ECP technique: • Testing performance is always much better than training. • ECP has more advantage over Average Pooling than MaxPooling due to the speedup of training. • We can achieve more performance improvement when con-ducting ECP on a deeper network, with little loss in accuracy.
E C P v s M a x / A v e r a g e P o o l i n g
Accuracy (%)
E p o c h (cid:1)(cid:5) (cid:7) (cid:16) (cid:1)(cid:6) (cid:13) (cid:13) (cid:11)(cid:10)(cid:12) (cid:9)(cid:1)(cid:2) (cid:15) (cid:8) (cid:14) (cid:7) (cid:9) (cid:8) (cid:1)(cid:6) (cid:13) (cid:13) (cid:11)(cid:10)(cid:12) (cid:9)(cid:1)(cid:4) (cid:3) (cid:6)
Figure 9: ECP Convergence vs Average/ Max Pooling underOne-convolution Layer.
Deeper network architecture usually leads to better performance,as a result, it’s getting more and more difficult to train Convo-lutional Neural Networks. Considering the fact that the overallexecution time of Convolutional Neural Networks is dominatedby convolution operations, we propose a novel technique namedEasyConvPooling (ECP) to solve this problem. In ECP, we conductconvolution operations according to the index from following pool-ing layer, which reduces 75% of original convolution operations.The experiments demonstrate that we achieve 1.45x speedup ontraining time and 1.64x on testing time with little loss in accuracy.What’s more, we achieve a speedup of 5.09x on pure Easy Convolu-tion operations compared to conventional convolution operations.
REFERENCES [1] Carlo Baldassi, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, and RiccardoZecchina. 2015. Subdominant dense clusters allow for simple learning and highcomputational performance in neural networks with discrete synapses.
Physicalreview letters asyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing Archive, Submission ID, 2286011
E C P v s M a x / A v e r a g e P o o l i n g
Accuracy (%)
E p o c h (cid:1)(cid:5) (cid:7) (cid:16) (cid:1)(cid:6) (cid:13) (cid:13) (cid:11)(cid:10)(cid:12) (cid:9)(cid:1)(cid:2) (cid:15) (cid:8) (cid:14) (cid:7) (cid:9) (cid:8) (cid:1)(cid:6) (cid:13) (cid:13) (cid:11)(cid:10)(cid:12) (cid:9)(cid:1)(cid:4) (cid:3) (cid:6)
Figure 10: ECP Convergence vs Average/ Max Pooling underTwo-convolution Layer. [2] Zhiyong Cheng, Daniel Soudry, Zexi Mao, and Zhenzhong Lan. 2015. Trainingbinary multilayer neural networks for image classification using expectationbackpropagation. arXiv preprint arXiv:1503.03562 (2015).[3] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binarycon-nect: Training deep neural networks with binary weights during propagations.In
Advances in neural information processing systems . 3123–3131.[4] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and YoshuaBengio. 2016. Binarized neural networks: Training deep neural networks withweights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).[5] Misha Denil, Babak Shakibi, Laurent Dinh, Nando De Freitas, et al. 2013. Predict-ing parameters in deep learning. In
Advances in neural information processingsystems . 2148–2156.[6] Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus.2014. Exploiting linear structure within convolutional networks for efficientevaluation. In
Advances in neural information processing systems . 1269–1277.[7] Jiashi Feng and Trevor Darrell. 2015. Learning the structure of deep convolutionalnetworks. In
Proceedings of the IEEE international conference on computer vision .2749–2757.[8] Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie,Hong Luo, Song Yao, Yu Wang, et al. 2017. Ese: Efficient speech recognition enginewith sparse lstm on fpga. In
Proceedings of the 2017 ACM/SIGDA InternationalSymposium on Field-Programmable Gate Arrays . ACM, 75–84.[9] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz,and William J Dally. 2016. EIE: efficient inference engine on compressed deepneural network. In
Computer Architecture (ISCA), 2016 ACM/IEEE 43rd AnnualInternational Symposium on . IEEE, 243–254.[10] Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressingdeep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).[11] Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Shijian Tang, Erich Elsen, BryanCatanzaro, John Tran, and William J Dally. 2016. Dsd: Regularizing deep neuralnetworks with dense-sparse-dense training flow. arXiv preprint arXiv:1607.04381
3, 6 (2016).[12] Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weightsand connections for efficient neural network. In
Advances in neural informationprocessing systems . 1135–1143.[13] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.
Neuralcomputation
9, 8 (1997), 1735–1780.[14] Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neuralnetwork design using weights+ 1, 0, and- 1. In
Signal Processing Systems (SiPS),2014 IEEE Workshop on . IEEE, 1–6.[15] Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, and Anto-nio Criminisi. 2015. Training cnns with low-rank filters for efficient imageclassification. arXiv preprint arXiv:1511.06744 (2015).[16] Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convo-lutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014). [17] Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprintarXiv:1601.06071 (2016).[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica-tion with deep convolutional neural networks. In
Advances in neural informationprocessing systems . 1097–1105.[19] Vadim Lebedev and Victor Lempitsky. 2016. Fast convnets using group-wisebrain damage. In
Computer Vision and Pattern Recognition (CVPR), 2016 IEEEConference on . IEEE, 2554–2564.[20] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition.
Proc. IEEE
86, 11 (1998), 2278–2324.[21] Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXivpreprint arXiv:1312.4400 (2013).[22] Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky.2015. Sparse convolutional neural networks. In
Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition . 806–814.[23] Guido F Montufar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. 2014.On the number of linear regions of deep neural networks. In
Advances in neuralinformation processing systems . 2924–2932.[24] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networksfor large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).[25] Xu Sun, Xuancheng Ren, Shuming Ma, and Houfeng Wang. 2017. meProp: Spar-sified back propagation for accelerated deep learning with reduced overfitting. arXiv preprint arXiv:1706.06197 (2017).[26] Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, et al. 2015. Convolutionalneural networks with low-rank regularization. arXiv preprint arXiv:1511.06067 (2015).[27] Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learningstructured sparsity in deep neural networks. In
Advances in Neural InformationProcessing Systems . 2074–2082.[28] Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. 2009. Linear spatialpyramid matching using sparse coding for image classification. In
ComputerVision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on . IEEE, 1794–1801.[29] Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression withgrouped variables.