[PDF] DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression

Abstract

We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressing deep neural networks. It quantizes each weight parameter by minimizing a weighted rate-distortion function, which implicitly takes the impact of quantization on to the accuracy of the network into account. Subsequently, it compresses the quantized values into a bitstream representation with minimal redundancies. We show that DeepCABAC is able to reach very high compression ratios across a wide set of different network architectures and datasets. For instance, we are able to compress by x63.6 the VGG16 ImageNet model with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB.

Full PDF

DDeepCABAC: Context-adaptive binary arithmetic coding for deep neuralnetwork compression

Simon Wiedemann * 1

Heiner Kirchhoffer * 1

Stefan Matlage * 1

Paul Haase * 1

Arturo Marban Talmaj Marinc David Neumann Ahmed Osman Detlev Marpe Heiko Schwarz Thomas Wiegand Wojciech Samek Abstract

We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressingdeep neural networks. It quantizes each weight pa-rameter by minimizing a weighted rate-distortionfunction, which implicitly takes the impact ofquantization on to the accuracy of the networkinto account. Subsequently, it compresses thequantized values into a bitstream representationwith minimal redundancies. We show that Deep-CABAC is able to reach very high compressionratios across a wide set of different network ar-chitectures and datasets. For instance, we areable to compress by x63.6 the VGG16 ImageNetmodel with no loss of accuracy, thus being able torepresent the entire network with merely 8.7MB.

1. Introduction

Inspite of their state-of-the-art performance across a widespectrum of problems (LeCun et al., 2015), deep neural net-works have the well-known caveat that most often they havehigh memory complexity. This does not only imply highstorage capacities as a requirement, but also high energy re-sources and slower runtimes for execution (Horowitz, 2014;Sze et al., 2017; Wang et al., 2019). This greatly limits theiradoption in industrial applications or their deployment intoresource constrained devices. Moreover, this also difﬁcultstheir transmission into communication channels with lim-ited capacity, which becomes an obstacle for distributedtraining scenarios such as in federated learning (McMahanet al., 2016; Sattler et al., 2018; 2019).As a reaction, a plethora of work has been published on * Equal contribution Department of Video Coding& Analytics, Fraunhofer Heinrich-Hertz Institut, Berlin,Germany. Correspondence to: Wojciech Samek < [email protected] > . Proceedings of the th International Conference on MachineLearning , Long Beach, California, PMLR 97, 2019. Copyright2019 by the author(s). the topic of deep neural network compression (Cheng et al.,2017; Cheng et al., 2018). From all different proposedmethods, sparsiﬁcation followed by weight quantization andentropy coding arguably belong to the set of most popu-lar approaches, since very high compression ratios can beachieved under such paradigm (Han et al., 2015a; Louizoset al., 2017; Wiedemann et al., 2018a;b). Whereas much ofresearch has focused on the sparsiﬁcation part, a substan-tially less amount have focused on improving the later twosteps. In fact, most of the proposed (post-sparsity) com-pression algorithms come with at least one of the followingcaveats: 1) they decouple the quantization procedure fromthe subsequent lossless compression algorithm, 2) ignorecorrelations between the parameters and 3) apply a losslesscompression algorithm that produce a bitstream with moreredundancies than principally needed (e.g. scalar Huffmancoding). Moreover, some of the proposed compression al-gorithms do also not take the impact of quantization on tothe accuracy of the network into account.In this work we present DeepCABAC, a compression al-gorithm that overcomes all of the above limitations. Itis based on applying a context-adaptive binary arithmeticcoder (CABAC) on to the quantized parameters, which isthe state-of-the-art for lossless compression. It also couplesthe quantization procedure with CABAC by minimizing arate-distortion cost function where the rate explicitly mea-sures the bit-size of the network parameters as determinedby CABAC. Moreover, it implicitly takes the impact ofquantization on to the networks accuracy into account byweighting the distortion with a term that measures the “ro-bustness” of the networks parameter. In our experiments weshow that we can signiﬁcantly boost the compression perfor-mance of a wide set of pre-sparsiﬁed network architectures,consequently achieving new state-of-the-art results for theVGG16 model.

2. CABAC

Context-adaptive binary arithmetic coding (CABAC) isa form of lossless coding which was originally designedfor the video compression standard H.264/AVC (Marpe a r X i v : . [ c s . L G ] M a y eepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression et al., 2003), but it is also an integral part of its successorH.265/HEVC. CABAC does not only offer high ﬂexibilityof adaptation, but also a highly efﬁcient implementation,thus attaining higher compression performance as well asthroughputs compared to other entropy coding methods(Marpe & Wiegand, 2003). In short, it applies three power-ful coding techniques: 1) Firstly, it binarizes the data to beencoded. That is, it predeﬁnes a series of binary decisions(also called bins) under which each data element (or sym-bol) will be uniquely identiﬁed. 2) Then, it assigns a binaryprobability model to each bin (also named context model)which is updated on-the-ﬂy by the local statistics of the data.This enables CABAC with a high degree of adaptation todifferent data distributions. 3) Finally, it employs an arith-metic coder in order to optimally and efﬁciently code eachbin, based on the respective context model. To recall, arith-metic coding is a form of entropy coding which encodesentire strings of symbols into a single integer value. It iswell-known to outperform other coding techniques such asthe Huffman code (Huffman, 1952) with regards to both,compactness of the data representation and coding efﬁciency(Witten et al., 1987).Due to the above reasons, we chose CABAC as our losslesscompression method and adapted it for the task of neuralnetwork compression. Inspired by a prior analysis on the empirical weight distri-bution of different neural network architectures, we adoptedthe following bianrization procedure. Given a quantizedweight tensor in its matrix form , DeepCABAC scans theweight elements in row-major order and encodes each quan-tized weight element value by: 1) ﬁrstly determining if theweight element is a signiﬁcant element or not. That is, eachweight element is assigned with a bit which determines if theelement is 0 or not. This bit is then encoded using a binaryarithmetic coder, according to its respective context model.The context model is initially set to 0.5 (thus, 50% probabil-ity that a weight element is 0 or not), but will automaticallybe adapted to the local statistics of the weight parametersas DeepCABAC encodes more elements. 2) Then, if theelement is not 0, the sign bit is analogously encoded, ac-cording to its respective context model. 3) Subsequently, aseries of bits are analogously encoded, which determine ifthe element is greater than , , ..., n ∈ N . The number n becomes a hyperparameter for the encoder. 4) Finally, thereminder is encoded using a ﬁxed-length binary code.The decoding process is performed analogously. An ex- For fully-connected layers this is trivial. For convolutional lay-ers we converted them into their respective matrix form accordingto (Chetlur et al., 2014). From left to right, up to down.

Figure 1.

DeepCABAC binarization of neural networks. It encodeseach weight element by performing the following steps: 1) encodesa bit named sigﬂag which determines if the weight is a signiﬁcantelement or not (in other words, if its 0 or not). 2) If its not 0,then the sign bit, signﬂag , is encoded. 3) Subsequently, a seriesof bits are encoded, which indicate if the weight value is greaterequal than , , ..., n ∈ N (the so called AbsGr(n)Flag ). 4) Finally,the reminder is encoded. The grey bits (also named regular bins)represent bits that are encoded using an arithmetic coder accordingto a context model. The other bits, the so called bypass bins, areencoded in ﬁxed-point form. The decoder is analogous. ample scheme of the binarization procedure is depicted inﬁgure 1.

3. Weighted rate-distortion function

Before we apply CABAC, we have to ﬁrstly quantize theweight parameters of the network. We do this by minimizinga generalised form of a rate-distortion function. Namely, wequantize each weight parameter w i to the quantization point q k ∗ that minimizes the cost function w i → q k ∗ = min k η i ( w i − q k ) + λR ik (1)where R ik is the bit-size of the quantization point q k asdetermined by DeepCABAC, and λ is the lagrangian multi-plier that speciﬁes the desired trade-offs between the bit-sizeand distortion incurred by the quantization. Notice, howthe bit-size R ik now also depends on the index i of theweight to be encoded. This is due to the context-adaptivemodels which update their probabilities as new elements arebeing encoded, thus being different for each weight w i andconsequently the bit-size of each quantization point q k .Moreover, (1) introduces a weight-speciﬁc parameter η i which takes into account the relative impact that the dis-tortion of a particular weight inccurs on to the accuracyof the network. In this work we take a bayesian approachin order to estimate this parameter. Namely, we assume agaussian prior for each weight parameter and apply scal-able bayesian techniques in order to estimate their sufﬁcientstatistics (Kingma et al., 2015; Molchanov et al., 2017;Louizos et al., 2017). As a result, we attain a mean andstandard deviation value for each weight parameter of the eepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression Table 1.

Compression ratios achieved when combining Deep-CABAC with a sparsiﬁcation method. In parenthesis are the resultsfrom previous work, where (Han et al., 2015a) and (Louizoset al., 2017). Models Dataset Org.acc.(top1)[%] Org.size Spars. | w (cid:54) =0 || w | [%] Comp.ratio[%] Acc.(top1)[%]VGG16 ImageNet 69.43 553.43MB 9.85(7.5 ) (2.05 ) (68.83 )ResNet50 ImageNet 76.13 102.23MB 25.40(29.0 ) (5.95 ) (76.15 )Mobile-Net-v1 ImageNet 70.69 16.93MB 50.73 Small-VGG16 CIFAR10 91.35 59.9MB 7.57(5.5 ) (0.86 ) (90.8 )LeNet5 MNIST 99.22 1722KB 1.90(8.0 )(0.6 ) (2.55 )(0.13 ) (99.26 )(99.00 )LeNet-300-100 MNIST 98.29 1066KB 9.05(8.0 )(2.2 ) (2.49 )(0.88 ) (98.42 )(98.20 )FCAE CIFAR10 30.14PSNR 304.72KB 55.69 PSNR network, where the former can be interpreted as its (new)value and the later as a measure of its “robustness”. Thus,when quantizing each w i , we set η i = 1 /σ i in (1), with σ i being the respective standard deviation. This is also theoret-ically motivated, since (Achille et al., 2017) established aconnection between the variances and the diagonal elementsof the ﬁsher information matrix.In order to minimize (1), we also need to deﬁne a set ofquantization points q k . We chose them to be equidistant toeach other with a particular distance ∆ ∈ R , namely, q k = ∆ I k , ∆ = 2 | w max | | w max | σ min + S , S, I k ∈ Z (2)where σ min is the smallest standard deviation and w max theparameter with highest magnitude value. S is then a hyper-parameter, which controls the “coarseness” of the quantiza-tion points. By selecting ∆ in such a manner we ensure thatthe quantisation points lie within the range of the standarddeviation of each weight parameter, in particular for values S ≥ . Moreover, by constraining them to be equidistant toeach other we encourage ﬁxed-point representations, whichcan be exploited in order to perform inference with lowercomplexity (QNN; TFl).

4. Experiments

We applied DeepCABAC on the set of models described inthe evaluation framework (MPEG Requirements, 2019a) ofthe MPEG call on neural network representations (MPEG https://github.com/slychief/ismir2018_tutorial/tree/master/metadata Requirements, 2019b). This includes the VGG16, ResNet50and MobileNet-v1 models and a fully-convolutional autoen-coder pretrained on a task of end-to-end image compression(which we named

FCAE ). In addition, we also applied Deep-CABAC on the LeNet-300-100 and LeNet5 models and on asmaller version of VGG16 model (which we named Small-VGG16 ).We aplied the variational sparsiﬁcation method introduced in(Molchanov et al., 2017) on to the LeNet-300-100, LeNet5,Small-VGG16, FCAE and MobileNet-v1 models. How-ever, due to the high training complexity that this methodrequires, we adopted a slightly different approach for theVGG16 and ResNet50. Namely, we ﬁrstly sparsiﬁed themby applying the iterative algorithm (Han et al., 2015b), andsubsequently applied method (Molchanov et al., 2017) butonly for estimating the variances of the distributions (thus,ﬁxing the mean values during training). After sparsiﬁcation,we applied DeepCABAC on to the weight parameters ofeach layer separately, excluding biases and normalizationparameters. Since the compression result can be sensitiveto the parameter S in (2), we probed the compression per-formance for all S ∈ { , , ..., } and selected the bestperforming model.The resulting sparsities as well as the compression ratiosare displayed in table 1. Notice that for most networks weare not able to reproduce the sparsity ratios reported in theliterature. In addition, we did not perform any ﬁne-tuningafter compression, thus having a particularly challengingsetup for achieving good post-sparsity compression ratios.Nevertheless, in-spite of these two disadvantages, Deep-CABAC is able to signiﬁcantly compress further the models,boosting the compression by 74% ( ±

5. Conclusion

We show that one can boost signiﬁcantly the compressiongains if one applies state-of-the-art coding techniques onto pre-sparsiﬁed deep neural networks. In particular, ourproposed coding scheme, DeepCABAC, is able to increasethe compression rates of pre-sparsiﬁed networks by 74% onaverage, attaining as such compression ratios comparable(or sometimes higher) to the current state-of-the-art. Infuture work we will benchmark DeepCABAC also on non-sparsiﬁed networks, as well as apply it in the context ofdistributed learning scenarios where memory complexity iscritical (e.g. in federated learning). http://torch.ch/blog/2015/07/30/cifar.html eepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression References

QNNPACK open source library for optimized mobiledeep learning. https://github.com/pytorch/QNNPACK . Accessed: 28.02.2019.TensorFlow Lite. . Accessed: 28.02.2019.Achille, A., Rovere, M., and Soatto, S. Critical learning pe-riods in deep neural networks. arXiv:1711.08856 , 2017.Cheng, Y., Wang, D., Zhou, P., and Zhang, T. A surveyof model compression and acceleration for deep neuralnetworks. arXiv:1710.09282 , 2017.Cheng, Y., Wang, D., Zhou, P., and Zhang, T. Model com-pression and acceleration for deep neural networks: Theprinciples, progress, and challenges.

IEEE Signal Pro-cessing Magazine , 35(1):126–136, Jan 2018.Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran,J., Catanzaro, B., and Shelhamer, E. cudnn: Efﬁcientprimitives for deep learning. arXiv:1410.0759 , 2014.Han, S., Mao, H., and Dally, W. J. Deep compression:Compressing deep neural network with pruning, trainedquantization and huffman coding. arXiv:1510.00149 ,2015a.Han, S., Pool, J., Tran, J., and Dally, W. J. Learning bothweights and connections for efﬁcient neural networks.In

Advances in Neural Information Processing Systems(NIPS) , pp. 1135–1143, 2015b.Horowitz, M. 1.1 computing’s energy problem (and whatwe can do about it). In

IEEE International Solid-StateCircuits Conference Digest of Technical Papers (ISSCC) ,pp. 10–14, Feb 2014.Huffman, D. A. A method for the construction of minimum-redundancy codes.

Proceedings of the IRE , 40(9):1098–1101, Sep. 1952.Kingma, D. P., Salimans, T., and Welling, M. Variationaldropout and the local reparameterization trick. In

Ad-vances in Neural Information Processing Systems (NIPS) ,pp. 2575–2583, 2015.LeCun, Y., Bengio, Y., and Hinton, G. E. Deep learning.

Nature , 521:436–444, 2015.Louizos, C., Ullrich, K., and Welling, M. Bayesian Com-pression for Deep Learning. In

Advances in Neural In-formation Processing Systems (NIPS) , pp. 3288–3298,2017. Marpe, D. and Wiegand, T. A highly efﬁcient multiplication-free binary arithmetic coder and its application in videocoding. In

Proceedings 2003 International Conferenceon Image Processing (Cat. No.03CH37429) , volume 2,pp. 263–266, Sep. 2003.Marpe, D., Schwarz, H., and Wiegand, T. Context-basedadaptive binary arithmetic coding in the h.264/avc videocompression standard.

IEEE Transactions on Circuitsand Systems for Video Technology , 13(7):620–636, July2003.McMahan, H. B., Moore, E., Ramage, D., and y Arcas,B. A. Federated learning of deep networks using modelaveraging. arXiv:1602.05629 , 2016.Molchanov, D., Ashukha, A., and Vetrov, D. Variationaldropout sparsiﬁes deep neural networks. In

InternationalConference on Machine Learning (ICML) , pp. 2498–2507, 2017.MPEG Requirements. Updated evaluation framework forcompressed representation of neural networks. N18162.Technical report, Moving Picture Experts Group (MPEG),Marrakech, MA, Jan. 2019a.MPEG Requirements. Updated call for proposals on neuralnetwork compression. N18129. Cfp, Moving PictureExperts Group (MPEG), Marrakech, MA, Jan. 2019b.Sattler, F., Wiedemann, S., M¨uller, K.-R., and Samek,W. Sparse binary compression: Towards dis-tributed deep learning with minimal communication. arXiv:1805.08768 , 2018.Sattler, F., Wiedemann, S., M¨uller, K.-R., and Samek, W.Robust and communication-efﬁcient federated learningfrom non-iid data. arXiv:1903.02891 , 2019.Sze, V., Chen, Y.-H., Yang, T.-J., and Emer, J. S. Efﬁcientprocessing of deep neural networks: A tutorial and survey. arXiv:1703.09039 , 2017.Wang, E., Davis, J. J., Zhao, R., Ng, H.-C., Niu, X., Luk, W.,Cheung, P. Y. K., and Constantinides, G. A. Deep neu-ral network approximation for custom hardware: Wherewe’ve been, where we’re going. arXiv:1901.06955 , 2019.Wiedemann, S., Marb´an, A., M¨uller, K.-R., and Samek,W. Entropy-constrained training of deep neural networks. arXiv:1812.07520 , 2018a.Wiedemann, S., M¨uller, K.-R., and Samek, W. Compact andcomputationally efﬁcient representation of deep neuralnetworks. arXiv:1805.10692 , 2018b.Witten, I., H, I., , N., M, R., , C., and G, J. Arithmeticcoding for data compression.