Privacy-preserving Cloud-based DNN Inference
PPRIVACY-PRESERVING CLOUD-BASED DNN INFERENCE
Shangyu Xie, Bingyu Liu and Yuan Hong
Illinois Institute of Technology { sxie14, bliu40 } @hawk.iit.edu, [email protected] ABSTRACT
Deep learning as a service (DLaaS) has been intensively stud-ied to facilitate the wider deployment of the emerging deeplearning applications. However, DLaaS may compromise theprivacy of both clients and cloud servers. Although some pri-vacy preserving deep neural network (DNN) techniques havebeen proposed by composing cryptographic primitives, thechallenges on computational efficiency have not been fullyaddressed due to the complexity of DNN models and expen-sive cryptographic primitives. In this paper, we propose anovel privacy preserving cloud-based DNN inference frame-work (“PROUD”), which greatly improves the computationalefficiency. Finally, we conduct experiments on two datasets tovalidate the effectiveness and efficiency for the PROUD whilebenchmarking with the state-of-the-art techniques.
1. INTRODUCTION
Deep neural network (DNN) models have been frequentlydeployed in a wide variety of real world applications, suchas image classification [1], video recognition [2] and voiceassistant (e.g., Apple Siri and Google Assistant). Meanwhile,cloud computing technologies (e.g., Microsoft Azure Ma-chine Learning, Google Inference API, and Amazon AWSMachine Learning) have promoted the deep learning as aservice (DLaaS) to make DNNs widely accessible. Userscan outsource their own data for inferences based on the pre-trained DNN models provided by the cloud service provider.However, severe privacy concerns may arise in such appli-cations. First, if the data of the clients are explicitly disclosedto the cloud, sensitive personal information included in theoutsourced data would be leaked. Second, if the fine-tunedDNN models are shared for inferences [3], the parametersmight be reconstructed by untrusted parties [4]. To addresssuch privacy concerns, several recent works [5, 6, 7, 8] haveproposed cryptographic protocols to ensure privacy in infer-ences via garbled circuits [9] and/or homomorphic encryp-tion [10]), which rely on expensive cryptographic primitives.Then, such protocols may result in fairly high computationand communication overheads. Since the volume of the out-sourced data grows rapidly and the DNN models usually re-quire high computational resources in the cloud, such tech-niques may not be suitable for practical deployment due to limited scalability. Thus, we are seeking an efficient schemeto securely implement the DNN inferences in the cloud.Specifically, we propose a privacy-preserving cloud-basedDNN inference framework (“PROUD”) by co-designing thecryptographic primitives, deep learning, and cloud comput-ing technologies. We mainly take advantage of a novel ma-trix permutation with ciphertext packing and parallelizationto improve the computational efficiency of linear layers. Withthe privacy guarantee provided via homomorphic encryption,PROUD supports all types of non-linear activation functionsby leveraging an interactive paradigm. Above all, PROUD in-tegrates the cloud container technology to further improve theperformance via parallel execution, which can also be readilyadapted for various DNNs via configuring container images.
2. THE PROUD SYSTEM
Figure 1 illustrates the framework of the proposed systemfor the users (clients) and the cloud service provider (cloudserver). The client locally holds the private data, which willbe encrypted with the client’s public key and sent to the cloudserver. Then, the cloud server initializes container instances(pre-compiled with secure protocols, i.e., MatF and NlnF) toexecute the DNN inference with the encrypted input. Finally,the client will decrypt and receive the classification result.
Automated Backend Execution . The backend system canautomatically deploy the cryptographic protocol for the se-cure data inference in the cloud. Specifically, once the serverreceives encrypted data from the client, it will compose theconfiguration file to initialize a bunch of container instancesvia a pre-compiled image (with the source codes), where thesecure protocols (i.e., MatF and NlnF) will start to be exe-cuted for DNN inference until the final result is returned. Theautomation of the backend ensures that the secure protocolscan be delivered efficiently, and enables the full system to becapable of processing a large number of clients (if necessary).
3. THE PROUD PROTOCOL DESIGN3.1. Problem Formulation
The PROUD will securely compute the DNN model with en-crypted inputs in the cloud. We first denote an (cid:96) -layer DNN a r X i v : . [ c s . CR ] F e b Video StreamsResults
PolyF + MatF … V i d e o C li p s C on t a i n e r M odu l e C l a ss i f i e r Cloud Server
Client … Build-in Network
ContainerInstance
ContainerInstance
ImageCreate ContainerConfig.jsonMatFPolyF … Secure Protocol Result
Backend
Input Data
MatF+NlnFResults
Client
Fig. 1 . The PROUD Framework.model as M = { L i , i ∈ [1 , (cid:96) ] } , and the input video as V .The inference model M can be viewed as a complex function f ( · ) integrating linear functions (corresponding to linear lay-ers, e.g., convolution layers and fully-connected layers) andnon-linear functions (activation functions, e.g., Sigmoid andReLu). Denoting the inference result as S , we have: S = f ( V ) = L (cid:96) ( L (cid:96) − ( · · · L ( L ( V )) · · · )) (1) Threat Model . We consider semi-honest model where bothparties are honest to execute the protocol but are curious tolearn private information. PROUD can preserve privacy forboth parties against possible leakage: (1) client’s private inputvideos are not leaked to the cloud service provider; (2) cloudservice provider’s DNN model (e.g., linear/non-linear weightparameters, and bias values) is not revealed to the client in thecomputation. We also assume that all the communications areexecuted in a secure and authenticated channel.
Algorithm 1 illustrates the protocol for PROUD. In the initial-ization phase, the client generates a key pair and encrypts theprivate data V (Line 1) while the server prepares the compu-tation for the DNN functions (Equation 1) with two subproto-cols: (1) MatF for the linear functions; (2) NlnF for the non-linear activation functions (Line 2). With such two subproto-cols, PROUD will be jointly executed by both the client andserver. Specifically, the server can perform computation ofthe linear layers directly on the encrypted data received fromthe client using the subprotocol MatF (Line 5). For the non-linear layers, the output data will be sent back to the clientfor computation by the subprotocol NlnF (Line 6), and thenthe client will re-encode and encrypt the data to be sent tothe server for next layer’s computation. Once completing thecomputations of all the layers in the DNN model, the clientwill receive the ciphertext and decrypt it to get the classifica-tion result. The details of two subprotocols will be illustratedin Section 3.3 and 3.4, respectively. To ensure privacy for the linear layers, a naive method is toapply homomorphic encryption (HE) to the arithmetic op-
Algorithm 1:
PROUD Protocol
Input:
Input Data V , M Output:
Classification Result S Client: Encode and encrypt V to get τ Server: (MatF, NlnF) ← M for i ∈ [1 , (cid:96) ] do switch L i do Case
Linear : τ i ← MatF ( τ i − ) Case
Non-Linear : τ i ← NlnF ( τ i − ) Client: Decrypts τ (cid:96) to get S erations of encrypted matrices (e.g., fully-connected layer),which might be inefficient since the input data tensors are usu-ally high-dimensional. To mitigate such issue, our PROUDsystem utilizes a novel matrix permutation method [3] to ef-ficiently perform matrix computations with ciphertext pack-ing and parallelization [11], where the matrix multiplicationequals the sum of the component-wise products for some spe-cific permutations of the matrices themselves.Given the input matrix V , the linear layer (matrix) W andbias parameter B , PROUD will securely compute the func-tion of a linear layer as: W ∗ V + B (w.l.o.g., we considerthe fully-connected layer with bias while W and V are twosquare matrices with size n × n ). We illustrate an exampleof the square matrix as A (of size n × n ). To compute themultiplication, the server will first find n permutations of thematrix A via the following symmetric permutations: σ ( A ) i,j = A i,i + j , τ ( A ) i,j = A i + j,j (2) φ ( A ) i,j = A i,j +1 , ψ ( A ) i,j = A i +1 ,j (3)Note that φ, ψ are the column and row shifting operations.Then, we can compute the product for W and V as below: W ∗ V = n − (cid:88) k =0 W k (cid:12) V k (4) where W k = φ k ( σ ( W )) , V k = ψ k ( τ ( B )) , (cid:12) indicatesthe component-wise product and k is the number of perturba-tions, e.g., ψ k will perform k times ψ ( · ) permutation on thematrix. We denote the function permut ( · ) to compute the n permutation matrices of one matrix. Ciphertext Packing and Parallelization . To improve the ef-ficiency, we also leverage the vectorable homomorphic en-cryption (aka. “Ciphertext Packing”) [3, 5], which transformsa matrix of size d × d to a single vector (plaintext) via an en-coding map function, denoted as Encode . In particular, the
Decode function transforms the vector plaintext back to thematrix form. For simplicity of notations, we denote the en-cryption, evaluation, and decryption functions under an HEscheme as
Enc () , Eval () and Dec () , respectively.Then, the component-wise product (Equation 4) of the ci-phertexts V k and W k , denote as Enc ( pk, O k ) , can be securelycomputed with the multiplicative property of the HE: val ( pk, Encode ( W ( l,m ) k ) , Enc ( Encode ( V ( l,m ) k )) , ∗ ) (5)where l, m ∈ [1 , n ] are the entry indices of the matrices W and V , and pk is the public key. Then, the sum of all the n component-wise products of the matrices W k and V k can becomputed using the additive property of HE. Finally, the biasparameter B can be computed using the additive property ofHE. The protocol is detailed in Algorithm 2.Given a large number of plaintexts to be encrypted by ci-phertext packing, we further expedite the matrix computationwith the parallelization [3]. To this end, we modify the encod-ing map function to “1-to-1 map” such that an n -dimensionalvector can be transformed into a g -tuple of square matrices oforder d , where g = n/d . This parallelization technique canalso be realized with the parallel computation in the cloudframework (using a bunch of containers), which results in areduced computational complexity O ( d/g ) per matrix. Algorithm 2:
MatF
Input:
Input V , Weighted Matrix W , Bias B Output: O = Enc ( pk, W ∗ V + B ) { V k } n − k =0 ← Enc ( pk, Encode ( permut ( V ))) { W k } n − k =0 ← Encode ( permut ( W )) for k ∈ [0 , n − do O k ← Eval ( pk, W ( l,m ) k , V ( l,m ) k ) , ∗ ) Enc ( pk, O ) ← Eval ( pk, { O k , k ∈ [0 , n − } , +) return Eval ( pk, Enc ( pk, O ) , B, +) The NlnF protocol securely computes the non-linear layersof DNNs. Most of the existing works depend on either gar-bled circuits [5] or replacing square function [3], which mayarouse high computational overheads or reduce the accuracy.In our protocol, the computation of the non-linear function(e.g., ReLu) is executed at the client side with the input of de-crypted data to preserve privacy. Algorithm 3 shows that theclient will first decrypt the received output of MatF from theserver with its private key. Then, the client will compute theoutput of the non-linear function φ and return the output tothe server for the computation of next network layer. Duringthe execution of this protocol, the client does not leak any pri-vate information to the server and the server does not exposesensitive weight parameters to the client. Algorithm 3:
NlnF
Input:
Input V (from MatF), Activation Function φ ( · ) Output: O = Server: sends V to the client Client: r ← Decode ( Dec ( sk, V )) return O ← φ ( r ) Security and Practicality . For the linear computations(MatF), the server will not know the plaintext since all thecomputations are performed on the ciphertexts (“no leakage”can be theoretically proven). For the non-linear computations,the client receives some encrypted intermediate results fromthe server, and decrypts them to get some trivial intermediatedata (which does not result in privacy leakage). Such trivialnon-private data release is traded for a light-weight crypto-graphic protocol, which is far more efficient than other cryp-tographic protocols built on secure polynomial approximationand/or garbled circuits. Since the protocol is composed in-dependently, many neural network based applications (e.g.,image classification [1] and natural language processing [12])or video learning models (e.g., C3D [2] and I3D [13]) can bereadily integrated into our system. The pre-trained DNNs canbe adapted with appropriate extensions, and integrated intothe PROUD protocol (for feature extraction and/or inferenceson the encrypted data). Moreover, the PROUD system can beeasily integrated into the practical cloud platform (e.g., AWS)since the PROUD is a cloud-based prototype of system.
4. EXPERIMENTSExperimental Setup . Our system is implemented on the NSFCloudLab platform in which one machine works as the clientand the other one works as the server. Both machines haveeight 64-bit ARMv8 cores with 2.4GHZ, 64GB memory in-stalled with Ubuntu 16.04. We implement the homomorphicencryption in HEANN [11] (which realizes the optimal com-putation over real numbers) for secure matrix operations. Weleverage Docker to develop the prototype for PROUD: the im-age of the container (all the source codes) is pre-compiledwith the specific functions (i.e., MatF and NlnF) in Python.We evaluate our framework on the two datasets: (1)MNIST dataset [14] includes 70K handwritten images of size × under the gray level 0-255; (2) IDC dataset for inva-sive ductal carcinoma (IDC) classification (IDC-negative orpositive), which contains about 28K patches of × pix-els. We employ the LeNet5 [14] as the test network model.In addition, we compare the performance of PROUD withfour representative schemes (CryptoNets [6], GAZELLE [5],BAYHENN [7] and DELPHI [15]) on the MNIST and IDCdataset for image classification. Results . All the results on the two datasets are shown in Ta-ble 1 and 2, respectively. From the Table 1, we can observethat our PROUD results in the least average latency (e.g., 13times faster than GAZELLE) and communication overheadsfor digit classification, compared with other three existingschemes. PROUD significantly outperforms other schemesconsidering we adopt a highly light-weight matrix compu- Table 1 . Benchmarking on MNIST dataset
Framework Accuracy (%) Latency (s) Comm. (MB)CryptoNets 81.25 1942.6 1621.3GAZELLE 83.74 24.64 263.4BAYHENN 83.26 9.36 67.32DELPHI 82.72 2.47 2.95PROUD 84.01 1.12 3.27
Table 2 . Benchmarking on IDC datasettation scheme compared with the existing schemes (includ-ing garbled circuits and heavily encrypting matrices). As forthe classification accuracy, PROUD works almost identical asGAZELLE (in which the optimal approximation of non-linearfunction achieves the negligible loss using the original acti-vation function). It is worth noting that CryptoNets performsthe worst, since it replaces all the activation functions with thesquare functions, and all the pooling functions with sum pool-ing, which also greatly increase the computational overheadand arouse the high communication bandwidth (the larger ci-phertext size). BAYHENN uses a different Bayesian infer-ence model with some randomness for DNN, which decreasesthe classification accuracy to some extent. Also, consideringthat the DELPHI’s computation overheads are mainly in theoffline preparation (heavy cryptographic computations), theonline computation overhead is reduced. Table 2 shows simi-lar results for IDC classification. All of these results illustratethat our proposed framework can significantly improve thecomputational efficiency of secure DNN inference comparedwith other state-of-the-art techniques.We also illustrate the evaluation results of latency andcommunication bandwidth result for each step of PROUDprocessing one image instance in Table 3. Specifically, theclient takes about 23.4 ms, including the runtime for encod-ing and encrypting the image. Then, the server initializesthe DNN model by taking 107.2 ms (note that this step canbe processed simultaneously as the first step at the client’s).Moreover, we also observe that DNN computation in theserver dominates the latency. Regarding the communica-tion overheads, it mainly occurs when the client sends theencrypted images to the server (0.58MB). Moreover, therearouses communication consumption during the DNN infer-ence since NlnF protocol follows an interactive paradigm.
5. RELATED WORK
The generic secure computation techniques (e.g., secure two-party computation [9, 16], fully homomorphic encryption
Phase Latency (ms) Comm. (MB)Client Encode + Encry. 23.4 0.58Server Set Model 107.2 -Server DNN Computation 410.8 0.34Client Decry.+ Decode 2.7 0.03Total 544.1 0.95
Table 3 . Performance of PROUD on MNIST dataset[17] and secret sharing [18]) can be directly used to tackle theprivacy concerns in DNN inferences. However, such cryp-tographic primitives would request high computation andcommunication overheads. For instance, the size of garbledcircuits in the MPC protocols will exponentially grow asthe number of parties increases. They also require multiplerounds of communications among the parties. Recently, al-though there are multiple works that improve the efficiencyof FHE [19, 20, 21], the high computational costs still makethem impractical for performing inferences.Therefore, it seems to be necessary to design specific pro-tocols for secure learning. There have been several workson designing specific secure protocols for DNN models[6, 22, 5, 7]. SecureML [22] is one of the first systemswhich focuses on machine learning on encrypted data withNN model. However, it mainly depends on the generic two-party protocols with very poor performance. Jiang et al. [3]proposed an efficient secure matrix computation protocol toimprove the performance for the computation with neural net-works. However, it only supports limited activation functions(e.g., only the square function in the case study). GAZELLE[5] composes the protocol on the heavy garbled circuits whileBAYHENN [7] leverages inefficient ciphertext packing ofmatrix for linear computations. Although DELPHI improvesthe bandwidth of online protocol, it still depends on the off-line computation and neural architecture search (NAS).
6. CONCLUSION
We have proposed a novel privacy preserving DNN inferenceframework with cloud container technology which ensuresboth privacy protection and high efficiency under complexneural network settings. It employs efficient homomorphicmatrix operation to securely execute inference interactively.Furthermore, we have designed and implemented the proto-type for PROUD. Finally, we conducted experiments to eval-uate the performance using two common datasets. PROUDhas been shown to outperform the existing schemes, and canbe readily integrated into the practical cloud infrastructure.
Acknowledgments
This work is partially supported by the NSF under Grant No.CNS-1745894. The authors would like to thank the anony-mous reviewers for their constructive comments. . REFERENCES [1] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng,Yurong Liu, and Fuad E Alsaadi, “A survey of deep neu-ral network architectures and their applications,”
Neu-rocomputing , vol. 234, pp. 11–26, 2017.[2] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Tor-resani, and Manohar Paluri, “Learning spatiotemporalfeatures with 3d convolutional networks,” in
Proceed-ings of the IEEE international conference on computervision , 2015, pp. 4489–4497.[3] Xiaoqian Jiang, Miran Kim, Kristin Lauter, and Yong-soo Song, “Secure outsourced matrix computation andapplication to neural networks,” in
Proceedings of the2018 ACM SIGSAC Conference on Computer and Com-munications Security . ACM, 2018, pp. 1209–1222.[4] Florian Tram`er, Fan Zhang, Ari Juels, Michael K Re-iter, and Thomas Ristenpart, “Stealing machine learningmodels via prediction apis,” in , 2016, pp. 601–618.[5] Chiraag Juvekar, Vinod Vaikuntanathan, and AnanthaChandrakasan, “Gazelle: A low latency framework forsecure neural network inference,” in , 2018, pp. 1651–1669.[6] Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine,Kristin Lauter, Michael Naehrig, and John Wernsing,“Cryptonets: Applying neural networks to encrypteddata with high throughput and accuracy,” in
Interna-tional Conference on Machine Learning , 2016, pp. 201–210.[7] Peichen Xie, Bingzhe Wu, and Guangyu Sun, “Bay-henn: Combining bayesian deep learning and homo-morphic encryption for secure dnn inference,” arXivpreprint arXiv:1906.00639 , 2019.[8] Qiao Zhang, Cong Wang, Hongyi Wu, Chunsheng Xin,and Tran V Phuong, “Gelu-net: A globally encrypted,locally unencrypted deep neural network for privacy-preserved learning.,” .[9] A. C. Yao, “How to generate and exchange secrets,”in , Oct 1986, pp. 162–167.[10] Pascal Paillier, “Public-key cryptosystems based oncomposite degree residuosity classes,” in
InternationalConference on the Theory and Applications of Crypto-graphic Techniques . Springer, 1999, pp. 223–238.[11] Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong-soo Song, “Homomorphic encryption for arithmetic of approximate numbers,” in
International Conference onthe Theory and Application of Cryptology and Informa-tion Security . Springer, 2017, pp. 409–437.[12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova, “Bert: Pre-training of deep bidirec-tional transformers for language understanding,” arXivpreprint arXiv:1810.04805 , 2018.[13] Joao Carreira and Andrew Zisserman, “Quo vadis, ac-tion recognition? a new model and the kinetics dataset,”in proceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition , 2017, pp. 6299–6308.[14] Yann LeCun, L´eon Bottou, Yoshua Bengio, PatrickHaffner, et al., “Gradient-based learning applied to doc-ument recognition,”
Proceedings of the IEEE , vol. 86,no. 11, pp. 2278–2324, 1998.[15] Pratyush Mishra, Ryan Lehmkuhl, Akshayaram Srini-vasan, Wenting Zheng, and Raluca Ada Popa, “Delphi:A cryptographic inference service for neural networks,”in , 2020.[16] O Goldreich, S Micali, and A Wigderson, “How to playany mental game,” in
Proceedings of the nineteenth an-nual ACM symposium on Theory of computing , 1987,pp. 218–229.[17] Craig Gentry, “Fully homomorphic encryption usingideal lattices,” in
Proceedings of the Forty-first AnnualACM Symposium on Theory of Computing , New York,NY, USA, 2009, STOC ’09, pp. 169–178, ACM.[18] Elette Boyle, Geoffroy Couteau, Niv Gilboa, YuvalIshai, and Michele Orr`u, “Homomorphic secret shar-ing: optimizations and applications,” in
Proceedings ofthe 2017 ACM SIGSAC Conference on Computer andCommunications Security , 2017, pp. 2105–2122.[19] Junfeng Fan and Frederik Vercauteren, “Somewhatpractical fully homomorphic encryption,” CryptologyePrint Archive, Report 2012/144, 2012.[20] Shai Halevi and Victor Shoup, “Faster homomorphiclinear transformations in helib,” in
Annual InternationalCryptology Conference . Springer, 2018, pp. 93–120.[21] Shai Halevi, Yuriy Polyakov, and Victor Shoup, “An im-proved rns variant of the bfv homomorphic encryptionscheme,” in
Cryptographers’ Track at the RSA Confer-ence . Springer, 2019, pp. 83–105.[22] Payman Mohassel and Yupeng Zhang, “Secureml: Asystem for scalable privacy-preserving machine learn-ing,” in2017 IEEE Symposium on Security and Privacy(SP)