An Extension of Fano's Inequality for Characterizing Model Susceptibility to Membership Inference Attacks
Sumit Kumar Jha, Susmit Jha, Rickard Ewetz, Sunny Raj, Alvaro Velasquez, Laura L. Pullum, Ananthram Swami
AAn Extension of Fano’s Inequality for Characterizing Model Susceptibility toMembership Inference Attacks
Sumit Kumar Jha , Susmit Jha , Rickard Ewetz , Sunny Raj , Alvaro Velasquez Laura L. Pullum , Ananthram Swami University of Texas at San Antonio, TX, 78249 SRI International, Menlo Park, CA, 94025 University of Central Florida, Orlando, FL 32816 Oakland University, Rochester, MI, 48309 Air Force Research Laboratory, Rome, NY, 13441 Oak Ridge National Laboratory, Oak Ridge, TN, 37831 Army Research Laboratory, Adelphi, MD 20783
Abstract
Deep neural networks have been shown to be vulnerableto membership inference attacks wherein the attacker aimsto detect whether specific input data were used to train themodel. These attacks can potentially leak private or propri-etary data. We present a new extension of Fano’s inequalityand employ it to theoretically establish that the probability ofsuccess for a membership inference attack on a deep neuralnetwork can be bounded using the mutual information be-tween its inputs and its activations and/or outputs. This en-ables the use of mutual information to measure the suscepti-bility of a DNN model to membership inference attacks. Inour empirical evaluation, we show that the correlation be-tween the mutual information and the susceptibility of theDNN model to membership inference attacks is 0.966, 0.996,and 0.955 for CIFAR-10, SVHN and GTSRB models, respec-tively.
Deep neural network (DNN) models have achieved remark-able accuracy levels on tasks such as image classification,activity recognition, speech translation, autonomous driving,and medical diagnosis. This has fueled the emergence of amarket for DNN models that could be trained on proprietaryor private data, and then made available to the users eitherdirectly or as a service over cloud platforms. Recently, it hasbeen shown that black-box access to a DNN model can beused to detect whether a specific data item is a member of thetraining data set. Such membership inference attacks (MIA)pose a significant security and privacy risk.The “Dalenius desideratum” (Dwork 2011) was first pro-posed in the literature on statistical disclosure control andattempts to characterize this notion of expected privacy fortraining data. It states that the model should reveal no moreabout the input to which it is applied than would have beenknown about this input without applying the model. Anotherclosely related notion of privacy considers the leak in thevalues of sensitive protected attributes of an input by usingthe model’s output (Fredrikson et al. 2014). But such ab-solute notions of privacy for all training inputs cannot be achieved by any useful model (Dwork and Naor 2010). Amembership inference attack using the neural network’s toplayer output was shown in (Shokri and Shmatikov 2015),and a recent improvement, by incorporating activation andgradient output of layers, was proposed in (Nasr, Shokri,and Houmansadr 2019). Techniques such as those employ-ing differential privacy during model training have also beenshown to be not immune to privacy attacks without dete-rioration of the model’s accuracy (Rahman et al. 2018). Auseful model must preserve some information of the train-ing data to make accurate predictions. The literature on gen-eralization in deep learning (Zhang et al. 2016; Neyshaburet al. 2017) studies a closely related problem of understand-ing whether the model has memorized training data or dis-tilled a generalized model from it. Some theories of general-ization in deep learning connect it to the mutual informationbetween the input and output of the model (Shwartz-Ziv andTishby 2017; Xu and Raginsky 2017). We make the follow-ing contributions in this paper: • Fano’s inequality establishes an information theoreticrelationship between the average information lost in anoisy channel and the probability of the categorization er-ror (Fano 1961). We extend Fano’s inequality to establishthat the probability of success for a membership inferenceattack on a deep neural network can be bounded by an ex-pression that depends on the mutual information betweenits inputs and its activations and/or outputs. • Inspired by our theoretical results, we use the mutual in-formation between the input and the outputs/activationsof a DNN model as a metric for computing its suscepti-bility to membership inference attacks (MIA). Our eval-uation over a set of deep learning benchmarks and mem-bership attack (Shokri and Shmatikov 2015; Nasr, Shokri,and Houmansadr 2019) methods demonstrates that mu-tual information strongly correlates with the success prob-ability of membership inference attacks. Our experimen-tal results show that the correlation between the mutualinformation and MIA susceptibility is 0.966, 0.996, and0.955 for CIFAR-10, SVHN and GTSRB data sets. a r X i v : . [ c s . L G ] S e p igure 1: The training of a DNN model N uses the dataset D with n d -dimensional inputs D i and corresponding labels Y ci .An MIA attack method relies on feeding m inputs D (cid:48) i from D ⊇ D to the trained DNN model N to obtain its probabilisticpredictions/activations. This is, in turn, fed to the MIA model which makes a prediction A i of whether the data input D (cid:48) i ispresent in D . The ground truth of whether D (cid:48) i is present in D is denoted by X i . The output/activations are denoted by Y . Theempirical MIA success probability is computed from the predictions of the MIA model on whether the attacker-provided inputdata D (cid:48) i belong to the training set D . Our derived bound on the attack success probability is computed by estimating the mutualinformation between the input and the activation and gradient output of the top layers the DNN model N . Very high correlationdemonstrates practical utility of our theoretical bound. MIA Attacker Model:
We consider an adversary mount-ing a membership inference attack against a DNN model N where the adversary can have black-box (Shokri et al. 2017)or white-box (Nasr, Shokri, and Houmansadr 2019) accessto the target DNN model. It can issue arbitrary queries D (cid:48) i and retrieve the model’s prediction Y i . D ⊇ D is the pop-ulation from which the training dataset D is drawn. Theadversary can obtain the model output Y , which could bethe softmax output in a black-box setting and include theactivation and gradient output of top layers in a white-boxenvironment. The adversary can access a set of input datathat are drawn independently from that population. The at-tacker’s inputs D (cid:48) might contain only elements of interest tothe attacker for which it wants to infer whether these wereused in training the model N . The adversary has no other in-formation about whether these input data are present in thetraining set. Susceptibility of a model to MIA Attack:
Given a specificinput D i from a data set D and a neural network N learnedfrom the training data D ⊆ D , an MIA attack M deter-mines whether D i ∈ D , i.e. the input D i is present in thetraining data set D . Let X = ( X , . . . , X m ) be a randomvariable that indicates the ground truth whether the attackinputs are present in the training data set D . Here, X i = 1 if the training data set D contains the corresponding data D i ; otherwise X i = 0 . A = ( A , . . . , A m ) denotes a ran-dom variable describing whether an MIA algorithm labelsthe data D i as being present in the training set for model N . A i = 1 if the MIA algorithm predicts that the input D i has been used for training; otherwise, A i = 0 . In this paper, weseek to answer the following questions: Can we establish atheoretical lower bound on the robustness of a DNN modelagainst MIA attacks by analyzing the mutual information ofits inputs and outputs? Key Observation:
As shown in Section 5, the observedcorrelations between the MIA success probability and themutual information metric are . , . , and . forCIFAR-10, GTSRB, and SVHN data sets. The fact that thesecorrelations are close to unity suggests that we can computethe mutual information between input and output/activationsof a model to estimate its susceptibility to MIA attacks. Extension of Fano’s Inequality:
Given a DNN model N ,the success probability p α of a membership inference attackalgorithm that considers all inputs from a data set D makingmore than α prediction errors is p α ≥ H ( D ) − I ( D ; Y ) − − log (cid:0) ( |D| ) + ... ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ... ( |D| α ) (cid:1) Here, H ( D ) is the entropy of the training data set D , |D| denotes the size of the total data set available to theMIA algorithm, and I ( D ; Y ) denotes the mutual informationbetween the training data D and the neural network out-puts/activations Y . Since typically DNNs are deterministicfunctions, we add small noise to the DNN weights to com-pute this mutual information (Achille, Paolini, and Soatto2019). Only the mutual information term I ( D ; Y ) dependson the DNN model, and hence, we can compute I ( D ; Y ) to determine the robustness of the model - the higher the I ( D ; Y ) , the lower the robustness to MIA attacks. Theoretical Bounds on MIA Success usingExtension of Fano’s Inequality
The supervised training of a DNN model N uses the dataset D with inputs D i and corresponding labels Y ci . The prob-abilistic output of the DNN model on the data set D is de-noted by Y . A MIA method relies on feeding inputs D (cid:48) i froma data set D ⊇ D to the trained DNN model to obtain itsprobabilistic prediction (softmax layer output) Y i . This is, inturn, fed to the MIA model which makes a prediction A i ofwhether the data input D (cid:48) i is present in D . The ground truthof whether D (cid:48) i is present in D is denoted by X i . The error ξ of the attack model is given by ξ = (cid:80) i ( X i (cid:54) = A i ) ,where is the indicator function. For a threshold α , wecan define an indicator random variable E α that has thevalue when ξ > α and otherwise. We use the notation p α = P r ( E α = 1) to denote the probability of the event E α . Fano’s Inequality
We briefly recall a classical result from information theory,Fano’s inequality, that establishes a relationship between theaverage information lost in a noisy channel and the proba-bility of the categorization error (Fano 1961).Let X F represent the input to a noisy channel being ana-lyzed by Fano’s inequality, and Y F represent the correspond-ing output on this channel. Further, let P ( x F , y F ) denote thejoint probability of the input and the output for this noisychannel.Suppose the random variable e F represents the occur-rence of an error in the noisy channel, i.e., the approximaterecovered signal ˜ X F = f ( Y F ) is not the same as the inputsignal. Formally, e F corresponds to the event X F (cid:54) = ˜ X F .We denote the support of the random variable X F by thenotation X F .Fano’s inequality establishes a fundamental information-theoretic relationship between the conditional information H ( X F | Y F ) and the probability of error P ( e F ) in a noisychannel: H ( X F | Y F ) ≤ H ( e F ) + P ( e F ) log( |X F | − (1)Our mathematical results in this paper are an extension ofFano’s inequality that relate the probability of error of amembership inference attack with the mutual informationbetween the inputs and outputs/activations of a neural net-work. Extension of Fano’s Inequality to MIA Success
We theoretically establish a relationship between the proba-bility of a MIA model making α prediction errors on a neuralnetwork N and the mutual information I ( D ; Y ) between theinputs D and the outputs/activations Y of the neural network N . Our proof procedure first establishes two lemmas on theconditional entropy H ( E α , X | A ) , and then uses these re-sults to prove a theorem relating MIA prediction errors withthe mutual information I ( D ; Y ) . Our proof of the bound on p α is applicable to any classifier with input D and output Y ,not just to a neural network. Lemma 1. H ( E α , X | A ) = H ( X | A ) Proof.
Since the error E α is deterministically known given X and A , the entropy H ( E α | X, A ) = 0 . We can evaluate H ( E α , X | A ) using the chain rule of conditional entropy. H ( E α , X | A ) = H ( X | A ) + H ( E α | X, A ) (2) = H ( X | A ) + 0= H ( X | A ) (3) Lemma 2. H ( E α , X | A ) ≤ − p α ) log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) + p α |D| Proof.
We perform an expansion for H ( E α , X | A ) using thechain rule of conditional entropy: H ( E α , X | A ) = H ( E α | A ) + H ( X | E α , A ) (4)Now, we know that H ( E α | A ) ≤ H ( E α ) as conditional en-tropy is no more than an unconditional entropy. Further,since E α is a binary valued random variable, H ( E α ) ≤ by the definition of entropy. Thus, we can write Eqn. 4 asfollows: H ( E α , X | A ) ≤ H ( X | E α , A ) (5)We can expand the second term H ( X | E α , A ) by splitting E α into two cases i.e. E α = 0 and E α = 1 : H ( X | E α , A ) = P r ( E α = 0) H ( X | E α = 0 , A )+ P r ( E α = 1) H ( X | E α = 1 , A ) (6)We can simplify the above expression by obtaining boundson the quantity H ( X | E α = 0 , A ) . If E α = 0 , the ran-dom variable X can only differ from the random variable A in at most α positions. Thus, given a particular value ofthe random variable A , the random variable X can onlytake at most (cid:0) |D| (cid:1) + · · · + (cid:0) |D| α (cid:1) = V ( α ) values. The high-est entropy is achieved when all these values are equallylikely i.e. H ( X | E α = 0 , A ) ≤ − (cid:80) V ( α ) j =1 1 V ( α ) log V ( α ) = − log V ( α ) (cid:80) V ( α ) j =1 1 V ( α ) = − log V ( α ) = log V ( α ) . Hence,Eqn. 6 can be rewritten as: H ( X | E α , A ) ≤ (1 − p α ) log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) + p α H ( X | E α = 1 , A ) (7)In the above equation, we have used p α as a shorthand torepresent the probability P r ( E α = 1) . Since X can take atmost |D| different values, the term H ( X | E α = 1 , A ) onthe right can be upper bounded by log 2 |D| = |D| using thedefinition of entropy. Thus, Eqn. 7 can be simplified as: H ( X | E α , A ) ≤ (1 − p α ) log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) + p α |D| (8)Putting together equations 5 and 8, we get the following: H ( E α , X | A ) ≤ − p α ) log (cid:0) ( |D| ) + ... ( |D| α ) (cid:1) + p α |D| (9) heorem 1. Given a neural network N and a MIA modelthat considers all inputs from a data set D and only observesthe outputs/activations Y of the neural network N , the prob-ability of such a MIA model making more than α predictionerrors is p α ≥ H ( D ) − I ( D ; Y ) − − log (cid:0) ( |D| ) + ... ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ... ( |D| α ) (cid:1) Here, H ( D ) is the entropy of the training data set D , |D| de-notes the size of the total data set available to the MIA, and I ( D ; Y ) denotes the mutual information between the trainingdata D and the outputs/activations Y of the neural network.Proof. Putting together the results from Lemma 1 andLemma 2, we obtain the following: H ( X | A ) ≤ − p α ) log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) + p α |D| = ⇒ p α ≥ H ( X | A ) − − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) (10)Note that X is determined given the training data D usedto train the neural network N ; hence, H ( X | D, A ) = 0 .Thus, using the chain rule of conditional entropy, we get H ( D, X | A ) = H ( D | A ) + H ( X | D, A ) = H ( D | A ) +0 = H ( D | A ) Also, repeating the chain rule of conditionalentropy, we get H ( D, X | A ) = H ( X | A ) + H ( D | X, A ) .Combining these two results, we obtain the following: H ( X | A ) = H ( D | A ) − H ( D | X, A ) . Putting this togetherwith Eqn. 10, we obtain the following: p α ≥ H ( D | A ) − H ( D | X, A ) − − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) ≥ H ( D ) − I ( D ; A ) − H ( D | X, A ) − − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ... ( |D| α ) (cid:1) as I ( D ; A ) = H ( D ) − H ( D | A ) ≥ H ( D ) − I ( D ; A ) − − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) since, H ( D | X, A ) = 0 (11)
Also, since Y is obtained from D by using the neural net-work N , and the adversarial prediction A is obtainedfrom the neural network response Y , the data processinginequality implies that I ( D ; A ) ≤ I ( D ; Y ) . Applying theseresults to Eqn. 11, we get the following: p α ≥ H ( D ) − I ( D ; Y ) − − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) |D| − log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) (12)The training of a neural network does not influence theentropy of the training data set H ( D ) or the size of thecomplete data set D used by the membership inference at-tack. Our analysis shows that the probability of a member-ship inference attack making more than α prediction errorsis dependent on the mutual information I ( D ; Y ) betweenthe inputs and the outputs/activations of a neural network.Thus, the mutual information between the inputs and theoutputs/activations of a neural network can be used to char-acterize its susceptibility to membership inference attacks. Example 1 (Theorem 1 with I ( D ; Y ) = 0 , α = c where c is a constant such that c << |D| , and H ( D ) = |D| ) . Con-sider an untrained neural network such that the mutual in-formation between its input D and its output Y is zero. Fur-ther, assume that H ( D ) = |D| . Then, Theorem 1 states thatthe probability p α of a membership inference attack makingmore than c prediction errors is: p α ≥ H ( D ) − I ( D ; Y ) − − log (cid:0) ( |D| ) + ··· + ( |D| c ) (cid:1) |D| − log (cid:0) ( |D| ) + ··· + ( |D| c ) (cid:1) ≥ |D| − − log (cid:0) ( |D| ) + ··· + ( |D| c ) (cid:1) |D| − log (cid:0) ( |D| ) + ··· + ( |D| c ) (cid:1) Since, I ( D ; Y ) = 0 and H ( D ) = |D|≥ − |D| − log (cid:0) ( |D| ) + ··· + ( |D| c ) (cid:1) As the data set becomes large i.e. |D| → ∞ , p α → for α = c << |D| i.e. the membership inference attack willalmost surely make at least c prediction errors if I ( D ; Y ) =0 and H ( D ) = |D| . Example 1 shows how the probability bound establishedby Theorem 1 ties with our intuition in a specific setting ofa poorly trained neural network with I ( D ; Y ) = 0 . Now,we look at another example of a neural network where I ( D, Y ) = |D| /c for some constant c > . Example 2 (Theorem 1 with I ( D ; Y ) = |D| /c for someconstant c > , α = 0 , and H ( D ) = |D| ) . Considera neural network whose mutual information is given by I ( D ; Y ) = |D| /c . Applying Theorem 1, the probability ofmaking one or more prediction errors is: p α ≥ H ( D ) − I ( D ; Y ) − − log (cid:0) ( |D| ) (cid:1) |D| − log (cid:0) ( |D| ) (cid:1) ≥ |D| − |D| c − |D| Since, I ( D ; Y ) = |D| c and H ( D ) = |D|≥ − c − |D| Thus, according to Theorem 1, a membership inference at-tack may make at least one prediction error with probability − c as |D| → ∞ . Measuring Mutual Information:
Entropy of any d di-mensional random variable x can be computed using a non-parametric estimator (Gao, Ver Steeg, and Galstyan 2015)based on k -nearest-neighbors (kNN) with a correction ap-plied for the local non-uniformity of the underlying jointdistribution of the d features. A simple kNN based esti-mator for entropy from samples x , x , . . . , x n is: H ( x ) = − n (cid:80) n log p k ( x i ) where the probability density is given by p k ( x i ) = kn − d/ π d/ r k ( x i ) − d . Here, r k ( x i ) is the dis-tance between x i and its k th nearest neighbor in the dataset. This can be used to compute the entropy of trainingdata H ( D ) , H ( Y ) , and H ( D, Y ) . The empirical estimationof the mutual information between the training inputs andoutputs/activations of a DNN model I ( D ; Y ) is obtained as I ( D ; Y ) = H ( D ) + H ( Y ) − H ( D, Y ) . Related Work
We survey related work in membership inference attacks anddiscuss privacy preserving approaches to machine learning.We sketch the relationship between regularization, mutualinformation and generalization in deep neural networks.
Membership Inference Attacks
A membership inference attack on neural networks essen-tially generalizes the well-studied problem of identifying ifa specific data record is present in a data set given somestatistic about this data set (Shokri et al. 2017; Nasr, Shokri,and Houmansadr 2019; Jacobs et al. 2009; Sankararamanet al. 2009). This is a severe privacy concern. For example,membership in the training data set of a model associatedwith an addiction or disease can reveal otherwise private in-formation about the patient (Liu et al. 2019; Pyrgelis, Tron-coso, and Cristofaro 2017). A number of MIA methods havebeen proposed recently in literature. One approach (Shokriet al. 2017) trains a number of shadow models independentlyusing a subset of the training dataset. The final attackermodel learns from all these shadow models, and can thenpredict if a data element was in or out of the target model’straining data. Another training-time attack is based on aug-menting the training data with additional synthetic inputswhose labels encode information that the model needs toleak (Song, Ristenpart, and Shmatikov 2017). No other com-ponent of the entire training pipeline is perturbed. Yet an-other approach (Melis et al. 2019) exploits the fact that deepneural networks construct multiple internal representationsof all kinds of features related to the input data, includingthose irrelevant to the current task. These attacks have alsobeen extended to collaborative and federated settings (Meliset al. 2019). Robust learning techniques to defend againstadversarial attacks have been shown to increase susceptibil-ity to MIA attacks (Song, Shokri, and Mittal 2019). Finally,these attacks have also been shown to be largely transfer-able (Truex et al. 2018). These observations further under-line the need for addressing MIA attacks.
Privacy Preserving Machine Learning
Differential privacy is used for privacy-preserving statisti-cal analysis over sensitive data where the privacy and utilitytrade-off is controlled by a privacy budget parameter. Differ-ential privacy can provide formal guarantees that the modeltrained on a given dataset will produce statistically similarpredictions as a model trained on a different dataset thatdiffers by exactly one instance (Dwork, Roth et al. 2014).Differential training privacy has been proposed as a wayto measure model susceptibility by computing this worst-case difference among all training data points (Long, Bind-schaedler, and Gunter 2017). These are particularly use-ful for simple convex machine learning algorithms (Chaud-huri, Monteleoni, and Sarwate 2011; Zhang, Rubinstein, andDimitrakakis 2016; Jayaraman et al. 2018). But differen-tial private deep learning often requires a large privacy bud-get (Shokri and Shmatikov 2015) with ongoing efforts to re-duce it (Abadi et al. 2016; Hynes, Cheng, and Song 2018).Differential privacy methods can provide worst-case bounds on the privacy loss, but these do not provide an understand-ing of privacy attacks in practice. Membership and attributeinference attacks, on the other hand, provide an empiricallower bound on the privacy loss of training data. The rela-tionship between the standard worst-case definition of dif-ferential privacy and the average-case mutual-informationnotion is an active area of study in the security and pri-vacy literature (Cuff and Yu 2016; Wang, Ying, and Zhang2016). Further, MIA attacks are a restricted form of privacyattacks that do not aim at discovering the training data butonly detecting the presence of a given data in the trainingset. In contrast to the differential privacy bounds, we focusentirely on MIA attacks and formulate an information theo-retic bound on the probability of such an attack being suc-cessful instead of characterizing worst-case privacy leakage.This allows a scalable and practical approach to measureand regulate the average-case susceptibility of DNN mod-els to existing MIA attacks. In order to make DNN modelsmore robust to privacy attacks, there are broadly two classesof techniques. The first relies on adding noise directly to thetraining inputs (Zhang, He, and Lee 2018), or to the stochas-tic gradient descent (Abadi et al. 2016) to control the affectsof the training data on the model parameters. The secondclass uses an aggregation of teacher ensembles (Dwork andFeldman 2018; Papernot et al. 2018; Pyrgelis, Troncoso, andCristofaro 2017), where privacy is enforced by training eachteacher on a separate subset of training data, and relying onthe noisy aggregation of the teachers’ responses.
Generalization and Memorization in DNNs
A desirable property of any model is having low general-ization error, that is, good performance on unseen exam-ples from the population. The connection between overfit-ting and membership inference attacks has also been inves-tigated (Yeom et al. 2018). Regularization techniques aimedat controlling model complexity have been traditionally usedto reduce overfitting and improve generalization. But recentwork has demonstrated that these regularization techniquesdo not reduce the susceptibility to MIA attack (Long et al.2018). In contrast, we use mutual information to character-ize susceptibility of DNNs to MIA attacks. One explana-tion of generalization in deep learning states that traininginitially increases the mutual information between the inputand the output of the model, and then decreases the mutualinformation removing relations irrelevant to the task and im-proving generalization (Shwartz-Ziv and Tishby 2017). Arelated effort focuses on the ability of a deep learning modelto unintentionally memorize unique or rare sequences in thetraining data (Carlini et al. 2018), and uses it to measure themodel’s propensity for leaking training data. Prior work hasshown that deep learning models can be trained to perfectlyfit completely random data (Zhang et al. 2016) which in-dicates high memorization capacity of DNNs. Hence, MIAattacks are not an oddity of a particular learning technique ormodel, but a result of the widely observed memorization indeep learning models. Our approach of characterizing MIAsusceptibility of models to these attacks to mutual informa-tion is, thus, a first step in a promising direction that connectsprivacy and generalization of DNNs.
I=2.01 MI=1.52 MI=1.24 MI=1.17 . . . . . . Models A tt ac k S u cce ss P r ob a b ilit y CIFAR-10
Attack 1 Attack 2 Attack 3 MI=2.29 MI=1.62 MI=1.32 MI=1.21 . . Models A tt ac k S u cce ss P r ob a b ilit y GTSRB
Attack 1 Attack 2 Attack 3MI=1.09 MI=0.88 MI=0.51 MI=0.40 . . . . Models A tt ac k S u cce ss P r ob a b ilit y SVHN
Attack 1 Attack 2 Attack 3
Data Set CorrelationMI & Attack ProbabilityCIFAR-10 0.966GTSRB 0.996SVHN 0.955Figure 2: Mutual information between the inputs and the output layers of a neural network correlates strongly with the successprobability of membership inference attack models. The Pearson correlations between mutual information and success proba-bility of a contemporary MIA attack (Shokri and Shmatikov 2015) are . , . and . for neural networks trained onthe CIFAR-10, GTSRB and SVHN data sets, respectively. Our evaluation considers three different variants of the MIA attack. Our experiments are performed on a system with 128GBRAM, a 16-core AMD processor, and 2 NVIDIA RTX 2080Ti GPUs running Ubuntu 20.04. Three popular data sets areused for our investigations: (i) CIFAR-10 (Krizhevsky, Nair,and Hinton 2014) (ii) SVHN (Netzer et al. 2011) and (iii)GTSB (Houben et al. 2013). In our experimental evaluation,we investigate whether we can use mutual information be-tween the input and output of a DNN model to estimate thesuccess probability of MIA attacks on the model.
CIFAR-10:
We study 4 DNN models for the CIFAR-10 data set with mutual information decreasing from . to . nats. Using three different variants of a contem-porary membership inference attack (Nasr, Shokri, andHoumansadr 2019) with , and shadow models, the prob-ability of attacks decreases from . to . , . to . ,and . to . , for the three attacks respectively. A de-crease in mutual information is coupled with a decrease inthe success probability of the MIA model. The Pearson cor- relation between the mutual information and the attack prob-ability for CIFAR-10 is . . GTSRB:
The GTSRB data is also used to train four dif-ferent neural network models with mutual information de-creasing from . to . . As shown in Fig. 2, the successprobability of the most powerful MIA model falls from . to . . SVHN:
Similar reduction in the success of MIA models isobserved on the SVHN data set. As the mutual informationfalls from . to . , the probability of success of the mostsuccessful MIA model falls from . to . .We find that the Pearson correlations between mutual in-formation and success probability of a contemporary MIAattack (Nasr, Shokri, and Houmansadr 2019) are . , . and . for neural networks trained on the CIFAR-10, GTSRB and SVHN data sets, respectively. The stronglypositive Pearson’s correlation across data sets confirms ourtheoretical finding that mutual information is related to thesuccess probability of MIA models.igure 3: The approximate lower bound (LB) on p α is positive only when H ( D | Y ) exceeds a threshold (left) and z ( α ) =log (cid:0) ( |D| ) + ··· + ( |D| α ) (cid:1) / |D| exceeds another threshold (right). Broader Applicability of Our Lower Bound
While Theorem 1 enables a theoretical understanding ofthe relationship between mutual information I ( D ; Y ) , in thissection, we investigate an orthogonal question: when doesTheorem 1 produce positive lower bounds on p α ? Figure 3 (left) shows a plot of a threshold on the ratio ofthe conditional entropy H ( D | Y ) and the size of the data set |D| such that conditional entropy values higher than this ap-proximate threshold are required for a positive lower boundsfor p α in Theorem 1. We can verify that the results agreewith our intuition for various values of the ratio of the num-ber of errors α to the size of the data set |D| . For example,if we are only interested in small number of errors α < |D| ,our lower bounds on p α are positive when H ( D | Y ) > |D| i.e. the conditional entropy H ( D | Y ) is comparable to at leasthalf the size of the data set |D| .On the other hand, as the size of the data set increases andthe number of errors becomes large e.g. α/ |D| ≈ . , thecurves corresponding to the threshold show that the condi-tional entropy H ( D | Y ) needs to become as large as about of |D| for our lower bound to produce a positive result.This again makes intuitive sense as the conditional entropymust be high in order for even the best membership infer-ence attack to suffer a large number of errors.The bound in Theorem 1 can also be stated as p α ≥ − / |D|− c − z ( α ) , where c = H ( D | Y ) |D| and z ( α ) = log (( |D| ) + ··· + ( |D| α )) |D| . Figure 3 (right) shows how the value of z ( α ) required for a positive lower bound changes with theratio α/ |D| in one setting.In summary, our lower bound on p α is useful in alarge non-degenerate regime where the conditional entropy H ( D | Y ) is not too low when compared to the size of thedata set |D| . If the conditional entropy H ( D | Y ) is too low,our bound is not positive and this ties well with our intuitionthat a good adversary can launch embarrassingly successfulmembership inference attacks in this setting. Fano’s inequality is a classical information theoretic resultthat relates the probability of an error in a channel withthe conditional entropy between the input and output of anoisy channel. We present a new extension to Fano’s in-equality (Fano 1961) that establishes a bound on the successprobability of a membership inference attack using mutualinformation between the inputs and the outputs/activationsof a DNN model. We mathematically prove that our mutualinformation based bound can measure a DNN model’s sus-ceptibility to any membership attack.In our empirical evaluation, the correlation between themutual information and the susceptibility of the DNN modelto membership inference attacks is 0.966, 0.996, and 0.955for CIFAR-10, SVHN and GTSRB, respectively. Thus, weaddress the challenge of making DNNs less susceptible tomembership inference attacks and reduce the risk of inad-vertent leak of information about training data.Several directions for future research remain open. Whilethis paper focuses on the use of mutual information as a sus-ceptibility metric another interesting line of research mayfocus on computing p α directly as a metric of susceptibilityto membership inference attacks. Since mutual information I ( D ; Y ) is the only term in the lower bound of Theorem 1that arises from the design and training of the neural net-work, we have chosen to focus on mutual information as asusceptibility metric.Because of recent advances in neural network based es-timation of mutual information, our results on I ( D ; Y ) asa metric can be used to create an effective regularizationapproach for training neural networks that are more robustagainst membership inference attacks.Another interesting direction of research is a deeper un-derstanding of the tightness of our bound based on mu-tual information. While we have presented experimental ev-idence on three different data sets to show that mutual infor-mation is a good metric for measuring model susceptibilityto membership inference attacks, a theoretical investigationinto the tightness of the bound may lead to deeper insights. thical and Broader Impact There is an emerging trend of providing DNN models tousers either directly or through cloud services, where themodel has been trained on proprietary or private data. Therecently proposed membership inference attacks show thatthe user of the model can infer whether a training data wasused in a model or not. MIA attacks violate the expected pri-vacy of the individual participants contributing to the train-ing data, and cause unauthorized leakage of the trainingdataset which could be of business value or even a trade se-cret. For example, membership in the training data set of amodel associated with a disease or addiction can reveal oth-erwise private information about a patient. As yet anotherexample, consider an anomaly detection DNN model for anengine made available to customers by the engine manufac-turer, the discovery of training data employed for anomalydetection could leak crucial proprietary information. Theseconcerns create a hurdle to the broader adoption of DNNmodels.We address this socially important challenge in the paper.We present a way to analyze a machine learning model tounderstand its susceptibility to membership inference attackusing mutual information between the inputs and the out-puts of the model. Our approach will make machine learn-ing models more robust and privacy-aware, and thus, be ofpositive impact to society.
References
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H. B.;Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learningwith differential privacy. In
Proceedings of the 2016 ACMSIGSAC Conference on Computer and Communications Se-curity , 308–318.Achille, A.; Paolini, G.; and Soatto, S. 2019. Where isthe information in a deep neural network? arXiv preprintarXiv:1905.12213 .Carlini, N.; Liu, C.; Kos, J.; Erlingsson, ´U.; and Song, D.2018. The secret sharer: Measuring unintended neural net-work memorization & extracting secrets. arXiv preprintarXiv:1802.08232 .Chaudhuri, K.; Monteleoni, C.; and Sarwate, A. D. 2011.Differentially private empirical risk minimization.
Journalof Machine Learning Research
Proceedings of the 2016 ACMSIGSAC Conference on Computer and Communications Se-curity , 43–54.Dwork, C. 2011. A firm foundation for private data analysis.
Communications of the ACM arXiv preprint arXiv:1803.10266 .Dwork, C.; and Naor, M. 2010. On the difficulties of disclo-sure prevention in statistical databases or the case for differ-ential privacy.
Journal of Privacy and Confidentiality
Foundations and Trends inTheoretical Computer Science
American Journal of Physics , 17–32.Gao, S.; Ver Steeg, G.; and Galstyan, A. 2015. Efficient es-timation of mutual information for strongly dependent vari-ables. In
Artificial intelligence and statistics , 277–286.Houben, S.; Stallkamp, J.; Salmen, J.; Schlipsing, M.; andIgel, C. 2013. Detection of traffic signs in real-world im-ages: The German Traffic Sign Detection Benchmark. In
The 2013 International Joint Conference on Neural Net-works (IJCNN) , 1–8. IEEE.Hynes, N.; Cheng, R.; and Song, D. 2018. Efficientdeep learning on multi-source private data. arXiv preprintarXiv:1807.06689 .Jacobs, K. B.; Yeager, M.; Wacholder, S.; Craig, D.; Kraft,P.; Hunter, D. J.; Paschal, J.; Manolio, T. A.; Tucker, M.;Hoover, R. N.; et al. 2009. A new statistic and its power toinfer membership in a genome-wide association study usinggenotype frequencies.
Nature genetics
Advances in Neural InformationProcessing Systems , 6343–6354.Krizhevsky, A.; Nair, V.; and Hinton, G. 2014.
The cifar-10dataset
IEEE Transactionson Computational Social Systems arXiv preprintarXiv:1712.09136 .Long, Y.; Bindschaedler, V.; Wang, L.; Bu, D.; Wang, X.;Tang, H.; Gunter, C. A.; and Chen, K. 2018. Understandingmembership inferences on well-generalized learning mod-els. arXiv preprint arXiv:1802.04889 .Melis, L.; Song, C.; De Cristofaro, E.; and Shmatikov, V.2019. Exploiting unintended feature leakage in collaborativelearning. In , 691–706. IEEE.Nasr, M.; Shokri, R.; and Houmansadr, A. 2019. Compre-hensive privacy analysis of deep learning: Passive and activewhite-box inference attacks against centralized and feder-ated learning. In , 739–753. IEEE.etzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.;and Ng, A. Y. 2011. Reading Digits in Natural Im-ages with Unsupervised Feature Learning. In
NIPS Work-shop on Deep Learning and Unsupervised Feature Learn-ing 2011 . URL http://ufldl.stanford.edu/housenumbers/nips2011 housenumbers.pdf.Neyshabur, B.; Bhojanapalli, S.; McAllester, D.; and Sre-bro, N. 2017. Exploring generalization in deep learning. In
Advances in Neural Information Processing Systems , 5947–5956.Papernot, N.; Song, S.; Mironov, I.; Raghunathan, A.; Tal-war, K.; and Erlingsson, ´U. 2018. Scalable private learningwith pate. arXiv preprint arXiv:1802.08908 .Pyrgelis, A.; Troncoso, C.; and Cristofaro, E. D. 2017.Knock knock, whos there? membership inference on aggre-gate location data. arXiv. Technical report, preprint.Rahman, M. A.; Rahman, T.; Lagani`ere, R.; Mohammed, N.;and Wang, Y. 2018. Membership Inference Attack againstDifferentially Private Deep Learning Model.
Transactionson Data Privacy
Nature genetics
Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Security ,1310–1321.Shokri, R.; Stronati, M.; Song, C.; and Shmatikov, V. 2017.Membership inference attacks against machine learningmodels. In
IEEE Symposium on Security and Privacy , 3–18. IEEE.Shwartz-Ziv, R.; and Tishby, N. 2017. Opening the blackbox of deep neural networks via information. arXiv preprintarXiv:1703.00810 .Song, C.; Ristenpart, T.; and Shmatikov, V. 2017. Machinelearning models that remember too much. In
Proceedings ofthe 2017 ACM SIGSAC Conference on Computer and Com-munications Security , 587–601.Song, L.; Shokri, R.; and Mittal, P. 2019. Privacy risks ofsecuring machine learning models against adversarial exam-ples. In
Proceedings of the 2019 ACM SIGSAC Conferenceon Computer and Communications Security , 241–257.Truex, S.; Liu, L.; Gursoy, M. E.; Yu, L.; and Wei, W. 2018.Towards demystifying membership inference attacks. arXivpreprint arXiv:1807.09173 .Wang, W.; Ying, L.; and Zhang, J. 2016. On the rela-tion between identifiability, differential privacy, and mutual-information privacy.
IEEE Transactions on InformationTheory
Advances in Neural Information Processing Systems , 2524–2533. Yeom, S.; Giacomelli, I.; Fredrikson, M.; and Jha, S. 2018.Privacy risk in machine learning: Analyzing the connectionto overfitting. In , 268–282. IEEE.Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; and Vinyals,O. 2016. Understanding deep learning requires rethinkinggeneralization. arXiv preprint arXiv:1611.03530 .Zhang, T.; He, Z.; and Lee, R. B. 2018. Privacy-preservingmachine learning through data obfuscation. arXiv preprintarXiv:1807.01860 .Zhang, Z.; Rubinstein, B. I.; and Dimitrakakis, C. 2016. Onthe differential privacy of Bayesian inference. In