Analyzing Information Leakage of Updates to Natural Language Models
Marc Brockschmidt, Boris Köpf, Olga Ohrimenko, Andrew Paverd, Victor Rühle, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
AAnalyzing Information Leakage of Updatesto Natural Language Models
Marc Brockschmidt [email protected]
Boris Köpf [email protected]
Olga Ohrimenko [email protected] of Melbourne
Andrew Paverd [email protected]
Victor Rühle [email protected]
Shruti Tople [email protected]
Lukas Wutschitz [email protected]
Santiago Zanella-Béguelin [email protected]
ABSTRACT
To continuously improve quality and reflect changes in data, ma-chine learning applications have to regularly retrain and updatetheir core models. We show that a differential analysis of languagemodel snapshots before and after an update can reveal a surpris-ing amount of detailed information about changes in the trainingdata. We propose two new metrics— differential score and differentialrank —for analyzing the leakage due to updates of natural languagemodels. We perform leakage analysis using these metrics acrossmodels trained on several different datasets using different meth-ods and configurations. We discuss the privacy implications of ourfindings, propose mitigation strategies and evaluate their effect.
Over the last few years, deep learning has made sufficient progressto be integrated into intelligent, user-facing systems, which meansthat machine learning models are now part of the software develop-ment lifecycle. As part of this cycle, models are regularly updatedto accommodate three different scenarios: • data update , to improve performance when new and moredata becomes available; • data specialization , to fine-tune a model towards a specificdataset, or to handle distributional shift as usage patternschange; or • data deletion , to respect user requests for removal of theirdata.Motivated by these scenarios, we study privacy implications fortext data that is added (or removed) during retraining of genera-tive natural language models (LMs). Specifically, we consider anadversary who obtains access to multiple snapshots of a model andwishes to learn information about differences in the data used totrain them. This threat model is motivated by the combination ofthree factors: (1) the current trend to fine-tune pretrained publichigh-capacity LMs to smaller private datasets; (2) the establishedability of such LMs to memorize out-of-distribution training sam-ples [4]; and (3) the widespread deployment of LMs to end-usersystems (e.g., predictive keyboards on smartphones), allowing ad-versaries to analyze them in detail. We show that data that is added or removed between modelupdates can be extracted in this threat model, having severe impli-cations for deploying machine learning models trained on privatedata. Some of the implications are counter-intuitive: for example,honoring a request to remove a user’s data (as per GDPR) from thetraining corpus can mean that their data becomes exposed by re-leasing an updated model trained without it. Similarly, fine-tuninga public snapshot of a high-capacity model (e.g., BERT [6] or GPT-2 [15]) with data from a single organization exposes this additionaldata to anyone who obtains access to both the fine-tuned modeland the original public model (e.g., employees of this organization).In order to extract information about the difference in the dataused to train two language models, we develop a novel notion of differential score . The differential score of a token sequence cap-tures the difference between the probability that each of the twomodels assigns to it. The intuition is that token sequences with thehighest differential score are likely to have been added during amodel update. We devise an algorithm based on beam search thatefficiently identifies such token sequences, even if the individualmodels assign low probability to them. This algorithm allows us torecover information about the difference between the datasets usedfor training without any background knowledge of their contentsor distribution.When given some background knowledge, the advantage ofhaving access to two model snapshots becomes crisper. For ex-ample, we train a recurrent neural network (RNN) on 20M to-kens of general Reddit comments, and update it by retraining iton these comments plus 25K tokens from 940 messages of the talk.politics.mideast newsgroup. When prompted with theword “Turkey”, our algorithm produces “Turkey searched an Amer-ican plane” as the 2 nd most likely result, although this phrase occursonly 6 times in newsgroup messages and none in Reddit comments(that is, it represents less than 0.000002% of the training data). Anequivalent search using only the updated network does not producethis sentence among the top 10,000 results; it would take the longerprompt “Turkey searched an” for this phrase to surface to the top100 results.We perform experiments where we use differential score to studythe effect of changes to the training data in the three scenariosmentioned above. As a proxy for the updated dataset, we perform a r X i v : . [ c s . L G ] M a y xperiments using synthetically generated sentences (or canaries )and real-world sentences from newsgroup messages. Using bothcanaries and real-world data, we analyze the effect on leakageof (1) different training types for model updates, ranging fromretraining a model from scratch with an updated dataset to fine-tuning as is common for modern high-capacity language models;(2) the proportion of private and public data used for the update;and (3) an adversary’s background knowledge. For robustness, weconsider datasets of different sizes on both RNNs as well as moderntransformer architectures. Summary of Contributions.
We present the first systematic studyof the privacy implications of releasing snapshots of language mod-els trained on overlapping data. The results we obtain validate thatmodel updates pose a substantial risk to content added to trainingdata in terms of information leakage. Our key findings are: • By comparing two models, an adversary can extract specificsentences or fragments of discourse from the difference betweenthe data used to train them. This does not require any informationabout the training data or the model architecture and is possibleeven when the change to the data is as small as 0.0001% of the orig-inal dataset. Smaller changes become exposed when given partialknowledge about the data. • We show that analyzing two model snapshots reveals sub-stantially more about the data that was added or removed thanconsidering only a single snapshot at a time, as in [4]. • Adding or removing additional non-sensitive training databetween model updates is not a reliable method to hide data thatshould be kept private. • Training with differential privacy mitigates the attack, butincurs substantial computational cost and reduces the utility of thetrained models. • Restricting access to the model by providing clients with asubset of prediction results is a promising mitigation as it reducesthe effectiveness of our attack without reducing utility of the model.These findings apply to models that are fine-tuned on a smallerdataset, as well as models that are retrained on the union of originaland new data.
Structure of the Paper.
We provide background on language mod-els and describe our adversary model and attack scenarios in thenext section. We define the notion of differential score and describehow to efficiently approximate it in Section 3. In Section 4 we de-scribe our experiments to analyze the effect of different factors onleakage. In Section 5 we investigate the source of leakage in modelupdates, e.g., by comparing with leakage from access to only asingle model. Finally, we consider mitigation strategies in Section 6,before describing related work and concluding.
We consider machine learning models capable of generating naturallanguage. These models are used in a variety of applications, includ-ing automatic caption generation, language translation, and next-word prediction. Generative language models usually operate on a Notice that our findings hold also for the content that is removed, simply by swappingthe models. fixed set of known tokens T (often referred to as the model’s vocabu-lary ) and are autoregressive , modeling the probability p ( t . . . t n ) ofa sequence of tokens t . . . t n ∈ T n as the product of the per-tokenprobabilities conditional on their prefix p ( t i | t . . . t i − ) , i.e., p ( t . . . t n ) = (cid:214) ≤ i ≤ n p ( t i | t . . . t i − ) . Training an autoregressive generative language model M hencerequires learning a function (which we also refer to as M ) that mapstoken sequences of arbitrary length to a probability distributionover the vocabulary T , modeling the likelihood of each token toappear next. We will use M ( t < i ) to denote the probability distribu-tion over tokens computed by model M after reading the sequence t . . . t i − ∈ T ∗ , and M ( t < i )( t i ) to denote the probability of a spe-cific token t i .Given such a model M , a simple predictive screen keyboard canbe implemented by feeding M the words typed so far (e.g., from thestart of the current sentence) and displaying the, say, three mostlikely tokens as one-tap options to the user.A variety of different architectures exist for the generation ofnatural language using machine learning models. The most promi-nent are Recurrent Neural Networks (RNNs) using Long Short-TermMemory [10] cells (or variants thereof) and the more recent Trans-formers [15, 21]. These architectures differ substantially in how theyimplement the modeling of the per-token probability distribution,but as our experiments will show, they behave nearly identicallyfor the purposes of our analysis.Given a model architecture, a dataset D ⊆ T ∗ is required as train-ing data to obtain a concrete model. We write M D to emphasizethat a model was trained on a dataset D . Throughout the paper,we will use the standard measure of perplexity perp M ( t . . . t n ) = p M ( t . . . t n ) − n of a model M on test data t . . . t n , using the prob-ability p M ( t . . . t n ) assigned to the sequence by model M . Unlikethe more familiar accuracy, which only captures the correctness ofthe most probable choice, this metric captures models being “almostright.” Intuitively, perplexity can be thought as how “surprised” amodel is by a next-word choice, and hence, lower perplexity valuesindicate a better match between data and model. Language models are regularly updated for a variety of reasons,either by adding and/or removing data from the training set. We usethe term model update to refer to any update in the parameters ofthe model caused by training on different data. This is distinct froman update to the model architecture, which changes the number oruse of parameters. Each update creates a new version of the model,which we refer to as a snapshot .We consider an adversary that has concurrent query accessto two snapshots, M D and M D ′ , of a language model trained ondatasets D and D ′ respectively, where D ⊊ D ′ . We write M , M ′ asshorthand for M D , M D ′ . The adversary can query the snapshotswith any sequence s ∈ T ∗ and observe the corresponding proba-bility distributions M ( s ) and M ′ ( s ) . The adversary’s goal is to inferinformation about D ′ \ D , the difference between D and D ′ . Werefer to an adversary who has access to two snapshots of the modelas a snapshot attacker .2 .3 Analysis Scenarios To guide our analysis, we focus on three concrete scenarios in whichan adversary can gain concurrent access to two (or more) snapshotsof a language model.
Data Updates.
Many applications require language models thatreflect recent patterns in language use. For example, a predictivekeyboard on a mobile device requires regular updates to suggestterms that have become more common recently (e.g., followingnews trends or internet memes). To achieve this, vendors oftenregularly retrain an (otherwise unchanged) model on an updateddataset, for example by simply adding more recent data to thetraining dataset. In such cases, an adversary can easily gain accessto two snapshots M D and M D ′ with D ⊊ D ′ and may be interestedin learning details about the update D ′ \ D . We will show that wecan extract entire sentences from this difference by comparing M D and M D ′ , revealing not only aggregate user behavior, but specificconversations. Data Specialization.
Some applications with little task-specificdata build on top of generic, pretrained high-capacity languagemodels such as GPT-2 [15]. In such settings, training starts fromthe pretrained model, but then uses a significantly smaller pri-vate dataset. As an example, an organization could simply use apublicly available off-the-shelf language model to create an emailauthoring autocompletion system. However, by additionally train-ing the model with some historical email data, it can be adaptedto organization-specific terms, acronyms and concepts. In such ascenario, if an adversary can gain access to the specialized model M ′ , they can easily also obtain the (publicly available) model M used as a basis.We will show that by treating these as different snapshots of thesame model, the adversary can extract parts of the private datasetused for specialization. User Data Deletion.
Art. 17 of GDPR Right to erasure (“right to beforgotten”) gives data owners the right to request erasure of theirpersonal data from a party who has collected and processed it. Language models trained on emails, text messages and other user-generated content may contain personal information that a usercan at any point request to delete. Though some models that usepersonal data may only be available internally to the data collector’sorganization, there are numerous scenarios where the models arereleased either to the public or to other users via services providedby the data collector (e.g., text prediction and auto-correct servicesin text editors and mobile keyboards). In order to comply with theregulation, the data collector would be required to delete the user’sdata and retrain any models in which this data had been used.This scenario also falls into our adversary setting, albeit in re-verse chronological order. Here the dataset D ′ contains the datathat will be deleted, whilst D does not (i.e., the difference D ′ \ D represents the user’s data). With access to M D and M D ′ , the at-tacker can attempt to infer the user’s data. Even if the retrainedmodel overwrites the old model, it may not be possible to erase allinstances of the old model simultaneously. For example, some users https://gdpr-info.eu/art-17-gdpr/ may be slow to download the new version or the old model mayhave been copied by other parties.Naturally, this scenario can be extended to other settings wheredata is deleted between model updates. Though not considered inthis paper, this scenario raises an interesting question on whetherdeletion of data is in the user’s best interest or if it makes their datamore susceptible to information leakage. We introduce two metrics called differential rank and differentialscore to analyze data exposure between two snapshots of a genera-tive language model.
We aim to identify token sequences whose probability differs mostbetween models M and M ′ . Intuitively, such sequences are mostlikely to be related to the differences between their correspondingtraining datasets D and D ′ .To capture this notion formally, we define the differential score ( DS ) of token sequences, which is simply the sum of the differencesof (contextualized) per-token probabilities. We also define a relative variant (cid:102) DS based on the relative change in probabilities, which wefound to be more robust w.r.t. the noise introduced by differentrandom initializations of the models M and M ′ . Definition 3.1.
Given two language models M , M ′ and a tokensequence t . . . t n ∈ T ∗ , we define the differential score of a tokenas the increase in its probability and the relative differential score as the relative increase in its probability. We lift these concepts totoken sequences by defining DS M ′ M ( t . . . t n ) = n (cid:213) i = M ′ ( t < i )( t i ) − M ( t < i )( t i ) , (cid:102) DS M ′ M ( t . . . t n ) = n (cid:213) i = M ′ ( t < i )( t i ) − M ( t < i )( t i ) M ( t < i )( t i ) . The differential score of a token sequence is best interpretedrelative to that of other token sequences. This motivates rankingsequences according to their differential score.
Definition 3.2.
We define the differential rank DR ( s ) of s ∈ T ∗ asthe number of token sequences of length | s | with differential scorehigher than s . DR ( s ) = (cid:12)(cid:12)(cid:12)(cid:110) s ′ ∈ T | s | (cid:12)(cid:12)(cid:12) DS M ′ M ( s ′ ) > DS M ′ M ( s ) (cid:111)(cid:12)(cid:12)(cid:12) The lower the differential rank of a sequence, the more the se-quence is exposed by a model update, with the most exposed se-quence having rank 0.
Computing the differential rank DR ( s ) of a sequence s of length | s | = n requires searching a space of size | T | n . To avoid this expo-nential blow-up, we rely on Algorithm 1, which approximates thedifferential rank based on beam search .At iteration i , the algorithm maintains a set S of k (called the beam width ) candidate sequences of length i , together with theirdifferential scores. The algorithm iterates over all possible k · | T | k highest-scoring sequences of length i + S . Algorithm 1
Beam search for Differential Rank
In: M , M ′ =models, T =tokens, k =beam width, n =length Out: S =set of ( n -gram, DS ) pairs S ← {( ϵ , )} ▷ Initialize with empty sequence ϵ for i = . . . n do S ′ ← {( s ◦ t , r + DS M ′ M ( s )( t )) | ( s , r ) ∈ S , t ∈ T } S ← take ( k , S ′ ) ▷ Take top k items from S ′ return S = {( s , r ) , . . . , ( s k , r k )} such that r ≥ · · · ≥ r k Algorithm 1 returns a set of token sequences s , together withtheir differential score r . With this we can approximate the dif-ferential rank DR ( s ) by the number of token sequences in S withdifferential score higher than s . For large enough beam widths thisyields the true rank of s . For smaller beam widths, the result isa lower bound on DR ( s ) , as the search may miss sequences withhigher differential score than those in S .Proposition 3.3. If Algorithm 1 returns a set S = {( s , r ) , . . . , ( s k , r k )} with r ≥ · · · ≥ r k , then DS M ′ M ( s i ) = r i and DR ( s i ) ≥ i − .Optimizing for Speed. The beam width k governs the trade-offbetween computational cost and the precision of the approxima-tion. In experiments, we found that shrinking the beam width asthe search progresses speeds up the search considerably withoutcompromising on the quality of results. Typically, we use a beamwidth | T | , which we halve at each iteration. That is, we consider | T | / | T | / Optimizing for Diversity.
Since the sequences returned by vanillabeam search typically share a common prefix, we rely on groupbeam search as a technique for increasing diversity: we split theinitial | T | one-token sequences into multiple groups according totheir differential score, and run parallel beam searches extendingeach of the groups independently. See [22] for more sophisticatedtechniques for increasing diversity. We use our new metrics to perform leakage analyses for variousdatasets across various model update scenarios. We first describeour benchmark datasets with their model configurations and themodel training scenarios we consider. Then, we discuss researchquestions relevant to the analysis scenarios described in Section 2.3.We then show experiments investigating these questions in detail,first using synthetically generated canaries as a proxy for updateswhere we can precisely control the differences between the datasetsused to create model snapshots, and then in a realistic setting, inwhich we use a set of standard real-world datasets.
We consider three datasets of different size and complexity, matchedwith standard model architectures whose capacity we adapted tothe data size and implemented in TensorFlow. We will release thesource code as well as analysis tools used in our experimentalevaluation.Concretely, we use the Penn Treebank [12] (PTB) dataset asa representative of low-data scenarios, as the standard trainingdataset has only around 900 000 tokens and a vocabulary size of10 000. As the corresponding model, we use a two-layer recurrentneural network using LSTM cells with 200-dimensional embeddingsand hidden states and no additional regularization (this correspondsto the small configuration of Zaremba et al. [24]).Second, we use a dataset of Reddit comments with 20 milliontokens overall, of which we split off 5% as validation set. We use avocabulary size of 10 000. We rely on two different model configu-rations for this dataset, which allows us to understand the impactof model size on information leakage using DR as a metric.(1) a one-layer RNN using an LSTM cell with 512-dimensionalhidden states and 160-dimensional embeddings. We usedropout on inputs and outputs with a keep rate of 0 . . Updated models can be created using different techniques, withdifferent applicability to the usage and analysis scenarios discussedin Section 2.3.
Retraining.
Given an updated dataset D ′ , a fresh model snapshot M ′ can be obtained by simply training a fresh model from scratch,which we refer to as retraining . This also involves a fresh (random)initialization of the model parameters, and in practice, retraining amodel repeatedly on the same dataset will yield slightly differentmodels.The User Deletion scenario legally requires retraining for modelupdates, as all data stemming from users requesting deletion needsto be pruned from the dataset D ′ . This is because there is no known4ffective way of deleting user-specific information from previouslytrained model parameters apart from a handful of special cases [9]. Continued Training.
In this approach, a fresh model snapshot M ′ is obtained by taking an existing model M and continuing trainingit on additional data. This is the core of the data specialization scenario and sometimes also used in data update scenarios to avoidthe computational cost of training on a large dataset from scratch. With the training techniques outlined for different model updatescenarios, we consider four research questions in our experiments.
RQ0: Can an attacker learn private information from modelupdates?
Here we address the basic question of whether privatedata used to update a model can be leaked in our adversarial settingand how. We first answer this question by using differential scoreto find information about private sequences used in a model update.We then investigate the influence of other parameters of the systemon the differential score in more detail.
RQ1: How does masking private data with additional non-sensitive data ( D extra ) affect leakage? This is particularly impor-tant for the user deletion scenario, for which we need to answer ifit is possible to safely remove data of a single user, or if such datasetchanges need to be hidden among other substantial changes. Con-cretely, we analyze whether including a large enough additionaldataset D extra in an update can prevent leakage of informationabout the rest of the data used. D extra can be any dataset which iseither available publicly or is non-sensitive from the point of viewof the model provider or users. RQ2: How do retraining and continued training differ withrespect to information leakage?
In the continued training ap-proach, the parameters of a previously trained model M D are up-dated based only on new data D ′ \ D . In contrast, in the retrainingstrategy parameters are updated using all data in D ′ . The mostrecent updates to model parameters depend only on new data inthe continuing training case, whereas they depend on the wholetraining data D ′ when retraining a model from scratch. We analyzethe effect of this seemingly more pronounced dependence. RQ3: How is leakage affected by an adversary’s backgroundknowledge?
Prior attacks on language models assume that theadversary has background knowledge about the context in which asecret appears. We analyze the effect of such knowledge for infer-ring private data from model updates.
We create a number of canary phrases—grammatically correctphrases that do not appear in the original dataset—that serve as aproxy for private data that the adversary is trying to extract. Weconsider different word frequency characteristics to control theinfluence on the used vocabulary.Specifically, we fix the length of the canary phrase to 5, choosea valid phrase structure (e.g., Subject, Verb, Adverb, CompoundObject), and instantiate each placeholder with a token in a datasetvocabulary. We create canaries in which frequencies of tokens are all low (all tokens are from the least frequent quintile of words), mixed (one token from each quintile), increasing from low to high ,and decreasing from high to low .For example, the mixed phrase across all the datasets is “NASAused deadly carbon devices”, and the all low phrase for PTB is“nurses nervously trusted incompetent graduates”. As the vocab-ularies differ between the different datasets, the canaries are ingeneral dataset-dependent.We vary the amount of private data , C , by inserting a canaryphrase s a number of times proportional to the number of tokensin the training corpus:(1) For PTB, we consider k ∈ { , , } canary insertions(corresponding to 1 canary token in 18K training tokens, 1 in 3.6K,and 1 in 1.8K).(2) For the Reddit dataset, we use k ∈ { , , } (correspondingto 1 in 1M, 1 in 100K, 1 in 10K).(3) For the Wikitext-103 data, we use k ∈ { , } (correspond-ing to 1 in 1M, 1 in 200K).We train the model M on D and the model M ′ on D with k copies of the canary s . We then compute the differential rank of thecanaries for different values of k . RQ0: Can an attacker learn private information from model up-dates?
We use our differential score based beam search (Algorithm 1)to extract canary phrases that correspond to the change in train-ing data between M and M ′ . The results of varying the numberof inserted canaries are summarized in Table 1. We highlight thefollowing findings: • For most combinations of k and types of canaries, wesuccessfully recover the canary. This is indicated by the cellswith white background, where the canary phrase has the maximumdifferential score among all token sequences found by our beamsearch, i.e., it ranks first. • The signal for extraction is strong even when the insertedcanaries account for only 0.0001% of the tokens in the dataset.This is visible in the first row of Table 1 where differential scoresapproach 4, which is close to the upper bound of 5 (for 5-tokencanaries). • Private phrases that occur more often in the training data aremore exposed via a model update, as expected. This is visible inthe monotonic growth of the differential score of canaries with thenumber of insertions. • Phrases composed of rare words are more easily extracted,as seen in the high differential score of canaries constructed fromlow-frequency tokens. In contrast, canaries with descending tokenfrequencies tolerate much higher number of insertions before beingexposed. This is expected, as our beam search is biased towardsfinding high-scoring prefixes.
RQ1: Effect of amount of public vs. private data.
We vary theamount of public data by partitioning the dataset D into D orig ⊎ D extra such that the latter is p = , , D orig .The results of varying the amount of public data and canaryinsertions are displayed in Table 2, where the 0% column is identicalto the result from Table 1.The retraining column in Table 2 shows that DS M ′ M does notchange significantly across the different dataset splits. That is, ca-naries can be extracted from the trained model even when theyare contained in a substantially larger dataset extension. Hence,5 able 1: Differential score ( DS ) for PTB, Reddit, and Wikitext-103 datasets for different canaries and insertion frequencies.Cells with a white background correspond to a differential rank DR of 0 (as approximated by beam search), gray cells corre-spond to DR > . Dataset Penn Treebank Reddit Wikitext-103Model Type (Perplexity) RNN (120.90) RNN (79.63) Transformer (69.29) RNN (48.59)Canary Token Freq. 1:18K 1:3.6K 1:1.8K 1:1M 1:100K 1:10K 1:1M 1:100K 1:10K 1:1M 1:200KAll Low 3.40 3.94 3.97 2.83 3.91 3.96 3.22 3.97 3.99 1.39 3.81Low to High 3.52 3.85 3.97 0.42 3.66 3.98 0.25 3.66 3.97 0.07 3.21Mixed 3.02 3.61 3.90 0.23 3.04 3.92 0.39 3.25 3.96 0.25 3.02High to Low 1.96 2.83 3.46 0.74 1.59 2.89 0.18 1.87 3.10 0.08 1.22
Figure 1: Differential score of tokens in canaries given a pre-fix for the Reddit dataset. Solid (dashed) lines represent ex-periments with k insertions of canaries with all-low (resp.high-to-low) token frequencies, indicated by LL- k (resp. HL- k ). . . . . D i ff e r e n t i a l s c o r e the amount of public data used in the update does not significantlyaffect the leakage of the private data. RQ2: Effect of training type.
We train a model M on a dataset D orig to convergence, and then continue training M using D extra and the canaries C , obtaining M ′ . We compare the differential rankof the canaries on the models obtained using continued trainingwith that on the models retrained from scratch.The results of this experiment are shown in the middle columnof Table 2. We observe that in all cases the differential score ishigher for continued training than for retraining. As expected, thedifferential score of the canary phrase decreases as additional extradata is used for fine-tuning. RQ3: Effect of background knowledge.
We evaluate the differentialscore of suffixes of a canary phrase s assuming knowledge of a prefix.For i = , . . . , n we take the prefix t . . . t i − of the canary phraseand compute the differential score r of the token t i conditional onhaving read the prefix, i.e., M ′ ( t < i )( t i )− M ( t < i )( t i ) . The relationshipbetween i and r indicates how much knowledge about s is requiredto expose the remainder of the canary phrase.Figure 1 depicts the result of this analysis for canaries with high-to-low and all-low token frequencies on the Reddit dataset. Ourresults show that, while the differential score of the first tokenwithout context is close to 0, the score of subsequent tokens quickly grows for all-low canaries, even with a low number of canary in-sertions. In contrast, more context is required before observing achange in the score of high-to-low canaries, as the model is less in-fluenced by the small number of additional occurrences of frequenttokens.This suggests that, even in cases where we fail to extract thecanary without additional knowledge, an adversary can use thedifferential rank to complete a partially known phrase, or confirmthat a phrase was used to update the model. We simulate real-world scenarios by sourcing training data fromreal-world conversations on specific topics, and using it as a proxyfor private data included in the training data used in model updates.The adversary’s goal is to extract specific phrases occuring in theproxy dataset, or phrases that do not occur literally but nonethelessreveal the topic of conversations.We mimic the data distribution shift by choosing conversationson topics that are not dominant in the original dataset, so thatwe can better judge whether phrases extracted using differentialscore are on-topic and thus represent meaningful leakage of privateinformation. Specifically, we compare models trained only on datafrom the Reddit dataset against models trained on data from theReddit dataset plus messages from one of two newsgroups fromthe 20 Newsgroups dataset [11]:a) rec.sport.hockey , containing around 184K tokens, ≈
1% of theoriginal training data; andb) talk.politics.mideast , containing around 430K tokens, ≈ M on the entire Reddit dataset and retrain M ′ from scratch on the same dataset plus all messages from oneof the two newsgroups. For both model architectures (RNNs andTransformer) described in Section 4.1 and each newsgroup, wecompute the sequences with highest relative differential score. Sincethe sequences returned by vanilla beam search typically share acommon prefix, we run a group beam search (see Section 3.2) toget a more diverse sample. RQ0: Can an attacker learn private information from model up-dates?
Tables 3 and 4 display the highest-scoring sequences of6 able 2: Differential Score ( DS M ′ M ) of the mixed frequency canary phrase for the Reddit (RNN) model using different updatetechniques. Model M is trained on D orig . For the Retraining column, M ′ is trained on D orig ∪ D extra ∪ C starting from randominitial parameters. For the Cont’d Training 1 column, M ′ is trained on D extra ∪ C starting from M . For the Cont’d Training 2 column, we first train a model ˜ M on D extra ∪ C starting from M , and then train model M ′ from ˜ M using additional public data D ′ extra . A white cell background means that the differential rank DR (as approximated by our beam search) of the phrase is 0,gray cell background means that DR is >1000. Retraining Continued Training 1 Continued Training 2 | D extra |/| D orig |
0% 20% 50% 100% 20% 50% 100% 100%1:1M 0.23 0.224 0.223 0.229 0.52 0.34 0.46 0.011:100K 3.04 3.032 3.031 3.038 3.56 3.25 3.27 0.26 length 4 in each group of a (cid:102) DS -based group beam search with 5groups. The exposed sentences are on-topic w.r.t. the newsgroup in-cluded, e.g., the hockey theme dominates the top ranked sequencesin Table 3. This suggests that, information about the private dataused for the update is leaked. It is noteworthy that these resultsare obtained assuming a weak adversarial model that does notrequire either background knowledge about the dataset distribu-tion or about the information willing to be extracted. In contrast,concurrent work on updates of image classification models [16]requires knowledge about the data distribution to train shadowmodels, while prior work on single language models [4] requires aknown prefix for extraction of a secret.Given some background knowledge in the form of a long enoughprefix of a phrase occuring in the private data, we show that thecomplete phrase can be extracted by a beam search directed bydifferential score (see Table 6).
RQ1: Effect of amount of public vs. private data.
To answer this,we consider partitions of the Reddit dataset D into D orig and D extra of different relative sizes. For each partition, we train a model M on D orig and a model M ′ on D orig ∪ D extra ∪ N , where N are all mes-sages from talk.politics.mideast . We highlight the followingobservations: • For all phrases, the proportion of public data ranging from 5%to 100% used in the update does not significantly affect their relativedifferential scores, which confirms our findings for canaries. • The top two phrases resemble canaries in that they occurmultiple times in the datasets, which explains their high scores. Anexception is
Little resistance was offered , which appears12 times in the dataset but still has low score. Other phrases donot occur literally in newsgroup messages, but digest recurrentdiscussions or contain n -grams that do occur. RQ2: Effect of training type.
We train a model M on D orig toconvergence, and then continue training M using D extra ∪ N toproduce a model M ′ . To understand the effect of the training typeon information leakage, we sample a set of representative phrasesand compare their relative differential scores w.r.t. M and M ′ againsttheir scores w.r.t. M and a model trained on D ∪ N from scratch.The results are shown in Table 5, together with the perplexitydecrease after the model update. Retrained models correspond to the data update and data deletion scenarios and their perplexity drop is greater the more data is used during retraining. Continued trainingcorresponds to the data specialization scenario. The perplexity dropin the updated model is greater the larger is the proportion ofnewsgroup data used in the update, for which the initial model isnot specialized.The last two rows in Table 5 correspond to phrases found bygroup beam search in the continued training scenario, but thathave too low a score to be found when M ′ is retrained from scratchinstead. The converse, i.e., phrases that have low score when contin-uing training and high score when retraining, seems to occur rarelyand less consistently (e.g., Saudi troops surrounded village ).For phrases that occur literally in the dataset, the results are inline with those for canaries (see Table 2), with scores decreasing asmore data is used during the fine-tuning stage. For other phrases,the results are not as clear-cut. While fine-tuning a model exclu-sively on private data yields scores that are significantly higherthan when retraining a model from scratch, this effect vanishesas more additional data is used; in some cases continued trainingyields scores lower than when retraining a model on the same data.
RQ3: Effect of background knowledge.
An adversary wishing toextract information about a dataset used to update a language model M to M ′ may direct a search using as prompt a known prefix of aphrase in the dataset. We study how long this prefix needs to be torecover the rest of phrase.We consider a RNN model M trained on the full Reddit datasetand a model M ′ trained on the union of the full Reddit dataset andall messages of the talk.politics.mideast newsgroup dataset.We sample 4 phrases in newsgroup messages beginning with thename of a Middle Eastern country and containing only tokens inthe model vocabulary. We believe it is natural to venture a guess at ashort prefix of such phrases from the description of the newsgroupor the geopolitical context. For each phrase s and i = , . . . , | s | − (cid:102) DS -based beam search for phrases of the same length withconstant beam width 10 000 and 100 groups starting from s . . . s i .We report the rank of s among the search results (or ∞ if it is absent)in Table 6.We observe a correlation between the score of a phrase andthe minimum prefix sufficient to recover it. However, a dip in thescore of two consecutive tokens is much more consequential: acommon word like the , which has a similar distribution in theoriginal and private datasets, contributes little to the score of a7 able 3: Top ranked phrases in group beam search for a model updated with rec.sport.hockey . For the layperson: Los AngelesKings, Minnesota North Stars, and Toronto Maple Leaf are National Hockey League teams; Norm Green was the owner of theNorth Stars; an ice hockey game consists of three periods with overtime to break ties. Capitalization added for emphasis. RNN TransformerPhrase (cid:102) DS Phrase (cid:102) DS Angeles Kings prize pools
Minnesota North Stars playoff
National Hockey League champions
Arsenal Maple Leaf fans
Norm ’s advocate is
Overtime no scoring chance
Intention you lecture me
Period 2 power play
Covering yourself basically means
Penalty shot playoff results
Table 4: Top ranked phrases in a group beam search for a model updated with talk.politics.mideast . Center for PolicyResearch is a prolific newsgroup poster; many of the posts around the time the 20 Newsgroups dataset [11] was collecteddiscuss tensions between Turkey and Armenia.
RNN TransformerPhrase (cid:102) DS Phrase (cid:102) DS Turkey searched first aid
Center for Policy Research
Doll flies lay scattered
Escaped of course ...
Arab governments invaded Turkey
Holocaust %UNK% museum museum
Lawsuit offers crime rates
Troops surrounded village after
Sanity boosters health care
Turkey searched neither Arab
Table 5: Relative differential score of phrases found by beam search when retraining from scratch and continuing training froma previous model. The results are for RNN models trained on partitions of the Reddit dataset with N = talk.politics.mideast .Cells for which continued training yields a higher score than retraining appear in bold font. Capitalization added for emphasis. Retraining Continued TrainingPhrase (frequency in N ) | D extra |/| D orig |
0% 5% 10% 20% 100% 0% 5% 10% 20% 100%Perplexity decrease 0.79 1.17 2.45 3.82 11.82 73.97 18.45 10.29 6.08 8.28
Center for Policy Research (93) 99.77 101.38 97.11 98.65 91.53
Troops surrounded village after (12) 44.50 44.50 44.50 44.41 44.54
Partition of northern Israel (0) 27.61 16.81 38.48 26.10 38.76
West Bank peace talks (0) 25.68 25.64 25.69 25.71 25.75
Spiritual and political leaders (0) 25.23 25.98 17.04 24.21 23.47
Saudi troops surrounded village (0) 24.31 24.31 24.31 24.31 24.30 5.05
Arab governments invaded Turkey (0) 22.59 22.62 22.80 22.78 22.80
Little resistance was offered (12) 22.24 22.09 25.12 22.34 25.59
Buffer zone aimed at protecting (0) 4.00 4.47 5.30 5.25 5.69
Capital letters racial discrimination (0) 3.76 3.32 3.40 3.60 3.84
Prior work has primarily studied information leakage when anattacker has only access to a single model snapshot. Here, we first analyze how much our analysis gains from having access to twomodel snapshots, and then consider the influence of common causesof leakage in the single-model case. The central ones are overfit-ting [23] to the training data, and unintended memorization [4] ofdata items that is independent of the distribution to be learned.
RQ4: How important is access to a second model snapshot?
We want to analyze how much leakage of sensitive information isincreased when having access to two model snapshots M D , M D ′ in contrast to having only access to a single model M D ′ . This isa challenging analysis in a realistic setting, due to the size of the8 able 6: Results of beam searches for different prefix lengths. A rank of 0 means that the search recovers the complete phrase.Due to the heuristic nature of the search the rank reported may be lower than the true rank of s . Conversely, a beam searchmay not encounter s at all despite having lower rank than most phrases encountered. For instance, this occurs for Turkeysearched an American plane , where all but 7 search results with no prompt have higher rank (lower score).
Prefix length i Phrase s Frequency (cid:102) DS ( s ) Turkey searched an American plane ∞ Israel allows freedom of religion ∞ ∞
788 55 0 –
Iraq with an elected government ∞ ∞ ∞
Israel sealed off the occupied lands ∞ ∞ ∞ ∞ M D and M D ′ is (a) more likely tobe part of D ′ than of D , (b) not very common in D ′ , and (c) that (a)and (b) are more true for the results of the differential analysis thanfor the analysis of M D ′ alone.We quantify how likely a given sentence is to be a part of a datasetusing a simpler, well-understood model of natural language data,namely an n -gram model. n -gram models define the probability ofa token t n + appearing after a sequence of tokens t . . . t n as thenumber of times t . . . t n t n + appeared in the dataset divided by thenumber of times t . . . t n appeared. Consequently, such models areincapable of reasoning about synonyms or unusual grammaticalstructure, necessitating more complex architectures such as RNNsor Transformers.In our experiments, we use the perplexity of 3-gram modelstrained on D (resp. N ) to capture how likely a given extractedsentence is part of the dataset D (resp. N ). We compare these per-plexity values for sequences extracted using group beam searchfrom the models M D (resp. M D ′ ) and for sequences extracted usingour differential rank-based search, following the setup of Section4.5. Concretely, we used the entire Reddit comment data as dataset D , and the messages N from talk.politics.mideast as data up-date. We are concerned with information an attacker can gain aboutthe contents of N .Figure 2a shows the results of our analysis when we train M D ′ on D ′ = D ∪ N from scratch. Points above the main diagonal arecloser in distribution to the (private) data update N than to thebase data D . This shows that our attack extracts sequences usingdifferential score (represented by red crosses) that are more likely tobe part of N than of D , and that these sequences differ substantiallyfrom the sequences obtained by a single-model analysis. In fact,the sequences obtained by single-model analysis for M D and M D ′ show little significant difference. Note that the perplexity values perp ( D ) are very high for some of the extracted sentences, asthey use combinations of tokens that never appear in the originaltraining dataset D .Similarly, Figure 2b shows the results of this analysis on thescenario in which we obtain M D ′ by specializing the model M D bycontinuing training on the dataset N . While our differential analysisagain captures sequences more likely to be part of the updated data N than of the original data D , the single-model analysis now alsoshows some of this effect. RQ5: Is leakage due to overfitting or intended memoriza-tion?
All models are trained using an early-stopping criterion thathalts training when the model does not improve on a separate vali-dation set. This effectively rules out overfitting to the training data.Additionally, model training employs regularization strategies suchas dropout to further encourage the trained models to generalizeto unseen data.We refer to the model’s ability to reproduce verbatim fragmentsof the training data as memorization and call it intended if this isnecessary to serve its purpose of generating natural language (e.g.,a model needs to memorize the token pair “United States”, as it isan extremely common combination) and unintended otherwise.In the experimental results in Table 5, we have included thenumber of times that the phrases with the highest differential scoresappear in the data. While “Center for Policy Research” is a clearcase of intended memorization, as the name appears many times inthe signatures of emails, the other results appear rarely or never,indicating that our analysis extracts fragments that need not bememorized to serve its purpose. This is further supported by theresults in Table 6, where extraction of complete sentences suchas “Israel allows freedom of religion” occurring as few as threetimes in the dataset is possible. Overall, this indicates that intendedmemorization cannot explain our results.
In this section, we discuss and analyze three strategies to miti-gate information leakage in model updates: (1) Differential Privacy,(2) continued training with public data, and (3) truncating the out-put of the updated model.
Differential privacy (DP) [8] provides strong guarantees on theamount of information leaked by a released output. Given a compu-tation over records it guarantees a bound on the effect that any inputrecord can have on the output. Formally, F is a ( ϵ , δ ) -differentially-private computation if for any datasets D and D ′ that differ in onerecord and for any subset O of F ’s range we havePr ( F ( D ) ∈ O ) ≤ exp ( ϵ ) · Pr ( F ( D ′ ) ∈ O ) + δ (1)Differential privacy is a natural candidate for defending againstmembership-like inferences about data. The exact application ofdifferential privacy for protecting the information in the modelupdate depends on what one wishes to protect w.r.t. the new data:9 × × × × p e r p − g r a m ( D ) ( s ) perp − gram ( N ) ( s ) Extracted from M Extracted from M ′ Extracted from ( M , M ′ ) (a) × × × × p e r p − g r a m ( D ) ( s ) perp − gram ( N ) ( s ) Extracted from M Extracted from M ′ Extracted from ( M , M ′ ) (b) Figure 2: Sensitivity of extracted content, with re-training from scratch (a) vs. continued training (b). + depict sentences ex-tracted from M , × depict sentences extracted from M ′ , and ∗ depict sentences extracted from ( M , M ′ ) using Differential Score.Vertical axis depicts the perplexity wrt data D , horizontal axis depicts perplexity wrt data update N . Points above the diagonalare closer in distribution to the (private) data update N than to the base data D . individual sentences in the new data or all information present inthe update. For the former, sequence-level privacy can suffice whilefor the latter group DP can serve as a mitigation technique wherethe size of the group is proportional to the number of sequencesin the update. Recall that an ϵ -DP algorithm F is kϵ -differentiallyprivate for groups of size k [8].At a high level, differential privacy can be achieved in gradient-based optimization computations [1, 3, 20] by clipping the gradientof every record in a batch according to some bound L , then addingnoise proportional to L to the sum of the clipped gradients, aver-aging over the batch size and using this noisy average gradientupdate during backpropagation.We evaluate the extent to which DP mitigates attacks consideredin this paper by training models on the Penn Treebank (PTB) datasetwith canaries with sequence-level differential privacy. We train DPmodels using the TensorFlow Privacy library [2] for two sets of ( ϵ , δ ) parameters, ( , × − ) and ( , × − ) , for two datasets:PTB and PTB with 50 insertions of the all-low-frequency canary.We rely on [2] to train models with differentially private stochasticgradient descent using a Gaussian noise mechanism and to computethe overall privacy loss of the training phase.As expected, the performance of models trained with DP de-grades, in our case from ≈
23% accuracy in predicting the nexttoken on the validation dataset to 11.89% and 13.34% for ϵ valuesof 5 and 111, respectively.While the beam search with the parameters of Section 4.4 nolonger returns the canary phrase for the DP-trained models, we notethat the models have degraded so far that they are essentially onlypredicting the most common words from each class (e.g., “is” whena verb is required) and thus, the result is unsurprising. We note thatthe guarantees of sequence-level DP formally do not apply for the case where canary phrases are inserted as multiple sequences, andthat ϵ values for our models are high. However, the ϵ -analysis isan upper bound and similar observations about the effectiveness oftraining with DP with high ϵ were reported by Carlini et al. [4].We further investigate the effect of DP training on the differentialrank of a canary phrase that was inserted 50 times. Instead of usingour beam search method to approximate the differential rank, wefully explore the space of subsequences of length two, and findthat the DR for the two-token prefix of our canary phrase droppedfrom 0 to 9 458 399 and 849 685 for the models with ϵ = ϵ =
111 respectively. In addition, we compare the differential scoreof the whole phrase and observe that it drops from 3.94 for theoriginal model to 4 . × − and 2 . × − for models with ϵ = ϵ = We also consider a possible mitigation strategy where we performcontinued training in two stages. For this, we split the dataset intothree equal parts D orig , D extra and D ′ extra . We proceed as in thecontinued training setting in RQ2, but add a final step in whichwe train on another dataset after training on the canaries. Thisresembles a setting where an attacker does not have access totwo consecutive snapshots. The results are on the right column ofTable 2, showing that the differential score of the canary phrase10 × × × × × × × p e r p − g r a m ( D ) ( s ) perp − gram ( N ) ( s ) k = k = k = (a) Re-training from scratch × × × × × × × p e r p − g r a m ( D ) ( s ) perp − gram ( N ) ( s ) k = k = k = (b) Continued training Figure 3: Sentences extracted from ( M , M ′ ) using Differential Score when the adversary only receives the top k tokens fromthe updated model M ′ for each query. The axes have the same meaning as in Figures 2a and 2b. drops substantially after the second training stage. Thus, two ormulti-stage continued training, where only the last trained model isreleased, might be a path forward for mitigating leakage of privatedata. Finally, we analyze the effect of truncating the output of the updatedmodel for each query. Specifically, the adversary still has full accessto the original model M but only receives the top k tokens fromthe updated model M ′ . This is a slight weakening of our adversarymodel, but is realizable for some applications. For example, in the Data Specialization scenario, the adversary may have full access tothe public base model, but can only access the specialized model viaan API that truncates the results for each query. In the
Data Update scenario, even if models are deployed to client devices, it may bepossible to enforce this by running the model in a Trusted ExecutionEnvironment (TEE), such as Intel SGX or ARM TrustZone on theclient device.To evaluate the impact of this mitigation, we repeat the experi-ment described in Section 5 and plot only the sentences extractedusing differential score (i.e., the ‘Snapshot attack’) for different val-ues of k . To facilitate comparison, we use the same beam width asin Figures 2a and 2b. As shown in Figure 3, decreasing the valueof k brings the extracted sequences closer to the main diagonal,where they have similar likelihood of being drawn from eitherdataset. Similarly to Figures 2a and 2b, we also observe a differencebetween re-training from scratch and continued training; for thesame value of k , the sentences extracted after continued trainingare more likely to be private than those extracted after the modelis re-trained from scratch. Additionally, if the adversary only hasaccess to the top k outputs of the original model M , this would https://software.intel.com/en-us/sgx https://developer.arm.com/ip-products/security-ip/trustzone further reduce the leakage. In applications where this mitigation isrealizable, returning only the top k outputs can thus reduce leakagewithout decreasing the utility of the provided outputs. In recent years several works have identified that machine learningmodels can leak information about private training data. Member-ship attacks introduced by Shokri et al. [18] show that one canidentify whether a record belongs to the training dataset of a clas-sification model given black-box access to the model and shadowmodels trained on data from a similar distribution. Salem et al.[17] demonstrate that similar attacks are effective under weakeradversary models.Carlini et al. [4] is closest to our work, as it also considers infor-mation leakage of language models. The authors assess the risk of(unintended) memorization of rare sequences in the training data.They show that canaries inserted into training data can be retrievedfrom a character-level language model. The key differences to ourapproach are that 1) we consider a different attack scenario where an adversary has access to two snapshots of a model , and 2) ourcanaries follow the distribution of the data whereas Carlini et al.[4] add a random sequence of numbers in a fixed context into adataset of financial news articles (e.g., “The random number is ...”),where such phrases are rare. We instead are able to extract canaries without any context , even when the canary token frequency in thetraining dataset is as low as one in a million.Song and Shmatikov [19] also study sequence-to-sequence lan-guage models and show how a user can check if their data hasbeen used for training. In their setting, an auditor needs an aux-iliary dataset to train shadow models with the same algorithm asthe target model and queries the target model for predictions ona sample of the user’s data. The auxiliary dataset does not needto be drawn from the same distribution as the original training11ata (unlike [18]) and the auditor only observes a list of severaltop-ranked tokens. In contrast, our approach requires no auxiliarydataset, but assumes access to the probability distributions overall tokens from two different model snapshots. From this, we areable to recover full sequences from the differences in training datarather than binary information about data presence. Like them,we find that sequences with infrequent tokens provide a strongersignal to the adversary/auditor.Salem et al. [16] consider reconstruction of training data thatwas used to update a model. While their goal is similar to ours,their adversarial model and setup differ: 1) similar to Song andShmatikov [19] and Shokri et al. [18], their attacker uses shadowmodels trained on auxiliary data drawn from the same distributionas the target training dataset, while in our setting the attacker hasno prior knowledge of this distribution and does not need auxiliarydata; 2) the updated model is obtained by fine-tuning the targetmodel with additional data rather than re-training it from scratchon the changed dataset; 3) the focus is on classification models andnot on (generative) language models.Information leakage from updates has also been considered forsearchable encryption: an attacker who has control over data in anupdate to an encrypted database can learn information about itscontent and previous encrypted searches on it [5]. Pan-privacy [7],on the other hand, studies the problem of maintaining differentialprivacy when an attacker observes snapshots of the internal stateof a DP algorithm between updates.In terms of defenses, McMahan et al. [13] study how to trainLSTM models with DP guarantees at a user-level. They investigateutility and privacy trade-offs of the trained models depending ona range of parameters (e.g., clipping bound and batch size). Car-lini et al. [4] show that DP protects against leakage of canaries incharacter-level models, while Song and Shmatikov [19] show thatan audit as described above fails when training language modelswith user-level DP using the techniques of [13].Ginart et al. [9] define deletion of a training data point from amodel as a stochastic operation returning the same distribution asre-training from scratch without that point, and develop deletionalgorithms for k -means clustering with low amortized cost. Pub-lishing snapshots of a model before and after a deletion matchesour adversarial model and our results apply. We presented a first systematic study of the privacy implicationsof releasing snapshots of a language model trained on overlap-ping data. Our results show that updates pose a realistic threat,which needs to be considered in the lifecycle of machine learningapplications. We encourage the research community to work to-wards quantifying and reducing unintended information leakagecaused by model updates, and hope to make practitioners aware ofthe privacy implications of deploying and updating high-capacitylanguage models.
REFERENCES [1] Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov,Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In . ACM, 308–318.[2] Galen Andrew, Steve Chien, and Nicolas Papernot. 2019. TensorFlow Privacy.https://github.com/tensorflow/privacy. (2019). [Online; accessed 09-Sep-2019].[3] Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private Empirical RiskMinimization: Efficient Algorithms and Tight Error Bounds. In . IEEE ComputerSociety, 464–473.[4] Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. 2018.The Secret Sharer: Evaluating and Testing Unintended Memorization in NeuralNetworks.
CoRR abs/1802.08232 (2018). http://arxiv.org/abs/1802.08232[5] David Cash, Paul Grubbs, Jason Perry, and Thomas Ristenpart. 2015. Leakage-Abuse Attacks Against Searchable Encryption. In . ACM, 668–679.[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:Pre-training of deep bidirectional transformers for language understanding. In , Vol. 1.Association for Computational Linguistics, 380–385.[7] Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and SergeyYekhanin. 2010. Pan-Private Streaming Algorithms. In
Innovations in ComputerScience, ICS 2010 . Tsinghua University Press, 66–80.[8] Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differen-tial Privacy.
Foundations and Trends in Theoretical Computer Science
9, 3-4 (2014),211–407.[9] Antonio Ginart, Melody Y. Guan, Gregory Valiant, and James Zou. 2019. MakingAI Forget You: Data Deletion in Machine Learning.
CoRR abs/1907.05012 (2019).http://arxiv.org/abs/1907.05012[10] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.
Neural Computation
9, 8 (1997), 1735–1780.[11] Ken Lang. 1995. NewsWeeder: Learning to Filter Netnews. In . Morgan Kaufmann,331–339.[12] Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Build-ing a Large Annotated Corpus of English: The Penn Treebank.
ComputationalLinguistics
19, 2 (1993), 313–330.[13] H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learn-ing Differentially Private Recurrent Language Models. In . OpenReview.net.[14] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2017.Pointer sentinel mixture models. In . OpenReview.net.[15] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and IlyaSutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).[16] Ahmed Salem, Apratim Bhattacharyya, Michael Backes, Mario Fritz, and YangZhang. 2019. Updates-Leak: Data Set Inference and Reconstruction Attacks inOnline Learning.
CoRR abs/1904.01067 (2019). http://arxiv.org/abs/1904.01067[17] Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, andMichael Backes. 2019. ML-Leaks: Model and Data Independent MembershipInference Attacks and Defenses on Machine Learning Models. In . The InternetSociety.[18] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017.Membership Inference Attacks Against Machine Learning Models. In . IEEE Computer Society, 3–18.[19] Congzheng Song and Vitaly Shmatikov. 2018. Auditing Data Provenance in Text-Generation Models.
CoRR abs/1811.00513 (2018). http://arxiv.org/abs/1811.00513[20] S. Song, K. Chaudhuri, and A. D. Sarwate. 2013. Stochastic gradient descentwith differentially private updates. In . 245–248.[21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is allyou need. In
Advances in Neural Information Processing Systems 30, NIPS 2017 .5998–6008.[22] Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun,Stefan Lee, David J. Crandall, and Dhruv Batra. 2018. Diverse Beam Search forImproved Description of Complex Scenes. In
AAAI . AAAI Press, 7371–7379.[23] Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. 2018. PrivacyRisk in Machine Learning: Analyzing the Connection to Overfitting. In
CSF . IEEEComputer Society, 268–282.[24] Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent NeuralNetwork Regularization.
CoRR abs/1409.2329 (2014). http://arxiv.org/abs/1409.2329abs/1409.2329 (2014). http://arxiv.org/abs/1409.2329