Location Trace Privacy Under Conditional Priors
LLocation Trace Privacy Under Conditional Priors
Casey Meehan [email protected]
Kamalika Chaudhuri [email protected]
UC San Diego UC San Diego
Abstract
Providing meaningful privacy to users of loca-tion based services is particularly challengingwhen multiple locations are revealed in a shortperiod of time. This is primarily due to thetremendous degree of dependence that canbe anticipated between points. We proposea Rényi divergence based privacy frameworkfor bounding expected privacy loss for con-ditionally dependent data. Additionally, wedemonstrate an algorithm for achieving thisprivacy under Gaussian process conditionalpriors. This framework both exemplifies whyconditionally dependent data is so challengingto protect and offers a strategy for preservingprivacy to within a fixed radius for sensitivelocations in a user’s trace.
Location data is acutely sensitive information, detail-ing where we live, work, eat, shop, worship, and oftenwhen, too. Yet increasingly, location data is being up-loaded for smartphone services such as ride hailing andweather forecasting and then being brokered in a thriv-ing user location aftermarket to advertisers and eveninvestors (Valentino-DeVryes, 2018). Users share loca-tion ‘traces’ when they release a sequence of locations,often across a short period of time. These traces arethen used by central servers to monitor traffic trends,track individual fitness, target marketing, and even tostudy the effectiveness of social-distancing ordinances(Fowler, 2020). Here, we aim to provide a local privacyguarantee, wherein traces are sanitized at the user levelbefore being transmitted to a centralized service. Notethat this requires different guarantees and mechanisms
Proceedings of the 24 th International Conference on Artifi-cial Intelligence and Statistics (AISTATS) 2021, San Diego,California, USA. PMLR: Volume 130. Copyright 2021 bythe author(s). than in aggregate applications making queries on largelocation trace databases.Specifically, we guarantee a radius r of privacy at anysensitive time point or combination of time pointswithin a given trace. This is challenging due to thefact that the locations within traces are highly inter-dependent. Informally, traces tend to follow relativelysmooth trajectories in time. If not sanitized carefully,that knowledge alone may be exploited to infer actuallocations from the released version of the trace. Thiswork centers on designing meaningful privacy defini-tions and corresponding mechanisms that takes thisdependence into account.Broadly speaking, the vast majority of prior work onrigorous data privacy can be divided into two classesthat differ by the kind of guarantee offered: differentialand inferential privacy. Differential privacy (DP) guar-antees that the participation of a single person in adataset does not change the probability of any outcomeby much. In contrast, inferential privacy guaranteesthat an adversary who has a certain degree of priorknowledge cannot make certain sensitive inferences.DP for releasing aggregate statistics of a spatio-temporal dataset has been well studied (Fan et al.,2013; Cao et al., 2017; Yang et al., 2015; dep). There,the idea is to add enough noise to released statisticssuch that the effect of any user’s participation is ob-scured, even if their locations are highly correlated toeach other or to those of other users. Here, such aguarantee does not apply since we aim to release asanitized version of a single user’s trace.In this local case we cannot rule out the possibilitythat the data curator knows who each individual isand who participated. Instead, we want to guaranteethat event level information about each trace remainsprivate. In this work, at any sensitive time t we maskwhether the user visited location A or location B forany A,B less than r apart. Without ad hoc modifica-tions, standard DP tools are insufficient for achievingthis for the primary reasons that 1) the domain oflocation is virtually unbounded and 2) locations are a r X i v : . [ c s . A I] F e b ocation Trace Privacy Under Conditional Priors highly dependent across a short period of time. To seethis, consider the following instinctual approaches toachieving location trace privacy. Approach A: apply Local Differential Privacy(LDP) to each trace. Imagine a dataset of traces, eachfrom a separate individual. Applying LDP impliesthat every trace has nearly the same probability ofreleasing the same sanitized version. This would berobust to arbitrary side information about dependencebetween locations in any one trace. Unfortunately, theamount of additive noise needed to achieve this woulddestroy nearly all utility: sanitized traces from Califor-nia would have almost the same probability of showingup in Connecticut as do those from New York. Evenif we constrained the domain to just Manhattan, thisdefinition would not permit enough utility to performe.g. traffic monitoring.
Approach B: apply LDP to each location within atrace. To preserve some utility, imagine a single traceas a dataset of n locations, each of which enjoys ε -LDP guarantees. This alone is not robust to arbitrarydependence between locations. By the logic of groupLDP, it does satisfy kε -LDP regardless of the depen-dence between any k locations. This approach has twosetbacks. First, how to set k is unclear. Technically,all points in the trace are correlated, so to ward offworst-case correlations one might set it to the lengthof the trace, which is identical to Approach A. Second,even if location is bounded to a single city or county,satisfying this definition would still destroy nearly allutility. We cannot use sanitized traces for traffic moni-toring if locations from either side of town have aboutsame probability of being sanitized to the same value. Approach C: apply LDP guarantees to each lo-cation within a trace, but only within any regionless than width r . This definition is known as Geo-Indistinguishability (GI) (Andrés et al., 2012). GIprovides a substitute for restricting the domain of lo-cation allowing us to salvage some utility. Here, onlylocations within r of each other are required to have ε -LDP guarantees. In DP parlance, we might say that‘neighboring traces’ have one location altered by ≤ r and are identical everywhere else. This gives us theguarantee we want for a trace with one location, butnot with more than one location. To see why, comparewith Approach B. Analogously, ( ε, r ) -GI along a traceprovides ( kε, r ) -GI to any subset of k locations. LikeApproach B, setting k is unclear. Yet unlike ApproachB, GI is not resistant to arbitrary dependence betweenany k locations. Any dependence where a change in oneor more location(s) by r implies a change in some otherlocation(s) by ≥ r breaks the GI guarantee. Even with the simplest models of dependence (e.g. if we know thetrue trace ought to move in a straight line) this is aproblem.To reiterate, applying LDP to traces or to locationswithin traces (Approaches A & B) does not providea principled method for meaningful privacy with rea-sonable utility. GI adapts LDP by giving guaranteesonly within a radius r . But in relaxing LDP, GI com-promises the standard DP tools for handling obviousdependences between data-points like group DP. In oureyes, this warrants an inferentially private approach.Here, we continue to provide privacy within a radius r , thus allowing for utility. Yet instead of providingresistance to arbitrary dependence across any k loca-tions, we aim to provide resistance to natural modelsof dependence between all locations. One may viewsuch models as an adversary’s prior beliefs about whattraces are likely, like the straight-line prior mentionedearlier.In contrast with differential privacy, providing inferen-tial privacy guarantees is more complex, and has beenless studied. It is however appropriate for applicationssuch as ours, where information must be released basedon a single person’s data, the features of which are pri-vate and dependent. Kifer & Machanavajjhala (2014)provide a formal inferential privacy framework calledPufferfish, and design mechanisms for specific Puffer-fish instances. As these instances do not apply to oursetting, we adapt the Pufferfish framework to locationprivacy and more broadly to releasing any sequence ofreal-valued private information. Contributions:
In this work, we propose an inferen-tially private approach to guaranteeing a radius r ofprivacy for sensitive points in location traces in threeparts:• First, we propose an adaptable privacy frameworktailored to sequences of highly dependent data-points that adapts Pufferfish privacy (Kifer &Machanavajjhala, 2014) to use Rényi DifferentialPrivacy (RDP) (Mironov, 2017). Given a model ofdependence between points, this framework moreappropriately estimates the risk of inference withinradius r on points of interest than do vanilla LDPapproaches.• We then demonstrate how to implement our frame-work for the highly flexible and expressive settingof Gaussian process (GP) priors. These nonpara-metric models capture the spatiotemporal aspectof location data (Liang & Haas, 1999; Liu et al.,1998; Chen et al., 2015). GPs have a natural syn-ergy with Rényi privacy enabling an interpretableupper bound on privacy loss for additive Gaussian unning heading author breaks the line X Z X Z X Z X Z (a) X I S Z I S X I U Z I U (b)Figure 1: (a) An example graphical model of a four point trace X . (b) The more general grouped version of the model in(a), with the secret set X I S = { X , X } and the remaining set X I U = { X , X } . privacy mechanisms (that add Gaussian noise toeach point). Using this, we design a semidefiniteprogram (SDP) that optimizes the correlation ofsuch mechanisms to minimize privacy loss withoutdestroying utility, efficiently thwarting the infer-ence of sensitive locations.• Finally, we provide experiments on both locationtrace and home temperature data to demonstratethe advantage of these techniques over ApproachC mechanisms like GI. We find that our mecha-nisms successfully obscure sensitive locations whilerespecting utility constraints, even when the priormodel is misspecified.Ultimately, by resisting only reasonable kinds of depen-dence in the data we are able to offer both meaningfulprivacy and utility. We show that our framework is ro-bust to misspecification of this reasonable dependenceand offers a privacy loss that is both tractable andinterpretable. A user transmits a sequence of N n real-valued random variables X = { X , X , . . . , X n } .A trace of 10 2d locations has n = 2 ×
10 = 20 randomvariables X i . Instead of releasing the raw trace X , theuser releases a private version Z = { Z , Z , . . . , Z n } ,by way of an additive noise mechanism Z = X + G ,where G = { G , G , . . . , G n } is random noise producedby a privacy mechanism.An adversary, receiving the obscured trace Z , then rea-sons about the true locations at some sensitive time(s).To reference the sensitive times, we use index set I S . Ifthe sensitive indices are I S = { , } , the correspondinglocation values are X I S = { X , X } (e.g. referring tothe two coordinates of one location). When inferringthe true value of X I S , the adversary makes use of theremaining points in the trace at indices I U = [ n ] \ I S , de-noted X I U , with obscured values Z I U . This separation of points into X I S and X I U is represented in Figure 1 .We use location as a guiding example, but such inter-dependent traces X could take the form of home tem-perature time series data or spatial data like 3D facialmaps used for identification. Going forward, we willcontinue to denote X = { X , X , . . . , X n } with theunderstanding that any subsequence of d points e.g. X I S = { X , X , . . . } could represent a d -dimensionalsensitive value, or N d points could represent
N d -dimensional sensitive values.For the real-valued distributions considered here, P × ( • ) refers to a density of distribution × on r.v. • and P × ( •|∗ ) is its regular conditional density given ∗ . GI limits what can be inferred about the sensitive X I S from its corresponding Z I S , but not from the re-maining locations Z I U . To do so we need a privacydefinition that specifies what events of random vari-able X I S we wish to obscure, which realistic priors ofinter-dependence to protect against, and a privacy loss. We borrow heavily from the Pufferfish framework (Kifer& Machanavajjhala, 2014), and specialize it for thesetting of location traces. We define our own set of secrets — the collection of events we wish to obscure —and discriminative pairs , the pairs of secret events wedo not want an adversary to tell between.
Basic Secrets & Pairs
After releasing Z , we donot want an adversary with a reasonable prior on X , P ∈ Θ , to have sharp posterior beliefs about the user’slocation at some sensitive time (e.g. one of the sensitivetimes in Figure 3 of Appendix 7.1). As such, theadversary cannot distinguish whether the user visitedlocation A or some nearby location B at that time.Let x s ∈ R represent a possible assignments to X I S ,hypothesizing the true sensitive location. Any suchassignment is secret, S = { X I S = x s : x s ∈ R } .Specifically, we want the posterior probability of any ocation Trace Privacy Under Conditional Priors two assignments to X I S within a radius r to be close: S pairs = { ( x s , x (cid:48) s ) : (cid:107) x s − x (cid:48) s (cid:107) ≤ r } . This protects asingle time within a trace of locations. More generally,in the context of spatiotemporal data of any dimension,we call this a basic secret . Compound Secrets & Pairs
Suppose we havethree sensitive times (again as in
Figure 3 ). A mech-anism that blocks inference on each of these separatelydoes not prevent inference on the combination of themsimultaneously. To obscure hypotheses on all three ofthese, we modify our set of secrets to any combinationof assignments to each secret location: S = (cid:8) { X I S = x s } ∩ { X I S = x s } ∩ { X I S = x s } : x si ∈ R , i ∈ [3] (cid:9) . Now, the set of discriminative pairs is any two assign-ments to all three secret locations: S pairs = (cid:110)(cid:0) { x s , x s , x s } , { x (cid:48) s , x (cid:48) s , x (cid:48) s } (cid:1) : (cid:107) x si − x (cid:48) si (cid:107) ≤ r, i ∈ [3] (cid:111) This protects against compound hypotheses: if daycareand work are within r of each other, this keeps anadversary from inferring X I S = ‘daycare’ and X I S = ‘work’ versus X I S = ‘work’ and X I S = ‘daycare’.More generally, in the context of spatiotemporal dataof any dimension, we call this a compound secret . Intu-itively, a mechanism that protects a compound secretof locations close together in time prevents a Bayesianadversary from leveraging the remainder of the trace toinfer direction of motion at those sensitive times. Notethat bounding the privacy loss of a compound secretdoes not bound the privacy loss of its constituent basicsecrets.Going forward, we refer to I S as the ‘secret set’. For the purpose of location privacy, it is importantto choose a prior class Θ such that the conditionaldistribution P P ( X I U | X I S ) is simple to compute for anysecret set I S and any prior P ∈ Θ . Of course, it isalso critical that the prior class naturally models thedata, and thus consists of ‘reasonable assumptions’ foradversaries. GPs satisfy both these requirements. Wemodel a full d -dimensional trace sampled at N timesby ‘unrolling’ it into a n = dN dimensional GP. Definition 2.1.
Gaussian process
A trace X is a Gaus-sian process if X I M has a multivariate normal distribu-tion for any set of indices I M ⊂ [ n ] . If X is a gaussianprocess, then the function i → E [ X i ] is called the meanfunction and the function ( i, j ) → Cov ( X i , X j ) is calledthe kernel function. In this work, the kernel uses locations’ time stamps tocompute their covariance ( t i , t j ) → Cov ( X i , X j ) , butgenerally could use any side information provided witheach location.GPs have simple, closed form conditional distributions.Let X ∼ N ( µ, Σ) , where µ ∈ R n and Σ ∈ R n × n . Then,the random variable X I U |{ X I S = x s } ∼ N ( µ u | s , Σ u | s ) ,where µ u | s = µ u + Σ us Σ − ss ( x s − µ s ) and Σ u | s =Σ uu − Σ us Σ − ss Σ su . Here, µ s denotes the mean vector µ accessed at indices I S and Σ su denotes the covariancematrix Σ accessed at rows I S and columns I U .For GP priors, we will use additive noise G ∼N ( , Σ ( g ) ) . Thus Z = X + G , too, is multivariatenormal. Furthermore, the distribution of any set ofvariables conditioned on any other set of variables in Figure 1 belongs to some multivariate normal distri-bution.GPs have been shown to successfully model mobility(Chen et al., 2015; Liang & Haas, 1999; Liu et al., 1998),even in the domain of surveillance video (Kim et al.,2011). Furthermore, although these non-parametricmodels are characterized by second order statistics, GPsare capable of complexity rivaling that of deep neuralnetworks (Lee et al., 2018), allowing for scalability tomore complex models and domains. Our proposedresults and algorithms may be applied regardless of thecomplexity of the chosen GP.
In the following section, we propose a privacy defi-nition that adapts Rényi Differential Privacy (RDP)(Mironov, 2017) to the Pufferfish framework. RDPresembles Differential Privacy (Dwork, 2006), exceptinstead of bounding the maximum probability ratio or max divergence of the distribution on outputs for twoneighboring databases, it bounds the
Rényi divergence of order λ , defined in Equation (1) for distributions P and P . The Rényi divergence bears a nice syn-ergy with Gaussian processes. If P = N ( µ , Σ) and P = N ( µ , Σ) — two mean-shifted normal distribu-tions — the Rényi divergence takes on a simple closedform shown in Equation (2). D λ (cid:18) P P (cid:19) = 1 λ − E x ∼P (cid:16) P P ( X = x ) P P ( X = x ) (cid:17) λ (1) = λ µ − µ ) (cid:124) Σ − ( µ − µ ) (2)We will make use of this in defining and boundingprivacy loss in the next section. unning heading author breaks the line We now propose a privacy framework that is tailoredto sequences of correlated data, Conditional Inferen-tial Privacy (CIP). CIP guarantees a radius r of in-distinguishability for the basic or compound secretsassociated with any secret set I S . Specifically, CIPprotects against any adversary with a specific prior on the shape of the trace, and is agnostic to their prior onthe absolute location of the trace. We call the set ofsuch prior distributions a Conditional Prior Class. Definition 3.1.
Conditional Prior Class
For X = { X , . . . , X n } , prior distributions P i , P j on X are saidto belong to the same conditional prior class Θ ifa constant shift in the conditioned x s results in aconstant shift on the distribution of X I U . Formally,if conditional distributions P P i ( X I U | X I S = x s ) = P P j ( X I U + c uij I S | X I S = x s + c sij I S ) for all x s .For instance, prior P P i may concentrate probabilityon traces passing through Los Angeles, while P P j con-centrates on traces passing through London. Condi-tioning on each secret in the pair ( x s , x (cid:48) s ) in L.A. isanalogous to conditioning on each secret in the pair ( x s + c sij I S , x (cid:48) s + c sij I S ) in London. The correspondingpair of conditional distributions on X I U in London( P P j ) are copies of those in L.A. ( P P i ) shifted by c uij I S .What matters is that the set of all pairs of conditionaldistributions under P P i induced by secret pairs ( x s , x (cid:48) s ) is identical to those under P P j up to a mean shift.See Appendix 7.5 for a more detailed discussion ofconditional prior classes. Definition 3.2. ( ε, λ ) -Conditional Inferential Privacy ( S pairs , r, Θ) Given compound or basic discriminativepairs S pairs associated with I S , a radius of privacy r ,a conditional prior class, Θ , and a privacy parameter, ε > , a privacy mechanism Z = A ( X ) satisfies ( ε, λ ) -CIP ( S pairs , r, Θ) if for all ( s i , s j ) ∈ S pairs , and all priordistributions P ∈ Θ , where P P ( s i ) , P P ( s j ) > , D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) ≤ ε (3)CIP departs from DP type notions of privacy like Ap-proaches A → C primarily by resisting only a restrictedclass of inter-dependence — the conditional prior class— as opposed to arbitrary dependence of any k locations.Unlike approaches A and B, we are able to preserveutility for tasks like traffic monitoring. Unlike approachC, CIP is still resistant to realistic models of locationinter-dependence.While this definition borrows heavily from the Pufferfishframework, it has a few key modifications. Pufferfish isgenerally described from a central, not local model. Wespecialize the kinds of secrets and discriminative pairs for the case of local location trace privacy. Addition-ally, we specialize the type of prior distribution classneeded for this local setting: the conditional prior class.Finally, we relax the strict max divergence (max logodds) criterion of the Pufferfish definition to a Rényidivergence. This guarantees that — with high prob-ability on draws of realistic traces Z | X I S — the logodds will be bounded by ε . As λ → ∞ , the log oddsare bounded for all traces, i.e. the max divergence isbounded. We formalize this in Theorem 3.1.The Rényi criterion of CIP greatly improves its flexibil-ity. Unlike the standard DP Approaches A → C whichonly take probabilities over the mechanism, we do nothave full control over the randomness at play: it ispartially from A defined by us and from P intrinsicto the data. Unlike max divergence, Rényi divergenceis available in closed form for many distributions, al-lowing for a more flexible privacy framework. The λ parameter helps us tune how strict a CIP definition isand how much noise we need to add. This allows us todesign mechanisms that are resistant to natural modelsof dependence while preserving utility. We now identify key properties that make the CIPguarantee interpretable and robust.
Interpretability:
CIP guarantees that a Bayesianadversary with any prior distribution on traces P inthe conditional prior class Θ does not learn much aboutbasic or compound secrets from the released trace Z .For basic secrets, this means that the adversary’s pos-terior beliefs regarding sensitive location X I S are notmuch sharper than their prior beliefs before witnessing Z . Theorem 3.1.
Prior-Posterior Gap: An ( ε, λ ) -CIPmechanism with conditional prior class Θ guaranteesthat for any event O on sanitized trace Z (cid:12)(cid:12)(cid:12)(cid:12) log P P , A ( s i | Z ∈ O ) P P , A ( s j | Z ∈ O ) − log P P ( s i ) P P ( s j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:48) for any P ∈ Θ with probability ≥ − δ over draws of Z | X I S = s i or Z | X I S = s j , where ε (cid:48) and δ are relatedby ε (cid:48) = ε + log / δ λ − . This holds under the condition that Z | X I S = s i and Z | X I S = s j have identical support. A CIP mechanism depends only on the conditional priordescribing the data, not the data itself. Suppose anadversary’s prior beliefs on X I S are uniform over some ocation Trace Privacy Under Conditional Priors region. For λ = 5 and ε = 0 . , there is only a ≈ chance that their posterior odds on s i , s j will be morethan 3.5, and a ≈ chance that they will be morethan 2. This ‘chance’ is over draws of likely remaininglocations X I U and the additive noise G . Proofs of allresults are in Appendix 7.2.For additive noise mechanisms like A ( X ) = X + G = Z ,the CIP loss can be split into two terms: one accountingfor the direct privacy loss of Z I S on X I S and a secondaccounting for the inferential privacy loss of Z I U on X I S via X I U . Lemma 3.2.
Conditional Independence
For an ad-ditive noise mechanism, a fully dependent trace as in
Figure 1a , and any prior P on X the CIP loss maybe expressed as D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) (4) = (cid:88) i ∈ I S (cid:20) D λ (cid:18) P A ( Z i | X i = s i ) P A ( Z i | X i = s j ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = s i ) P A , P ( Z I U | X I S = s j ) (cid:19) One interpretation of GI is that it assumes all locations X i are independent. In this case, the second term van-ishes and the privacy loss only depends on randomnessof the mechanism, not the prior. Robustness:
Kifer & Machanavajjhala (2011) showthat it is impossible to achieve both utility and privacyresistant to all priors. CIP provides resistance to areasonable class of priors
P ∈ Θ , but it is possiblethat the true distribution Q / ∈ Θ . In this case, theprivacy guarantees degrade gracefully as the divergencebetween Q and P ∈ Θ grows. Theorem 3.3.
Robustness to Prior Misspecification
Mechanism A satisfies ε ( λ ) -CIP for prior class Θ . Sup-pose the finite mean true distribution Q is not in Θ .The CIP loss of A against prior Q is bounded by D λ (cid:18) P A , Q ( Z | X I S = s i ) P A , Q ( Z | X I S = s j ) (cid:19) ≤ ε (cid:48) ( λ ) where ε (cid:48) ( λ ) = λ − λ − λ ) + ∆(4 λ −
3) + 2 λ − λ − ε (4 λ − and where ∆( λ ) is inf P∈ Θ sup s i ∈S max (cid:26) D λ (cid:18) P P ( X I U | X I S = s i ) P Q ( X I U | X I S = s i ) (cid:19) , D λ (cid:18) P Q ( X I U | X I S = s i ) P P ( X I U | X I S = s i ) (cid:19)(cid:27) As long as the conditional distribution on X I U | X I S = s i of prior Q is close to that of some P ∈ Θ , the privacyguarantees should change only marginally. This boundis tightest when ε ( λ ) does not grow quickly with order λ . A GP conditional prior class is the set of all GP priordistributions with the same kernel function ( i, j ) → Cov ( X i , X j ) and any mean function i → E [ X i ] . Withan additive Gaussian mechanism G ∼ N ( , Σ ( g ) ) , theCIP loss of Equation (4) can be bounded for any GPconditional prior class. See Appendix 7.5 for furtherdiscussion of the GP conditional prior class. Theorem 3.4.
CIP loss bound for GP conditionalpriors:
Let Θ be a GP conditional prior class. Let Σ be the covariance matrix for X produced by its kernelfunction. Let S be the basic or compound secret asso-ciated with I S , and S be the number of unique timesin I S . The mechanism A ( X ) = X + G = Z , where G ∼ N ( , Σ ( g ) ) , then satisfies ( ε, λ ) -Conditional Infer-ential Privacy ( S pairs , r, Θ) , where ε ≤ λ Sr (cid:16) σ s + α ∗ (cid:17) (5) where σ s is the variance of each G i ∈ G I S (diagonalentries of Σ ( g ) ss ) and α ∗ is the maximum eigenvalue of Σ eff = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) . The above bound is tight for basic secrets ( S = 1 ). Thetwo terms of Equation (5) represent the direct ( σ s ) and inferential ( α ∗ ) loss terms of Equation (4). Weassume that each diagonal entry of Σ ( g ) ss equals some σ s , so that each X i ∈ X I S experiences identical directprivacy loss, which is optimal under utility constraints.The above bound composes gracefully when multipletraces of an individual are released. Corollary 3.4.1.
Graceful Composition in Time
Sup-pose a user releases two traces X and ˆ X with additivenoise G ∼ N ( , Σ ( g ) ) and ˆ G ∼ N ( , ˆΣ ( g ) ) , respectively.Then basic or compound secret X I S of X enjoys (¯ ε, λ ) -CIP, where ¯ ε ≤ λ Sr (cid:16) σ s + ¯ α ∗ (cid:17) and where ¯ α ∗ is the maximum eigenvalue of ¯Σ eff = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + ¯Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) . Σ is the covari-ance matrix of the joint distribution on X, ˆ X and ¯Σ ( g ) = (cid:20) Σ ( g )
00 ˆΣ ( g ) . (cid:21) This bound is identical to that of Theorem 3.4, only us-ing the joint distribution over X , ˆ X and G, ˆ G . This pro-vides some insight to the fact that, unlike DP, even par-allel composition guarantees are not automatic. Com-position depends on the conditional prior. In the GP unning heading author breaks the line setting, if the chosen kernel function decays over time,we can expect composition to have minimal effects onprivacy for traces separated by long durations.To reduce the upper bound of Theorem 3.4, we optimizethe correlation (off-diagonal) of Σ ( g ) to minimize α ∗ ,and optimize its variance (diagonal) to balance a noisebudget between lowering inferential ( α ∗ ) and direct( σ s ) loss. Theorem 3.4 characterizes the privacy loss for GP con-ditional priors. We next show how to use this The-orem to design mechanisms that can strategically re-duce CIP loss given a utility constraint. We mea-sure ‘utility loss’ as the total mean squared error(MSE) between the released ( Z ) and true ( X ) traces:MSE (Σ ( g ) ) = (cid:80) ni =1 E [ Z i − X i ] = tr (Σ ( g ) ) . We boundthe utility loss by tr (Σ ( g ) ) ≤ no t , where o t is the aver-age per-point utility loss.It can be shown that optimizing the privacy loss underthis utility constraint can be described by a semidefi-nite program (SDP) (formalization/derivation of SDPsin Appendix 7.3). For a given trace X , define itscovariance matrix Σ using the the kernel of the GPconditional prior Σ ij = k ( i, j ) . Then pass Σ , the secretset I S , and the utility constraint o t to our first program,SDP A , which returns noise covariance Σ ( g ) . This de-fines an additive noise mechanism G ∼ N (0 , Σ ( g ) ) thatminimizes CIP loss to I S . Σ ( g ) = SDP A (Σ , I S , o t ) We can thus use a SDP to minimize the CIP loss to anysingle compound or basic secret. However, a trace maycontain multiple locations or combinations thereof thatone wishes to protect. It remains to produce a singlemechanism Σ ( g ) that bounds the CIP loss to multiplebasic and/or compound secrets in a single trace.For this we propose SDP B , which uses the fact that if Σ ( g ) (cid:48) (cid:31) Σ ( g ) it will have lower CIP loss (see Appendix7.3.2). SDP B takes in a set of covariance matrices F = { Σ ( g )1 , . . . , Σ ( g ) m } , each designed to minimize CIPloss for a single compound or basic secret I Si . It thenreturns a single covariance matrix Σ ( g ) (cid:23) Σ ( g ) i , i ∈ [ m ] that maintains the privacy guarantee each Σ ( g ) i offeredits corresponding I Si , while minimizing utility loss.In our experiments, we use Algorithm 1 to design asingle mechanism that protects all locations in the trace— all basic secrets — while minimizing utility loss. Algorithm 1:
Multiple Secrets
Input: I S , . . . , I Sm , o t , Σ Output: Σ ( g ) F = ∅ ; for i ∈ [ m ] do Σ ( g ) i = SDP A (Σ , I Si , o t ) ; F = F ∪ Σ ( g ) i ; end Σ ( g ) = SDP B ( F ) ; return Σ ( g ) ; Here, we aim to empirically answer: Do our SDPmechanisms maintain high posterior uncertainty ofsensitive locations? How do they compare to ApproachC baselines of equal MSE? How robust is the SDP A mechanism when the prior covariance Σ is misspecified? Methods
To answer these questions, we look at therange of conditional prior classes that fit real-worlddata. For location trace data, we use the GeoLife GPSTrajectories dataset (Zheng et al., 2010) containing10k human mobility traces after preprocessing (seeAppendix 7.4 for details). We also consider the pri-vacy risk of room temperature data (Nef et al., 2015),using the SML2010 dataset (Zamora-Martinez et al.,2014), which contains approximately 40 days of roomtemperature data sampled every 15 minutes.For the location data, having observed that the corre-lation between latitude and longitude is low ( ≈ . )we treat each dimension as independent. By way ofCorollary 7.2.1, this allows us to bound privacy lossand design mechanisms for each dimension separately.Furthermore, having observed that each dimensionfits nearly the same conditional prior, we treat ourdataset of 10k 2-dimensional traces as a dataset of 20k1-dimensional traces, where each trace represents onedimension of a 2d location trajectory.We model the location trace data with a Radial BasisFunction (RBF) kernel GP and the temperature seriesdata with a periodic kernel GP: k RBF ( t i , t j ) = σ x exp (cid:16) − ( t i − t j ) l (cid:17) k PER ( t i , t j ) = σ x exp (cid:16) − ( π | t i − t j | /p ) l (cid:17) In both kernels, the intrinsic degree of dependence be-tween points is captured by the lengthscale l . However,the fact that sampling rates vary significantly betweentraces means that traces with equal length scales can ocation Trace Privacy Under Conditional Priors(a) (b) (c) (d)(e) (f ) (g) (h)Figure 2: Posterior uncertainty interval (higher = better privacy) on X I S of a GP Bayesian adversary. A larger l eff corresponds to greater inter-dependence and reduces posterior uncertainty. The gray interval depicts the middle 50% of theMLE l eff among traces in each dataset, and the black dotted line the median l eff . (a) → (c) , (e) → (g) show SDP mechanisms(blue) maintaining relatively high uncertainty compared to two GI (Approach C) baselines of equal utility (MSE). (d) , (h) show the (minor) change in posterior uncertainty when the prior covariance Σ used in SDP A is misspecified: when it isidentical to the true covariance Σ ∗ known to the adversary (blue), is more correlated (orange), or is less correlated (green). have very different degrees of correlation. To encap-sulate both of these effects, we study the empiricaldistribution of effective length scale of each trace l eff ,x = l x P l eff ,y = l y P where P is the trace’s sampling period and l x , l y arethe its optimal length scales for each dimension. l eff ,x , l eff ,y tell us the average number of neighboringlocations that are highly correlated, instead of timeperiod. For instance, a given trace with an optimal l eff ,x = 8 tells us that every eight neighboring locationsamples in the x dimension have correlation > . .The empirical distribution of effective length scalesacross all traces describes – over a range of loggingdevices (sampling rates), users, and movement patterns– how many neighboring points are highly correlated inlocation trace data. After this preprocessing, we areable to use the kernels that take indices (not time) asarguments: k RBF ( i, j ) = exp (cid:16) − ( i − j ) l eff (cid:17) k PER ( i, j ) = exp (cid:16) − ( π | i − j | /p ) l eff (cid:17) See Appendix 7.4 for a more detailed discussion ofhow the empirical distribution of l eff across traces ismeasured.To impart the range of realistic conditional priors thegray interval of each plot depicts the middle 50% of the empirical l eff among traces in each dataset. Thedashed vertical line reports the median l eff .Each figure increases the degree of dependence, l eff ,used by the kernel to compute the prior covariance Σ( l eff ) . Σ( l eff ) is then used in one of the SDP routinesof Section 4 to produce a mechanism Σ ( g ) ( l eff ) thatprotects a basic secret (SDP A ), a compound secret(SDP A ), or the union of all basic secrets (Multiple Se-crets). We then observe the 68% confidence interval ofthe Gaussian posterior on sensitive points X I S (blueline). This is the σ uncertainty of a Bayesian adversarywith a GP prior represented by Σ( l eff ) (see Appendix7.4 for how this is computed). As l eff increases, theirposterior uncertainty will reduce. Our aim is to mit-igate this as much as possible with the given utilityconstraint. For scale, recall that prior variance diag (Σ) is normalized to one. In the case of all basic secrets, wereport the average posterior uncertainty over locations.We compare the SDP mechanisms with two mecha-nisms using the logic of Approach C (all three of equalMSE utility loss): independent/uniform and indepen-dent/concentrated . The uniform approach adds inde-pendent Gaussian noise evenly along the whole traceregardless of I S , Σ ( g ) = o t I . The concentrated ap-proach allocates the entire noise budget to the sensitiveset I S . Results
For our first question, see
Figures 2a → → . For both location and temperature data, ourSDP mechanisms maintain higher posterior uncertainty unning heading author breaks the line than the baselines with identical utility cost for a sin-gle basic secret, a compound secret, and all basic se-crets. By actively considering the conditional prior classparametrized by Σ , the SDP mechanisms can strategizeto both correlate noise samples and concentrate noisepower such that posterior inference is thwarted at thesensitive set I S . For an intuitive illustration of thechosen Σ ( g ) ’s, see Appendix 7.1.2.To answer our second question, see Figures 2d and .When the prior covariance Σ does not represent thetrue data distribution known to the adversary, a smallerposterior uncertainty may be achieved. The orange lineindicates the uncertainty interval of an adversary whoknows the data is less correlated than we believe i.e.the true Σ ∗ = Σ(0 . l eff ) . The blue line represents anadversary who knows the data is more correlated thanwe believe i.e. the true Σ ∗ = Σ(1 . l eff ) . Both plotsconfirm the robustness of our privacy guarantees statedby Theorem 3.3. Particularly around the median l eff we see that the change in posterior uncertainty withthis change in prior is indeed marginal. Related Work
Few works have proposed solutionsto the local guarantee when releasing individual traces.A mechanism offered in Bindschaedler & Shokri (2016)releases synthesized traces satisfying the notion of plau-sible deniability (Bindschaedler et al., 2017), but this isdistinctly different from providing a radius of privacyto sensitive locations. Meanwhile, the frameworks pro-posed in Xiao & Xiong (2015) and Cao et al. (2019)nicely characterize the risk of inference in locationtraces, but use only first-order Markov models of cor-relation between points, do not offer a radius of indis-tinguishability as in this work, and are not suited tocontinuous-valued spatiotemporal traces.Perhaps more technically similar to this work, Songet al. (2017) provide a general mechanism that appliesto any Pufferfish framework, as well as a more compu-tationally efficient mechanism that applies when thejoint distribution of an individual’s features can bedescribed by a graphical model. The first is too compu-tationally intensive. The second is for discrete settings,and cannot accommodate spatiotemporal effects.
Conclusion
This work proposes a framework forboth identifying and quantifying the inferential privacyrisk for highly dependent sequences of spatiotemporaldata. As a starting point, we have provided a simplebound on the privacy loss for Gaussian process priors,and an SDP-based privacy mechanism for minimizingthis bound without destroying utility. We hope to ex-tend this work to other data domains with different conditional priors, and different sets of secrets.
Acknowledgements
KC and CM would like to thank ONR under N00014-20-1-2334 and UC Lab Fees under LFR 18-548554 forresearch support. We would also like to thank ourreviewers for their insightful feedback. ocation Trace Privacy Under Conditional Priors
References
Dependence Makes You Vulnerable: Differential Pri-vacy Under Dependent Tuples. San Diego, CA. ISBN978-1-891562-41-9.Andrés, M. E., Bordenabe, N. E., Chatzikokolakis,K., and Palamidessi, C. Geo-indistinguishability:Differential privacy for location-based systems. arXivpreprint arXiv:1212.1984 , 2012.Bindschaedler, V. and Shokri, R. Synthesizing PlausiblePrivacy-Preserving Location Traces. In , pp. 546–563, May 2016. doi: 10.1109/SP.2016.39. ISSN:2375-1207.Bindschaedler, V., Shokri, R., and Gunter, C. A. Plau-sible deniability for privacy-preserving data syn-thesis.
Proceedings of the VLDB Endowment , 10(5):481–492, January 2017. ISSN 2150-8097. doi:10.14778/3055540.3055542. URL https://doi.org/10.14778/3055540.3055542 .Cao, Y., Yoshikawa, M., Xiao, Y., and Xiong, L. Quan-tifying Differential Privacy under Temporal Correla-tions. In , pp. 821–832, April2017. doi: 10.1109/ICDE.2017.132. ISSN: 2375-026X.Cao, Y., Xiao, Y., Xiong, L., and Bai, L. PriSTE: FromLocation Privacy to Spatiotemporal Event Privacy.In , pp. 1606–1609, April 2019. doi:10.1109/ICDE.2019.00153. ISSN: 2375-026X.Chen, J., Low, K. H., Yao, Y., and Jaillet, P. Gaus-sian Process Decentralized Data Fusion and ActiveSensing for Spatiotemporal Traffic Modeling andPrediction in Mobility-on-Demand Systems.
IEEETransactions on Automation Science and Engineer-ing , 12(3):901–921, July 2015. ISSN 1558-3783.doi: 10.1109/TASE.2015.2422852. Conference Name:IEEE Transactions on Automation Science and En-gineering.Dwork, C.
Differential Privacy , volume 4052.July 2006. ISBN 978-3-540-35907-4. URL .Fan, L., Xiong, L., and Sunderam, V. Differentiallyprivate multi-dimensional time series release for traf-fic monitoring. In
IFIP Annual Conference on Dataand Applications Security and Privacy , pp. 33–48.Springer, 2013.Fowler, G. A. Perspective | Smartphonedata reveal which Americans are socialdistancing (and not).
Washington Post ,2020. ISSN 0190-8286. URL .Kifer, D. and Machanavajjhala, A. No free lunchin data privacy. In
Proceedings of the 2011 ACMSIGMOD International Conference on Manage-ment of data , SIGMOD ’11, pp. 193–204, Athens,Greece, June 2011. Association for Computing Ma-chinery. ISBN 978-1-4503-0661-4. doi: 10.1145/1989323.1989345. URL https://doi.org/10.1145/1989323.1989345 .Kifer, D. and Machanavajjhala, A. Pufferfish: A frame-work for mathematical privacy definitions.
ACMTransactions on Database Systems (TODS) , 39(1):3,2014.Kim, K., Lee, D., and Essa, I. Gaussian process re-gression flow for analysis of motion trajectories. In ,pp. 1164–1171, November 2011. doi: 10.1109/ICCV.2011.6126365. ISSN: 2380-7504.Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Pen-nington, J., and Sohl-Dickstein, J. Deep NeuralNetworks as Gaussian Processes. arXiv:1711.00165[cs, stat] , March 2018. URL http://arxiv.org/abs/1711.00165 . arXiv: 1711.00165.Liang, B. and Haas, Z. Predictive distance-based mo-bility management for PCS networks. In
IEEE IN-FOCOM ’99. Conference on Computer Communi-cations. Proceedings. Eighteenth Annual Joint Con-ference of the IEEE Computer and CommunicationsSocieties. The Future is Now (Cat. No.99CH36320) ,volume 3, pp. 1377–1384 vol.3, March 1999. doi:10.1109/INFCOM.1999.752157. ISSN: 0743-166X.Liu, T., Bahl, P., and Chlamtac, I. Mobility model-ing, location tracking, and trajectory prediction inwireless ATM networks.
IEEE Journal on SelectedAreas in Communications , 16(6):922–936, August1998. ISSN 1558-0008. doi: 10.1109/49.709453. Con-ference Name: IEEE Journal on Selected Areas inCommunications.Mironov, I. Rényi differential privacy. In , pp. 263–275. IEEE, 2017.Nef, T., Urwyler, P., Büchler, M., Tarnanas, I., Stucki,R., Cazzoli, D., Müri, R., and Mosimann, U. Evalua-tion of three state-of-the-art classifiers for recognitionof activities of daily living from smart home ambientdata.
Sensors , 15(5):11725–11740, 2015.Song, S., Wang, Y., and Chaudhuri, K. PufferfishPrivacy Mechanisms for Correlated Data. In
Pro-ceedings of the 2017 ACM International Confer-ence on Management of Data , SIGMOD ’17, pp.1291–1306, Chicago, Illinois, USA, May 2017. As-sociation for Computing Machinery. ISBN 978-1- unning heading author breaks the line https://doi.org/10.1145/3035918.3064025 .Valentino-DeVryes, Jennifer; Singer, N. K. M. K. A.Your apps know where you were last night, andthey’re not keeping it secret.
The New York Times ,2018.Vandenberghe, L. The cvxopt linear and quadraticcone program solvers.
Online: http://cvxopt.org/documentation/coneprog. pdf , 2010.Vandenberghe, L. and Boyd, S. Semidefinite Program-ming.
SIAM Review , 38(1):49–95, March 1996. ISSN0036-1445, 1095-7200. doi: 10.1137/1038003. URL http://epubs.siam.org/doi/10.1137/1038003 .Xiao, Y. and Xiong, L. Protecting locations withdifferential privacy under temporal correlations. In
Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security , pp. 1298–1309. ACM, 2015.Yang, B., Sato, I., and Nakagawa, H. Bayesian Differ-ential Privacy on Correlated Data. In
Proceedingsof the 2015 ACM SIGMOD International Confer-ence on Management of Data , SIGMOD ’15, pp.747–762, Melbourne, Victoria, Australia, May 2015.Association for Computing Machinery. ISBN 978-1-4503-2758-9. doi: 10.1145/2723372.2747643. URL https://doi.org/10.1145/2723372.2747643 .Zamora-Martinez, F., Romeu, P., Botella-Rocamora,P., and Pardo, J. On-line learning of indoor temper-ature forecasting models towards energy efficiency.
Energy and Buildings , 83:162–172, 2014.Zheng, Y., Xie, X., Ma, W.-Y., et al. Geolife: Acollaborative social networking service among user,location and trajectory.
IEEE Data Eng. Bull. , 33(2):32–39, 2010. ocation Trace Privacy Under Conditional Priors
For documented code demonstrating our SDP mechanisms used to generate the plots of
Figure 2 please visitour repo: https://github.com/casey-meehan/location_trace_privacy
The following sections will include proofs of results, derivations of algorithms, and explanations of experimentalprocedures. (a) (b) (c)Figure 3:
Example of sensitive location trace of NYC mayoral staff member exposed by (Valentino-DeVryes, 2018). (b)and (c) depict the posterior uncertainty (green) P A , P ( X i | Z ) for each 2d location. (a) depicts three sensitive times (redwith blue outline): Gracie Mansion (Mayor’s home), an event on Staten Island that the mayor attended, and finally thestaff member’s home on long island. (b) provides an example of Approach C: adding independent Gaussian noise to eachlocation (red dotted line). A GP posterior still maintains high confidence within a small radius along the trace, includingat the sensitive times. (c) provides an example of the optimized noise of Multiple Secrets of identical aggregate MSE as(b). By focusing correlated noise around the three sensitive times, there is high uncertainty at sensitive times and highconfidence elsewhere. The following figures aim to illustrate the difference between the covariance matrices used in the experimentalbaselines (indep./uniform and indep./concentrated) and those chosen by our SDP algorithms for both the RBFand periodic prior. Note that here we presume the different dimensions of location to be independent and — byCorollary 7.2.1 — are able to treat a 2d location trace as two 1d traces. As such, the following examples aredemonstrating mechanism covariance matrices and additive noise samples used for either a single dimension oflocation data (for RBF kernel) or for the one dimension of temperature data (for periodic kernel).The first figure (a) shows the covariance of the Approach C baselines used in the experiments. The second figure (b) shows the covariance of our SDP mechanisms for the RBF kernel used on location data. The third figure (c) shows the covariance of our SDP mechanisms for the periodic kernel used for temperature data.In each figure the covariance matrix is depicted as a heat map with warmer colors indicating higher values(normalized to largest and smallest value in the covariance matrix). The drawn noise samples G are plottedagainst their time index. So, the sequence of plotted ( x, y ) values is (cid:2) (1 , G ) , (2 , G ) , . . . , ( n, G n ) (cid:3) , where n = 50 unning heading author breaks the line for the RBF case and n = 48 for the periodic case. (a) Covariance matrices and mechanism samples for the baselines used in experiments.The first figure demonstrates the uniform approach that distributes the independent Gaussian noise budget along theentire trace, regardless of I S .The second and third show the concentrated approach that allocates the entire noise budget to only the sensitive locationsin I S : first for a basic secret (one location) and then for a compound secret of 3 evenly spaced locations. ocation Trace Privacy Under Conditional Priors(b) Covariance matrices and mechanism samples for the median RBF prior ( l eff ≈ ).The first noise mechanism (Mech. basic) demonstrates the covariance matrix chosen by SDP A for a basic secret of a singlelocation X i in the middle of the trace. The uncorrelated dot in the middle of the covariance matrix, Σ ( g ) ii , representsthe independent noise G i added at the sensitive location to mitigate direct loss. To mitigate inferential loss, the SDPoptimizes the remainder of the matrix to be positively correlated with maximum variance allocated to locations near X i intime. This thwarts GP inference of the true location at time t i .The second mechanism (Mech. comp.) depicts the covariance chosen by SDP A to protect a compound secret of twoadjacent locations in the trace (visible as the uncorrelated ‘ + ’ through the middle consuming 2 rows/columns). Recallthat a compound secret ought to protect directional information: did the user visit B first and then A, or A and then B? That is precisely what this mechanism does by randomizing the angle of approach to the two locations in the middle withpositively and negatively correlated noise. Also note that the SDP does not allocate a large share of noise budget to theactual locations themselves. This highlights the fact that protecting a compound secret does not protect its constituentbasic secrets.The third and final mechanism (Mech. all basic) is the noise covariance chosen by SDP B in the Multiple Secrets algorithm.To protect all basic secrets with a utility constraint, the SDP converges to a mechanism that looks similar to the uniformbaseline. However, this mechanism adds a subtle degree of off-diagonal correlation along with greater noise power towardsthe beginning and end of the trace. The off-diagonal correlation is noticeable when the samples are compared to those ofthe uniform baseline in the previous figure. While this change appears to be minor, it makes a significant change in theposterior confidence of a GP adversary (as seen in Figure 2c ). unning heading author breaks the line(c) Covariance matrices and mechanism samples for the median periodic prior ( l eff ≈ . ), and a period of half the tracelength.The first noise mechanism (Mech. Basic) shows the covariance chosen by SDP A to protect a single location (temperature)in the middle of the trace. As in the RBF case, significant noise power is allocated to the sensitive location itself, X i , tolimit direct privacy loss. However, the noise added to the remainder of the trace is significantly different. It is tailored tothwart inference by a periodic prior, wherein the location one period away has correlation 1.The second noise mechanism (Mech. comp.) shows the covariance chosen by SDP A to protect a compound secret of twolocations, X i , X j , 16 timesteps apart (not quite a full period). Here, we see the SDP randomize the phase of the additivenoise such that periodic inference cannot tell directional information like X i > X j or vice versa.The third noise mechanism (Mech. all basic) is identical to the all basic secrets mechanism chosen for the RBF case above,except using a periodic prior Σ . The mechanism chosen looks similar to the uniform baseline, except with slightly periodicoff-diagonal correlation imitating the prior covariance. Additionally, noise power is mitigated towards the middle andends of the trace. Again, Figure 2g indicates that this subtle change makes a significant difference in thwarting Bayesianadversaries. ocation Trace Privacy Under Conditional Priors
Prior-Posterior Gap: An ( ε, λ ) -CIP mechanism with conditional prior class Θ guarantees that forany event O on sanitized trace Z (cid:12)(cid:12)(cid:12)(cid:12) log P P , A ( s i | Z ∈ O ) P P , A ( s j | Z ∈ O ) − log P P ( s i ) P P ( s j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:48) for any P ∈ Θ with probability ≥ − δ over draws of Z | X I S = s i or Z | X I S = s j , where ε (cid:48) and δ are related by ε (cid:48) = ε + log / δ λ − . This holds under the condition that Z | X I S = s i and Z | X I S = s j have identical support.Proof. This result makes use of a Rényi divergence property identified in Mironov (2017):
Lemma 7.1.
Let P , Q be two distributions on X of identical support such that max (cid:26) D λ (cid:18) P P ( X ) P Q ( X ) (cid:19) , D λ (cid:18) P Q ( X ) P P ( X ) (cid:19)(cid:27) ≤ ε Then for any event O , P P ( X ∈ O ) ≤ max (cid:8) e ε (cid:48) P Q ( X ∈ S ) , δ (cid:9) and P Q ( X ∈ O ) ≤ max (cid:8) e ε (cid:48) P P ( X ∈ S ) , δ (cid:9) where ε (cid:48) = ε + log / δ λ − CIP guarantees that for all
P ∈ Θ and all discriminative pairs ( s i , s j ) ∈ S pairs (which also includes ( s j , s i ) ) D λ (cid:18) P P , A ( Z | X I S = s i ) P P , A ( Z | X I S = s j ) (cid:19) ≤ ε and thus by Lemma 7.1 we have for any event O on ZP P , A ( Z ∈ O | X I S = s i ) ≤ max (cid:8) e ε (cid:48) P P , A ( Z ∈ O | X I S = s j ) , δ (cid:9) and P P , A ( Z ∈ O | X I S = s j ) ≤ max (cid:8) e ε (cid:48) P P , A ( Z ∈ O | X I S = s i ) , δ (cid:9) As such, given that X I S = s i the probability of some event { Z ∈ W } such that P P , A ( Z ∈ W | X I S = s i ) ≥ e ε (cid:48) P P , A ( Z ∈ W | X I S = s j ) is no more than δ . The same is true swapping s j for s i . So, over draws of Z | X I S = s i or Z | X I S = s j we have that P P , A ( Z ∈ O | X I S = s i ) P P , A ( Z ∈ O | X I S = s j ) ≤ e ε (cid:48) and P P , A ( Z ∈ O | X I S = s j ) P P , A ( Z ∈ O | X I S = s i ) ≤ e ε (cid:48) with probability ≥ − δ , which is equivalent to the statement that − ε (cid:48) ≤ log P P , A ( Z ∈ O | X I S = s i ) P P , A ( Z ∈ O | X I S = s j ) ≤ ε (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) log P P , A ( s i | Z ∈ O ) P P , A ( s j | Z ∈ O ) − log P P ( s i ) P P ( s j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:48) unning heading author breaks the line (CIP loss for additive mechanisms) For an additive noise mechanism, a fully dependent trace as in
Figure 1b , and any prior P on X the CIP loss may be expressed as D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = (cid:88) i ∈ I S (cid:20) D λ (cid:18) P A ( Z i | X i = s i ) P A ( Z i | X i = s j ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = s i ) P A , P ( Z I U | X I S = s j ) (cid:19) Proof. D λ (cid:18) P A , P ( Z | X I S = x s ) P A , P ( Z | X I S = x (cid:48) s ) (cid:19) = D λ (cid:18) P A ( Z I S | X I S = x s ) P A , P ( Z I U | X I S = x s ) P A ( Z I S | X I S = x (cid:48) s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (1) = D λ (cid:18) P A ( Z I S | X I S = x s ) P A ( Z I S | X I S = x (cid:48) s ) (cid:19) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (2) = D λ (cid:18)(cid:81) i ∈ I S P A ( Z i | X i = x i ) (cid:81) i ∈ I S P A ( Z i | X i = x (cid:48) i ) (cid:19) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (3) = (cid:88) i ∈ I S (cid:20) D λ (cid:18) P A ( Z i | X i = x i ) P A ( Z i | X i = x (cid:48) i ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (4)Where line (1) uses the conditional independence seen in the graphical model of Figure 1 . Line (2) is due tothe fact that the two terms in line (1) are conditionally independent, allowing for separating into the sum oftwo separate divergences (which is an easily verifiable property of Rényi divergence evident from its definition inEquation 1). Line (3) is again from the conditional independence between the Z i for each i ∈ I S when conditionedon X I S . Line (4) uses the same property of Rényi divergence used in Line (2): the terms in the product areconditionally independent allowing for the separation into the sum of multiple divergences. Robustness to Prior Misspecification
Mechanism A satisfies ε ( λ ) -CIP for prior class Θ . Supposethe finite mean true distribution Q is not in Θ . The CIP loss of A against prior Q is bounded by D λ (cid:18) P A , Q ( Z | X I S = s i ) P A , Q ( Z | X I S = s j ) (cid:19) ≤ ε (cid:48) ( λ ) where ε (cid:48) ( λ ) = λ − λ − λ ) + ∆(4 λ −
3) + 2 λ − λ − ε (4 λ − and where ∆( λ ) is inf P∈ Θ sup s i ∈S max (cid:26) D λ (cid:18) P P ( X I U | X I S = s i ) P Q ( X I U | X I S = s i ) (cid:19) , D λ (cid:18) P Q ( X I U | X I S = s i ) P P ( X I U | X I S = s i ) (cid:19)(cid:27) Proof.
By ‘finite mean’ distribution Q , we mean that all conditionals of Q given some X I S have finite mean.Since a conditional prior class contains conditionals of one distribution with any offset (any mean value), thisguarantees that ∆( λ ) is achieved for some P ∈ Θ . Intuitively, this prevents the pathological case of inf P∈ Θ beinga limit as the mean of P → ∞ , only asymptotically approaching ∆( λ ) . If the mean of Q is finite, then the closest P ∈ Θ (in Rényi divergence) must also have finite mean, since any mean is attainable in a conditional prior class Θ .With this in mind, we make use of the following triangle inequality provided in Mironov (2017): ocation Trace Privacy Under Conditional Priors Lemma 7.2.
For distributions P , Q , R on X with common support we have D λ (cid:18) P P ( X ) P Q ( X ) (cid:19) ≤ λ − λ − D λ (cid:18) P P ( X ) P R ( X ) (cid:19) + D λ − (cid:18) P R ( X ) P Q ( X ) (cid:19) In our case, we assume that the mechanism A gives Z | X I S = x s identical support for all I S , x s . Using this, wehave D λ (cid:18) P A , Q ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) ≤ λ − λ − D λ (cid:18) P A , Q ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x s ) (cid:19) + D λ − (cid:18) P A , P ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) . By a data processing inequality, the divergence of the first term is bounded by ∆(2 λ ) and the blue term may bebounded by a second application of the triangle inequality: D λ − (cid:18) P A , P ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) ≤ λ − λ − D λ − (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) + D λ − (cid:18) P A , P ( Z I U | X I S = x (cid:48) s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) The first divergence is bounded by ε (4 λ − and the second divergence is bounded by ∆(4 λ − . Putting all thistogether we have the following upper bound D λ (cid:18) P A , Q ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) ≤ λ − λ − λ ) + ∆(4 λ −
3) + 2 λ − λ − ε (4 λ − CIP loss bound for GP conditional priors:
Let Θ be a GP conditional prior class. Let Σ be thecovariance matrix for X produced by its kernel function. Let S be the basic or compound secret associated with I S ,and S be the number of unique times in I S . The mechanism A ( X ) = X + G = Z , where G ∼ N ( , Σ ( g ) ) , thensatisfies ( ε, λ ) -Conditional Inferential Privacy ( S pairs , r, Θ) , where ε ≤ λ Sr (cid:16) σ s + α ∗ (cid:17) where σ s is the variance of each G i ∈ G I S (diagonal entries of Σ ( g ) ss ) and α ∗ is the maximum eigenvalue of Σ eff = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) .Proof. Again, the conditional prior class Θ is defined by a kernel function i, j → Cov ( i, j ) , which – given theindices of the trace X – induces a covariance matrix Σ between all X i , X j . In practice, when the sampling rate oflocations is non-uniform the kernel function may use the time-stamps of the points in the trace to assign highcorrelation to X i that are close in time and low correlation to X i that are far apart in time. Of course, correlationbetween X i that are different dimension (e.g. latitude and longitude) must be designed for the given applicationand may be completely independent. The kernel function can encode this as well.Recall from Equation 1 that the Rényi divergence between two mean-shifted multivariate normal distributions, P = N ( µ , Σ) and P = N ( µ , Σ) is D λ (cid:18) P P (cid:19) = λ µ − µ ) (cid:124) Σ − ( µ − µ ) Now, for any prior
P ∈ Θ , we have that X ∼ N ( µ, Σ) for some µ and for Σ defined by the kernel function. Again, G ∼ N ( , Σ ( g ) ) . I S encodes the indices of a single location basic secret or a multi-location compound secret.Then, the divergence to bound for ( ε, λ ) -CIP ( S pairs , r, Θ) is D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) unning heading author breaks the line for any ( s i , s j ) ∈ S pairs = { ( x s , x (cid:48) s ) : (cid:107) x s − x (cid:48) s (cid:107) ≤ r } if I S encodes a basic secret, or for any ( s i , s j ) ∈ S pairs = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) if I S encodes a compound secret. A discriminative pair ( s i , s j ) is two real valued vectors ∈ R | I S | , representingtwo hypotheses about the true values of X I S . We denote the m th element as s im , s jm . Let f : I S → [ | I S | ] be amapping from each index w ∈ I S to its corresponding position in the vector s i or s j (where the value of X w ishypothesized). By Lemma 3.2, the divergence can be written as D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = (cid:88) w ∈ I S (cid:20) D λ (cid:18) P A ( Z w | X w = s if ( w ) ) P A ( Z w | X w = s jf ( w ) ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) where P A ( Z w | X w = x ) = N ( x, σ s ) for all w ∈ I S . Recall from the statement of the Theorem that we assume thediagonal entries of Σ ss all equal some value σ s : we add the same noise variance to each point in the secret set, whichis optimal under MSE constraints. Additionally, note that for the hypothesis X I S = x s , we know the distributionof X I U | X I S = x s ∼ N ( µ u | s , Σ u | s ) , where µ u | s = µ u + Σ us Σ − ss ( x s − µ s ) and Σ u | s = Σ uu − Σ us Σ − ss Σ su . Noticethat only µ u | s depends on the actual value of x s , and Σ u | s depends only on the indices of I S . Being the sum oftwo normally distributed variables, we have that ( Z I U | X I S = x s ) d = ( X I U | X I S = x s ) + G I U = N ( µ u | s , Σ u | s + Σ ( g ) uu ) .Substituting this into the divergences above sum of divergences: D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = | I S | (cid:88) m =1 (cid:20) D λ (cid:18) N ( s im , σ s ) N ( s jm , σ s ) (cid:19)(cid:21) + D λ (cid:18) N ( µ u | s i , Σ u | s + Σ ( g ) uu ) N ( µ u | s j , Σ u | s + Σ ( g ) uu ) (cid:19) (1) = λ | I S | (cid:88) m =1 σ s ( s im − s jm ) + λ µ u | s i − µ u | s j ) (cid:124) (Σ u | s + Σ ( g ) uu ) − ( µ u | s i − µ u | s j ) (2) = λ σ s ( s i − s j ) (cid:124) ( s i − s j ) + λ (cid:0) Σ us Σ − ss ( s i − s j ) (cid:1) (cid:124) (Σ u | s + Σ ( g ) uu ) − (cid:0) Σ us Σ − ss ( s i − s j ) (cid:1) (3) = λ σ s ( s i − s j ) (cid:124) ( s i − s j ) + λ s i − s j ) (cid:124) Σ − ss Σ su (Σ u | s + Σ ( g ) uu ) − Σ us Σ − ss ( s i − s j ) (4)Line (1) substitutes in the normal distributions given by our mechanism and conditional prior class. Line (2)substitutes in the closed-form expression for Rényi divergence between two mean-shifted normal distributionsgiven in Equation 1. Line (3) substitutes in the expression for µ u | s given above, and simplifies. To expand outthis simplification in explicit steps: ( µ u | s i − µ u | s j ) = (cid:0) µ u + Σ us Σ − ss ( s i − µ s ) − [ µ u + Σ us Σ − ss ( s j − µ s )] (cid:1) = (cid:0) Σ us Σ − ss s i − Σ us Σ − ss s j (cid:1) = Σ us Σ − ss ( s i − s j ) Line (4) distributes the transpose in the right term of line (3): (cid:0) Σ us Σ − ss ( s i − s j ) (cid:1) (cid:124) = ( s i − s j ) (cid:124) (cid:0) Σ us Σ − ss (cid:1) (cid:124) = ( s i − s j ) (cid:124) (cid:0) Σ − ss (cid:1) (cid:124) Σ (cid:124) us = ( s i − s j ) (cid:124) Σ − ss Σ su where that final step is a consequence of Σ being symmetric. Σ ss is also a symmetric matrix (so its inverse issymmetric) and Σ (cid:124) us = Σ su .Returning to line (4) above, simplify this expression by substituting ∆ = s i − s j : D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = λ σ s ∆ (cid:124) ∆ + λ (cid:124) Σ − ss Σ su (Σ u | s + Σ ( g ) uu ) − Σ us Σ − ss ∆ (5) = λ σ s (cid:107) ∆ (cid:107) + λ (cid:124) Σ eff ∆ (6) ocation Trace Privacy Under Conditional Priors Where Σ eff = Σ − ss Σ su (Σ u | s + Σ ( g ) uu ) − Σ us Σ − ss . The left term of line (6) attributes the direct loss of Z I S on X I S and the right term attributes the indirect loss of Z I U on X I S .We are interested in bounding the expression of line (6) for all ( s i , s j ) ∈ S pairs . We do this by bounding it for allvectors ∆ ∈ D D = { s i − s j : (cid:107) s i − s j (cid:107) ≤ √ S r } , where S is the number of basic secrets (locations) contained in I S which may be a basic or compound secretset. For a basic secret ( S = 1 ), this bound is tight, since D = { s i − s j : ( s i , s j ) ∈ S pairs } . The set of ∆ ∈ D isexactly any two hypothesis ( s i , s j ) that are within any circle of radius r . For a compound secret, this bound isnot guaranteed to be tight. Recall once again that the set of S pairs for a compound secret is given by the set of ( s i , s j ) in S pairs = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) For concreteness, consider the 2d location trace example in
Figure 3 , where we have a compound secret of S = 3 locations. Here, s i , s j ∈ R , where 6 comes from the fact that we have three 2d locations. So, ( s i , s j ) represents apair of hypotheses on all three locations. s i ’s hypothesis of the first secret location — written as x s ∈ R above— is within r of the s j ’s hypothesis of the first secret location — written as x s (cid:48) ∈ R above. The same goes forthe second and third locations. So, the L norm of ∆ = s i − s j is no greater than sup ( s i ,s j ) ∈S pairs (cid:107) s i − s j (cid:107) = sup ( s i ,s j ) ∈S pairs (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) m =1 ( s im − s jm ) = sup ( s i ,s j ) ∈S pairs (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) k =1 (cid:107) x sk − x s (cid:48) k (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) k =1 r = √ r For compound secrets, D represents the L ball enclosing all ∆ ∈ { s i − s j : ( s i , s j ) ∈ S pairs } . However, D alsoincludes some values of ∆ = s i − s j not covered by S pairs . Suppose an adversary considers the hypotheses s i = { x s , x s , x s } , s j = { x (cid:48) s , x (cid:48) s , x (cid:48) s } where x s = 0 , x s (cid:48) = √ r, x s = x s (cid:48) , x s = x s (cid:48) . Since x s , x s (cid:48) are not within r of each other, this is not in S pairs . However, it is covered by D , and thus is covered by our bound on CIP loss and our mechanisms.With D defined, we may return to bounding the expression in line (6): D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) ≤ sup ∆ ∈D (cid:18) λ σ s (cid:107) ∆ (cid:107) + λ (cid:124) Σ eff ∆ (cid:19) (7) ≤ λ (cid:18) σ s Sr + Sr maxeig (Σ eff ) (cid:19) (8) = λ Sr (cid:0) σ s + α ∗ (cid:1) (9)where line (8) distributes the supremum. For the right term, this is given by the maximum magnitude ofall ∆ ∈ D times the maximum eigenvalueof Σ eff which equals Sr maxeig (Σ eff ) . Line (9) simply substitutes α ∗ = maxeig (Σ eff ) . unning heading author breaks the line Graceful Composition in Time
Suppose a user releases two traces X and ˆ X with additive noise G ∼ N ( , Σ ( g ) ) and ˆ G ∼ N ( , ˆΣ ( g ) ) , respectively. Then basic or compound secret X I S of X enjoys (¯ ε, λ ) -CIP,where ¯ ε ≤ λ Sr (cid:16) σ s + ¯ α ∗ (cid:17) and where ¯ α is the maximum eigenvalue of ¯Σ eff = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + ¯Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) . Σ is the covariancematrix of the joint distribution on X, ˆ X and ¯Σ ( g ) = (cid:20) Σ ( g )
00 ˆΣ ( g ) . (cid:21) Proof.
Here, we record two traces (presumably) far apart in time ( X , . . . , X n ) and ( ˆ X , . . . , ˆ X m ) And release ( Z , . . . , Z n ) = ( X , + G , . . . , X n + G n ) and ( ˆ Z , . . . , ˆ Z m ) = ( ˆ X , + ˆ G , . . . , ˆ X m , + ˆ G m ) the first trace protects secret locations X I S and the second protects (cid:100) X I S , so we have that D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) ≤ εD λ (cid:18) P A , P ( ˆ Z | (cid:100) X I S = ˆ s i ) P A , P ( ˆ Z | (cid:100) X I S = ˆ s j ) (cid:19) ≤ ˆ ε We aim to update the losses: D λ (cid:18) P A , P ( Z, ˆ Z | X I S = s i ) P A , P ( Z, ˆ Z | X I S = s j ) (cid:19) ≤ ε (cid:48) D λ (cid:18) P A , P ( ˆ Z, Z | (cid:100) X I S = ˆ s i ) P A , P ( ˆ Z, Z | (cid:100) X I S = ˆ s j ) (cid:19) ≤ ˆ ε (cid:48) Fortunately, our framework is pretty friendly to figuring this out, and can be done simply by updating the‘inferential loss term’ α ∗ and ˆ α ∗ of each, the max eigenvalues used to compute each of ε and ˆ ε , respectively. Let’sfocus on ε (cid:48) , since the same analysis follows for ˆ ε (cid:48) .Recall that α ∗ is given by the max eigenvalue of Σ eff which is Σ eff = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) Where Σ is the covariance matrix of X , . . . , X n and Σ ( g ) is the noise covariance matrix added. Simply augment Σ to become the joint covariance matrix Σ J of X, ˆ X , and augment Σ ( g ) to become Σ ( g ) J = (cid:20) Σ ( g )
00 ˆΣ ( g ) (cid:21) then update Σ eff to Σ eff ,J which uses both Σ J and Σ ( g ) J . Using the corresponding max eigenvalue α ∗ J in the lossexpression of Theorem 3.2 gives us ε (cid:48) .Note that for kernels like RBF, ε (cid:48) → ε as the traces X and ˆ X move apart further and further in time. This isnot the case for traces using a purely periodic kernel with not time decay, and we should expect much worsecomposition. ocation Trace Privacy Under Conditional Priors In many cases, the different dimensions of the trace may be probabilistically independent, and it may be moreconvenient to make separate privacy mechanisms for each. For a 2d trace X , suppose I x and I y store the indicesof the latitude points X I x and longitude points X I y , such that X = X I x ∪ X I y . If latitude and longitude areindependent, it may be more convenient to characterize the conditional priors of X I x abd X I y separately. Thequestion is whether privacy guarantees remain for the full trace X . To answer this, we provide the followingcorollary: Corollary 7.2.1.
CIP loss of independent dimensions
Let Θ be a GP conditional prior class on a 2d trace X such that the dimensions are independent. Let I S be some secret set of time indices corresponding to somebasic or compound secret. For the trace X = X I x ∪ X I y , the Gaussian mechanism A ( X ) = Z I x ∪ Z I y where Z I x = A x ( X I x ) = X I x + G I x and Z I y = A y ( X I y ) = X I y + G I y satisfies ( ε, λ ) -CIP where ε ≤ λ Sr (cid:0) σ s + α ∗ x + α ∗ y (cid:1) when A x and A y provide λ Sr (cid:0) σ s + α ∗ x ) and λ Sr (cid:0) σ s + α ∗ y ) to I S ∩ I x and I S ∩ I y , respectively. The gist of this corollary is that a mechanism can be designed to achieve the bound of Theorem 3.4 to eachdimension independently and released with still-meaningful privacy guarantees. The reason is that this stillincludes all secret pairs S pairs Proof.
By independence, X I x and X I y can be treated as two unconnected traces of the type seen in Figure 1 .As such the privacy guarantee of Theorem 3.4 can be upheld for each. The question is whether bounding CIPloss to the one-dimensional basic or compound secret associated with secret sets I S ∩ I x and I S ∩ I y still providesguarantees for the full secret set I S .Without loss of generality, we will demonstrate for a basic and a compound secret. Consider the basic secretset I S = { X , X } , where I S ∩ I x = { X } (latitude) and I S ∩ I y = { X } (longitude). We again assume thatindependent gaussian noise of variance σ s is added to all X I S , since this is optimal under utility constraints. Wehave now bounded the Rényi divergence when conditioning on pairs of hypotheses on latitude and longitudeseparately. S pairs x = S pairs y = { ( x s , x (cid:48) s ) : x s ∈ R , (cid:107) x s − x (cid:48) s (cid:107) ≤ r } By independence, this also bounds the Rényi divergence conditioning on pairs of hypotheses on latitude andlongitude jointly: S pairs xy = { ( x s , x (cid:48) s ) : x s ∈ R , (cid:107) x s − x (cid:48) s (cid:107) ≤ r } In effect, we have guaranteed privacy for any pair of hypotheses ( s i , s j ) in the square circumscribing the circle ofradius r that we with to provide. The analysis on the direct privacy loss is exactly the same as it was in the moregeneral case. Since the Rényi divergences of X I U ∩ X I x and of X I U ∩ X I y add, the α ∗ ’s add.The same goes for a compound secret. Consider three location compound secret pairs given by S pairs xy = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : x si ∈ R , (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) Instead, we bound privacy loss for S pairs x = S pairs y = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : x si ∈ R , (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) Separately, giving us α ∗ x and α ∗ y . This again includes any two hypotheses on the three locations such that eachpair of x sk , x (cid:48) sk is within a square circumscribing a circle of radius r . We achieve this by bounding privacy lossfor all ∆ x in a 3d L ball of radius √ S r , as with ∆ y .This corollary can be extended to all traces of all dimensions that are probabilistically independent.We make use of the above proof in the Experiments section. unning heading author breaks the line In this section, we derive the three SDP-based algorithms of Section 4 and their properties. A SDP A minimizes the privacy loss bound of Theorem 3.4 for any compound or basic secret encoded by secret set I S . As is clarified in its proof (Appendix 7.2.4), the bound is tight when I S encodes a basic secret. If I S encodesa compound secret, the tightness depends on the conditional prior class Θ .Our variable for minimizing this bound is the noise covariance matrix Σ ( g ) . Due to the conditional independenceexhibited by Lemma 3.2, G I S and G I U may be independent. The additive noise G i ∈ G I S are all independentGaussian with variance σ s . This is because — conditioning on { X I S = x s } — Z I S is independent of X I U and Z I U .So, G I S ∼ N ( , σ s I ) , and Σ ( g ) ss = σ s I . The additive noise G i ∈ G I U are all dependent as described by Σ ( g ) uu , and G I U ∼ N ( , Σ ( g ) uu ) . Consequently, Σ ( g ) is completely characterized by Σ ( g ) uu and σ s .To see how the bound of Theorem 3.4 can be redrafted as an SDP, first notice that its two terms may be written asthe maximum eigenvalue of a matrix product. Here, Σ eff = A (cid:124) BA , where A = Σ us Σ − ss and B = (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − σ s + α ∗ = maxeig (cid:0) σ s I + A (cid:124) BA (cid:1) = maxeig (cid:18) (cid:2) I A (cid:3) (cid:20) σ s I B (cid:21) (cid:20) IA (cid:21) (cid:19) = maxeig (cid:0) ˜ A (cid:124) ˜ B ˜ A (cid:1) This expression uses all parameters of Σ ( g ) : σ s parametrizes Σ ( g ) ss and Σ ( g ) uu = B − − Σ u | s , where Σ u | s is given bythe kernel function of Θ .Before casting this as an SDP, we provide a formal definition from Vandenberghe & Boyd (1996): Definition 7.1.
Semidefinite Program
The problem of minimizing a linear function of a variable x ∈ R n subjectto a matrix inequality: min x ∈ R n c (cid:124) x s.t. F + n (cid:88) i =1 x i F i (cid:23) Ax = b where the F i ∈ R n × n are all symmetric and A ∈ R p × n is a semidefinite program , or SDP.The task of minimizing maxeig (cid:0) ˜ A (cid:124) ˜ B ˜ A (cid:1) under MSE constraints can almost be formulated as an SDP: min B (cid:23) , / σ s ≥ β ∗ s.t. β ∗ I (cid:23) ˜ A (cid:124) ˜ B ˜ AB (cid:22) Σ − u | s tr (Σ ( g ) uu ) + | I S | σ s ≤ no t Here, the first constraint guarantees that the maximum eigenvalue of ˜ A (cid:124) ˜ B ˜ A is bounded by β ∗ , which the objectiveminimizes. At program completion, we set Σ ( g ) uu = B − − Σ u | s , and the second constraints ensures that this is stillPSD. The final constraint bounds the MSE of the mechanism Σ ( g ) . Note that tr (Σ ( g ) uu ) + | I S | σ s = tr (Σ ( g ) ) . Thetrouble lies the last constraint. Our program variable is B , but the final linear constraint requires Σ ( g ) , which isexpressed using the inverse of B . This is not immediately available in the SDP framework.To make the final linear constraint available, we invert the above program using the observation that the maximumeigenvalue of ˜ A (cid:124) ˜ B ˜ A is the inverse of the minimum eigenvalue of ( ˜ A (cid:124) ˜ B ˜ A ) − . Instead of optimizing over B and / σ s , we optimize over B − and σ s . Since B − = Σ u | s + Σ ( g ) uu , we may now have a utility constraint directly on thetrace of Σ ( g ) . To make B − our program variable, we approximate ( ˜ A (cid:124) ˜ B ˜ A ) − with ˜ A − ˜ B − ˜ A − (cid:124) . First note that ˜ A ∈ R n ×| I S | , and has full column rank for the covariances we work with. So, ˜ A − = ( ˜ A (cid:124) ˜ A ) − ˜ A (cid:124) ∈ R ( | I S |× n ) is the ocation Trace Privacy Under Conditional Priors left inverse of ˜ A and is the least squares solution to ˜ A − ˜ A = ˜ A (cid:124) ˜ A − (cid:124) = I (we denote its transpose as ˜ A − (cid:124) ). It isalso the least squares solution to ˜ A ˜ A − = ˜ A − (cid:124) ˜ A (cid:124) = I . Thus, we have an approximation of the inverse ( ˜ A (cid:124) ˜ B ˜ A ) − : ( ˜ A (cid:124) ˜ B ˜ A ) ( ˜ A − ˜ B − ˜ A − (cid:124) ) ≈ ˜ A (cid:124) ˜ B ˜ B − ˜ A − (cid:124) = ˜ A (cid:124) ˜ A − (cid:124) ≈ I We now can optimize in terms of B − with the augmented matrix ˜ B − : ˜ B − = (cid:20) σ s I B − (cid:21) We then optimize the following SDP: max B − (cid:23) ,σ s ≥ β ∗ s.t. β ∗ I (cid:22) ˜ A − ˜ B − ˜ A − (cid:124) B − (cid:23) Σ u | s tr ( ˜ B ) − tr (Σ u | s ) ≤ no t Upon program completion we recover σ s and Σ ( g ) uu = B − − Σ u | s which we know is PSD due to the second constraint.The first constraint guarantees that the minimum eigenvalue of the approximated inverse is ≥ β ∗ , which theobjective maximizes. If the minimum eigenvalue of the approximate inverse is close to that of the true inverse, thenwe successfully minimize the maximum eigenvalue of ˜ A (cid:124) ˜ B ˜ A , and thus minimize the direct and indirect privacy loss.The third constraint limits the MSE of Σ ( g ) since tr ( ˜ B ) − tr (Σ u | s ) = ( tr (Σ ( g ) uu ) + | I S | σ s + tr (Σ u | s )) − tr (Σ u | s ) = tr (Σ ( g ) ) . By inverting ˜ A (cid:124) ˜ B ˜ A , this constraint is available in the SDP framework.By expressing the above program in terms of the variable Σ ( g ) instead of indirectly via B − and σ s , we get SDP A : SDP A : arg max Σ ( g ) (cid:23) β ∗ s.t. ˜ A − ˜ B − ˜ A − (cid:124) (cid:23) β ∗ Itr (Σ ( g ) ) ≤ no t It is straightforward to write this SDP in the form seem in Definition 7.1. The program variables x would be thediagonal and upper or lower triangular part of Σ ( g ) along with β ∗ . With some linear algebra, the first constraintcan be written in the form of F + (cid:80) ni =1 x i F i (cid:23) , and the second constraint can be written as Ax = b . With theuse of contemporary convex programming tools like CVXOPT (Vandenberghe, 2010) rewriting into this form isunnecessary. B SDP B takes a set of covariance matrices F = { Σ , . . . , Σ k } , each of which is designed to protect some secret set I Si , and returns a covariance matrix Σ ( g ) that preserves the privacy loss bound of each Σ i to each I Si . It does sowhile minimizing the utility loss of Σ ( g ) . This algorithm is also expressed as an SDP. It is based on the followingcorollary, which we have omitted from the main text: Corollary 7.2.2.
More PSD, More Private:
For a basic or compound secret denoted by indices I S , the CIP lossbound of Equation 5 provided by a Gaussian noise mechanism with covariance Σ ( g ) is lower than it would be forany Σ ( g ) (cid:48) ≺ Σ ( g ) .Proof. First note that if Σ ( g ) (cid:31) Σ ( g ) (cid:48) , then the same is true for its sub-matrices: Σ ( g ) ss (cid:31) Σ ( g ) ss (cid:48) Σ ( g ) uu (cid:31) Σ ( g ) uu (cid:48) unning heading author breaks the line Recall the privacy loss bound of Equation 5: ε ≤ λ Sr (cid:16) σ s + α ∗ (cid:17) Also recall that Σ ( g ) ss = σ s I and Σ ( g ) ss (cid:48) = σ s (cid:48) I . Since Σ ( g ) ss (cid:31) Σ ( g ) ss (cid:48) , we already know that σ s > σ s (cid:48) , and thus thefirst term of Equation 5 is lower for Σ ( g ) .It remains to show that the second term is also lower, α ∗ < α ∗(cid:48) . Starting with what we’re given, Σ ( g ) uu (cid:31) Σ ( g ) uu (cid:48) Σ ( g ) uu + Σ u | s (cid:31) Σ ( g ) uu (cid:48) + Σ u | s (Σ ( g ) uu + Σ u | s ) − ≺ (Σ ( g ) uu (cid:48) + Σ u | s ) − B ≺ B (cid:48) A (cid:124) BA ≺ A (cid:124) B (cid:48) A max eig ( A (cid:124) BA ) < max eig ( A (cid:124) B (cid:48) A ) α ∗ < α ∗(cid:48) Therefore σ s + α ∗ < σ s (cid:48) + α ∗(cid:48) , and the CIP bound of Equation 5 is lower for Σ ( g ) than it is for Σ ( g ) (cid:48) .With Corollary 7.2.2 in mind, SDP B is natural: SDP B : arg min Σ ( g ) tr (Σ ( g ) ) s.t. Σ ( g ) (cid:23) Σ ( g ) i , ∀ Σ ( g ) i ∈ F SDP B attempts to minimize, but does not constrain, the utility loss of the chosen Σ ( g ) . To provide an upperbound on the resulting utility loss, we provided the following claim in the main text: Claim
Utility loss of SDP B : The utility loss of Σ ( g ) = SDP B ( F ) is no greater than (cid:80) Σ i ∈F tr (Σ i ) .Proof. The covariance Σ ( g ) (cid:48) = (cid:80) Σ ( g ) i ∈F Σ ( g ) i with MSE (cid:80) Σ ( g ) i ∈F tr (Σ ( g ) i ) is in the feasible set of SDP B problemsince Σ ( g ) (cid:48) (cid:23) Σ ( g ) i , ∀ Σ ( g ) i ∈ F . Unless Σ ( g ) (cid:48) has the lowest MSE of all Σ ( g ) in the feasible set, a covariance matrixwith better utility will be chosen. Multiple Secrets combines SDP A and SDP B to minimize the privacy loss to each basic secret within a trace.The basic mechanism is useful in cases when inferences at each time within the trace — each basic secret — issensitive.Let I Si be the secret set representing basic secret i , of which there are N (e.g. if location is sampled at N times).Then I S b = { I S , . . . , I SN } contains the indices corresponding to each. Multiple Secrets works by first producing N covariance matrices, Σ ( g ) i = SDP A ( I Si , Σ , o t ) on each basic secret. It then uses SDP B ( F = { Σ ( g )1 , . . . , Σ ( g ) N } ) toproduce a single covariance matrix Σ ( g ) that preserves the privacy loss to each basic secret (note that, being basicsecrets, the privacy loss bound that SIG OPT optimizes is tight).By virtue of using SDP B , the MSE of the resultant Σ ( g ) is minimized but not constrained. To bound the MSE ofthe Basic Mechanism by O , we may simply bound the MSE of each Σ ( g ) i by o t = O / N . Then, by the above Claim,the MSE of the solution cannot be greater than O . In practice, this bound may be too loose. We hope to tightenit in future work. ocation Trace Privacy Under Conditional Priors We use a 2d location trace and a 1d home temperature dataset. For the location data, having observed thatthe correlation between latitude and longitude is low ( ≈ . ) we treat each dimension as independent. By wayof Corollary 7.2.1, this allows us to bound privacy loss and design mechanisms for each dimension separately.Furthermore, having observed that each dimension fits the nearly the same conditional prior, we treat our datasetof 10k 2-dimensional traces as a dataset of 20k 1-dimensional traces, where each trace represents one dimensionof a 2d location trajectory.The one-dimensional traces of temperature and location are indexed by timestamps, for which we would use thefollowing kernel functions: k RBF ( t i , t j ) = σ x exp (cid:16) − ( t i − t j ) l (cid:17) k PER ( t i , t j ) = σ x exp (cid:16) − ( π | t i − t j | /p ) l (cid:17) (6)to determine the covariance between two points sampled at times t i and t j . The parameters including variance σ x and length scale l . The lengthscale determines the window of time in which two sampled points are highlycorrelated. Preprocessing of location data
We first limit the dataset to traces of under 50 locations that are between4.5 and 5.5 minutes in duration. Caring only about the conditional dependence between locations, we thende-mean each trace and normalize its variance to one. Normalizing the variance of traces implicitly sets σ x = 1 inthe above RBF kernel, in essence assuming that the adversary has a decent prior for the user’s average speed in agiven trace, and could do the same operation. Fitting of location data
We then find the maximum likelihood RBF kernel for each distinct trace. Havingfixed the variance σ x , this amounts to fitting only the length scale for each dimension, l x and l y , individually.The length scale represents the average window of time during which neighboring locations are highly correlated(i.e. correlation > . ). Relatively smooth traces will have large length scales and chaotic traces will have lowlength scales. However, the fact that sampling rates vary significantly between traces means that traces withequal length scales can have very different degrees of correlation. To encapsulate both of these effects, we studythe empirical distribution of effective length scale of each trace l eff ,x = l x P l eff ,y = l y P where P is the trace’s sampling period and l x , l y are the its optimal length scales. l eff ,x , l eff ,y tell us the averagenumber of neighboring locations that are highly correlated, instead of time period. For instance, a given trace withan optimal l eff ,x = 8 tells us that every eight neighboring location samples in the x dimension have correlation > . . The empirical distribution of effective length scales across all traces describes – over a range of loggingdevices (sampling rates), users, and movement patterns – how many neighboring points are highly correlatedin location trace data. After this preprocessing, we are able to use the kernels that take indices (not time) asarguments. k RBF ( i, j ) = exp (cid:16) − ( i − j ) l eff (cid:17) k PER ( i, j ) = exp (cid:16) − ( π | i − j | /p ) l eff (cid:17) In each plot we then observed a spectrum of conditional priors by sweeping the effective length scale and plottingposterior uncertainty for various noise mechanisms of equal utility loss. This ranges from a prior assuming nearlyindependent location samples (chaotic trace) on the left up to highly dependent location samples (traveling in astraight line or standing still) on the right. To understand how realistic these conditional prior parameters are,we displayed the middle 50% of the empirical distribution of l eff ( x and y together) from the GeoLife dataset.Note that the distribution of l eff x and l eff y are nearly identical.To compute posterior uncertainty, we consider a 50-point one-dimensional location trace. The basic secret isa single index in the middle of the trace, and the compound secret consists of two neighboring indices also in unning heading author breaks the line the middle of trace. For each value of l eff , we compute the R × conditional prior covariance matrix Σ usingthe RBF kernel above. We then compare the posterior uncertainty when Σ ( g ) is an Approach C baseline, oran optimized covariance matrix using one of the three algorithms. We re-optimize Σ ( g ) for each l eff , since each l eff represents a different conditional prior class. The MSE is fixed in all figures except the two exhibiting “AllBasic Secrets”, where SDP B is used. Recall that this algorithm minimizes utility loss while maintaining a seriesof privacy guarantees. Here, the MSE is identical across mechanisms for each l eff , but changes from one l eff toanother.For the temperature data, our preprocessing steps were nearly identical, except we use the periodic kernel insteadof the RBF kernel, and we did not need to remove any traces from the dataset, as the data was much cleaner. Computation of Posterior Uncertainty Interval
Each of the plots in