[PDF] Location Trace Privacy Under Conditional Priors

Abstract

Providing meaningful privacy to users of location based services is particularly challenging when multiple locations are revealed in a short period of time. This is primarily due to the tremendous degree of dependence that can be anticipated between points. We propose a R\'enyi divergence based privacy framework for bounding expected privacy loss for conditionally dependent data. Additionally, we demonstrate an algorithm for achieving this privacy under Gaussian process conditional priors. This framework both exemplifies why conditionally dependent data is so challenging to protect and offers a strategy for preserving privacy to within a fixed radius for sensitive locations in a user's trace.

Full PDF

LLocation Trace Privacy Under Conditional Priors

Casey Meehan [email protected]

Kamalika Chaudhuri [email protected]

UC San Diego UC San Diego

Abstract

Providing meaningful privacy to users of loca-tion based services is particularly challengingwhen multiple locations are revealed in a shortperiod of time. This is primarily due to thetremendous degree of dependence that canbe anticipated between points. We proposea Rényi divergence based privacy frameworkfor bounding expected privacy loss for con-ditionally dependent data. Additionally, wedemonstrate an algorithm for achieving thisprivacy under Gaussian process conditionalpriors. This framework both exempliﬁes whyconditionally dependent data is so challengingto protect and oﬀers a strategy for preservingprivacy to within a ﬁxed radius for sensitivelocations in a user’s trace.

Location data is acutely sensitive information, detail-ing where we live, work, eat, shop, worship, and oftenwhen, too. Yet increasingly, location data is being up-loaded for smartphone services such as ride hailing andweather forecasting and then being brokered in a thriv-ing user location aftermarket to advertisers and eveninvestors (Valentino-DeVryes, 2018). Users share loca-tion ‘traces’ when they release a sequence of locations,often across a short period of time. These traces arethen used by central servers to monitor traﬃc trends,track individual ﬁtness, target marketing, and even tostudy the eﬀectiveness of social-distancing ordinances(Fowler, 2020). Here, we aim to provide a local privacyguarantee, wherein traces are sanitized at the user levelbefore being transmitted to a centralized service. Notethat this requires diﬀerent guarantees and mechanisms

Proceedings of the 24 th International Conference on Artiﬁ-cial Intelligence and Statistics (AISTATS) 2021, San Diego,California, USA. PMLR: Volume 130. Copyright 2021 bythe author(s). than in aggregate applications making queries on largelocation trace databases.Speciﬁcally, we guarantee a radius r of privacy at anysensitive time point or combination of time pointswithin a given trace. This is challenging due to thefact that the locations within traces are highly inter-dependent. Informally, traces tend to follow relativelysmooth trajectories in time. If not sanitized carefully,that knowledge alone may be exploited to infer actuallocations from the released version of the trace. Thiswork centers on designing meaningful privacy deﬁni-tions and corresponding mechanisms that takes thisdependence into account.Broadly speaking, the vast majority of prior work onrigorous data privacy can be divided into two classesthat diﬀer by the kind of guarantee oﬀered: diﬀerentialand inferential privacy. Diﬀerential privacy (DP) guar-antees that the participation of a single person in adataset does not change the probability of any outcomeby much. In contrast, inferential privacy guaranteesthat an adversary who has a certain degree of priorknowledge cannot make certain sensitive inferences.DP for releasing aggregate statistics of a spatio-temporal dataset has been well studied (Fan et al.,2013; Cao et al., 2017; Yang et al., 2015; dep). There,the idea is to add enough noise to released statisticssuch that the eﬀect of any user’s participation is ob-scured, even if their locations are highly correlated toeach other or to those of other users. Here, such aguarantee does not apply since we aim to release asanitized version of a single user’s trace.In this local case we cannot rule out the possibilitythat the data curator knows who each individual isand who participated. Instead, we want to guaranteethat event level information about each trace remainsprivate. In this work, at any sensitive time t we maskwhether the user visited location A or location B forany A,B less than r apart. Without ad hoc modiﬁca-tions, standard DP tools are insuﬃcient for achievingthis for the primary reasons that 1) the domain oflocation is virtually unbounded and 2) locations are a r X i v : . [ c s . A I] F e b ocation Trace Privacy Under Conditional Priors highly dependent across a short period of time. To seethis, consider the following instinctual approaches toachieving location trace privacy. Approach A: apply Local Diﬀerential Privacy(LDP) to each trace. Imagine a dataset of traces, eachfrom a separate individual. Applying LDP impliesthat every trace has nearly the same probability ofreleasing the same sanitized version. This would berobust to arbitrary side information about dependencebetween locations in any one trace. Unfortunately, theamount of additive noise needed to achieve this woulddestroy nearly all utility: sanitized traces from Califor-nia would have almost the same probability of showingup in Connecticut as do those from New York. Evenif we constrained the domain to just Manhattan, thisdeﬁnition would not permit enough utility to performe.g. traﬃc monitoring.

Approach B: apply LDP to each location within atrace. To preserve some utility, imagine a single traceas a dataset of n locations, each of which enjoys ε -LDP guarantees. This alone is not robust to arbitrarydependence between locations. By the logic of groupLDP, it does satisfy kε -LDP regardless of the depen-dence between any k locations. This approach has twosetbacks. First, how to set k is unclear. Technically,all points in the trace are correlated, so to ward oﬀworst-case correlations one might set it to the lengthof the trace, which is identical to Approach A. Second,even if location is bounded to a single city or county,satisfying this deﬁnition would still destroy nearly allutility. We cannot use sanitized traces for traﬃc moni-toring if locations from either side of town have aboutsame probability of being sanitized to the same value. Approach C: apply LDP guarantees to each lo-cation within a trace, but only within any regionless than width r . This deﬁnition is known as Geo-Indistinguishability (GI) (Andrés et al., 2012). GIprovides a substitute for restricting the domain of lo-cation allowing us to salvage some utility. Here, onlylocations within r of each other are required to have ε -LDP guarantees. In DP parlance, we might say that‘neighboring traces’ have one location altered by ≤ r and are identical everywhere else. This gives us theguarantee we want for a trace with one location, butnot with more than one location. To see why, comparewith Approach B. Analogously, ( ε, r ) -GI along a traceprovides ( kε, r ) -GI to any subset of k locations. LikeApproach B, setting k is unclear. Yet unlike ApproachB, GI is not resistant to arbitrary dependence betweenany k locations. Any dependence where a change in oneor more location(s) by r implies a change in some otherlocation(s) by ≥ r breaks the GI guarantee. Even with the simplest models of dependence (e.g. if we know thetrue trace ought to move in a straight line) this is aproblem.To reiterate, applying LDP to traces or to locationswithin traces (Approaches A & B) does not providea principled method for meaningful privacy with rea-sonable utility. GI adapts LDP by giving guaranteesonly within a radius r . But in relaxing LDP, GI com-promises the standard DP tools for handling obviousdependences between data-points like group DP. In oureyes, this warrants an inferentially private approach.Here, we continue to provide privacy within a radius r , thus allowing for utility. Yet instead of providingresistance to arbitrary dependence across any k loca-tions, we aim to provide resistance to natural modelsof dependence between all locations. One may viewsuch models as an adversary’s prior beliefs about whattraces are likely, like the straight-line prior mentionedearlier.In contrast with diﬀerential privacy, providing inferen-tial privacy guarantees is more complex, and has beenless studied. It is however appropriate for applicationssuch as ours, where information must be released basedon a single person’s data, the features of which are pri-vate and dependent. Kifer & Machanavajjhala (2014)provide a formal inferential privacy framework calledPuﬀerﬁsh, and design mechanisms for speciﬁc Puﬀer-ﬁsh instances. As these instances do not apply to oursetting, we adapt the Puﬀerﬁsh framework to locationprivacy and more broadly to releasing any sequence ofreal-valued private information. Contributions:

In this work, we propose an inferen-tially private approach to guaranteeing a radius r ofprivacy for sensitive points in location traces in threeparts:• First, we propose an adaptable privacy frameworktailored to sequences of highly dependent data-points that adapts Puﬀerﬁsh privacy (Kifer &Machanavajjhala, 2014) to use Rényi DiﬀerentialPrivacy (RDP) (Mironov, 2017). Given a model ofdependence between points, this framework moreappropriately estimates the risk of inference withinradius r on points of interest than do vanilla LDPapproaches.• We then demonstrate how to implement our frame-work for the highly ﬂexible and expressive settingof Gaussian process (GP) priors. These nonpara-metric models capture the spatiotemporal aspectof location data (Liang & Haas, 1999; Liu et al.,1998; Chen et al., 2015). GPs have a natural syn-ergy with Rényi privacy enabling an interpretableupper bound on privacy loss for additive Gaussian unning heading author breaks the line X Z X Z X Z X Z (a) X I S Z I S X I U Z I U (b)Figure 1: (a) An example graphical model of a four point trace X . (b) The more general grouped version of the model in(a), with the secret set X I S = { X , X } and the remaining set X I U = { X , X } . privacy mechanisms (that add Gaussian noise toeach point). Using this, we design a semideﬁniteprogram (SDP) that optimizes the correlation ofsuch mechanisms to minimize privacy loss withoutdestroying utility, eﬃciently thwarting the infer-ence of sensitive locations.• Finally, we provide experiments on both locationtrace and home temperature data to demonstratethe advantage of these techniques over ApproachC mechanisms like GI. We ﬁnd that our mecha-nisms successfully obscure sensitive locations whilerespecting utility constraints, even when the priormodel is misspeciﬁed.Ultimately, by resisting only reasonable kinds of depen-dence in the data we are able to oﬀer both meaningfulprivacy and utility. We show that our framework is ro-bust to misspeciﬁcation of this reasonable dependenceand oﬀers a privacy loss that is both tractable andinterpretable. A user transmits a sequence of N n real-valued random variables X = { X , X , . . . , X n } .A trace of 10 2d locations has n = 2 ×

10 = 20 randomvariables X i . Instead of releasing the raw trace X , theuser releases a private version Z = { Z , Z , . . . , Z n } ,by way of an additive noise mechanism Z = X + G ,where G = { G , G , . . . , G n } is random noise producedby a privacy mechanism.An adversary, receiving the obscured trace Z , then rea-sons about the true locations at some sensitive time(s).To reference the sensitive times, we use index set I S . Ifthe sensitive indices are I S = { , } , the correspondinglocation values are X I S = { X , X } (e.g. referring tothe two coordinates of one location). When inferringthe true value of X I S , the adversary makes use of theremaining points in the trace at indices I U = [ n ] \ I S , de-noted X I U , with obscured values Z I U . This separation of points into X I S and X I U is represented in Figure 1 .We use location as a guiding example, but such inter-dependent traces X could take the form of home tem-perature time series data or spatial data like 3D facialmaps used for identiﬁcation. Going forward, we willcontinue to denote X = { X , X , . . . , X n } with theunderstanding that any subsequence of d points e.g. X I S = { X , X , . . . } could represent a d -dimensionalsensitive value, or N d points could represent

N d -dimensional sensitive values.For the real-valued distributions considered here, P × ( • ) refers to a density of distribution × on r.v. • and P × ( •|∗ ) is its regular conditional density given ∗ . GI limits what can be inferred about the sensitive X I S from its corresponding Z I S , but not from the re-maining locations Z I U . To do so we need a privacydeﬁnition that speciﬁes what events of random vari-able X I S we wish to obscure, which realistic priors ofinter-dependence to protect against, and a privacy loss. We borrow heavily from the Puﬀerﬁsh framework (Kifer& Machanavajjhala, 2014), and specialize it for thesetting of location traces. We deﬁne our own set of secrets — the collection of events we wish to obscure —and discriminative pairs , the pairs of secret events wedo not want an adversary to tell between.

Basic Secrets & Pairs

After releasing Z , we donot want an adversary with a reasonable prior on X , P ∈ Θ , to have sharp posterior beliefs about the user’slocation at some sensitive time (e.g. one of the sensitivetimes in Figure 3 of Appendix 7.1). As such, theadversary cannot distinguish whether the user visitedlocation A or some nearby location B at that time.Let x s ∈ R represent a possible assignments to X I S ,hypothesizing the true sensitive location. Any suchassignment is secret, S = { X I S = x s : x s ∈ R } .Speciﬁcally, we want the posterior probability of any ocation Trace Privacy Under Conditional Priors two assignments to X I S within a radius r to be close: S pairs = { ( x s , x (cid:48) s ) : (cid:107) x s − x (cid:48) s (cid:107) ≤ r } . This protects asingle time within a trace of locations. More generally,in the context of spatiotemporal data of any dimension,we call this a basic secret . Compound Secrets & Pairs

Suppose we havethree sensitive times (again as in

Figure 3 ). A mech-anism that blocks inference on each of these separatelydoes not prevent inference on the combination of themsimultaneously. To obscure hypotheses on all three ofthese, we modify our set of secrets to any combinationof assignments to each secret location: S = (cid:8) { X I S = x s } ∩ { X I S = x s } ∩ { X I S = x s } : x si ∈ R , i ∈ [3] (cid:9) . Now, the set of discriminative pairs is any two assign-ments to all three secret locations: S pairs = (cid:110)(cid:0) { x s , x s , x s } , { x (cid:48) s , x (cid:48) s , x (cid:48) s } (cid:1) : (cid:107) x si − x (cid:48) si (cid:107) ≤ r, i ∈ [3] (cid:111) This protects against compound hypotheses: if daycareand work are within r of each other, this keeps anadversary from inferring X I S = ‘daycare’ and X I S = ‘work’ versus X I S = ‘work’ and X I S = ‘daycare’.More generally, in the context of spatiotemporal dataof any dimension, we call this a compound secret . Intu-itively, a mechanism that protects a compound secretof locations close together in time prevents a Bayesianadversary from leveraging the remainder of the trace toinfer direction of motion at those sensitive times. Notethat bounding the privacy loss of a compound secretdoes not bound the privacy loss of its constituent basicsecrets.Going forward, we refer to I S as the ‘secret set’. For the purpose of location privacy, it is importantto choose a prior class Θ such that the conditionaldistribution P P ( X I U | X I S ) is simple to compute for anysecret set I S and any prior P ∈ Θ . Of course, it isalso critical that the prior class naturally models thedata, and thus consists of ‘reasonable assumptions’ foradversaries. GPs satisfy both these requirements. Wemodel a full d -dimensional trace sampled at N timesby ‘unrolling’ it into a n = dN dimensional GP. Deﬁnition 2.1.

Gaussian process

A trace X is a Gaus-sian process if X I M has a multivariate normal distribu-tion for any set of indices I M ⊂ [ n ] . If X is a gaussianprocess, then the function i → E [ X i ] is called the meanfunction and the function ( i, j ) → Cov ( X i , X j ) is calledthe kernel function. In this work, the kernel uses locations’ time stamps tocompute their covariance ( t i , t j ) → Cov ( X i , X j ) , butgenerally could use any side information provided witheach location.GPs have simple, closed form conditional distributions.Let X ∼ N ( µ, Σ) , where µ ∈ R n and Σ ∈ R n × n . Then,the random variable X I U |{ X I S = x s } ∼ N ( µ u | s , Σ u | s ) ,where µ u | s = µ u + Σ us Σ − ss ( x s − µ s ) and Σ u | s =Σ uu − Σ us Σ − ss Σ su . Here, µ s denotes the mean vector µ accessed at indices I S and Σ su denotes the covariancematrix Σ accessed at rows I S and columns I U .For GP priors, we will use additive noise G ∼N ( , Σ ( g ) ) . Thus Z = X + G , too, is multivariatenormal. Furthermore, the distribution of any set ofvariables conditioned on any other set of variables in Figure 1 belongs to some multivariate normal distri-bution.GPs have been shown to successfully model mobility(Chen et al., 2015; Liang & Haas, 1999; Liu et al., 1998),even in the domain of surveillance video (Kim et al.,2011). Furthermore, although these non-parametricmodels are characterized by second order statistics, GPsare capable of complexity rivaling that of deep neuralnetworks (Lee et al., 2018), allowing for scalability tomore complex models and domains. Our proposedresults and algorithms may be applied regardless of thecomplexity of the chosen GP.

In the following section, we propose a privacy deﬁ-nition that adapts Rényi Diﬀerential Privacy (RDP)(Mironov, 2017) to the Puﬀerﬁsh framework. RDPresembles Diﬀerential Privacy (Dwork, 2006), exceptinstead of bounding the maximum probability ratio or max divergence of the distribution on outputs for twoneighboring databases, it bounds the

Rényi divergence of order λ , deﬁned in Equation (1) for distributions P and P . The Rényi divergence bears a nice syn-ergy with Gaussian processes. If P = N ( µ , Σ) and P = N ( µ , Σ) — two mean-shifted normal distribu-tions — the Rényi divergence takes on a simple closedform shown in Equation (2). D λ (cid:18) P P (cid:19) = 1 λ − E x ∼P (cid:16) P P ( X = x ) P P ( X = x ) (cid:17) λ (1) = λ µ − µ ) (cid:124) Σ − ( µ − µ ) (2)We will make use of this in deﬁning and boundingprivacy loss in the next section. unning heading author breaks the line We now propose a privacy framework that is tailoredto sequences of correlated data, Conditional Inferen-tial Privacy (CIP). CIP guarantees a radius r of in-distinguishability for the basic or compound secretsassociated with any secret set I S . Speciﬁcally, CIPprotects against any adversary with a speciﬁc prior on the shape of the trace, and is agnostic to their prior onthe absolute location of the trace. We call the set ofsuch prior distributions a Conditional Prior Class. Deﬁnition 3.1.

Conditional Prior Class

For X = { X , . . . , X n } , prior distributions P i , P j on X are saidto belong to the same conditional prior class Θ ifa constant shift in the conditioned x s results in aconstant shift on the distribution of X I U . Formally,if conditional distributions P P i ( X I U | X I S = x s ) = P P j ( X I U + c uij I S | X I S = x s + c sij I S ) for all x s .For instance, prior P P i may concentrate probabilityon traces passing through Los Angeles, while P P j con-centrates on traces passing through London. Condi-tioning on each secret in the pair ( x s , x (cid:48) s ) in L.A. isanalogous to conditioning on each secret in the pair ( x s + c sij I S , x (cid:48) s + c sij I S ) in London. The correspondingpair of conditional distributions on X I U in London( P P j ) are copies of those in L.A. ( P P i ) shifted by c uij I S .What matters is that the set of all pairs of conditionaldistributions under P P i induced by secret pairs ( x s , x (cid:48) s ) is identical to those under P P j up to a mean shift.See Appendix 7.5 for a more detailed discussion ofconditional prior classes. Deﬁnition 3.2. ( ε, λ ) -Conditional Inferential Privacy ( S pairs , r, Θ) Given compound or basic discriminativepairs S pairs associated with I S , a radius of privacy r ,a conditional prior class, Θ , and a privacy parameter, ε > , a privacy mechanism Z = A ( X ) satisﬁes ( ε, λ ) -CIP ( S pairs , r, Θ) if for all ( s i , s j ) ∈ S pairs , and all priordistributions P ∈ Θ , where P P ( s i ) , P P ( s j ) > , D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) ≤ ε (3)CIP departs from DP type notions of privacy like Ap-proaches A → C primarily by resisting only a restrictedclass of inter-dependence — the conditional prior class— as opposed to arbitrary dependence of any k locations.Unlike approaches A and B, we are able to preserveutility for tasks like traﬃc monitoring. Unlike approachC, CIP is still resistant to realistic models of locationinter-dependence.While this deﬁnition borrows heavily from the Puﬀerﬁshframework, it has a few key modiﬁcations. Puﬀerﬁsh isgenerally described from a central, not local model. Wespecialize the kinds of secrets and discriminative pairs for the case of local location trace privacy. Addition-ally, we specialize the type of prior distribution classneeded for this local setting: the conditional prior class.Finally, we relax the strict max divergence (max logodds) criterion of the Puﬀerﬁsh deﬁnition to a Rényidivergence. This guarantees that — with high prob-ability on draws of realistic traces Z | X I S — the logodds will be bounded by ε . As λ → ∞ , the log oddsare bounded for all traces, i.e. the max divergence isbounded. We formalize this in Theorem 3.1.The Rényi criterion of CIP greatly improves its ﬂexibil-ity. Unlike the standard DP Approaches A → C whichonly take probabilities over the mechanism, we do nothave full control over the randomness at play: it ispartially from A deﬁned by us and from P intrinsicto the data. Unlike max divergence, Rényi divergenceis available in closed form for many distributions, al-lowing for a more ﬂexible privacy framework. The λ parameter helps us tune how strict a CIP deﬁnition isand how much noise we need to add. This allows us todesign mechanisms that are resistant to natural modelsof dependence while preserving utility. We now identify key properties that make the CIPguarantee interpretable and robust.

Interpretability:

CIP guarantees that a Bayesianadversary with any prior distribution on traces P inthe conditional prior class Θ does not learn much aboutbasic or compound secrets from the released trace Z .For basic secrets, this means that the adversary’s pos-terior beliefs regarding sensitive location X I S are notmuch sharper than their prior beliefs before witnessing Z . Theorem 3.1.

Prior-Posterior Gap: An ( ε, λ ) -CIPmechanism with conditional prior class Θ guaranteesthat for any event O on sanitized trace Z (cid:12)(cid:12)(cid:12)(cid:12) log P P , A ( s i | Z ∈ O ) P P , A ( s j | Z ∈ O ) − log P P ( s i ) P P ( s j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:48) for any P ∈ Θ with probability ≥ − δ over draws of Z | X I S = s i or Z | X I S = s j , where ε (cid:48) and δ are relatedby ε (cid:48) = ε + log / δ λ − . This holds under the condition that Z | X I S = s i and Z | X I S = s j have identical support. A CIP mechanism depends only on the conditional priordescribing the data, not the data itself. Suppose anadversary’s prior beliefs on X I S are uniform over some ocation Trace Privacy Under Conditional Priors region. For λ = 5 and ε = 0 . , there is only a ≈ chance that their posterior odds on s i , s j will be morethan 3.5, and a ≈ chance that they will be morethan 2. This ‘chance’ is over draws of likely remaininglocations X I U and the additive noise G . Proofs of allresults are in Appendix 7.2.For additive noise mechanisms like A ( X ) = X + G = Z ,the CIP loss can be split into two terms: one accountingfor the direct privacy loss of Z I S on X I S and a secondaccounting for the inferential privacy loss of Z I U on X I S via X I U . Lemma 3.2.

Conditional Independence

For an ad-ditive noise mechanism, a fully dependent trace as in

Figure 1a , and any prior P on X the CIP loss maybe expressed as D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) (4) = (cid:88) i ∈ I S (cid:20) D λ (cid:18) P A ( Z i | X i = s i ) P A ( Z i | X i = s j ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = s i ) P A , P ( Z I U | X I S = s j ) (cid:19) One interpretation of GI is that it assumes all locations X i are independent. In this case, the second term van-ishes and the privacy loss only depends on randomnessof the mechanism, not the prior. Robustness:

Kifer & Machanavajjhala (2011) showthat it is impossible to achieve both utility and privacyresistant to all priors. CIP provides resistance to areasonable class of priors

P ∈ Θ , but it is possiblethat the true distribution Q / ∈ Θ . In this case, theprivacy guarantees degrade gracefully as the divergencebetween Q and P ∈ Θ grows. Theorem 3.3.

Robustness to Prior Misspeciﬁcation

Mechanism A satisﬁes ε ( λ ) -CIP for prior class Θ . Sup-pose the ﬁnite mean true distribution Q is not in Θ .The CIP loss of A against prior Q is bounded by D λ (cid:18) P A , Q ( Z | X I S = s i ) P A , Q ( Z | X I S = s j ) (cid:19) ≤ ε (cid:48) ( λ ) where ε (cid:48) ( λ ) = λ − λ − λ ) + ∆(4 λ −

3) + 2 λ − λ − ε (4 λ − and where ∆( λ ) is inf P∈ Θ sup s i ∈S max (cid:26) D λ (cid:18) P P ( X I U | X I S = s i ) P Q ( X I U | X I S = s i ) (cid:19) , D λ (cid:18) P Q ( X I U | X I S = s i ) P P ( X I U | X I S = s i ) (cid:19)(cid:27) As long as the conditional distribution on X I U | X I S = s i of prior Q is close to that of some P ∈ Θ , the privacyguarantees should change only marginally. This boundis tightest when ε ( λ ) does not grow quickly with order λ . A GP conditional prior class is the set of all GP priordistributions with the same kernel function ( i, j ) → Cov ( X i , X j ) and any mean function i → E [ X i ] . Withan additive Gaussian mechanism G ∼ N ( , Σ ( g ) ) , theCIP loss of Equation (4) can be bounded for any GPconditional prior class. See Appendix 7.5 for furtherdiscussion of the GP conditional prior class. Theorem 3.4.

CIP loss bound for GP conditionalpriors:

Let Θ be a GP conditional prior class. Let Σ be the covariance matrix for X produced by its kernelfunction. Let S be the basic or compound secret asso-ciated with I S , and S be the number of unique timesin I S . The mechanism A ( X ) = X + G = Z , where G ∼ N ( , Σ ( g ) ) , then satisﬁes ( ε, λ ) -Conditional Infer-ential Privacy ( S pairs , r, Θ) , where ε ≤ λ Sr (cid:16) σ s + α ∗ (cid:17) (5) where σ s is the variance of each G i ∈ G I S (diagonalentries of Σ ( g ) ss ) and α ∗ is the maximum eigenvalue of Σ eﬀ = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) . The above bound is tight for basic secrets ( S = 1 ). Thetwo terms of Equation (5) represent the direct ( σ s ) and inferential ( α ∗ ) loss terms of Equation (4). Weassume that each diagonal entry of Σ ( g ) ss equals some σ s , so that each X i ∈ X I S experiences identical directprivacy loss, which is optimal under utility constraints.The above bound composes gracefully when multipletraces of an individual are released. Corollary 3.4.1.

Graceful Composition in Time

Sup-pose a user releases two traces X and ˆ X with additivenoise G ∼ N ( , Σ ( g ) ) and ˆ G ∼ N ( , ˆΣ ( g ) ) , respectively.Then basic or compound secret X I S of X enjoys (¯ ε, λ ) -CIP, where ¯ ε ≤ λ Sr (cid:16) σ s + ¯ α ∗ (cid:17) and where ¯ α ∗ is the maximum eigenvalue of ¯Σ eﬀ = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + ¯Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) . Σ is the covari-ance matrix of the joint distribution on X, ˆ X and ¯Σ ( g ) = (cid:20) Σ ( g )

00 ˆΣ ( g ) . (cid:21) This bound is identical to that of Theorem 3.4, only us-ing the joint distribution over X , ˆ X and G, ˆ G . This pro-vides some insight to the fact that, unlike DP, even par-allel composition guarantees are not automatic. Com-position depends on the conditional prior. In the GP unning heading author breaks the line setting, if the chosen kernel function decays over time,we can expect composition to have minimal eﬀects onprivacy for traces separated by long durations.To reduce the upper bound of Theorem 3.4, we optimizethe correlation (oﬀ-diagonal) of Σ ( g ) to minimize α ∗ ,and optimize its variance (diagonal) to balance a noisebudget between lowering inferential ( α ∗ ) and direct( σ s ) loss. Theorem 3.4 characterizes the privacy loss for GP con-ditional priors. We next show how to use this The-orem to design mechanisms that can strategically re-duce CIP loss given a utility constraint. We mea-sure ‘utility loss’ as the total mean squared error(MSE) between the released ( Z ) and true ( X ) traces:MSE (Σ ( g ) ) = (cid:80) ni =1 E [ Z i − X i ] = tr (Σ ( g ) ) . We boundthe utility loss by tr (Σ ( g ) ) ≤ no t , where o t is the aver-age per-point utility loss.It can be shown that optimizing the privacy loss underthis utility constraint can be described by a semideﬁ-nite program (SDP) (formalization/derivation of SDPsin Appendix 7.3). For a given trace X , deﬁne itscovariance matrix Σ using the the kernel of the GPconditional prior Σ ij = k ( i, j ) . Then pass Σ , the secretset I S , and the utility constraint o t to our ﬁrst program,SDP A , which returns noise covariance Σ ( g ) . This de-ﬁnes an additive noise mechanism G ∼ N (0 , Σ ( g ) ) thatminimizes CIP loss to I S . Σ ( g ) = SDP A (Σ , I S , o t ) We can thus use a SDP to minimize the CIP loss to anysingle compound or basic secret. However, a trace maycontain multiple locations or combinations thereof thatone wishes to protect. It remains to produce a singlemechanism Σ ( g ) that bounds the CIP loss to multiplebasic and/or compound secrets in a single trace.For this we propose SDP B , which uses the fact that if Σ ( g ) (cid:48) (cid:31) Σ ( g ) it will have lower CIP loss (see Appendix7.3.2). SDP B takes in a set of covariance matrices F = { Σ ( g )1 , . . . , Σ ( g ) m } , each designed to minimize CIPloss for a single compound or basic secret I Si . It thenreturns a single covariance matrix Σ ( g ) (cid:23) Σ ( g ) i , i ∈ [ m ] that maintains the privacy guarantee each Σ ( g ) i oﬀeredits corresponding I Si , while minimizing utility loss.In our experiments, we use Algorithm 1 to design asingle mechanism that protects all locations in the trace— all basic secrets — while minimizing utility loss. Algorithm 1:

Multiple Secrets

Input: I S , . . . , I Sm , o t , Σ Output: Σ ( g ) F = ∅ ; for i ∈ [ m ] do Σ ( g ) i = SDP A (Σ , I Si , o t ) ; F = F ∪ Σ ( g ) i ; end Σ ( g ) = SDP B ( F ) ; return Σ ( g ) ; Here, we aim to empirically answer: Do our SDPmechanisms maintain high posterior uncertainty ofsensitive locations? How do they compare to ApproachC baselines of equal MSE? How robust is the SDP A mechanism when the prior covariance Σ is misspeciﬁed? Methods

To answer these questions, we look at therange of conditional prior classes that ﬁt real-worlddata. For location trace data, we use the GeoLife GPSTrajectories dataset (Zheng et al., 2010) containing10k human mobility traces after preprocessing (seeAppendix 7.4 for details). We also consider the pri-vacy risk of room temperature data (Nef et al., 2015),using the SML2010 dataset (Zamora-Martinez et al.,2014), which contains approximately 40 days of roomtemperature data sampled every 15 minutes.For the location data, having observed that the corre-lation between latitude and longitude is low ( ≈ . )we treat each dimension as independent. By way ofCorollary 7.2.1, this allows us to bound privacy lossand design mechanisms for each dimension separately.Furthermore, having observed that each dimensionﬁts nearly the same conditional prior, we treat ourdataset of 10k 2-dimensional traces as a dataset of 20k1-dimensional traces, where each trace represents onedimension of a 2d location trajectory.We model the location trace data with a Radial BasisFunction (RBF) kernel GP and the temperature seriesdata with a periodic kernel GP: k RBF ( t i , t j ) = σ x exp (cid:16) − ( t i − t j ) l (cid:17) k PER ( t i , t j ) = σ x exp (cid:16) − ( π | t i − t j | /p ) l (cid:17) In both kernels, the intrinsic degree of dependence be-tween points is captured by the lengthscale l . However,the fact that sampling rates vary signiﬁcantly betweentraces means that traces with equal length scales can ocation Trace Privacy Under Conditional Priors(a) (b) (c) (d)(e) (f ) (g) (h)Figure 2: Posterior uncertainty interval (higher = better privacy) on X I S of a GP Bayesian adversary. A larger l eﬀ corresponds to greater inter-dependence and reduces posterior uncertainty. The gray interval depicts the middle 50% of theMLE l eﬀ among traces in each dataset, and the black dotted line the median l eﬀ . (a) → (c) , (e) → (g) show SDP mechanisms(blue) maintaining relatively high uncertainty compared to two GI (Approach C) baselines of equal utility (MSE). (d) , (h) show the (minor) change in posterior uncertainty when the prior covariance Σ used in SDP A is misspeciﬁed: when it isidentical to the true covariance Σ ∗ known to the adversary (blue), is more correlated (orange), or is less correlated (green). have very diﬀerent degrees of correlation. To encap-sulate both of these eﬀects, we study the empiricaldistribution of eﬀective length scale of each trace l eﬀ ,x = l x P l eﬀ ,y = l y P where P is the trace’s sampling period and l x , l y arethe its optimal length scales for each dimension. l eﬀ ,x , l eﬀ ,y tell us the average number of neighboringlocations that are highly correlated, instead of timeperiod. For instance, a given trace with an optimal l eﬀ ,x = 8 tells us that every eight neighboring locationsamples in the x dimension have correlation > . .The empirical distribution of eﬀective length scalesacross all traces describes – over a range of loggingdevices (sampling rates), users, and movement patterns– how many neighboring points are highly correlated inlocation trace data. After this preprocessing, we areable to use the kernels that take indices (not time) asarguments: k RBF ( i, j ) = exp (cid:16) − ( i − j ) l eﬀ (cid:17) k PER ( i, j ) = exp (cid:16) − ( π | i − j | /p ) l eﬀ (cid:17) See Appendix 7.4 for a more detailed discussion ofhow the empirical distribution of l eﬀ across traces ismeasured.To impart the range of realistic conditional priors thegray interval of each plot depicts the middle 50% of the empirical l eﬀ among traces in each dataset. Thedashed vertical line reports the median l eﬀ .Each ﬁgure increases the degree of dependence, l eﬀ ,used by the kernel to compute the prior covariance Σ( l eﬀ ) . Σ( l eﬀ ) is then used in one of the SDP routinesof Section 4 to produce a mechanism Σ ( g ) ( l eﬀ ) thatprotects a basic secret (SDP A ), a compound secret(SDP A ), or the union of all basic secrets (Multiple Se-crets). We then observe the 68% conﬁdence interval ofthe Gaussian posterior on sensitive points X I S (blueline). This is the σ uncertainty of a Bayesian adversarywith a GP prior represented by Σ( l eﬀ ) (see Appendix7.4 for how this is computed). As l eﬀ increases, theirposterior uncertainty will reduce. Our aim is to mit-igate this as much as possible with the given utilityconstraint. For scale, recall that prior variance diag (Σ) is normalized to one. In the case of all basic secrets, wereport the average posterior uncertainty over locations.We compare the SDP mechanisms with two mecha-nisms using the logic of Approach C (all three of equalMSE utility loss): independent/uniform and indepen-dent/concentrated . The uniform approach adds inde-pendent Gaussian noise evenly along the whole traceregardless of I S , Σ ( g ) = o t I . The concentrated ap-proach allocates the entire noise budget to the sensitiveset I S . Results

For our ﬁrst question, see

Figures 2a → → . For both location and temperature data, ourSDP mechanisms maintain higher posterior uncertainty unning heading author breaks the line than the baselines with identical utility cost for a sin-gle basic secret, a compound secret, and all basic se-crets. By actively considering the conditional prior classparametrized by Σ , the SDP mechanisms can strategizeto both correlate noise samples and concentrate noisepower such that posterior inference is thwarted at thesensitive set I S . For an intuitive illustration of thechosen Σ ( g ) ’s, see Appendix 7.1.2.To answer our second question, see Figures 2d and .When the prior covariance Σ does not represent thetrue data distribution known to the adversary, a smallerposterior uncertainty may be achieved. The orange lineindicates the uncertainty interval of an adversary whoknows the data is less correlated than we believe i.e.the true Σ ∗ = Σ(0 . l eﬀ ) . The blue line represents anadversary who knows the data is more correlated thanwe believe i.e. the true Σ ∗ = Σ(1 . l eﬀ ) . Both plotsconﬁrm the robustness of our privacy guarantees statedby Theorem 3.3. Particularly around the median l eﬀ we see that the change in posterior uncertainty withthis change in prior is indeed marginal. Related Work

Few works have proposed solutionsto the local guarantee when releasing individual traces.A mechanism oﬀered in Bindschaedler & Shokri (2016)releases synthesized traces satisfying the notion of plau-sible deniability (Bindschaedler et al., 2017), but this isdistinctly diﬀerent from providing a radius of privacyto sensitive locations. Meanwhile, the frameworks pro-posed in Xiao & Xiong (2015) and Cao et al. (2019)nicely characterize the risk of inference in locationtraces, but use only ﬁrst-order Markov models of cor-relation between points, do not oﬀer a radius of indis-tinguishability as in this work, and are not suited tocontinuous-valued spatiotemporal traces.Perhaps more technically similar to this work, Songet al. (2017) provide a general mechanism that appliesto any Puﬀerﬁsh framework, as well as a more compu-tationally eﬃcient mechanism that applies when thejoint distribution of an individual’s features can bedescribed by a graphical model. The ﬁrst is too compu-tationally intensive. The second is for discrete settings,and cannot accommodate spatiotemporal eﬀects.

Conclusion

This work proposes a framework forboth identifying and quantifying the inferential privacyrisk for highly dependent sequences of spatiotemporaldata. As a starting point, we have provided a simplebound on the privacy loss for Gaussian process priors,and an SDP-based privacy mechanism for minimizingthis bound without destroying utility. We hope to ex-tend this work to other data domains with diﬀerent conditional priors, and diﬀerent sets of secrets.

Acknowledgements

KC and CM would like to thank ONR under N00014-20-1-2334 and UC Lab Fees under LFR 18-548554 forresearch support. We would also like to thank ourreviewers for their insightful feedback. ocation Trace Privacy Under Conditional Priors

References

Dependence Makes You Vulnerable: Diﬀerential Pri-vacy Under Dependent Tuples. San Diego, CA. ISBN978-1-891562-41-9.Andrés, M. E., Bordenabe, N. E., Chatzikokolakis,K., and Palamidessi, C. Geo-indistinguishability:Diﬀerential privacy for location-based systems. arXivpreprint arXiv:1212.1984 , 2012.Bindschaedler, V. and Shokri, R. Synthesizing PlausiblePrivacy-Preserving Location Traces. In , pp. 546–563, May 2016. doi: 10.1109/SP.2016.39. ISSN:2375-1207.Bindschaedler, V., Shokri, R., and Gunter, C. A. Plau-sible deniability for privacy-preserving data syn-thesis.

Proceedings of the VLDB Endowment , 10(5):481–492, January 2017. ISSN 2150-8097. doi:10.14778/3055540.3055542. URL https://doi.org/10.14778/3055540.3055542 .Cao, Y., Yoshikawa, M., Xiao, Y., and Xiong, L. Quan-tifying Diﬀerential Privacy under Temporal Correla-tions. In , pp. 821–832, April2017. doi: 10.1109/ICDE.2017.132. ISSN: 2375-026X.Cao, Y., Xiao, Y., Xiong, L., and Bai, L. PriSTE: FromLocation Privacy to Spatiotemporal Event Privacy.In , pp. 1606–1609, April 2019. doi:10.1109/ICDE.2019.00153. ISSN: 2375-026X.Chen, J., Low, K. H., Yao, Y., and Jaillet, P. Gaus-sian Process Decentralized Data Fusion and ActiveSensing for Spatiotemporal Traﬃc Modeling andPrediction in Mobility-on-Demand Systems.

IEEETransactions on Automation Science and Engineer-ing , 12(3):901–921, July 2015. ISSN 1558-3783.doi: 10.1109/TASE.2015.2422852. Conference Name:IEEE Transactions on Automation Science and En-gineering.Dwork, C.

Diﬀerential Privacy , volume 4052.July 2006. ISBN 978-3-540-35907-4. URL .Fan, L., Xiong, L., and Sunderam, V. Diﬀerentiallyprivate multi-dimensional time series release for traf-ﬁc monitoring. In

IFIP Annual Conference on Dataand Applications Security and Privacy , pp. 33–48.Springer, 2013.Fowler, G. A. Perspective | Smartphonedata reveal which Americans are socialdistancing (and not).

Washington Post ,2020. ISSN 0190-8286. URL .Kifer, D. and Machanavajjhala, A. No free lunchin data privacy. In

Proceedings of the 2011 ACMSIGMOD International Conference on Manage-ment of data , SIGMOD ’11, pp. 193–204, Athens,Greece, June 2011. Association for Computing Ma-chinery. ISBN 978-1-4503-0661-4. doi: 10.1145/1989323.1989345. URL https://doi.org/10.1145/1989323.1989345 .Kifer, D. and Machanavajjhala, A. Puﬀerﬁsh: A frame-work for mathematical privacy deﬁnitions.

ACMTransactions on Database Systems (TODS) , 39(1):3,2014.Kim, K., Lee, D., and Essa, I. Gaussian process re-gression ﬂow for analysis of motion trajectories. In ,pp. 1164–1171, November 2011. doi: 10.1109/ICCV.2011.6126365. ISSN: 2380-7504.Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Pen-nington, J., and Sohl-Dickstein, J. Deep NeuralNetworks as Gaussian Processes. arXiv:1711.00165[cs, stat] , March 2018. URL http://arxiv.org/abs/1711.00165 . arXiv: 1711.00165.Liang, B. and Haas, Z. Predictive distance-based mo-bility management for PCS networks. In

IEEE IN-FOCOM ’99. Conference on Computer Communi-cations. Proceedings. Eighteenth Annual Joint Con-ference of the IEEE Computer and CommunicationsSocieties. The Future is Now (Cat. No.99CH36320) ,volume 3, pp. 1377–1384 vol.3, March 1999. doi:10.1109/INFCOM.1999.752157. ISSN: 0743-166X.Liu, T., Bahl, P., and Chlamtac, I. Mobility model-ing, location tracking, and trajectory prediction inwireless ATM networks.

IEEE Journal on SelectedAreas in Communications , 16(6):922–936, August1998. ISSN 1558-0008. doi: 10.1109/49.709453. Con-ference Name: IEEE Journal on Selected Areas inCommunications.Mironov, I. Rényi diﬀerential privacy. In , pp. 263–275. IEEE, 2017.Nef, T., Urwyler, P., Büchler, M., Tarnanas, I., Stucki,R., Cazzoli, D., Müri, R., and Mosimann, U. Evalua-tion of three state-of-the-art classiﬁers for recognitionof activities of daily living from smart home ambientdata.

Sensors , 15(5):11725–11740, 2015.Song, S., Wang, Y., and Chaudhuri, K. PuﬀerﬁshPrivacy Mechanisms for Correlated Data. In

Pro-ceedings of the 2017 ACM International Confer-ence on Management of Data , SIGMOD ’17, pp.1291–1306, Chicago, Illinois, USA, May 2017. As-sociation for Computing Machinery. ISBN 978-1- unning heading author breaks the line https://doi.org/10.1145/3035918.3064025 .Valentino-DeVryes, Jennifer; Singer, N. K. M. K. A.Your apps know where you were last night, andthey’re not keeping it secret.

The New York Times ,2018.Vandenberghe, L. The cvxopt linear and quadraticcone program solvers.

Online: http://cvxopt.org/documentation/coneprog. pdf , 2010.Vandenberghe, L. and Boyd, S. Semideﬁnite Program-ming.

SIAM Review , 38(1):49–95, March 1996. ISSN0036-1445, 1095-7200. doi: 10.1137/1038003. URL http://epubs.siam.org/doi/10.1137/1038003 .Xiao, Y. and Xiong, L. Protecting locations withdiﬀerential privacy under temporal correlations. In

Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security , pp. 1298–1309. ACM, 2015.Yang, B., Sato, I., and Nakagawa, H. Bayesian Diﬀer-ential Privacy on Correlated Data. In

Proceedingsof the 2015 ACM SIGMOD International Confer-ence on Management of Data , SIGMOD ’15, pp.747–762, Melbourne, Victoria, Australia, May 2015.Association for Computing Machinery. ISBN 978-1-4503-2758-9. doi: 10.1145/2723372.2747643. URL https://doi.org/10.1145/2723372.2747643 .Zamora-Martinez, F., Romeu, P., Botella-Rocamora,P., and Pardo, J. On-line learning of indoor temper-ature forecasting models towards energy eﬃciency.

Energy and Buildings , 83:162–172, 2014.Zheng, Y., Xie, X., Ma, W.-Y., et al. Geolife: Acollaborative social networking service among user,location and trajectory.

IEEE Data Eng. Bull. , 33(2):32–39, 2010. ocation Trace Privacy Under Conditional Priors

For documented code demonstrating our SDP mechanisms used to generate the plots of

Figure 2 please visitour repo: https://github.com/casey-meehan/location_trace_privacy

The following sections will include proofs of results, derivations of algorithms, and explanations of experimentalprocedures. (a) (b) (c)Figure 3:

Example of sensitive location trace of NYC mayoral staﬀ member exposed by (Valentino-DeVryes, 2018). (b)and (c) depict the posterior uncertainty (green) P A , P ( X i | Z ) for each 2d location. (a) depicts three sensitive times (redwith blue outline): Gracie Mansion (Mayor’s home), an event on Staten Island that the mayor attended, and ﬁnally thestaﬀ member’s home on long island. (b) provides an example of Approach C: adding independent Gaussian noise to eachlocation (red dotted line). A GP posterior still maintains high conﬁdence within a small radius along the trace, includingat the sensitive times. (c) provides an example of the optimized noise of Multiple Secrets of identical aggregate MSE as(b). By focusing correlated noise around the three sensitive times, there is high uncertainty at sensitive times and highconﬁdence elsewhere. The following ﬁgures aim to illustrate the diﬀerence between the covariance matrices used in the experimentalbaselines (indep./uniform and indep./concentrated) and those chosen by our SDP algorithms for both the RBFand periodic prior. Note that here we presume the diﬀerent dimensions of location to be independent and — byCorollary 7.2.1 — are able to treat a 2d location trace as two 1d traces. As such, the following examples aredemonstrating mechanism covariance matrices and additive noise samples used for either a single dimension oflocation data (for RBF kernel) or for the one dimension of temperature data (for periodic kernel).The ﬁrst ﬁgure (a) shows the covariance of the Approach C baselines used in the experiments. The second ﬁgure (b) shows the covariance of our SDP mechanisms for the RBF kernel used on location data. The third ﬁgure (c) shows the covariance of our SDP mechanisms for the periodic kernel used for temperature data.In each ﬁgure the covariance matrix is depicted as a heat map with warmer colors indicating higher values(normalized to largest and smallest value in the covariance matrix). The drawn noise samples G are plottedagainst their time index. So, the sequence of plotted ( x, y ) values is (cid:2) (1 , G ) , (2 , G ) , . . . , ( n, G n ) (cid:3) , where n = 50 unning heading author breaks the line for the RBF case and n = 48 for the periodic case. (a) Covariance matrices and mechanism samples for the baselines used in experiments.The ﬁrst ﬁgure demonstrates the uniform approach that distributes the independent Gaussian noise budget along theentire trace, regardless of I S .The second and third show the concentrated approach that allocates the entire noise budget to only the sensitive locationsin I S : ﬁrst for a basic secret (one location) and then for a compound secret of 3 evenly spaced locations. ocation Trace Privacy Under Conditional Priors(b) Covariance matrices and mechanism samples for the median RBF prior ( l eﬀ ≈ ).The ﬁrst noise mechanism (Mech. basic) demonstrates the covariance matrix chosen by SDP A for a basic secret of a singlelocation X i in the middle of the trace. The uncorrelated dot in the middle of the covariance matrix, Σ ( g ) ii , representsthe independent noise G i added at the sensitive location to mitigate direct loss. To mitigate inferential loss, the SDPoptimizes the remainder of the matrix to be positively correlated with maximum variance allocated to locations near X i intime. This thwarts GP inference of the true location at time t i .The second mechanism (Mech. comp.) depicts the covariance chosen by SDP A to protect a compound secret of twoadjacent locations in the trace (visible as the uncorrelated ‘ + ’ through the middle consuming 2 rows/columns). Recallthat a compound secret ought to protect directional information: did the user visit B ﬁrst and then A, or A and then B? That is precisely what this mechanism does by randomizing the angle of approach to the two locations in the middle withpositively and negatively correlated noise. Also note that the SDP does not allocate a large share of noise budget to theactual locations themselves. This highlights the fact that protecting a compound secret does not protect its constituentbasic secrets.The third and ﬁnal mechanism (Mech. all basic) is the noise covariance chosen by SDP B in the Multiple Secrets algorithm.To protect all basic secrets with a utility constraint, the SDP converges to a mechanism that looks similar to the uniformbaseline. However, this mechanism adds a subtle degree of oﬀ-diagonal correlation along with greater noise power towardsthe beginning and end of the trace. The oﬀ-diagonal correlation is noticeable when the samples are compared to those ofthe uniform baseline in the previous ﬁgure. While this change appears to be minor, it makes a signiﬁcant change in theposterior conﬁdence of a GP adversary (as seen in Figure 2c ). unning heading author breaks the line(c) Covariance matrices and mechanism samples for the median periodic prior ( l eﬀ ≈ . ), and a period of half the tracelength.The ﬁrst noise mechanism (Mech. Basic) shows the covariance chosen by SDP A to protect a single location (temperature)in the middle of the trace. As in the RBF case, signiﬁcant noise power is allocated to the sensitive location itself, X i , tolimit direct privacy loss. However, the noise added to the remainder of the trace is signiﬁcantly diﬀerent. It is tailored tothwart inference by a periodic prior, wherein the location one period away has correlation 1.The second noise mechanism (Mech. comp.) shows the covariance chosen by SDP A to protect a compound secret of twolocations, X i , X j , 16 timesteps apart (not quite a full period). Here, we see the SDP randomize the phase of the additivenoise such that periodic inference cannot tell directional information like X i > X j or vice versa.The third noise mechanism (Mech. all basic) is identical to the all basic secrets mechanism chosen for the RBF case above,except using a periodic prior Σ . The mechanism chosen looks similar to the uniform baseline, except with slightly periodicoﬀ-diagonal correlation imitating the prior covariance. Additionally, noise power is mitigated towards the middle andends of the trace. Again, Figure 2g indicates that this subtle change makes a signiﬁcant diﬀerence in thwarting Bayesianadversaries. ocation Trace Privacy Under Conditional Priors

Lemma 7.1.

Let P , Q be two distributions on X of identical support such that max (cid:26) D λ (cid:18) P P ( X ) P Q ( X ) (cid:19) , D λ (cid:18) P Q ( X ) P P ( X ) (cid:19)(cid:27) ≤ ε Then for any event O , P P ( X ∈ O ) ≤ max (cid:8) e ε (cid:48) P Q ( X ∈ S ) , δ (cid:9) and P Q ( X ∈ O ) ≤ max (cid:8) e ε (cid:48) P P ( X ∈ S ) , δ (cid:9) where ε (cid:48) = ε + log / δ λ − CIP guarantees that for all

P ∈ Θ and all discriminative pairs ( s i , s j ) ∈ S pairs (which also includes ( s j , s i ) ) D λ (cid:18) P P , A ( Z | X I S = s i ) P P , A ( Z | X I S = s j ) (cid:19) ≤ ε and thus by Lemma 7.1 we have for any event O on ZP P , A ( Z ∈ O | X I S = s i ) ≤ max (cid:8) e ε (cid:48) P P , A ( Z ∈ O | X I S = s j ) , δ (cid:9) and P P , A ( Z ∈ O | X I S = s j ) ≤ max (cid:8) e ε (cid:48) P P , A ( Z ∈ O | X I S = s i ) , δ (cid:9) As such, given that X I S = s i the probability of some event { Z ∈ W } such that P P , A ( Z ∈ W | X I S = s i ) ≥ e ε (cid:48) P P , A ( Z ∈ W | X I S = s j ) is no more than δ . The same is true swapping s j for s i . So, over draws of Z | X I S = s i or Z | X I S = s j we have that P P , A ( Z ∈ O | X I S = s i ) P P , A ( Z ∈ O | X I S = s j ) ≤ e ε (cid:48) and P P , A ( Z ∈ O | X I S = s j ) P P , A ( Z ∈ O | X I S = s i ) ≤ e ε (cid:48) with probability ≥ − δ , which is equivalent to the statement that − ε (cid:48) ≤ log P P , A ( Z ∈ O | X I S = s i ) P P , A ( Z ∈ O | X I S = s j ) ≤ ε (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) log P P , A ( s i | Z ∈ O ) P P , A ( s j | Z ∈ O ) − log P P ( s i ) P P ( s j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε (cid:48) unning heading author breaks the line (CIP loss for additive mechanisms) For an additive noise mechanism, a fully dependent trace as in

Figure 1b , and any prior P on X the CIP loss may be expressed as D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = (cid:88) i ∈ I S (cid:20) D λ (cid:18) P A ( Z i | X i = s i ) P A ( Z i | X i = s j ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = s i ) P A , P ( Z I U | X I S = s j ) (cid:19) Proof. D λ (cid:18) P A , P ( Z | X I S = x s ) P A , P ( Z | X I S = x (cid:48) s ) (cid:19) = D λ (cid:18) P A ( Z I S | X I S = x s ) P A , P ( Z I U | X I S = x s ) P A ( Z I S | X I S = x (cid:48) s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (1) = D λ (cid:18) P A ( Z I S | X I S = x s ) P A ( Z I S | X I S = x (cid:48) s ) (cid:19) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (2) = D λ (cid:18)(cid:81) i ∈ I S P A ( Z i | X i = x i ) (cid:81) i ∈ I S P A ( Z i | X i = x (cid:48) i ) (cid:19) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (3) = (cid:88) i ∈ I S (cid:20) D λ (cid:18) P A ( Z i | X i = x i ) P A ( Z i | X i = x (cid:48) i ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) (4)Where line (1) uses the conditional independence seen in the graphical model of Figure 1 . Line (2) is due tothe fact that the two terms in line (1) are conditionally independent, allowing for separating into the sum oftwo separate divergences (which is an easily veriﬁable property of Rényi divergence evident from its deﬁnition inEquation 1). Line (3) is again from the conditional independence between the Z i for each i ∈ I S when conditionedon X I S . Line (4) uses the same property of Rényi divergence used in Line (2): the terms in the product areconditionally independent allowing for the separation into the sum of multiple divergences. Robustness to Prior Misspeciﬁcation

Mechanism A satisﬁes ε ( λ ) -CIP for prior class Θ . Supposethe ﬁnite mean true distribution Q is not in Θ . The CIP loss of A against prior Q is bounded by D λ (cid:18) P A , Q ( Z | X I S = s i ) P A , Q ( Z | X I S = s j ) (cid:19) ≤ ε (cid:48) ( λ ) where ε (cid:48) ( λ ) = λ − λ − λ ) + ∆(4 λ −

By ‘ﬁnite mean’ distribution Q , we mean that all conditionals of Q given some X I S have ﬁnite mean.Since a conditional prior class contains conditionals of one distribution with any oﬀset (any mean value), thisguarantees that ∆( λ ) is achieved for some P ∈ Θ . Intuitively, this prevents the pathological case of inf P∈ Θ beinga limit as the mean of P → ∞ , only asymptotically approaching ∆( λ ) . If the mean of Q is ﬁnite, then the closest P ∈ Θ (in Rényi divergence) must also have ﬁnite mean, since any mean is attainable in a conditional prior class Θ .With this in mind, we make use of the following triangle inequality provided in Mironov (2017): ocation Trace Privacy Under Conditional Priors Lemma 7.2.

For distributions P , Q , R on X with common support we have D λ (cid:18) P P ( X ) P Q ( X ) (cid:19) ≤ λ − λ − D λ (cid:18) P P ( X ) P R ( X ) (cid:19) + D λ − (cid:18) P R ( X ) P Q ( X ) (cid:19) In our case, we assume that the mechanism A gives Z | X I S = x s identical support for all I S , x s . Using this, wehave D λ (cid:18) P A , Q ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) ≤ λ − λ − D λ (cid:18) P A , Q ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x s ) (cid:19) + D λ − (cid:18) P A , P ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) . By a data processing inequality, the divergence of the ﬁrst term is bounded by ∆(2 λ ) and the blue term may bebounded by a second application of the triangle inequality: D λ − (cid:18) P A , P ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) ≤ λ − λ − D λ − (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) + D λ − (cid:18) P A , P ( Z I U | X I S = x (cid:48) s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) The ﬁrst divergence is bounded by ε (4 λ − and the second divergence is bounded by ∆(4 λ − . Putting all thistogether we have the following upper bound D λ (cid:18) P A , Q ( Z I U | X I S = x s ) P A , Q ( Z I U | X I S = x (cid:48) s ) (cid:19) ≤ λ − λ − λ ) + ∆(4 λ −

3) + 2 λ − λ − ε (4 λ − CIP loss bound for GP conditional priors:

Let Θ be a GP conditional prior class. Let Σ be thecovariance matrix for X produced by its kernel function. Let S be the basic or compound secret associated with I S ,and S be the number of unique times in I S . The mechanism A ( X ) = X + G = Z , where G ∼ N ( , Σ ( g ) ) , thensatisﬁes ( ε, λ ) -Conditional Inferential Privacy ( S pairs , r, Θ) , where ε ≤ λ Sr (cid:16) σ s + α ∗ (cid:17) where σ s is the variance of each G i ∈ G I S (diagonal entries of Σ ( g ) ss ) and α ∗ is the maximum eigenvalue of Σ eﬀ = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) .Proof. Again, the conditional prior class Θ is deﬁned by a kernel function i, j → Cov ( i, j ) , which – given theindices of the trace X – induces a covariance matrix Σ between all X i , X j . In practice, when the sampling rate oflocations is non-uniform the kernel function may use the time-stamps of the points in the trace to assign highcorrelation to X i that are close in time and low correlation to X i that are far apart in time. Of course, correlationbetween X i that are diﬀerent dimension (e.g. latitude and longitude) must be designed for the given applicationand may be completely independent. The kernel function can encode this as well.Recall from Equation 1 that the Rényi divergence between two mean-shifted multivariate normal distributions, P = N ( µ , Σ) and P = N ( µ , Σ) is D λ (cid:18) P P (cid:19) = λ µ − µ ) (cid:124) Σ − ( µ − µ ) Now, for any prior

P ∈ Θ , we have that X ∼ N ( µ, Σ) for some µ and for Σ deﬁned by the kernel function. Again, G ∼ N ( , Σ ( g ) ) . I S encodes the indices of a single location basic secret or a multi-location compound secret.Then, the divergence to bound for ( ε, λ ) -CIP ( S pairs , r, Θ) is D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) unning heading author breaks the line for any ( s i , s j ) ∈ S pairs = { ( x s , x (cid:48) s ) : (cid:107) x s − x (cid:48) s (cid:107) ≤ r } if I S encodes a basic secret, or for any ( s i , s j ) ∈ S pairs = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) if I S encodes a compound secret. A discriminative pair ( s i , s j ) is two real valued vectors ∈ R | I S | , representingtwo hypotheses about the true values of X I S . We denote the m th element as s im , s jm . Let f : I S → [ | I S | ] be amapping from each index w ∈ I S to its corresponding position in the vector s i or s j (where the value of X w ishypothesized). By Lemma 3.2, the divergence can be written as D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = (cid:88) w ∈ I S (cid:20) D λ (cid:18) P A ( Z w | X w = s if ( w ) ) P A ( Z w | X w = s jf ( w ) ) (cid:19)(cid:21) + D λ (cid:18) P A , P ( Z I U | X I S = x s ) P A , P ( Z I U | X I S = x (cid:48) s ) (cid:19) where P A ( Z w | X w = x ) = N ( x, σ s ) for all w ∈ I S . Recall from the statement of the Theorem that we assume thediagonal entries of Σ ss all equal some value σ s : we add the same noise variance to each point in the secret set, whichis optimal under MSE constraints. Additionally, note that for the hypothesis X I S = x s , we know the distributionof X I U | X I S = x s ∼ N ( µ u | s , Σ u | s ) , where µ u | s = µ u + Σ us Σ − ss ( x s − µ s ) and Σ u | s = Σ uu − Σ us Σ − ss Σ su . Noticethat only µ u | s depends on the actual value of x s , and Σ u | s depends only on the indices of I S . Being the sum oftwo normally distributed variables, we have that ( Z I U | X I S = x s ) d = ( X I U | X I S = x s ) + G I U = N ( µ u | s , Σ u | s + Σ ( g ) uu ) .Substituting this into the divergences above sum of divergences: D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = | I S | (cid:88) m =1 (cid:20) D λ (cid:18) N ( s im , σ s ) N ( s jm , σ s ) (cid:19)(cid:21) + D λ (cid:18) N ( µ u | s i , Σ u | s + Σ ( g ) uu ) N ( µ u | s j , Σ u | s + Σ ( g ) uu ) (cid:19) (1) = λ | I S | (cid:88) m =1 σ s ( s im − s jm ) + λ µ u | s i − µ u | s j ) (cid:124) (Σ u | s + Σ ( g ) uu ) − ( µ u | s i − µ u | s j ) (2) = λ σ s ( s i − s j ) (cid:124) ( s i − s j ) + λ (cid:0) Σ us Σ − ss ( s i − s j ) (cid:1) (cid:124) (Σ u | s + Σ ( g ) uu ) − (cid:0) Σ us Σ − ss ( s i − s j ) (cid:1) (3) = λ σ s ( s i − s j ) (cid:124) ( s i − s j ) + λ s i − s j ) (cid:124) Σ − ss Σ su (Σ u | s + Σ ( g ) uu ) − Σ us Σ − ss ( s i − s j ) (4)Line (1) substitutes in the normal distributions given by our mechanism and conditional prior class. Line (2)substitutes in the closed-form expression for Rényi divergence between two mean-shifted normal distributionsgiven in Equation 1. Line (3) substitutes in the expression for µ u | s given above, and simpliﬁes. To expand outthis simpliﬁcation in explicit steps: ( µ u | s i − µ u | s j ) = (cid:0) µ u + Σ us Σ − ss ( s i − µ s ) − [ µ u + Σ us Σ − ss ( s j − µ s )] (cid:1) = (cid:0) Σ us Σ − ss s i − Σ us Σ − ss s j (cid:1) = Σ us Σ − ss ( s i − s j ) Line (4) distributes the transpose in the right term of line (3): (cid:0) Σ us Σ − ss ( s i − s j ) (cid:1) (cid:124) = ( s i − s j ) (cid:124) (cid:0) Σ us Σ − ss (cid:1) (cid:124) = ( s i − s j ) (cid:124) (cid:0) Σ − ss (cid:1) (cid:124) Σ (cid:124) us = ( s i − s j ) (cid:124) Σ − ss Σ su where that ﬁnal step is a consequence of Σ being symmetric. Σ ss is also a symmetric matrix (so its inverse issymmetric) and Σ (cid:124) us = Σ su .Returning to line (4) above, simplify this expression by substituting ∆ = s i − s j : D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) = λ σ s ∆ (cid:124) ∆ + λ (cid:124) Σ − ss Σ su (Σ u | s + Σ ( g ) uu ) − Σ us Σ − ss ∆ (5) = λ σ s (cid:107) ∆ (cid:107) + λ (cid:124) Σ eﬀ ∆ (6) ocation Trace Privacy Under Conditional Priors Where Σ eﬀ = Σ − ss Σ su (Σ u | s + Σ ( g ) uu ) − Σ us Σ − ss . The left term of line (6) attributes the direct loss of Z I S on X I S and the right term attributes the indirect loss of Z I U on X I S .We are interested in bounding the expression of line (6) for all ( s i , s j ) ∈ S pairs . We do this by bounding it for allvectors ∆ ∈ D D = { s i − s j : (cid:107) s i − s j (cid:107) ≤ √ S r } , where S is the number of basic secrets (locations) contained in I S which may be a basic or compound secretset. For a basic secret ( S = 1 ), this bound is tight, since D = { s i − s j : ( s i , s j ) ∈ S pairs } . The set of ∆ ∈ D isexactly any two hypothesis ( s i , s j ) that are within any circle of radius r . For a compound secret, this bound isnot guaranteed to be tight. Recall once again that the set of S pairs for a compound secret is given by the set of ( s i , s j ) in S pairs = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) For concreteness, consider the 2d location trace example in

Figure 3 , where we have a compound secret of S = 3 locations. Here, s i , s j ∈ R , where 6 comes from the fact that we have three 2d locations. So, ( s i , s j ) represents apair of hypotheses on all three locations. s i ’s hypothesis of the ﬁrst secret location — written as x s ∈ R above— is within r of the s j ’s hypothesis of the ﬁrst secret location — written as x s (cid:48) ∈ R above. The same goes forthe second and third locations. So, the L norm of ∆ = s i − s j is no greater than sup ( s i ,s j ) ∈S pairs (cid:107) s i − s j (cid:107) = sup ( s i ,s j ) ∈S pairs (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) m =1 ( s im − s jm ) = sup ( s i ,s j ) ∈S pairs (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) k =1 (cid:107) x sk − x s (cid:48) k (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) k =1 r = √ r For compound secrets, D represents the L ball enclosing all ∆ ∈ { s i − s j : ( s i , s j ) ∈ S pairs } . However, D alsoincludes some values of ∆ = s i − s j not covered by S pairs . Suppose an adversary considers the hypotheses s i = { x s , x s , x s } , s j = { x (cid:48) s , x (cid:48) s , x (cid:48) s } where x s = 0 , x s (cid:48) = √ r, x s = x s (cid:48) , x s = x s (cid:48) . Since x s , x s (cid:48) are not within r of each other, this is not in S pairs . However, it is covered by D , and thus is covered by our bound on CIP loss and our mechanisms.With D deﬁned, we may return to bounding the expression in line (6): D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) ≤ sup ∆ ∈D (cid:18) λ σ s (cid:107) ∆ (cid:107) + λ (cid:124) Σ eﬀ ∆ (cid:19) (7) ≤ λ (cid:18) σ s Sr + Sr maxeig (Σ eﬀ ) (cid:19) (8) = λ Sr (cid:0) σ s + α ∗ (cid:1) (9)where line (8) distributes the supremum. For the right term, this is given by the maximum magnitude ofall ∆ ∈ D times the maximum eigenvalueof Σ eﬀ which equals Sr maxeig (Σ eﬀ ) . Line (9) simply substitutes α ∗ = maxeig (Σ eﬀ ) . unning heading author breaks the line Graceful Composition in Time

Suppose a user releases two traces X and ˆ X with additive noise G ∼ N ( , Σ ( g ) ) and ˆ G ∼ N ( , ˆΣ ( g ) ) , respectively. Then basic or compound secret X I S of X enjoys (¯ ε, λ ) -CIP,where ¯ ε ≤ λ Sr (cid:16) σ s + ¯ α ∗ (cid:17) and where ¯ α is the maximum eigenvalue of ¯Σ eﬀ = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + ¯Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) . Σ is the covariancematrix of the joint distribution on X, ˆ X and ¯Σ ( g ) = (cid:20) Σ ( g )

00 ˆΣ ( g ) . (cid:21) Proof.

Here, we record two traces (presumably) far apart in time ( X , . . . , X n ) and ( ˆ X , . . . , ˆ X m ) And release ( Z , . . . , Z n ) = ( X , + G , . . . , X n + G n ) and ( ˆ Z , . . . , ˆ Z m ) = ( ˆ X , + ˆ G , . . . , ˆ X m , + ˆ G m ) the ﬁrst trace protects secret locations X I S and the second protects (cid:100) X I S , so we have that D λ (cid:18) P A , P ( Z | X I S = s i ) P A , P ( Z | X I S = s j ) (cid:19) ≤ εD λ (cid:18) P A , P ( ˆ Z | (cid:100) X I S = ˆ s i ) P A , P ( ˆ Z | (cid:100) X I S = ˆ s j ) (cid:19) ≤ ˆ ε We aim to update the losses: D λ (cid:18) P A , P ( Z, ˆ Z | X I S = s i ) P A , P ( Z, ˆ Z | X I S = s j ) (cid:19) ≤ ε (cid:48) D λ (cid:18) P A , P ( ˆ Z, Z | (cid:100) X I S = ˆ s i ) P A , P ( ˆ Z, Z | (cid:100) X I S = ˆ s j ) (cid:19) ≤ ˆ ε (cid:48) Fortunately, our framework is pretty friendly to ﬁguring this out, and can be done simply by updating the‘inferential loss term’ α ∗ and ˆ α ∗ of each, the max eigenvalues used to compute each of ε and ˆ ε , respectively. Let’sfocus on ε (cid:48) , since the same analysis follows for ˆ ε (cid:48) .Recall that α ∗ is given by the max eigenvalue of Σ eﬀ which is Σ eﬀ = (cid:0) Σ us Σ − ss (cid:1) (cid:124) (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − (cid:0) Σ us Σ − ss (cid:1) Where Σ is the covariance matrix of X , . . . , X n and Σ ( g ) is the noise covariance matrix added. Simply augment Σ to become the joint covariance matrix Σ J of X, ˆ X , and augment Σ ( g ) to become Σ ( g ) J = (cid:20) Σ ( g )

00 ˆΣ ( g ) (cid:21) then update Σ eﬀ to Σ eﬀ ,J which uses both Σ J and Σ ( g ) J . Using the corresponding max eigenvalue α ∗ J in the lossexpression of Theorem 3.2 gives us ε (cid:48) .Note that for kernels like RBF, ε (cid:48) → ε as the traces X and ˆ X move apart further and further in time. This isnot the case for traces using a purely periodic kernel with not time decay, and we should expect much worsecomposition. ocation Trace Privacy Under Conditional Priors In many cases, the diﬀerent dimensions of the trace may be probabilistically independent, and it may be moreconvenient to make separate privacy mechanisms for each. For a 2d trace X , suppose I x and I y store the indicesof the latitude points X I x and longitude points X I y , such that X = X I x ∪ X I y . If latitude and longitude areindependent, it may be more convenient to characterize the conditional priors of X I x abd X I y separately. Thequestion is whether privacy guarantees remain for the full trace X . To answer this, we provide the followingcorollary: Corollary 7.2.1.

CIP loss of independent dimensions

Let Θ be a GP conditional prior class on a 2d trace X such that the dimensions are independent. Let I S be some secret set of time indices corresponding to somebasic or compound secret. For the trace X = X I x ∪ X I y , the Gaussian mechanism A ( X ) = Z I x ∪ Z I y where Z I x = A x ( X I x ) = X I x + G I x and Z I y = A y ( X I y ) = X I y + G I y satisﬁes ( ε, λ ) -CIP where ε ≤ λ Sr (cid:0) σ s + α ∗ x + α ∗ y (cid:1) when A x and A y provide λ Sr (cid:0) σ s + α ∗ x ) and λ Sr (cid:0) σ s + α ∗ y ) to I S ∩ I x and I S ∩ I y , respectively. The gist of this corollary is that a mechanism can be designed to achieve the bound of Theorem 3.4 to eachdimension independently and released with still-meaningful privacy guarantees. The reason is that this stillincludes all secret pairs S pairs Proof.

By independence, X I x and X I y can be treated as two unconnected traces of the type seen in Figure 1 .As such the privacy guarantee of Theorem 3.4 can be upheld for each. The question is whether bounding CIPloss to the one-dimensional basic or compound secret associated with secret sets I S ∩ I x and I S ∩ I y still providesguarantees for the full secret set I S .Without loss of generality, we will demonstrate for a basic and a compound secret. Consider the basic secretset I S = { X , X } , where I S ∩ I x = { X } (latitude) and I S ∩ I y = { X } (longitude). We again assume thatindependent gaussian noise of variance σ s is added to all X I S , since this is optimal under utility constraints. Wehave now bounded the Rényi divergence when conditioning on pairs of hypotheses on latitude and longitudeseparately. S pairs x = S pairs y = { ( x s , x (cid:48) s ) : x s ∈ R , (cid:107) x s − x (cid:48) s (cid:107) ≤ r } By independence, this also bounds the Rényi divergence conditioning on pairs of hypotheses on latitude andlongitude jointly: S pairs xy = { ( x s , x (cid:48) s ) : x s ∈ R , (cid:107) x s − x (cid:48) s (cid:107) ≤ r } In eﬀect, we have guaranteed privacy for any pair of hypotheses ( s i , s j ) in the square circumscribing the circle ofradius r that we with to provide. The analysis on the direct privacy loss is exactly the same as it was in the moregeneral case. Since the Rényi divergences of X I U ∩ X I x and of X I U ∩ X I y add, the α ∗ ’s add.The same goes for a compound secret. Consider three location compound secret pairs given by S pairs xy = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : x si ∈ R , (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) Instead, we bound privacy loss for S pairs x = S pairs y = (cid:110)(cid:0) { x s , x s , . . . } , { x (cid:48) s , x (cid:48) s , . . . } (cid:1) : x si ∈ R , (cid:107) x sk − x (cid:48) sk (cid:107) ≤ r, ∀ k (cid:111) Separately, giving us α ∗ x and α ∗ y . This again includes any two hypotheses on the three locations such that eachpair of x sk , x (cid:48) sk is within a square circumscribing a circle of radius r . We achieve this by bounding privacy lossfor all ∆ x in a 3d L ball of radius √ S r , as with ∆ y .This corollary can be extended to all traces of all dimensions that are probabilistically independent.We make use of the above proof in the Experiments section. unning heading author breaks the line In this section, we derive the three SDP-based algorithms of Section 4 and their properties. A SDP A minimizes the privacy loss bound of Theorem 3.4 for any compound or basic secret encoded by secret set I S . As is clariﬁed in its proof (Appendix 7.2.4), the bound is tight when I S encodes a basic secret. If I S encodesa compound secret, the tightness depends on the conditional prior class Θ .Our variable for minimizing this bound is the noise covariance matrix Σ ( g ) . Due to the conditional independenceexhibited by Lemma 3.2, G I S and G I U may be independent. The additive noise G i ∈ G I S are all independentGaussian with variance σ s . This is because — conditioning on { X I S = x s } — Z I S is independent of X I U and Z I U .So, G I S ∼ N ( , σ s I ) , and Σ ( g ) ss = σ s I . The additive noise G i ∈ G I U are all dependent as described by Σ ( g ) uu , and G I U ∼ N ( , Σ ( g ) uu ) . Consequently, Σ ( g ) is completely characterized by Σ ( g ) uu and σ s .To see how the bound of Theorem 3.4 can be redrafted as an SDP, ﬁrst notice that its two terms may be written asthe maximum eigenvalue of a matrix product. Here, Σ eﬀ = A (cid:124) BA , where A = Σ us Σ − ss and B = (cid:0) Σ u | s + Σ ( g ) uu (cid:1) − σ s + α ∗ = maxeig (cid:0) σ s I + A (cid:124) BA (cid:1) = maxeig (cid:18) (cid:2) I A (cid:3) (cid:20) σ s I B (cid:21) (cid:20) IA (cid:21) (cid:19) = maxeig (cid:0) ˜ A (cid:124) ˜ B ˜ A (cid:1) This expression uses all parameters of Σ ( g ) : σ s parametrizes Σ ( g ) ss and Σ ( g ) uu = B − − Σ u | s , where Σ u | s is given bythe kernel function of Θ .Before casting this as an SDP, we provide a formal deﬁnition from Vandenberghe & Boyd (1996): Deﬁnition 7.1.

Semideﬁnite Program

The problem of minimizing a linear function of a variable x ∈ R n subjectto a matrix inequality: min x ∈ R n c (cid:124) x s.t. F + n (cid:88) i =1 x i F i (cid:23) Ax = b where the F i ∈ R n × n are all symmetric and A ∈ R p × n is a semideﬁnite program , or SDP.The task of minimizing maxeig (cid:0) ˜ A (cid:124) ˜ B ˜ A (cid:1) under MSE constraints can almost be formulated as an SDP: min B (cid:23) , / σ s ≥ β ∗ s.t. β ∗ I (cid:23) ˜ A (cid:124) ˜ B ˜ AB (cid:22) Σ − u | s tr (Σ ( g ) uu ) + | I S | σ s ≤ no t Here, the ﬁrst constraint guarantees that the maximum eigenvalue of ˜ A (cid:124) ˜ B ˜ A is bounded by β ∗ , which the objectiveminimizes. At program completion, we set Σ ( g ) uu = B − − Σ u | s , and the second constraints ensures that this is stillPSD. The ﬁnal constraint bounds the MSE of the mechanism Σ ( g ) . Note that tr (Σ ( g ) uu ) + | I S | σ s = tr (Σ ( g ) ) . Thetrouble lies the last constraint. Our program variable is B , but the ﬁnal linear constraint requires Σ ( g ) , which isexpressed using the inverse of B . This is not immediately available in the SDP framework.To make the ﬁnal linear constraint available, we invert the above program using the observation that the maximumeigenvalue of ˜ A (cid:124) ˜ B ˜ A is the inverse of the minimum eigenvalue of ( ˜ A (cid:124) ˜ B ˜ A ) − . Instead of optimizing over B and / σ s , we optimize over B − and σ s . Since B − = Σ u | s + Σ ( g ) uu , we may now have a utility constraint directly on thetrace of Σ ( g ) . To make B − our program variable, we approximate ( ˜ A (cid:124) ˜ B ˜ A ) − with ˜ A − ˜ B − ˜ A − (cid:124) . First note that ˜ A ∈ R n ×| I S | , and has full column rank for the covariances we work with. So, ˜ A − = ( ˜ A (cid:124) ˜ A ) − ˜ A (cid:124) ∈ R ( | I S |× n ) is the ocation Trace Privacy Under Conditional Priors left inverse of ˜ A and is the least squares solution to ˜ A − ˜ A = ˜ A (cid:124) ˜ A − (cid:124) = I (we denote its transpose as ˜ A − (cid:124) ). It isalso the least squares solution to ˜ A ˜ A − = ˜ A − (cid:124) ˜ A (cid:124) = I . Thus, we have an approximation of the inverse ( ˜ A (cid:124) ˜ B ˜ A ) − : ( ˜ A (cid:124) ˜ B ˜ A ) ( ˜ A − ˜ B − ˜ A − (cid:124) ) ≈ ˜ A (cid:124) ˜ B ˜ B − ˜ A − (cid:124) = ˜ A (cid:124) ˜ A − (cid:124) ≈ I We now can optimize in terms of B − with the augmented matrix ˜ B − : ˜ B − = (cid:20) σ s I B − (cid:21) We then optimize the following SDP: max B − (cid:23) ,σ s ≥ β ∗ s.t. β ∗ I (cid:22) ˜ A − ˜ B − ˜ A − (cid:124) B − (cid:23) Σ u | s tr ( ˜ B ) − tr (Σ u | s ) ≤ no t Upon program completion we recover σ s and Σ ( g ) uu = B − − Σ u | s which we know is PSD due to the second constraint.The ﬁrst constraint guarantees that the minimum eigenvalue of the approximated inverse is ≥ β ∗ , which theobjective maximizes. If the minimum eigenvalue of the approximate inverse is close to that of the true inverse, thenwe successfully minimize the maximum eigenvalue of ˜ A (cid:124) ˜ B ˜ A , and thus minimize the direct and indirect privacy loss.The third constraint limits the MSE of Σ ( g ) since tr ( ˜ B ) − tr (Σ u | s ) = ( tr (Σ ( g ) uu ) + | I S | σ s + tr (Σ u | s )) − tr (Σ u | s ) = tr (Σ ( g ) ) . By inverting ˜ A (cid:124) ˜ B ˜ A , this constraint is available in the SDP framework.By expressing the above program in terms of the variable Σ ( g ) instead of indirectly via B − and σ s , we get SDP A : SDP A : arg max Σ ( g ) (cid:23) β ∗ s.t. ˜ A − ˜ B − ˜ A − (cid:124) (cid:23) β ∗ Itr (Σ ( g ) ) ≤ no t It is straightforward to write this SDP in the form seem in Deﬁnition 7.1. The program variables x would be thediagonal and upper or lower triangular part of Σ ( g ) along with β ∗ . With some linear algebra, the ﬁrst constraintcan be written in the form of F + (cid:80) ni =1 x i F i (cid:23) , and the second constraint can be written as Ax = b . With theuse of contemporary convex programming tools like CVXOPT (Vandenberghe, 2010) rewriting into this form isunnecessary. B SDP B takes a set of covariance matrices F = { Σ , . . . , Σ k } , each of which is designed to protect some secret set I Si , and returns a covariance matrix Σ ( g ) that preserves the privacy loss bound of each Σ i to each I Si . It does sowhile minimizing the utility loss of Σ ( g ) . This algorithm is also expressed as an SDP. It is based on the followingcorollary, which we have omitted from the main text: Corollary 7.2.2.

More PSD, More Private:

For a basic or compound secret denoted by indices I S , the CIP lossbound of Equation 5 provided by a Gaussian noise mechanism with covariance Σ ( g ) is lower than it would be forany Σ ( g ) (cid:48) ≺ Σ ( g ) .Proof. First note that if Σ ( g ) (cid:31) Σ ( g ) (cid:48) , then the same is true for its sub-matrices: Σ ( g ) ss (cid:31) Σ ( g ) ss (cid:48) Σ ( g ) uu (cid:31) Σ ( g ) uu (cid:48) unning heading author breaks the line Recall the privacy loss bound of Equation 5: ε ≤ λ Sr (cid:16) σ s + α ∗ (cid:17) Also recall that Σ ( g ) ss = σ s I and Σ ( g ) ss (cid:48) = σ s (cid:48) I . Since Σ ( g ) ss (cid:31) Σ ( g ) ss (cid:48) , we already know that σ s > σ s (cid:48) , and thus theﬁrst term of Equation 5 is lower for Σ ( g ) .It remains to show that the second term is also lower, α ∗ < α ∗(cid:48) . Starting with what we’re given, Σ ( g ) uu (cid:31) Σ ( g ) uu (cid:48) Σ ( g ) uu + Σ u | s (cid:31) Σ ( g ) uu (cid:48) + Σ u | s (Σ ( g ) uu + Σ u | s ) − ≺ (Σ ( g ) uu (cid:48) + Σ u | s ) − B ≺ B (cid:48) A (cid:124) BA ≺ A (cid:124) B (cid:48) A max eig ( A (cid:124) BA ) < max eig ( A (cid:124) B (cid:48) A ) α ∗ < α ∗(cid:48) Therefore σ s + α ∗ < σ s (cid:48) + α ∗(cid:48) , and the CIP bound of Equation 5 is lower for Σ ( g ) than it is for Σ ( g ) (cid:48) .With Corollary 7.2.2 in mind, SDP B is natural: SDP B : arg min Σ ( g ) tr (Σ ( g ) ) s.t. Σ ( g ) (cid:23) Σ ( g ) i , ∀ Σ ( g ) i ∈ F SDP B attempts to minimize, but does not constrain, the utility loss of the chosen Σ ( g ) . To provide an upperbound on the resulting utility loss, we provided the following claim in the main text: Claim

Utility loss of SDP B : The utility loss of Σ ( g ) = SDP B ( F ) is no greater than (cid:80) Σ i ∈F tr (Σ i ) .Proof. The covariance Σ ( g ) (cid:48) = (cid:80) Σ ( g ) i ∈F Σ ( g ) i with MSE (cid:80) Σ ( g ) i ∈F tr (Σ ( g ) i ) is in the feasible set of SDP B problemsince Σ ( g ) (cid:48) (cid:23) Σ ( g ) i , ∀ Σ ( g ) i ∈ F . Unless Σ ( g ) (cid:48) has the lowest MSE of all Σ ( g ) in the feasible set, a covariance matrixwith better utility will be chosen. Multiple Secrets combines SDP A and SDP B to minimize the privacy loss to each basic secret within a trace.The basic mechanism is useful in cases when inferences at each time within the trace — each basic secret — issensitive.Let I Si be the secret set representing basic secret i , of which there are N (e.g. if location is sampled at N times).Then I S b = { I S , . . . , I SN } contains the indices corresponding to each. Multiple Secrets works by ﬁrst producing N covariance matrices, Σ ( g ) i = SDP A ( I Si , Σ , o t ) on each basic secret. It then uses SDP B ( F = { Σ ( g )1 , . . . , Σ ( g ) N } ) toproduce a single covariance matrix Σ ( g ) that preserves the privacy loss to each basic secret (note that, being basicsecrets, the privacy loss bound that SIG OPT optimizes is tight).By virtue of using SDP B , the MSE of the resultant Σ ( g ) is minimized but not constrained. To bound the MSE ofthe Basic Mechanism by O , we may simply bound the MSE of each Σ ( g ) i by o t = O / N . Then, by the above Claim,the MSE of the solution cannot be greater than O . In practice, this bound may be too loose. We hope to tightenit in future work. ocation Trace Privacy Under Conditional Priors We use a 2d location trace and a 1d home temperature dataset. For the location data, having observed thatthe correlation between latitude and longitude is low ( ≈ . ) we treat each dimension as independent. By wayof Corollary 7.2.1, this allows us to bound privacy loss and design mechanisms for each dimension separately.Furthermore, having observed that each dimension ﬁts the nearly the same conditional prior, we treat our datasetof 10k 2-dimensional traces as a dataset of 20k 1-dimensional traces, where each trace represents one dimensionof a 2d location trajectory.The one-dimensional traces of temperature and location are indexed by timestamps, for which we would use thefollowing kernel functions: k RBF ( t i , t j ) = σ x exp (cid:16) − ( t i − t j ) l (cid:17) k PER ( t i , t j ) = σ x exp (cid:16) − ( π | t i − t j | /p ) l (cid:17) (6)to determine the covariance between two points sampled at times t i and t j . The parameters including variance σ x and length scale l . The lengthscale determines the window of time in which two sampled points are highlycorrelated. Preprocessing of location data

We ﬁrst limit the dataset to traces of under 50 locations that are between4.5 and 5.5 minutes in duration. Caring only about the conditional dependence between locations, we thende-mean each trace and normalize its variance to one. Normalizing the variance of traces implicitly sets σ x = 1 inthe above RBF kernel, in essence assuming that the adversary has a decent prior for the user’s average speed in agiven trace, and could do the same operation. Fitting of location data

We then ﬁnd the maximum likelihood RBF kernel for each distinct trace. Havingﬁxed the variance σ x , this amounts to ﬁtting only the length scale for each dimension, l x and l y , individually.The length scale represents the average window of time during which neighboring locations are highly correlated(i.e. correlation > . ). Relatively smooth traces will have large length scales and chaotic traces will have lowlength scales. However, the fact that sampling rates vary signiﬁcantly between traces means that traces withequal length scales can have very diﬀerent degrees of correlation. To encapsulate both of these eﬀects, we studythe empirical distribution of eﬀective length scale of each trace l eﬀ ,x = l x P l eﬀ ,y = l y P where P is the trace’s sampling period and l x , l y are the its optimal length scales. l eﬀ ,x , l eﬀ ,y tell us the averagenumber of neighboring locations that are highly correlated, instead of time period. For instance, a given trace withan optimal l eﬀ ,x = 8 tells us that every eight neighboring location samples in the x dimension have correlation > . . The empirical distribution of eﬀective length scales across all traces describes – over a range of loggingdevices (sampling rates), users, and movement patterns – how many neighboring points are highly correlatedin location trace data. After this preprocessing, we are able to use the kernels that take indices (not time) asarguments. k RBF ( i, j ) = exp (cid:16) − ( i − j ) l eﬀ (cid:17) k PER ( i, j ) = exp (cid:16) − ( π | i − j | /p ) l eﬀ (cid:17) In each plot we then observed a spectrum of conditional priors by sweeping the eﬀective length scale and plottingposterior uncertainty for various noise mechanisms of equal utility loss. This ranges from a prior assuming nearlyindependent location samples (chaotic trace) on the left up to highly dependent location samples (traveling in astraight line or standing still) on the right. To understand how realistic these conditional prior parameters are,we displayed the middle 50% of the empirical distribution of l eﬀ ( x and y together) from the GeoLife dataset.Note that the distribution of l eﬀ x and l eﬀ y are nearly identical.To compute posterior uncertainty, we consider a 50-point one-dimensional location trace. The basic secret isa single index in the middle of the trace, and the compound secret consists of two neighboring indices also in unning heading author breaks the line the middle of trace. For each value of l eﬀ , we compute the R × conditional prior covariance matrix Σ usingthe RBF kernel above. We then compare the posterior uncertainty when Σ ( g ) is an Approach C baseline, oran optimized covariance matrix using one of the three algorithms. We re-optimize Σ ( g ) for each l eﬀ , since each l eﬀ represents a diﬀerent conditional prior class. The MSE is ﬁxed in all ﬁgures except the two exhibiting “AllBasic Secrets”, where SDP B is used. Recall that this algorithm minimizes utility loss while maintaining a seriesof privacy guarantees. Here, the MSE is identical across mechanisms for each l eﬀ , but changes from one l eﬀ toanother.For the temperature data, our preprocessing steps were nearly identical, except we use the periodic kernel insteadof the RBF kernel, and we did not need to remove any traces from the dataset, as the data was much cleaner. Computation of Posterior Uncertainty Interval

Each of the plots in