Identification and Formal Privacy Guarantees
IIdentification and Formal Privacy Guarantees
Tatiana Komarova a and Denis Nekipelov b ∗ This version: June 2020
AbstractAbstract:
Empirical economic research crucially relies on highly sensitive individual datasets.At the same time, increasing availability of public individual-level data that comes from socialnetworks, public government records and directories makes it possible for adversaries to poten-tially de-identify anonymized records in sensitive research datasets. This increasing disclosurerisk has incentivised large data curators, most notably the US Census bureau and several largecompanies including Apple, Facebook and Microsoft to look for algorithmic solutions to provideformal non-disclosure guarantees for their secure data. The most commonly accepted formaldata security concept in the Computer Science community is referred to as differential privacy.Differential privacy restricts the interaction of the researcher with the data by allowing herto issue queries that evaluate the functions of the data. The differential privacy mechanismthen replaces the actual outcome of the query with a randomised outcome with the amount ofrandomness determined by the sensitivity of the outcome to individual observations in the data.While differential privacy does provide formal data security guarantees, its impact on theidentification of empirical economic models as well as on the performance of estimators in non-linear empirical Econometric models has not been sufficiently studied. Since privacy protectionmechanisms are inherently finite-sample procedures, we define the notion of identifiability of theparameter of interest as a property of the limit of experiments. It is naturally characterized byconcepts from the random sets theory and is linked to the asymptotic behavior in measure ofdifferentially private estimators.We demonstrate that particular instances of regression discontinuity design and averagetreatment effect may be problematic for inference with differential privacy. Under differentialprivacy their estimators can only be ensured to converge weakly with their asymptotic limitremaining random and, thus, may not be estimated consistently. This result is clearly supportedby our simulation evidence. Our analysis suggests that many other estimators that rely onnuisance parameters may have similar properties with the requirement of differential privacy.
JEL Classification:
C35, C14, C25, C13.
Keywords:
Differential privacy; average treatment effect; regression discontinuity; randomsets; identification ∗ a Department of Economics, Londons School of Economics and Political Science and b Department of Economicsand Computer Science, University of Virginia.Support from STICERD and the NSF is gratefully acknowledged. a r X i v : . [ ec on . E M ] J un A large portion of empirical work in Social Sciences and most notably in Economics relies on highlysensitive data. Sensitivity of the data may be well understood from the personal experiential per-spective and is largely associated with risks of potential exposure of individuals in the data whenthe data may reveal their personal or financial information which consequently would make themsensitive to the adversaries or embarrass them. At the same time, there has been a long strugglein attempts to formalize the concept of sensitivity of the data and the related concept of “privacyguarantee” that would measure how and to what extent sensitive attributes of the data are protected.The most significant progress in the efforts to formalize privacy protection has been made in theComputer Science literature. The mainstream approach there would consider any attribute of thedata as “sensitive.” At the same time, the security risks are considered in the worst-case scenariosetting where public components of the protected dataset or its summaries are accessed by anadversary. The adversary is assumed to have an arbitrary amount of auxiliary information that heor she can use to expose individuals in the data.The concept of differential privacy first introduced in [14] provides, arguably, the most acceptedformal definition of the security of the data in the Computer Science literature. Differential privacyis built on the idea of a data analyst communicating with a secure dataset by issuing queries whichare then evaluated by the privacy protection mechanism. Privacy protection mechanism alters anactual query outcome using an independent random noise. The privacy guarantee is measured bythe maximum possible change in the distribution of the randomized query if any one data entry isdeleted or altered. In other words, differential privacy ensures that no single individual/observationin the data can have a significant impact on the distribution of the randomized query. Differentialprivacy gives a broad set of guarantees of data protection from adversarial attacks.Given the universal theoretical appeal of differential privacy, its adoption as a security standard hasbeen considered by a variety of private enterprises and government bodies. For instance, Facebookrecently announced ([3]) that it will provide public access to the data on shared links within thesocial network via a differentially private protocol. In 2019 the US Census bureau has announced([1]) that it will use differential privacy as the baseline for privacy protection in 2020 Census. Thetransition to the differential privacy standards has caused an outcry in Social Science communityand resulted in an open letter signed by a large number of academic researchers ([2]).In the evaluation of privacy technologies Computer Science literature focuses on the “privacy-utility”tradeoff. In this literature, e.g. discussed in [34], [24], [8], [50]), the main idea is that privacyprotection makes the data “noisier” which reduces its “utility” but at the same time does notpreclude one from recovering the parameter of interest from the data, albeit possibly less accurately.In our previous work [28] we showed that this thought process is flawed. Since privacy protectionis fundamentally a finite sample paradigm, its impact on the properties of estimators has to bestudied using the concept of limits of experiments. From this perspective, the asymptotic behaviorof the estimator of interest should be viewed as the limit of experiments where an estimator withprivacy constraints is produced from samples on an increasing size. It is important to note, however,that in our earlier work [28] (as well as in [26], [27]) we do not analyze differential privacy. Therewe consider a framework where the data needed for estimation has to be combined from differentsplit datasets and analyze the identification of the parameter of interest in the presence of privacyguarantees. In particular, we show that the concept of k -anonymity, which was one of the firstattempts at a formal definition of privacy and is discussed below in more detail, is incompatiblewith the parameter identification in conditional moment models, which, of course, include manycommonly used econometric models. With any degree of k -anonymity imposed on the data onecan only recover a pseudo-identified set for the parameter of interest. This set is generally a non-singleton subset of the parameter space and it does not include the true parameter. In other words,the “privacy-utility” tradeoff is meaningless if one wants to enforce k -anonymity guarantees.In this paper we demonstrate that a similar drawback is inherent in the concept of differential privacy.There are without a doubt situations when differentially private estimation will be compatible withparameter identification. These are usually situation when the maximum of the influence functionof the estimator – the differential privacy literature refers to it as global sensitivity – converges tozero. In this paper we show that differentially private versions of several important estimators,including regression discontinuity design (RDD) estimators and standard average treatment effect(ATE) estimators, are incompatible with parameter identification. We show that the main reason forthis is the fact that the weights of observations used in these estimation techniques are data-driven.That is, the fact that the weights of some observations may drastically change with the change inone data point leads to these estimators having global sensitivities bounded away from zero (and insome cases even being infinite) even asymptotically. This leads to the loss of identification. Sincethe issue of data-driven weights is typical in many important econometric approaches, we believethat our negative findings will extend to other important frameworks.Before we review the formal notion of a differentially private mechanism and present our findings, wewant to touch upon some important issues in the privacy-related research. It is clear that in orderto talk about privacy-preserving approaches and their properties, one has to be able to characterizeformally the level of exposure induced by the adversary. This can done, for instance, in accordancewith the US Census Bureau’s analysis in [30] which distinguishes between identity disclosure , wherean adversary is able to identify if a specific data entry belongs to a certain individual, and attributedisclosure where an adversary is able to find out if a particular individual has a particular character-istic (e.g. belongs to a certain group). Formal legal approaches to protection of individual data, forinstance, behind HIPAA and FERPA are based on clear prioritization of identity disclosure over at-tribute disclosure and mandate the removal of specific demographic and personal identifiers to make When differentially private estimation is compatible with parameter identification, then the issue of the asymptoticdistribution of the differentially private estimator arises. It is a different problem and we don’t discuss it in detail inthis paper. However, we briefly touch upon this issue in Example 1 in Section 2, where one can see that to ensuretraditional rates one would need the global sensitivity to converge to zero fast enough. the data “less sensitive.” Computer Science literature, at the same time, clearly demonstrates thatthe reduction in “sensitivity” of the data based on potential exposure level by removing individualcharacteristics is highly ineffective. The examples of successful attacks on the “anonymized” datalead to the early work on the formal definition of privacy guarantees that resulted, in particular, inthe development and the implementation (see [45], [46], [47], [32], [5], [33], [12], among others) ofthe so-called k -anonymity approach. A database instance is said to provide k -anonymity, for somenumber k , if every way of singling an individual out of the database returns records for at least k individuals. In other words, anyone whose information is stored in the database can be “confused”with k others. It is important to note that the k -anonymity approach was primarily targeted to pre-vent the identity disclosure and, as is well known, does not prevent the attribute disclosure. It alsoimplicitly requires that the data curator responsible for protecting the data is aware of all possibleauxiliary information that could be available to the adversary. These features make k -anonymityand its variants rather impractical.The concept of differential privacy developed in [13] provides a measure of privacy guarantees with-out these complications and addresses both identity and attribute disclosure concerns. Differentialprivacy formalizes the interaction of a user with a “sensitive” database via queries that are sub-mitted through a secure server with a privacy protection mechanism. These queries are functionsthat need to be computed on the data representing summaries or tabulations of the data. Theassumed risk to the database is that the queries to the database can be issued by an adversary. Theprivacy protection mechanism has two main elements that allow it to be effective against arbitraryadversaries. First, it takes into account the “sensitivity” of the query to the data, that measures themaximum change in the output of the function computed on the data if any data entry is deletedor altered. Second, it independently randomizes the outcome of the query which ensures that theproduced outcome is not correlated with any auxiliary information that an adversary may have.More formally, when the query to the data is an estimator (cid:98) θ , for instance, representing a meanor a median of a particular variable in the data or the vector of estimated coefficients in a linearmodel, differential privacy requires to replace the point estimator of interest (cid:98) θ with a randomizedfunction θ ( P N , ν ), where P N is the empirical distribution of the dataset and ν is an independentrandom element. In other words, it is required that the estimator is randomized using independentnoise. Independence of random element ν from the data sample is essential, since any correlationmay allow adversary to recover potentially sensitive attributes of observations in the data. While For instance, [47] identified the medical records of William Weld, then governor of Massachusetts, by linkingvoter registration records to “anonymized” Massachusetts Group Insurance Commission (GIC) medical encounterdata, which retained the birthdate, sex, and zip code of the patient. In another example in [41] the risk of disclosurewas identified in the so-called “Netflix prize dataset.” In 2009 Netflix announced a competition with a grand prize of$1M for developers of the prediction algorithm that would use the information on past viewership history for givenconsumer and how this consumer rated those movies and be be able to predict how that consumer would rate themovies that he or she has not seen yet. For this competition Netflix released an “anonymized” dataset containing100,480,507 ratings produced by 480,189 consumers. [41] used the public movie review data of users from imdb.comand were able to link a significant fraction of “anonymous” consumers on Netflix to imdb.com users based on theuniqueness of watch histories. the distribution of the random element can be adjusted to the general properties of the population data distribution P N such as the sample size N and the number of variables, it may not depend onthe specific values of observations identifying the location of point masses of P N .We now provide formal definition of differential privacy in application to randomized estimator θ ( P N , ν ). DEFINITION 1 (( (cid:15), δ )-differential privacy [13]) . A randomized estimator θ ( P N , ν ) is ( (cid:15), δ ) -differentiallyprivate if for any two empirical distributions P N and P (cid:48) N over N support points and differing (ar-bitrarily) in only one support point, we have that for all measurable sets A of possible outputs thefollowing holds: P ν [ θ ( P N , ν ) ∈ A ] ≤ e (cid:15) P ν [ θ ( P (cid:48) N , ν ) ∈ A ] + δ, (1.1) where (cid:15) > , δ ∈ [0 , are privacy parameters and probabilities are taken over randomness in ν . Inaddition, if δ = 0 , then the estimator θ ( P N , ν ) is referred to as (cid:15) -differentially private. In this definition we use notation P ν ( · ) to emphasize that differentially private estimator θ ( P N , ν ) isbased on the distribution of random element ν while the distributions of two adjacent datasets P N and P (cid:48) N are fixed. The bound in the definition has to be valid for any possible empirical distributionsof the data P N and P (cid:48) N different in one support point no matter what the probability of realizationof these datasets is.In Section 2 we discuss this notion in detail and review some common differential privacy practice.We then develop a notion of identification under differential privacy. As we can see from Definition1, the differential privacy approach is implemented in the finite samples whereas, as is well known,identification in Economics and Econometrics is a population property. Thus, to bring these twoworlds – the population and finite sample – together, we propose to look at the identification (orlack of such) as the property of the limit of experiments, an idea we used in our previous work[28] for privacy under data combination. However, the identification approach we suggest underdifferential privacy is different from that in [28], which is not surprising given that differentiallyprivacy is a fundamentally distinct approach for privacy protection. We first consider a set ofestimators that can obtained in a finite sample by applying a differentially private mechanism. Thiscan be viewed as a random set and, thus, it is only natural for us to rely on the well-developed andextremely useful theory of random sets of [39] and [40] when defining the notion of identification inthe limit of experiments. We argue that the selection expectation the random set of estimators orother deterministic characteristics of the random sets are not suitable notions for identification basisin this case as they are in conflict with the practice and fundamental requirements of differentialprivacy. Instead, we argue that it is suitable to base the notion of identification on the weak limitof these random sets in the limit of sequence of experiments and its corresponding containmentfunctional . We discuss the notion of identification and pseudo-identified set and illustrate them withsome examples.In Section 3 we conduct the identification analysis for differentially private estimators in regressiondiscontinuity design models. We establish the lack of identification for differentially private versionsof nonparametric regression at the boundary and local linear estimators. We also discuss implicationsof differential privacy for specification tests in the RDD framework and present some simulationevidence.In Section 4 we conduct the identification analysis and the performance of differentially privateestimators in models with treatment effects and show that generally the ATE under differentialprivacy is not identified.Section 5 concludes.The proofs of all the results are collected in the Appendix. In our previous work [28] we considered identification in models where a researcher has access to adataset that was obtained by combining split datasets subject to constraints on the prevention ofthe identity disclosure. There, both the data combination procedure and the impositions of privacyconstraints such as k -anonymity) were intrinsically finite sample procedures. To reconcile the natureof these procedures with the population nature of the identification, we argued that the identificationnotion for econometric models from combined and privacy-protected data can only be defined as alimit of a combined output of the privacy preserving procedure and finite sample distribution of thedata. In this paper, we aim to develop an approach to analyze identification of econometric modelswhen the data curator requires the output of the econometric procedure to be differentially private.For that, we can use ideas realted to those in [28].As discussed in the introduction, in the context of point estimation differential privacy requires oneto replace the point estimator of interest (cid:98) θ with a randomized functional θ ( P N , ν ), where P N isthe empirical distribution of a given dataset and ν ∈ V is an independent random element whichis assumed to be belong to a Banach space V . In other words, it is required that the estimatoris randomized using independent noise orthogonal to the distribution inducing P N . Independenceof random element ν from the data sample is essential as any correlation between them may allowadversary to recover potentially sensitive attributes of observations in the data. Randomized esti-mator θ ( P N , ν ) in Definition 1 only ensures that information regarding individual data entry cannotbe reverse-engineered from its values. There are two important features of the privacy preservingmethodology we want to emphasize. The first one is that even though the distribution of ν can beadjusted to the general properties of the population data distribution in P N (such as the the numberof variable and the support variables) and can depend on the sample size N , it may not depend onthe specific values of observations producing P N . The second feature is that the privacy protectionhas to be guaranteed for every possible realization of the data P N . These two features contribute tothe powerful privacy-preserving framework delivered by differential privacy. But at the same timethey contribute to the possible lack of identification of parameters in differentially private versionsof some important econometric models (examples of those are given in section 3 and 4), which aswe will see, will be closely related to poor asymptotic properties of differentially private estimatorsin these models.It is clear from the definition of differential privacy that the smaller values of parameters ε ≥ δ ≥ ε as it measures the range of the likelihood ratio of distributions of randomizedestimators with two adjacent datasets while δ measures the (lack of) overlap between the sets of theserandomized estimators) correspond to stricter privacy restrictions and have to be chosen by a datacurator. As noted in [14], it is advisable that parameters in the definition of ( ε, δ )-differential privacyare calibrated such that both of them are allowed to approach zero as the sample size increases. inthis case we can write them as ( ε N , δ N ). Our coverage and the identification notion will allow for avariety of situations – we can( ε N , δ N ) to be constant as well decreasing with N and this is formallygiven in Section 2.2.To give readers a more complete coverage of differential privacy approaches, we start with an exampleof a statistical procedure that admits consistent parameter estimation even with the requirement ofdifferential privacy (such examples can be found in [15], among others). EXAMPLE 1 (Sample mean of a random variable with a bounded support) . Suppose that our goalis the estimation of the mean of random variable X with a bounded support from the sample of i.i.d.observations { X i } Ni =1 with empirical distribution P N . Consider the so-called Laplace mechanismwhen the estimator θ ( P N , ν N ) is obtain in the following additively separable fashion θ ( P N , ν N ) = ¯ X + a N ( ν N ) ,a N ( ν N ) ∼ Lap (0 , λ N ) where the Laplace distribution Lap ( µ, λ ) has density p ( x ; µ ; λ ; ) = λ exp (cid:16) − | x − µ | λ (cid:17) . If we choose µ = 0 and λ N = diam (Θ) Nε N (or greater of equal ), where Θ denotes the support of X , then θ ( P N , ν N ) is ( ε N , -differentially private because for any z ∈ R , f a N ( ν N ) (cid:0) z − ¯ X (cid:1) ≤ e ε N f a N ( ν N ) (cid:18) z − ¯ X − N ( X (cid:48) i − X i ) (cid:19) , for any X i , X (cid:48) i and ¯ X .If ε N remains bounded away from zero or even if ε N → as N → ∞ but N ε N → ∞ , then θ ( P N , ν N ) is obviously a consistent estimator of the population mean of X as the variance of the noise factor ν N decreases to zero. If, however, ε N = O ( N ) , then θ ( P N , ν N ) can be shown to be no longer consistent.If instead of looking at just θ ( P N , ν N ) , one wants to analyze the asymptotic behavior of the √ N ( θ ( P N , ν N ) − E [ X ]) (2.1) traditionally used for econometric inference, then one can show that if N ε N → ∞ , then the weaklimit of (2.1) is the same as for √ N ( ¯ X − E [ X ]) – that is, N (0 , V ar [ X ]) . If N ε N → constant > ,then the weak limit of (2.1) is still normal but with a large variance, thus leading to less accuracy inthe estimation. If N ε N → , then the weak limit of (2.1) does not exist. Thus, as we can see in Example 1, it is possible to have differentially private versions of samplemeans with inference being pretty much the same as for the original estimator. It does not mean,however, that differential privacy is broadly compatible with econometric inference. Example 2below demonstrates that even in the case of the sample mean a differentially private estimator maylose its nice asymptotic properties if we take X whose support is unbounded. EXAMPLE 2 (Sample mean of a random variable with unbounded support) . Suppose that incontrast with the situation in Example 1 the support of X is unbounded (for simplicity, we willtake it to be R ) but the variance of X is still finite. Then an ( ε, δ ) - differentially private estimator θ ( P N , ν N ) obtained by the addition of a mean zero noise to the sample mean ¯ X = N (cid:80) Ni =1 X i maynot be consistent.Indeed, suppose that the noise component has the density f a N ( ν N ) ( · ) . Definition 1 requires that ateach point and, in particular, at ¯ Xf a N ( ν N ) (cid:0) ¯ X (cid:1) ≤ e ε f a N ( ν N ) (cid:18) ¯ X + 1 N ( X (cid:48) i − X i ) (cid:19) + δ, for any X i , X (cid:48) i and ¯ X .The existence of the finite mean of random variable a N ( ν N ) is equivalent to the convergence of theimproper integral (cid:82) ∞ v t f a N ( ν N ) ( t ) dt for each v , which requires that lim t →∞ t f a N ( ν N ) ( t ) = 0 This impliesthat we can choose X (cid:48) i and X i such that f a N ( ν N ) (cid:0) ¯ X + N ( X (cid:48) i − X i ) (cid:1) ≤ e − ε f a N ( ν N ) (cid:0) ¯ X (cid:1) . Given that ¯ X can be an arbitrary value on the real line , Definition 1 then requires that for all N and all points t ∈ R , it holds that f a N ( ν N ) ( t ) < δ , which is clearly incompatible with the consistencyof the estimator which would require “concentration” of the distribution of a N ( ν N ) around zero as N → ∞ . Note that we have inconsistency of our differentially private estimator here even for fixed ( ε, δ ) . Ifthe parameters of differential privacy are drifting to zero with N the inconsistency problem wouldbecome even more severe.One approach to “fix” the behavior of the estimator is to consider trimmed or windsorised versionsof the sample mean. Trimming would bound the scale of noise that will need to be added to theestimator. That, however, may interfere with the asymptotic distribution of the mean depending Of course, some values of ¯ X may be considered to be very unlikely but we want to note here that the differentialprivacy notion requires considering all possible realizations of the samples regardless of their likelihood, and for anyvalue on the real line we can certainly find realizations of P N that will give the sample mean equal to that value on the tail behavior of the distribution of X leading to the domination of the distribution of thedifferentially private estimator by the added noise. We would like to emphasize that Example 2 demonstrates the non-existence of consistent differen-tially private estimators for means of unbounded random variables generated by additive noise.
Onemay ask if maybe some non-additive way to incorporate noise would result in a consistent differen-tially private estimator. Our later discussion in Section 2.4 shows that in this situation inconsistencyis an intrinsic property of any differentially private estimator with some basic smoothness require-ments. As is evident from Example 1, support restrictions may mitigate this issue, but the fact thatDefinition 1 is incompatible with consistent estimation of means of many commonly used randomvariables appears to be an unfortunate shortcoming of differential privacy. Our detailed analysis ofpopular applied econometrics methods in sections 3 and 4 will cover and discuss similar shortcomingseven in models with bounded supports for all the variables.One reason we have focused on the consistency property so far is because it is a minimum desirableproperty of a good estimator. The second reason is directly related to what we do in the remainder ofthis section – our notion of identification is related to weak limits of differentially private estimatorsand the property of consistency describe special cases of those weak limits.
In this section, we present our formal approach to identification for models with differentially privateoutcomes.We consider a sequence of statistical experiments indexed by the sample size N ( N → ∞ alongthis sequence), where for each N we generate an i.i.d. sample { z i } Ni =1 from the joint distribution of d -dimensional random vector Z leading to empirical distribution P N . We assume that the parameterof interest θ is in the interior of p -dimensional convex compact parameter space Θ ⊂ R p We then consider randomized estimators θ ( P N , ν N ) ∈ Θ where random element ν N ensures that theserandomized estimators are ( ε N , δ N )-differentially private for some sequences ε N and δ N . To ensureprotection from adversarial attacks on data P N added random element ν N has to be statisticallyindependent from ν N . In our analysis in this section we use the techniques from the theory of random sets which reflect thespirit of data analysis with differential privacy: the random element ν N or the technique that is usedto produce the randomized estimator θ ( P N , ν N ) aiming to represent parameter of interest θ may notbe available to the researcher. Instead, data curator controlling the dataset inducing the empricialdistribution P N reports parameters ( ε N , δ N ) which yield the upper bound guarantee for differentialprivacy of a given estimated output (and, possibly, the algorithm used for implementation of θ ( · , · )).This means that there can be an entire class of estimators for parameter θ that satisfy differentialprivacy with these parameters.0While we provided examples of mechanisms that can be used to achieve differential privacy inthe previous section, we need to formally define the structure of a general differentially privateestimator. A differentially private estimator takes as an input data sample that produced empiricaldistribution P N and a random element ν N and outputs a point in Θ . We treat ν N as a “seed” forrandomness represented by a fixed standardized random variable that is then transformed by theestimator into the random variable used in a particular mechanism for differential privacy. E.g. ν N can be a uniformly distributed random variable (or a vector of such variables) on [0 ,
1] that is thentransformed by a mechanism into a Laplace random variable. We now give a formal description ofthe class of our considered estimators.
ASSUMPTION 1.
The class of estimators is formed by a class of bounded operators M such that:(i) M is a collection parametric families of operators such that operators M θ,ν ∈ M are well-defined for each θ ∈ Θ and ν ∈ V ;(ii) M θ,ν : D ( R d ; [0 , (cid:55)→ R p for all M θ,ν ∈ M and all θ ∈ Θ and ν ∈ V (where D ( R d ; [0 , isthe Skorohod space of functions);(iii) For each F ∈ D ( R d ; [0 , and parametric family { M θ,ν : θ ∈ Θ , ν ∈ V} , M θ,ν ( F ) is Lipchitz-continuous in θ and ν ;(iv) Differentially private estimator is θ ( P N , ν N ) is defined as a solution of the system of equations M θ,ν N ( P N ) = 0 for θ over parametric family { M θ,ν : θ ∈ Θ , ν ∈ V} where P N is the empirical distribution ofsample { z i } Ni =1 . We can illustrate this assumption for differentially private estimate of a sample mean using Laplacemechanism for the sample i.i.d. draws { X i } Ni =1 of random varible with bounded support and randomelement ν N ∼ U [0 , M θ,ν ( F ) = (cid:82) + ∞−∞ z dF ( z ) + diam (Θ) Nε N F − ( ν ) − θ (where F − ( · ) isthe inverse cdf of the standard Laplace distribution). We note that the empirical moment inducedby this operator produces ( ε N , ε N , δ N ) such that ε N ≤ ¯ ε and δ N ≤ ¯ δ for all N for some universal constants ¯ ε and ¯ δ . DEFINITION 2.
For a given sequence of ( ε N , δ N ) , we say that an ( ε N , δ N ) -differentially privateestimator θ ( · , · ) : Z N × V → Θ satisfying Assumption 1 is regular for the parameter of interest θ ifthe following conditions hold: (i) θ ( P N , ν N ) is a continuous random variable with respect to the Lebesgue measure;(ii) in the absence of the mechanism noise – that is, when the estimator is θ ( P N , , – there existsa function ¯ R ( N κ ) , such that lim N →∞ ¯ R ( N, κ ) = 0 that for all N and for all κ > : P ( (cid:107) θ ( P N , − θ (cid:107) > κ ) ≤ ¯ R ( N, κ ); (2.2) (iii) θ ( P N , ν N ) has a weak limit if the sequence ( ε N , δ N ) is convergent. Condition (i) states that the distribution of θ ( · , · ) has a density. Condition (ii) implies that in theabsence of any mechanism noise are informative for the parameter of interest θ – in particular, theestimator is consistent and has a guaranteed rate of convergence (in most practical scenarios ¯ R ( N, κ )would be required to be exponentially decreasing in N and κ ). The condition that function θ ( · , · )takes values in Θ only, ensures that in cases when the mechanism noise would drive the estimatoroutside of the parameter space Θ, such an estimator would be projected on the boundary of Θ.Finally, condition (iii) requires the differentially private estimator to converge in distribution.Now, having defined a class of regular differentially private estimators, we make the next step towardsthe notion of identification.Our next notion will depend on the set of sequences of ( ε N , δ N ) a data curator is willing to consider.E.g. it could be the set of sequences where ε N and δ N do not change with N , or it could a set ofsequences converging to zero at a certain rate. We will refer to a fixed set of sequences as E . Wewill suppose that this set of sequences is a join-semilattice in the coordinate-wise partial order for( ε N , δ N ) – that is, the join of any two sequences from E is also in E .Following [39], we use the concept of measurable selection to define the set of all regular differentiallyprivate estimators. DEFINITION 3.
Consider set T ∗ N, E of all random variables θ ( P N , ν N ) : Z N × V (cid:55)→ Θ satisfyingDefinition 2 and corresponding to sequences ( ε N , δ N ) from E . We define random set T N, E referredto as the set of regular differentially private estimators for θ for a given E as the completion of theset T ∗ N, E ∩ L ( P ) with respect to L ( P ) -norm. L ( P ) is a space of measurable functions that map the elements of the σ -algebra on Z N (withthe product measure defined by the probability measure on Z ) and the σ -algebra associated withrandom elements ν N mapping into R p , and for which the Euclidean norm is integrable. By definitionof θ ( · , · ) in 2 and compactness of Θ, all elements in T ∗ N, E and, thus, in T N, E as well are bounded inthe L -norm.Random set T N, E is compact and convex in the sense of Definitions 1.30 and 4.32 in [39], as shownin Lemma 1. LEMMA 1. T N is convex and compact random set. T N, E overlap with an arbitrary neighborhood of θ with probability approaching 1. However,the notion of the probability limit is too strong as it will not allow us to talk about the limit in thefollowing simple instance of regular differentially private estimators: θ ( P N , ν N ) = θ ( P N ,
0) + a N ( ν N )where the variance of a N ( ν N ) remains constant or increases as N → ∞ (the estimator θ ( P N ,
0) inthe absence of the mechanism noise is, of course, consistent by the condition (ii) in Definition 2).As we will see later, such situations will be prevalent in the estimation of ATE and RDD models.In fact, we already conveyed our intention to consider weak limits in condition (iii) in Definition 2.The next lemma demonstrates that weak convergence is the strongest plausible convergence conceptto consider unless weal limits of regular differentially private estimators are constant.
LEMMA 2.
Suppose that θ ( P N , ν N ) W → τ as n → . Then if τ is not constant with a positiveprobability, then there exists ¯ κ > and γ > such that for all κ ≤ ¯ κ lim sup N →∞ P ( | θ ( P N , ν N ) − τ | > κ ) > γ. Theorem 1 establishes weak convergence of the convex compact random set T N, E when all thesequences of ( ε N , δ N ) in E are convergent. THEOREM 1.
For a E be the join-semilattice that consists of only convergent sequences of ( ε N , δ N ) . The random set T N, E as defined in Definition 3 weakly converges to a random set T E ,which is the closure of all weak limits of estimators in the respective random set T ∗ N, E . The convergence result of Theorem 1 is essentially a result of what happens in the limit of statisticalexperiments. Naturally, we will base our notion of identifiability and, more generally, pseudo-identified sets, on some characteristics of the random set T E . T E . Notions of identification and(pseudo)-identified set. Our next step will be to produce a tangible characterization of the random set T E and define theinformation content delivered by this set T E with regard to the parameter of interest. The best casescenario from the information content point of view is the case when T E is the degenerate distributionconcentrated at θ , which essentially means that the sequence of the statistical experiments deliversthe true parameter values in the limit. Generally, however, this may not be the case. One of themain difficulties in characterizing this set and its information content is that it may not containthe target parameter θ (i.e. it can be “biased”) and that the distribution of T E may not be3degenerate (i.e. it is not “consistently” estimating θ ). Some important work in the random setsliterature, such as [7] and [6], defined the information content (or, in other words, identified set) asthe selection expectation of a random set. In our previous work [28] we used the concept related toselection expectation to analyze the impact of privacy guarantees (in particular, k -anonymity) underdata combination. Thus, selection expectation might seem like a promising approach to explorein our framework as well, especially given that such characterization is deterministic, but for thereasons explained below (privacy budget for differentially private mechanisms and the impossibilityof repeated experiments), in the differentially private setting we do not see this approach as a fruitfulone. Instead, our pseudo-identified set (“pseudo-” because it does not necessarily contain the trueparameter value) will be the random set T E itself. The notion of a random identified or a randompseudo-identified set is not traditional in econometrics as usually researchers are able to extractdeterministic consensus about which parameter values can be driving observables. We argue thatin the differentially private setting this is the preferable approach. Some other work, such as [25]employed the notion of random identified set. In [25], the source of probability that induces therandom identified set is the posterior uncertainty for the identifiable parameters. In our case, thissource is a combination of the sampling uncertainty of the observations and the mechanism noise,and it is not possible to separate these two sources.Having given the gist of the content of this section, we now turn to a more detailed discussion.First, as mentioned above, in the context of converging random sets, as for converging randomvariables, the notion of the expectation or the median (or some other quantile) might appear anatural way to characterize the limit (and consequently provide the framework for identification).Recall that the selection (or Aumann) expectation of random set T N, E denoted E T N, E is the closureof the set { E ξ : ξ ∈ T N, E } . However, the selection expectation (along with any first-order statistic,such as the Vorob’ev’s expectation) fails to be “representative” for the limiting random set T E . Thislack of representativeness stems from the inherent impossibility to replicate a statistical experimentwhose outcome is the regular differentially private estimator driven by the required structure ofdifferentially private systems as discussed in [15]. Indeed, in the context of differentially privatesystems no function evaluated on the data can be considered in isolation. Differential privacy isthe property of the entirety of all functions that have ever been or will ever be evaluated from agiven dataset. By the composition property of differential privacy, two different functions that, forinstance, are ( ε/ , ε, ε in the definition of differential privacy is considered to be apolicy parameter, then it determines the “privacy budget” of a given database. The more functionsneed to be evaluated from the data, the more noise will need to be added to each function to ensurethe entire set of functions is within the “privacy budget.” The evaluation of K functions to ensure( ε, ε/K, K times less sensitive to arbitrary changes in individual observations inthe dataset. As discussed in the introduction, k -anonymity is a formal privacy guarantee that predates differential privacy). ν N to produce the regular differentially private estimator θ ( P N , ν N )is generated only once and then any data user who wants to estimate a given parameter θ willobserve exactly the same value of the randomized estimator θ ( P N , ν N ). In other words, repeatedidentical queries to the data always result in the same (randomized only once) output.From this perspective, the concept of the selection expectation or other related statistics is clearlymisleading for a characterization of the limiting random set T E . While it may be the case that E ν N θ ( P N , ν N ) is close to θ with high probability (as N → ∞ ) for all measurable selections θ ( P N , ν N )of T N, E , there is no guarantee that θ ( P N , ν N ) is also close to θ with high probability and under“privacy budget” considerations there is no way a researcher can access repeated samples from thedistribution θ ( P N , ν N ) corresponding to a given empirical distribution P N to “average out” theadded noise ν N . To put in other words, given the “privacy budget” considerations, the selectionexpectation E ν N θ ( P N , ν N ) is not “feasible” in the differentially private framework. Moreover, aswe illustrate in the example in Section 2.3, differential privacy may be in conflict even with theidentification of expectations. It is clear that we need to use a different approach to characterizingthe limiting random set T E . To provide a comprehensive characterization of random sets T N, E and T E we use the notion of thecontainment functional adopted from [39]: DEFINITION 4.
Functional C X ( K ) = P ( X ⊂ K ) for convex compact subset of Θ is referred toas the containment functional of random set X . By Theorem 1.7.8 in [39] containment functional provides a complete characterization of convexcompact random set. Moreover, it is sufficient to choose the “test sets” K to be convex polytopes.Coverage functional preserves the property of the weak convergence of the sequence of random sets.We summarize this in the following theorem THEOREM 2.
Under conditions of Theorem 1 for any convex polytope K ⊂ Θ C T N, E ( K ) → C T E ( K ) , as N → ∞ . This theorem is a simple corollary of Theorem 1.6.5 in [39] and it ensures that the coverage functionalpreserves the properties of the converging sequence of random sets T N, E and, more importantly, itslimit T N, E . The characterization of the limiting coverage functional would equivalently characterizethe limiting random set. In other words, the analysis of weak convergence of random sets can bereplaced with the analysis of pointwise convergence of the coverage function on the set of convexpolytopes contained in Θ . We now formulate the notion of identifiability of the parameter of interest.5
DEFINITION 5 (Identifiability of parameter under differential privacy) . Let E include only someconverging sequences of ( ε N , δ N ) . We will say that the parameter θ is identified in the regular ( ε N , δ N ) -differentially private framework, where the sequences of ( ε N , δ N ) belong to E , if and only iffor any α ∈ (0 , and any convex polytope K (cid:51) θ C T E ( K ) ≥ − α. Theorem 3 below gives necessary and sufficient conditions for the identification of parameter θ . THEOREM 3.
Suppose the conditions of Theorem 1 hold. For any sequence of ( ε N , δ N ) from E it holds that any regular ( ε N , δ N ) -differentially private estimator θ ( P N , ν N ) is such that θ ( P N , ν N ) p → θ , if and only if for any α ∈ (0 , and any convex polytope K (cid:51) θ we have C T E ( K ) ≥ − α and, thus,the parameter θ is identifiable even under differential privacy. Theorem 3 provides our characterization of identifiability which corresponds to the convergence ofthe sequence of random sets to a singleton. In other words, this parallels consistency for sequencesof ordinary random variables.Based on the same principles we can characterize the case of non-identifiability .
DEFINITION 6 (non-identifiability of parameter under differential privacy) . Let E consist ofconverging sequences of ( ε N , δ N ) . We will say that the parameter θ is non-identified in the regular ( ε N , δ N ) -differentially private framework, where the sequences of ( ε N , δ N ) belong to E , if and only ifthere exists β ∈ (0 , and a convex polytope K β (cid:51) θ such that C T E ( K β ) ≤ − β. Non-identifiability implies that the limiting random set is not degenerate. Therefore, it becomesimpossible to pinpoint the true parameter θ by tracing a “mass point” of the coverage function ofthat limiting random set T E . This makes the analysis of partial identification in our case differentfrom the traditional approach where partial identification aims to construct a deterministic set thatcontains the parameter of interest θ . In our case where the coverage function is non-degenerate inthe limit it is impossible to construct such a deterministic set. At the same time, coverage functionitself may be difficult to work with in practice. To address this we define the (pseudo)-identified setas a set of probability distributions:
DEFINITION 7.
Pseudo-identified set for parameter of interest θ produced by regular differen-tially private estimators is a class of distribution functions F θ , E such that for each F ∈ F θ , E thereexists measurable selection ξ ∈ T E such that F is the distribution function of ξ. One of the mechanisms most commonly used to induce differential privacy in theoretical literature(e.g. [15]) is the Laplace mechanism in which the original estimator is augmented by the addition ofindependent double exponential (or, Laplace) noise calibrated in a specific way. In this section, weconsider an example in which we illustrate the construction of a more general family of differentiallyprivate estimators by combining the Laplace mechanism with a random subsampling procedure.The resulting combination produces a random set of regular differentially private estimators. Forsimplicity and for the sake of highlighting important issues related to identifiability, we will evenassume that the researcher is informed of the Bernoulli-Laplace mechanism being used to deliver adifferentially private output, even though in practice a data curator may not release that information(hence, our generic notation for a “seed” ν N ).The reason why we want to highlight this specific mechanism is because, as we discuss further in thispaper, many relevant estimators in Economics can be viewed as being. constructed from weightedmeans. In fact, in the RDD models estimation in Section 3 and in the ATE estimation in Section 4 itwill be clear that the lack of identifiability under differential privacy and poor statistical performanceof regular differentially private estimators (even with relatively weak privacy requirements) willstem, in particular, from the weights of observations not being fixed but decided by the data. Theseweights may vary in a certain fixed interval even when only one observation in the dataset becomesdifferent. The example in this section considers an extreme version of such situations when theweights of observations in the weighted mean are 0/1 with some probabilities and these weights areindependent of other available data. It provides a nice support for our subsequent discussion ofRDD and ATE differentially private estimators.We will take X to have support on [0 , θ = E [ X ].We consider the following mechanism for obtaining a regular differentially private estimator (thechoice of ( ε N , δ N ) parameters is discussed later) on the basis of the i.i.d. sample { x i } Ni =1 .1. First, we create a subsample from { x i } Ni =1 that would independently randomly include eachobservation with probability π N and that would exclude it with probability 1 − π N
2. We compute the weighted average of the included observations and output that weightedaverage with added random variable u N , where u N ∼ Lap (0 , λ N ): θ ( P N , ν N ) = 1 nπ N n (cid:88) i =1 d i x i + u N , where d i is a Bernoulli random variable with parameter π N . Variables d i , i = 1 , . . . , N , aremutually independent. To ensure that θ ( P N , ν N ) ∈ Θ , we consider the estimator as projectionof θ ( P N , ν N ) on Θ . ε N and for δ N = 0 if for each pair of samples different from each other by a singleobservation (suppose this is N -th observation), the likelihood ratio L N ≡ c + ac + b = 1 + a − bc + b , where a = π N (cid:88) S υ ⊆S\{ x N } π |S υ | N (1 − π N ) n − −|S υ | · e − | t − (cid:101) θ S υ ∪{ xN }| λN ,b = π N (cid:88) S υ ⊆S\{ x N } π |S υ | N (1 − π N ) n − −|S υ | · e − | t − (cid:101) θ S υ ∪{ x (cid:48) N }| λN ,c = (1 − π N ) (cid:88) c ST ⊆S\{ x N } π |S υ | N (1 − π N ) n − −|S υ | · e − | t − (cid:101) θ S υ | λN does not exceed e ε . Note that the maximum absolute change in our estimator with the changein one observation (the so-called global sensitivity ) is 1 / ( N π N ). Using the partition of unity (cid:80) S υ ⊆S\{ x N } π |S υ | N (1 − π N ) N − −|S υ | = 1, we can write the upper bound on L N as follows: L N ≤ π N ( e N λN πN − . Therefore, ( ε N , − π N + π N exp(1 / ( N λ N π N )) ≤ exp( ε N ) , for which a sufficient condition is ε N ≥ π N exp(1 / ( N λ N π N )) . (2.3)As a result, for a converging sequence ε N → ε there will be a family of sequences π N and λ N thatensure differential privacy.Let first discuss the extreme cases. This estimator can be made (0 , π N ≡ λ N ≡ + ∞ , which means that the Laplace distribution for the mechanism noisehas an infinite variance. From Definition 2 we can clearly see that this estimator is not a regulardifferentially private estimator. Indeed, the requirement of ε N = 0 is very strong. At the otherextreme, we can consider (+ ∞ , π N ≡ λ N ≡
0. That is, the mechanism noise is zero and this estimator is consistent with (2.2) in Definition2 with ¯ R ( N, κ ) = 2 d exp( − N κ / diam(Θ) ) obtained from the Hoeffding bound.We can focus on the cases when 0 < ε N < + ∞ . Note that we can always find a family of sequences λ N and π N such that ε N ≥ π N exp(1 / ( N λ N π N )) . In this family to ensure weak convergence of8differentially private estimator (one of the requirements of Definition 2) , the sequence π N canconverge to any limit in [0 , ε ] . At the same time, the non-negative-valued sequence λ N can havelimits on [0 , + ∞ ) with + ∞ indicating a divergent sequence. Without a loss of generality, we assumethat sequences λ N and π N are monotonic. We then consider the behavior of the resulting regulardifferentially private estimator in the following series of various regimes. Regime 1: λ N → ∈ (0 , + ∞ ) as N → ∞ . Regime 1A.
In this regime, whenever π N (cid:29) /N, the variance of the randomized estimator θ ( P N , ν N )is O (1 / ( N π N ) + λ N ) = o (1) , meaning that the estimator converges weakly (and, of course, inprobability) to E [ X ]. Thus, the distribution of the random element τ in the limit is degenerate.Note that in order to guarantee ( ε N , λ N → E that consists of converging sequences ( ε N , ε N ≤ ¯ ε , thatsatisfy (2.3) (note that ε N may not converge to zero too quickly), the random set T E is degenerateat θ = E [ X ], which means that in this regime the parameter of interest is identified. Thus, in thiscase if one is willing to impose weaker privacy restrictions as N → ∞ in the sense that ε N goesto zero slowly enough, then the parameter is identified. However, even in this optimistic scenarioone has to remember that point identification is obtained given the knowledge of the asymptoticbehavior of λ N and π N , which a data curator may not release in such a detail (imagine, e.g. thedata curator releasing the rate for π N but at the same time releasing only a lower bound on λ N )! Regime 1B.
When lim N →∞ N π N = c ∈ [0 , + ∞ ) , the downsampled mean converges weakly to therandom variable Λ( c ) = k (cid:80) kj =1 X j , where X j are independent random variables distributed as X and k is Poisson random variable with parameter c. Given that λ N → , a weak limit τ ofthe randomized estimator θ ( P N , ν N ) projected on Θ is distributed as Λ( c ) projected on Θ. Thedistribution of Λ( c ) projected on Θ is the pseudo-identified set in this regime.Note that in order to guarantee ( ε N , λ N → ε N ,
0) comprising E thatconverges to zero would have to do so more slowly than in Regime 1A leading to further weakeningof differential privacy guarantees. Regime 1C.
When lim N →∞ N π N = 0 , the downsampled sample average converges weakly to a masspoint at 0 . Differential privacy will be guaranteed for all sequences ε N (cid:29) /N for λ N converging to0 sufficiently slow to satisfy (2.3). Regime 2: λ N → λ ∈ (0 , + ∞ ) as N → ∞ . Since in this regime the variance of the mechanism noise does not diminish, one can impose strongerdifferential privacy guarantees in the sense of taking sequences ε N converging to zero faster than inrespective cases in Regime 1.9 Regime 2A.
For any sequence π N (cid:29) /N, the downsampled sample average nπ N (cid:80) Ni =1 d i x i convergesin probability to E [ X ] . Its variance is O (1 / ( N π N )) = o (1) , which means that the Laplace noise willdominate the asymptotic behavior of regular differentially private estimators. The randomized differ-entially private estimator θ ( P N , ν N ) projected on Θ will converge weakly to Lap ( E [ X ] , λ ) projectedon Θ, with this distribution being our pseudo-identified set. Regime 2B.
When lim N →∞ N π N = c > , the resulting downsampled mean weakly converges to therandom variable Λ( c ) = k (cid:80) kj =1 X j , where X j are independent random variables distributed as X and k is Poisson random variable with parameter c. As a result, the randomized estimator θ ( P N , ν N )projected on Θ will converge weakly to the sum of distributions Λ( c ) + Lap (0 , λ ) projected on Θ. Regime 2C.
When lim N →∞ N π N = 0 , the downsampled sample average converges weakly to a masspoint at 0 . As a result, the variance of the additive noise dominates the distribution of the randomizedestimator θ ( P N , ν N ) which when projected on Θ will converge weakly to Lap (0 , λ ) projected on Θ. Regime 3: λ N → + ∞ as N → ∞ . This is the case when privacy guarantees can be strongest.In this regime, however, the Laplace noise increasingly dominates the element of the estimatorcorresponding to the sample average. The randomized differentially private estimator θ ( P N , ν N )diverges as N → ∞ , meaning that according to our convention, we need to consider its projectionon the parameter space Θ . The distribution of the projected estimator will concentrate on theboundary of the parameter space. The resulting weak limit τ is discrete random variable with thesupport { Argmin Θ , Argmax Θ } and taking equal probabilities.Suppose for simplicity that E consists of one converging sequence of ( ε N , T E may contain measurable selections withdistributions corresponding to those regimes. When a data curator releases the information aboutthe regime, this narrows down the class of regular differentially private estimators T N, E and, thus,results in a “smaller” limiting random set T E . One has to keep in mind, however, that a data curatormay release information that will give only partial knowledge about regimes compatible with givendifferential privacy restriction (as discussed above), which once again will lead to an “increase” in T E .In an extreme case when E consists of all converging sequence of ( ε N , ε N ≤ ¯ ε , we have the largest T N, E possible that contains measurable selection with distributions collected across all the regimes.In Section 2.2.1, we argued that the selection expectation is not a natural object to study underdifferential privacy. With a limited privacy budget of a given dataset, it is impossible to constructa function of the data that can reliably converge to an expectation of the differentially privateestimator (taken both with respect to the distribution of the data and the random element inducing0differential privacy). For the sake of having a more comprehensive discussion, in the Appendixwe briefly discuss the properties of the selection expectation of the limiting random set T E in thecontext of the example in this section. The very nature of that discussion ultimately ignores thevery important issue of the privacy budget. While our previous discussion considers a general, possibly non-separable form for regular differen-tially private estimator all existing approaches to inducing differential privacy lead to much simpler(approximately) separable estimators. We now further narrow down the class of regular differentiallyprivate estimators to reflect this property.
DEFINITION 8.
We say that regular differentially private estimator θ ( P N , ν N ) is smooth, if thereexists a functional ψ ( · ) and function a N with range on V such that θ ( P N , ν N ) = ψ ( P N ) + a N ( ν N ) + ∆ N , with E (cid:20)(cid:16)(cid:113) log R ( N,κ ) ∆ N (cid:17) (cid:21) → as N → ∞ for all κ > , where ¯ R ( N, κ ) is provided in Definition2. Definition 8 focuses on the regular differentially private estimators that are approximately separablein the way they depend on the data sample and on the added noise to achieve differential privacy.The residual ∆ N is required to be “small” relative to the rate of convergence of the version of theestimator θ ( P , ν N ) that is infused with the “trivial” noise that does not perturb it.We note that for two early mechanisms that were proposed to provide differential privacy: Laplaceand Gaussian mechanisms, Definition 8 applies trivially. In Laplace mechanisms differential privacyis achieved by the addition of the double-exponential noise ν N to the original estimator and in theGaussian mechanism ν N is the additive normal noise. In other words, by construction for both ofthese mechanisms ∆ N ≡ . One interesting example of a popular non-separable mechanism for differential privacy is the expo-nential mechanism developed in [37]. In application to extremum estimators, the mechanism replacesextremum estimator (cid:98) θ that maximizes sample objective function Q ( θ ; P N ) over θ ∈ Θ with a drawfrom quasi-posterior distribution implied by Q ( · ; P N ) . This class of estimators is directly related torandomized estimators developed in [11]. [11] consider the case where the population analog of theobjective function Q ( θ ; P N ) satisfies the information matrix equality. To form the estimator theypropose to consider a prior distribution π ( · ) over θ and a quasi-likelihood function exp ( Q ( θ ; P N )) (sothat the original objective function is quasi-log-likelihood function). Then the estimator is a meanof quasi-posterior distribution ∝ exp ( Q ( θ ; P N )) π ( θ ) which consistently estimates the maximizerof the population objective function regardless of the shape of the prior distribution π ( · ) (undermild regularity conditions). Moreover, the variance of this quasi-posterior distribution accuratelyestimates the asymptotic variance of the original extremum estimator. A significant advantage of1this estimator over the original extremum estimator (cid:98) θ is that it does not require maximization ofpotentially non-smooth or hard to optimize function Q ( θ ; P N ) . In the follow-up work in [29] for the cases where Q ( θ ; P N ) may be steep in the vicinity of themaximum, which may lead to slow convergence of the simulations required to sample from the quasi-posterior, it is proposed to scale the exponent in the pseudo-likelihood function as exp ( λ Q ( θ ; P N ))using a constant λ which is selected based on the speed of mixing of the simulated Markov Chain(produced using the new pseudo-posterior). The corresponding posterior mean remains a consistentestimator for the maximizer of the population objective function while its asymptotic variance canbe estimated by scaling the variance of the quasi-posterior using λ. The exponential mechanism for differential privacy considered in [37] is a simple implementation ofthe idea in [11]: the estimator is a single draw from the quasi-posterior ∝ exp( λ Q ( θ ; P N )) π ( θ ) . The resulting estimator turns out to be ( λ, ∆ Q , Q = sup θ ∈ Θ , P N , P (cid:48) N | Q ( θ ; P (cid:48) N ) − Q ( θ ; P N ) | is the global sensitivity of the objective function Q ( θ ; P N ) evaluated over all empirical distributions P (cid:48) N that are different from P N in any one single support point.[11] and later [29] focus on the cases where Q ( θ ; P N ) is stochastically equicontinuous and the quasi-posterior is asymptotically equivalent to ∝ exp (cid:18) − λ ( θ − (cid:98) θ ) (cid:48) H ( θ − (cid:98) θ ) + o p ( (cid:107) θ − (cid:98) θ (cid:107) ) (cid:19) , where H is the Hessian of the population objective function. This means that a single draw fromthis quasi-posterior, corresponding to the exponential mechanism for differential privacy can berepresented as ˜ θ = (cid:98) θ + λ ξ + o p (1) , where ξ is a multivariate normal random vector with mean zero and covariance matrix H − . Theextremum estimator (cid:98) θ is only depends on the data distribution P N and is not affected by the noise.Therefore, the exponential mechanism is smooth in the sense of Definition 8.We now present a simple lemma that outlines the additive representation of smooth differentiallyprivate estimators which we will use in our applications. LEMMA 3.
Consider additive randomized estimator θ ∗ ( P N , ν N ) = ψ ( P N ) + a N ( ν N ) , where • For each κ > , lim N →∞ P ( | ψ ( P N ) − θ | > κ ) = 0 • a N (0) = 0 for all N. Then estimator θ ∗ ( P N , ν N ) is regular in the sense of Definition 2. Moreover for any convergingsequence ( ε N , δ N ) and any smooth ( ε N , δ N ) - differentially private estimator θ ( P N , ν N ) (i.e. as inDefinition 8) there exists an asymptotically equivalent regular estimator θ ∗ ( P N , ν N ) , T E . We now move on to analyzing performance of smooth differentially private estimators in someimportant econometric models.
Regression discontinuity design is an important empirical tool for estimation of treatment effects ina variety of disciplines. The literature on RDD goes back to the work by [48]. A lot of importanttheoretical and empirical work on RDD has emerged in Economics the last two decades, with toomany papers to list here. For a general review of this literature, see [19], [22], [31], [10].In the usual setting for the RD design, the object of interest is the causal effect of binary treatmenton the outcome and units are either exposed or not exposed to a treatment. The effect of thetreatment can be heterogenous across units but we will focus on the case of when this effect ishomogeneous as our main goal is to highlight the loss of identification of of the treatment effect inthese models when estimator is subject to differential privacy guarantees.Following the tradition of the treatment effect literature, we let Y and Y denote the pair of potentialoutcomes (without the treatment with exposure to the treatment, respectively) and the actualtreatment denoted by Y and defined as Y = W · Y + (1 − W ) · Y , where W is the treatment indicator. The goal is to evaluate the average treatment effect (ATE) ofthe treatment. The observables are ( W, Y, X ), where X is a pre-treatment covariate (the so-called forcing or running variable).We first give a review of two main designs used in this literature. We then formulate conditionsunder which differentially private mechanism applied to traditional RDD methods leads to the lackof identification of the treatment effect. We then show that these situations of non-identifiabilitywill be generic due to the global sensitivity of RDD estimators being bounded away from zero evenas the sample size increases. These findings no doubt will be of interest to researchers as usuallyRD design is considered to be one of the most credible identification strategies for causal inferenceand it loses this powerful feature under differential privacy.3 In the sharp design there is a deterministic relation between the running variable and the treatmentindicator: W = 1( X ≥ c ) , and the average causal effect β is given by the discontinuity in the conditional expectation of theoutcome given the covariate effect of the treatment: β = lim X ↓ c E [ W | X ] − lim X ↑ c E [ W | X ] . Even though sometimes researchers rely on parametric methods by estimating, for instance, thelinear model Y = α + γW + β · W + δ · DW + ε (or analogous models with more polynomial terms), these approaches may work poorly in practicedue to their reliance on a functional form.The state-of-the art methods are local and rely on learning β from a small neighborhood of c , the sizeof which becomes increasingly smaller as the sample size increases. In other words, these approachesrely on employing only observations in a small neighborhood ( c − h, c + h ) of c , where the size of theneighborhood is described by bandwidth h = h ( N ) that depends on sample size N , where h ( N ) → N → ∞ . Such estimators can be roughly classified into two categories: nonparametric regressionat the boundary and local linear regression . Nonparametric regression at the boundary
This method selects a kernel K ( · ) and takes theestimator ˆ τ S,NR = ˆ τ r ( c ) − ˆ τ l ( c ) , (3.1)where ˆ τ r ( c ) = (cid:80) X i ≥ c Y i · K (cid:0) X i − ch (cid:1)(cid:80) X i ≥ c K (cid:0) X i − ch (cid:1) , ˆ τ l ( c ) = (cid:80) X i This method conducts two optimization problems by fitting linear re-gression functions to the observations within an h -neighborhood on either side of the discontinuity4point: ( (cid:98) α L , (cid:98) β L ) = arg min α L ,β L (cid:88) i : X i 0) ( X i − c ), 1 ( X i − c ≥ 0) ( X i − c ) andendogenous W i , while using the indicator 1( X i ≥ c ) as the excluded instrument. This estimatorcan, of course, be easily generalized to include more polynomial terms in each estimation. In thedefinition of (cid:98) α y,L , (cid:98) α y,R and (cid:98) α w,L , (cid:98) α w,R we, for simplicity, used the uniform kernel. However, onecould use other kernels. In this section, we establish general results that connect the asymptotic behavior of the globalsensitivity with the inconsistency of regular differentially private estimators, allowing us, in lightof our notion of identification in Section 2, to make conclusions about the non-identifiability of theparameter of interest under differentially private mechanisms.We consider smooth estimators as given in Definition 8. The smoothness property allows us toessentially consider estimators with an additive mechanism noise ξ N : (cid:98) θ = ψ ( P N ) + ξ N (3.4)(as discussed in the introduction, ν N in the Definitions 2 and 8, among others, play the role of the“seed” and, thus, the actual independent from the data additive noise ξ N is a transformation of ν N ).For this estimator to satisfy regularity requirements in Definition 2, it has to, among other things,be consistent in the absence of any mechanism noise, thus giving the identification of the parameterin the limit of statistical experiments. This immediately leads us to the condition that ψ ( P N ) p → θ , (3.5)where θ denotes the true parameter value.Suppose the family from which the distribution of ξ N is drawn is described by the density f N ; σ ,where σ denotes the variance (the actual value of σ in practice depend on the sample size). Thenotation f N ; σ is not meant to say that the distributional family for the additive noise is fullydescribed by the variance parameter. This parameter is introduced explicitly in the notation as6usually the situations when differentially private mechanisms do not prevent the identification of θ in the limit are characterized the behavior of this parameter as the sample size increases. Eventhough for now we consider just one distributional family, one has to keep in mind that potentiallyseveral different distributions could be used, in which one has to consider all of them. To keep thisexposition simple, we will focus on one family f σ , as this already will give us a rather comprehensiveanalysis.For any ∆ > σ and a < b , define D N (∆ , σ , a, b ) ≡ sup z ∈ [ a,b ] | log f N ; σ ( z ) − log f N ; σ ( z + ∆) | , (3.6)which is the discrepancy between the logarithms of two densities with a fixed variance, from whichone is calculated with a shifts ∆, on some interval [ a, b ]. As can be seen from the Definition 1 ofdifferential privacy, the behaviour of this object across different ∆ and σ is directly related to theparameters of the differential privacy. The properties of a density imply that if [ a, b ] is large enough,then for a fixed sample size N we have D N (∆ , σ , a, b ) → + ∞ as σ → , ∆ (cid:54) = 0 . (3.7)In the majority of applications the family of distributions f N ; σ is not indexed by N and the changein the distribution of the estimators with the sample size may be driven by different variances. Infact, this is the case for all commonly used mechanisms – Laplace, Gaussian, exponential mechanismsand their variations. To make our discussion more general, we can allow for different distributionalfamilies across different N , in which case we will require the following Algorithm condition 1 (AC1). For any ∆ , σ and a < b sup N D N (∆ , σ , a, b ) ≤ D (∆ , σ , a, b ) (3.8) D (∆ , σ , a, b ) → + ∞ as σ → , ∆ (cid:54) = 0 . (3.9)As is clear from the discussion later, in the differential privacy implementation ∆ will be associatedwith with the global sensitivity of ψ ( P N ). Condition AC1 tells us that if the global sensitivityremains bounded away from 0 as the sample size increases, then the diminishing variance of thenoise and ( (cid:15) N , δ N ), (cid:15) N ≤ ¯ (cid:15) , δ N ≤ ¯ δ differential privacy guarantees are incompatible with each other.This will allow us to immediately draw conclusions about the situation of non-identifiability in thelimit of statistical experiments of the parameter of interest.We give another algorithmic condition on the family of the distribution of the noise variable, whichholds generally for differentially private mechanisms and which also help to establish further resultsrelated to the identifiability (or lack of such) of the parameter. For very popular mean-zero Laplace and Gaussian mechanisms, the distribution of ξ N is fully characterized bythe variance parameter. Algorithm condition 2 (AC2). For any ∆ N → as N → ∞ , it is possible to indicate σ N → and [ a N , b N ] such that H ([ a N , b N ] , R ) → and D (∆ N , σ N , a N , b N ) → , (3.10) where D (∆ , σ , a, b ) is as defined in (3.10). To give an example, consider the Laplace mechanism (discussed in Example 1) and note thatlog f λ ( z + ∆) = − | z +∆ | λ , and thus, D (∆ , σ , a, b ) ≤ | ∆ | λ , which in particular implies that we can even take a = −∞ and b = + ∞ in AC2. We can see thatthis case trivially satisfies AC2 and, of course, condition AC1 if one chooses the Laplace mechanismfor any N .It is exactly condition AC2 that would give a hope for the identifiability of the parameter of interestin cases when global sensitivity of ψ ( P N ) goes to 0. Indeed, the convergence ∆ N → σ N → a N , b N ] convergingto the whole real line at the right rates (which clearly depend on the rate of a decreasing globalsensitivity) in such a way that D ( µ ,N , µ ,N , σ N , a N , b N ) remains bounded by a small ε N → − δ N . This will ensure that the differentially privacy criteria will besatisfied for some sequences (cid:15) N , δ N ) converging to (0 , 0) while delivering consistent differentiallyprivate estimator. This is given in Proposition 1 below. Proposition 1. Consider a smooth ( (cid:15) N , δ N ) -differentially private estimator – without a loss ofgenerality represented as (3.4), – and suppose that in the absence of the mechanism noise thisestimator (denotes as ψ ( P N ) ) is consistent, i.e. (3.5) holds. Suppose that the global sensitivity of ψ ( P N ) denoted as G ( N ) converges to 0 with the sample size and the mean on ξ N converges to 0. IfAC2 holds, then the differentially private estimator is consistent if (cid:15) N and δ N both converging to 0have slow enough rates. Note that under the conditions of Proposition 1 implies that an ( (cid:15), δ )-differentially private estimatoris consistent for fixed ( (cid:15), δ ) as well since these requirements are weaker than requiring that ( (cid:15) N , δ N )converges to 0. The result of Proposition 1 gives a hope that the parameter of interest can beidentified in the limit of experiments with a suitable choice of the set E of sequences ( (cid:15) N , δ N ).Our next step is to establish that a generic ( ε N , δ N )-differentially private algorithm will give aninconsistent estimator if the global sensitivity remains bounded away from zero as the sample sizeincreases even if the mean of the mechanism noise ξ N converges to 0.8 THEOREM 4. Consider a smooth ( (cid:15) N , δ N ) -differentially private estimator (cid:98) θ – without a lossof generality represented as (3.4), – and suppose that in the absence of the mechanism noise thisestimator (denoted as ψ ( P N ) ) is consistent, i.e. (3.5) holds. Suppose that AC1 holds and the globalsensitivity of ψ ( P N ) denoted as G ( N ) does not converge to 0 with the sample size whereas the meanon ξ N converges to 0.Then this ( (cid:15) N , δ N ) -differentially private estimator is inconsistent even if (cid:15) N does not change with N . We end this section by giving sufficient condition for when the parameter of interest is not identifiedfrom the differential privacy estimation in the limit of statistical experiments. COROLLARY 1. Consider a class of smooth ( (cid:15) N , δ N ) -differentially private estimators (cid:98) θ – withouta loss of generality represented as (3.4), – and suppose that in the absence of the mechanism noisethese estimators (correspond to ψ ( P N ) ) are consistent, i.e. (3.5) holds. Suppose that AC1 holds andthe global sensitivity of ψ ( P N ) denoted as G ( N ) does not converge to 0 with the sample size whereasthe mean on ξ N converges to 0.For any join-semilattice E of sequences of ( (cid:15) N , δ N ) , with (cid:15) N ≤ ¯ (cid:15) , δ N ≤ ¯ δ , parameter θ is notidentified in the limit of experiments. This corollary directly follows from Theorem 4. In this section, we use results of Section 3.2 to analyze the identifiability of the average treatmenteffect of differentially private regression discontinuity design estimators. Even though this sectionwill be focused on the issue of identifiability, there are other important issues one would want toexplore in the RDD framework. One of these issues is the question of how differential privacyrequirements would affect the visual analysis of the data, which is on e of the fundamental stepsin the practice of RDD. Another issue is the the question of the credibility of specification tests(continuity of the density of the running variable at the cut-off, placebo tests with pre-treatmentcovariates), under differential privacy. Even though our main focus is on identifiability of the averagetreatment effect under differential privacy, we do discuss these other related issues in Section 3.4albeit in less detail.Even though sometimes there are some parametric RDD estimation methods which are global innature, the state-of-the-art RDD techniques are local in nature and employ some elements of non-parametric methods. These latter methods focus on a neighborhood around the switch point with thesize of this neighborhood being determined by a kernel K ( · ) and a respective bandwidth h = h ( N ).We will suppose that h ( N ) is chosen by a certain differential privacy algorithm according to somerule in such a way that h ( N ) = o (1) as N → ∞ . Then the expected number of observations froma sample of size N in a right-hand side neighborhood of c is N · P r ( c ≤ X < c + h ( N )), and in the9left-hand side neighborhood is N · P r ( c − h ( N ) < X < c ). There are, of course, some well knownapproaches for selecting a bandwidth, such as [21], [9], among others. Our analysis will apply togeneral bandwidth choices subject. We start with the analysis of differentially private RDD estimators for the sharp design. We beginour series of formal results with establishing the results on the global sensitivity of nonparamet-ric regression at the boundary and local linear (polynomial) estimation. Propositions 2-4 look atnonparametric regression at the boundary and various properties of kernels that affect the globalsensitivity result. Proposition 5 looks at the local linear estimation. In light of the results in Section3.2, the knowledge of the asymptotic behavior of the global sensitivity of these estimators will allowus to analyze wether smooth differentially private approaches are compatible with the identifiabilityof the ATE of interest. As we show, the exact results on the global sensitivity even depend on thetype of kernel used in the above-mentioned estimation techniques.Before formulate to Proposition 2, we formulate what we mean by kernels with a bounded support. DEFINITION 9. We say that the kernel function K ( · ) : R → R + has a bounded support if thereis a value u > such that K ( u ) = 0 when | u | > u . If this condition is not satisfied, then we willsay that the kernel has an unbounded support. Uniform, Epanechnikov, triangular kernels are examples of kernel functions with bounded supports.Gaussian and logistic kernels are examples of kernel functions with unbounded supports. Even ifone considers kernels with a bounded support, we will see that it will make a difference whether thekernel is continuous (like the triangular kernel) or has discontinuities (like the uniform kernel). DEFINITION 10. For a given kernel function K ( · ) with a bounded support and a given bandwidth h , we define a K - h -neighborhood to the right of c as a set [ c, c + ∆ K,r ( h )) , where ∆ K,r ( h ) > , suchthat K ( u − ch ) > for u > c if and only if u ∈ [ c, c + ∆ r ( h )) . In other words, this is the set of pointto the right of c that will be used in the nonparametric regression.Analogously, we define a K - h -neighborhood to the left of c as a set ( c − ∆ K,l ( h ) , c ) , where ∆ K,l ( h ) > ,such that K ( u − ch ) > for u < c if and only if u ∈ ( c − ∆ K,l ( h ) , c ) . A differentially private algorithm takes the support of the observed variables as given and usuallydepends on this support, and thus, uses the supports of Y in both right-hand side and left-hand side K - h -neighborhoods as inputs. For a kernel with a bounded support, these supports can be denotedas Y r ( h ) and Y l ( h ), respectively, and may generally depend on h . However, they are naturallyapproximated by Y r = lim h ↓ Y r ( h ) and Y l = lim h ↓ Y l ( h ) (3.11)that no longer depend on the bandwidth choice (these limits are well defined as {Y r ( h ) } and {Y l ( h ) } are sequences of monotonically decreasing events.)0We will suppose that Y r and Y l are convex non-singleton sets. As further notations, we will use Y r = sup Y r , Y r = inf Y r ,Y l = sup Y l , Y l = inf Y l . We now present a series of results on the global sensitivity. Proposition 2. Consider a nonparametric regression at the boundary estimator that uses a con-tinuous kernel with a bounded support .Suppose that for a data-driven choice of bandwidth h = h ( N ) , for any sample size N it is possible tohave realizations of the data { ( Y i , W i , X i ) } Ni =1 that will deliver the minimum number of observations m r ( N ) ≥ in the K - h -neighborhood to the right of c and the minimum number of observations m l ( N ) ≥ in the K - h -neighborhood to the left of c .(a) If the supports Y r and Y l are bounded, then the global sensitivity of the nonparametric regres-sion at the boundary estimator is Y r − Y r + Y l − Y l and, hence, it does not depend on the sample size.(b) If at least one of the supports Y r and Y l is unbounded, the global sensitivity of of the nonpara-metric regression at the boundary estimator is + ∞ . We next consider the case of a kernel function with a bounded support and discontinuities at thesupport boundaries − u and u with the main example here being the uniform kernel. We define K ≡ inf u ∈ ( − u ,u ) K ( u ) . (3.12)The discontinuities of K ( · ) at − u and u imply that K > 0. For expositional simplicity, in theformulation of Proposition 3 we only indicate the rate of the global sensitivity. However, the proofof this proposition in the Appendix gives the exact expression for this sensitivity. Proposition 3. Consider a nonparametric regression at the boundary that uses a kernel with abounded support and K > , where K is as defined in 3.12.Suppose that for a data-driven choice of bandwidth h , for any sample size N it is somehow possibleto guarantee the minimum number of observations m r ( N ) ≥ in the K - h -neighborhood to the rightof c and the minimum number of observations m l ( N ) ≥ in the K - h -neighborhood to the left of c .(a) If the supports Y r and Y l are bounded are bounded, then the global sensitivity of the nonpara-metric regression at the boundary estimator is proportional to { m r ( N ) ,m l ( N ) } . (b) If at least one of the supports Y r or Y l is unbounded, the global sensitivity of of the nonpara-metric regression at the boundary estimator is + ∞ . Part (a) in Proposition 3 seemingly gives some hope of achieving the situation when the globalsensitivity may be going to zero as N → ∞ if it can be ensured that min { m r ( N ) , m l ( N ) } → ∞ .This hope, however, is a false one as the situation of being able to guarantee a minimal number(growing to ∞ ) of observations in each neighborhood for any sample { X i } Ni =1 with a given supportof X is a rather hypothetical scenario as the probabilities that the number of observations in theright- and left-hand side neighborhoods is strictly less than m r ( N ) and m l ( N ), respectively: m r ( N ) (cid:88) k =0 (cid:18) Nk (cid:19) F X ( c ) N − k (1 − F X ( c )) k is the probability of fewer than m r ( N ) observations to the right of c, m l ( N ) (cid:88) k =0 (cid:18) Nk (cid:19) F X ( c ) k (1 − F X ( c )) N − k is the probability of fewer than m l ( N ) observations to the left of c. Both these probabilities are clearly strictly positive when c is an interior point of the support of X .The non-stochastic nature of the global sensitivity concept effectively leads to the situation of theglobal sensitivity always being bounded away from 0 as N → ∞ in the case of the kernel with abounded support and K > ±∞ (such as the Gaussian kernel). In this case we willtake that a differentially private algorithm uses the supports of Y | X ≥ c and Y | X < c since thekernel weights are technically never equal to zero. Let’s denote these supports as Y rall and Y lall ,respectively. Also denote Y rall = sup Y rall , Y rall = inf Y rall ,Y lall = sup Y lall , Y lall = inf Y lall . Since the kernel approaches zero arbitrarily closely, the global sensitivity results for this case willbe similar to those in Proposition 4 where the infimum of the values of the kernel on its supportbounded support is 0. Proposition 4. Consider a nonparametric regression at the boundary that uses a kernel functionwith a unbounded support. Potentially differentially private algorithms may use more a complicated support for Y | X that could depend on X . This is not going to change our qualitative findings on the global sensitivity being bounded away from 0, eventhough the exact numerical values for global sensitivities may be different. (a) If the supports Y rall and Y lall are bounded, then the global sensitivity of a nonparametric regres-sion at the boundary estimator is Y rall − Y rall + Y lall − Y lall – that is, it does not depend on the sample size.(b) If at least one of the supports of Y rall and Y lall is unbounded, then the global sensitivity of anonparametric regression at the boundary estimator is + ∞ . Thus, the results of Propositions 2-4 and the discussion following Proposition 3 lead us to con-clude that the global sensitivity of a nonparametric at boundary estimator is always bounded awayfrom zero. The implications of this for asymptotic properties of differentially private estimatorsis given in Theorem 4 and Corollary 1 allow us to conclude that the ATE is not identified in thelimit of statistical experiments of smooth differentially private nonparametric regression at boundaryestimators .Our next step is to analyze whether things get better with a local linear (and more generally,polynomial) estimator as defined in (3.2). Proposition 5 below establishes that this is not the caseand the global sensitivity of this estimator is in fact infinite even if the support of the outcomevariable is bounded. Proposition 5. Consider the local linear estimator as defined in (3.2). The global sensitivity ofthis estimator is bounded away from zero as N → ∞ . Theorem 4 and Corollary 1 allow us to conclude that the ATE is not identified in the limit ofstatistical experiments of smooth differentially private local linear estimators .Our findings for the sharp design are summarized in Theorem 5. THEOREM 5. In the sharp regression discontinuity design case, any smooth ( ε N , δ N ) -differentiallyprivate nonparametric regression at the boundary estimator and any smooth ( ε N , δ N ) -differentiallyprivate local linear estimator is inconsistent for any bounded sequences of positive { ε N } and non-negative { δ N } . If we add other covariates to our estimation or use more terms in the local polynomial estimation, theconclusion of Theorem 5 remains exactly the same, even though in Propositions 2-5 quantitativelythe global sensitivities of the estimators could be different. Indeed, from the proofs in the Appendixone could easily see that the global sensitivity for the local polynomial estimator would once againrely on the formula for the OLS estimator for the intercept whereas in the nonparametric regressionat the boundary estimator the lower bounds on global sensitivities could be obtained in the sameway as the proofs in Propositions 2-4.3 In the case of the fuzzy design the results for the global sensitivities of the estimators are analogousto the sharp design case. Naturally, we are also able the conclude that differentially pirate versionsof traditional estimators in this framework are inconsistent.Indeed, as discussed in Section 3.1.2, the estimator (3.1) may be used in the fuzzy design case aswell, with the asymptotic properties analogous to the sharp design scenario. The global sensitivityof this estimator remains the same and, therefore, its properties are described by Propositions 2-4.As for the local linear estimator, the result is the same, once again, even though the proof is slightlymore elaborate than in the sharp design case. For the sake of completeness, we establish this resultformally in Proposition 6 below. Proposition 6. Consider the local linear estimator as defined in (3.3). The global sensitivity ofthis estimator is bounded away from 0 as N → ∞ . The proof of Proposition 6 in the Appendix for simplicity does not employ other covariates. It isworth mentioning, however, that the situation with other covariates (let’s call them S i with thesupport S ) may be even worse as in the case we may have thatinf S i S (cid:12)(cid:12)(cid:12)(cid:12) lim x ↑ c P ( W i = 1 | X i = x ) (cid:54) = lim x ↓ c P ( W i = 1 | X i = x ) (cid:12)(cid:12)(cid:12)(cid:12) = 0 . If this situation occurs, then even in the nonparametric regression at the boundary type of estimatorsthe global sensitivities may be converging to ∞ as N → ∞ , giving even more severe implicationsfor the asymptotic properties of smooth differentially private estimators.In the definition of the local linear estimator in Section 3.1.2 we for simplicity used the uniformkernel. However, the result of Proposition 6 remains true of other kernels are used.Then, relying on the results in Propositions 2-5, 6 and Theorem 4, we can immediately obtain theresult of Theorem 6 below. THEOREM 6. In fuzzy regression discontinuity design, any smooth ( ε N , δ N ) -differentially privatenonparametric regression at the boundary estimator and any smooth ( ε N , δ N ) -differentially privatelocal linear estimator is inconsistent for any bounded sequences of positive { ε N } and non-negative { δ N } . To summarize the results of Theorems 5 and 6 in Sections 3.3.1 and 3.3.2, in the sharp and fuzzyregression discontinuity design the requirements of ( ε, δ ) -differential privacy either with fixed ε, δ orwith these parameters decreasing with the sample size are incompatible with the consistent estimationof the average treatment effect. Therefore, given our notion of identifiability in Section 2, they arealso incompatible with the identifiability of the average treatment effect in the limit of statisticalexperiments. So far we mostly have focused on the non-identifiability of the average treatment effect under smoothdifferentially private mechanisms. However, every regression discontinuity design analysis is tradi-tionally accompanied by specification testing. This includes checking for the possibility of otherchanges at the cutoff value c of the forcing variable X i and also checking for the manipulation of X i .The first type of checks include testing the null hypothesis of a zero average effect on pseudo outcomesknown not to be affected by the treatment. The outcomes of such placebo tests would also have tobe ( ε N , δ N )-differentially private. A traditional differentially private literature approach in this casewould add noise to the true test statistic and then adjust the asymptotic distribution to computecorrect p -values (see e.g. [49]). It is not surprising that, once again, this brings a range of issuesin the context of placebo tests in regression discontinuity designs. Indeed, let (cid:98) τ pl denote the trueregression discontinuity design estimator in the analysis of treatment of pseudo-outcomes, and let τ pl denote the true parameter. The testing of the null H : τ pl = 0 is based on the t -ratio (cid:98) τ pl se ( (cid:98) τ pl ) .Without giving formal results on this, we nevertheless want to point out that utilizing our techniquesin the proofs of Propositions 2-5 and 6, we can establish that the global sensitivity of this ratio eitherincreases to ∞ with the sample size or is already ∞ in a finite sample. This implies that the noiseadded to this ratio would asymptotically dominate this ratio. Even if the critical values are correctedto account for the added noise, it is clear that the conclusion of the ( ε N , δ N )-differentially privatetest based on such a procedure are not credible and, in particular, result in a lower power of thetest (in the limit, the power of this test is trivial). To make our point more transparent, let us focuson a stylized version of the test when the asymptotic variance of √ N h ( (cid:98) τ pl − τ pl ) is known. We willdenote is as Avar ( (cid:98) τ pl ). In this stylized version we want to create a differentially private version ofthe test statistic t N = √ Nh (cid:98) τ pl √ Avar ( (cid:98) τ pl ) .As we have shown in Sections 3.3.1 and 3.3.2, the global sensitivity of (cid:98) τ pl may be constant andbounded away from zero or may even be infinite for every N (in situations when we add othercovariates, it may be increasing to ∞ wth the sample size). Given that h = h ( N ) is chosen in a wayto give N h → ∞ , this will imply immediately that the global sensitivity of t N is either increasingto infinity with the sample size or is infinite. This means that the variance of the independent noiseadded in the differentially private algorithm will increasingly dominate the asymptotically constantvariance of t N . Instead of using the standard normal distribution critical values, one would take thecritical values from the distribution that suitably combines the standard normal distribution andthe distribution of noise. However, as N increases, the testing essentially becomes inference aboutthe mechanism noise, leading to a decreasing power of the test, which asymptotically diminishes tothe trivial power. There are also approaches to hypotheses testing in the differential privacy literature that are based on addingnoise to the inputs – see e.g. [23]. They. may, however, be considered less reliable than the approaches based on theoutput perturbation. In contrast, in a simpler case of the estimation of mean, as in Example 1, the global sensitivity of t -ratio wouldbe finite and bounded away from zero as N increases. Upon the correction of the critical values, in this situation the 5A second type of tests are for the manipulation of the forcing variable. We will illustrate issuesassociated with ( ε, δ )-differentially private versions of these tests by considering the test of thecontinuity of the density at cutoff by [36]. The test is based on the ratio (cid:98) θ (cid:98) σ θ , where (cid:98) θ = ln (cid:98) f + − ln (cid:98) f − , (cid:98) σ θ = (cid:114) N h (cid:18) (cid:98) f + + 1 (cid:98) f − (cid:19) , where h is the bandwidth and (cid:98) f + = (cid:88) X i >c K (cid:18) X i − ch (cid:19) S + N, − S + N, ( X i − c ) S + N, S + N, − (cid:16) S + N, (cid:17) Y i = (cid:88) X i >c K (cid:0) X i − ch (cid:1) M + N S + N, M + N − S + N, M + N ( X i − c ) S + N, M + N S + N, M + N − (cid:18) S + N, M + N (cid:19) Y i , (3.13) (cid:98) f − = (cid:88) X i Scenario 1 .The forcing variable X has a uniform distribution on [ − , X i < X i > m ( x ) = (cid:40) . 35 + 1 . x + 7 . x + 20 . x + 21 . x + 7 . x , if x < , . 65 + 0 . x − x + 7 . x − . x + 3 . x , if x ≥ , and the error u having a symmetric uniform distribution on [ − . · √ , . · √ . 002 for any sample size (for a conservative lower bound of4 · . · √ , this would correspond to (cid:15) N being 10times of this δ N = 0). Panel 2 in Figure 1 depicts the paths of the estimator when the mechanismnoise variance equal to 0 . 002 for any sample size (for a conservative lower bound of 4 · . · √ (cid:15) N being equal to 10 times ofthis and δ N = 0). Panel 3 in Figure 1 shows the paths of the estimator when the mechanism noisevariance equal to 2 for any sample size (for a conservative lower bound of 4 · . · √ (cid:15) N being equal to this and δ N = 0).Finally, Panel 4 in Figure 1 illustrates the paths of the estimator when the mechanism noise varianceequal to 200 for any sample size (for a conservative lower bound of 4 · . · √ See our discussion in the proof of Proposition 5 in the Appendix. N = 500 1 1 0.6846 0.3706 0.0666 0.0286 0.056 0.023 N = 2000 1 1 0.7252 0.3880 0.0664 0.0290 0.0666 0.0272 N = 5000 1 1 0.7260 0.3844 0.0694 0.0284 0.0594 0.0250Table 1: Rejection rates in 5000 simulations of the false null hypothesis H : τ = 0 in Scenario 1. N denotesthe number of observations. sensitivity of the estimator, this would correspond to (cid:15) N being equal to the one-tenth of this and δ N = 0). Note the different range of the values on the vertical axis in these panels.In Table 1 we focus on the rejection of the null H : τ = 0 against H : τ (cid:54) = 0 when a researcheruses differentially private estimates and their standard errors (note that this is different from ourdiscussion of the differentially private release of t -tests in Section 3.4). Scenario 2 . The only difference here from Scenario 1 is that u is normally distributed with mean zeroand variance 0 . (cid:15) N , (cid:15) N ≤ ¯ (cid:15) , in the Laplace differentially private mechanism the noise has to be drawn from thedistribution with an infinite variance. Figure 2 shows that paths of differentially private local linearestimators for when the variance is equal to 10 .Here we could have conducted similar power analysis based on a large number of simulations, likein Scenario 1, and we would have obtained that power of the test H : τ = 0 vs H : τ (cid:54) = 0 based ondifferentially private estimates is very low. A central problem in evaluation studies is that potential outcomes that program participants wouldhave received in the absence of the program is not observed. Letting D i denote a binary variabletaking the value 1 if treatment was given to agent i , and 0 otherwise, and letting Y i , Y i denotepotential outcome variables, we refer to Y i − Y i as the treatment effect for the i ’th individual. Aparameter of interest for identification and estimation is the average treatment effect , defined as: θ = E [ Y i − Y i ] (4.1)As in the previous section our notation will be to denote realizations of random variables by lower9case letters and the random variables themselves by capital letters. One identification strategy for θ was proposed in [44], under the following assumption: ASSUMPTION 2 (ATE under Conditional Independence) . Let the following hold: (i) There exists an observed variable X i s.t. D i ⊥ ( Y i , Y i ) | X i (ii) < P ( D i = 1 | X i ) < ∀ X i See also [18], [20], [4]. The above assumption can be used to identify α as θ = E [ E [ Y | D = 1 , X ] − E [ Y | D = 0 , X ]] . (4.2)The above parameter can be written as: θ = E (cid:20) Y ( D − p ( X )) p ( X )(1 − p ( X )) (cid:21) , (4.3)where p ( X ) = P ( D = 1 | X ) is the propensity score. This parameter is a weighted moment conditionwhere the denominator gets small if the propensity score approaches 0 or 1. Also, identification islost when we remove any region in the support of X (so, fixed trimming will not identify θ above).Consider the general setting of the treatment effect model under unconfoundeness with two potentialcontinuous outcomes Y and Y and treatment D along with the vector of (continuous and discrete)covariates X . We assume that ( Y , Y ) ⊥ D | X . The observed outcome is Y = Y D + Y (1 − D ) . In our setup the propensity score needs to be estimated as a function of X. In the further discussionwithout loss of generality we assume that X is single-dimensional. Our theory will be based on thefollowing structure: ASSUMPTION 3. (i) X has a support X is a closed and continuous (but possible unbounded)set.(ii) ( Y , Y ) | X = x has an absolutely continuous density for each x ∈ X . Moreover the supportof Y k for k = 0 , is bounded.(iii) The propensity score is strictly positive P ( · ) > on its support. We consider the following procedure to implement an estimator for θ . First, a non-parametricestimator is used to estimate the propensity score (cid:98) P ( x ) = N (cid:80) Ni =1 D i K (cid:16) x − X i h N (cid:17) N (cid:80) Ni =1 K (cid:16) x − X i h N (cid:17) , (4.4)0where K ( · ) is a symmetric kernel and h N is the bandwidth. Then the average treatment effect θ isestimated as: (cid:98) θ = ψ ( P N ) ≡ N N (cid:88) i =1 (cid:32) Y i D i (cid:98) P ( X i ) − Y i (1 − D i )1 − (cid:98) P ( X i ) (cid:33) . (4.5)In our analysis we focus on the kernel-based estimator for the propensity score without the loss ofgenerality. One can use a different approach such as the series estimator where the number of termsused to approximate the function would play the role of the tuning parameter equivalent to thebandwidth parameter.We consider the kernel functions K ( · ) with sub-polynomial tail behavior. In particular, we assumethat there exists natural number d > k ≤ d , lim | z |→∞ | z | k K ( | z | ) = 0. This ensuresexistence of moments of kernel-weighted statistics over the distribution of X which is particularlyhelpful when the support of X is unbounded. We note that all “standard” kernel functions such asthe bounded support uniform, quadratic and Epanechnikov kernel as well as the most commonlyused Gaussian kernel satisfy this condition.The bandwidth is required to satisfy h N (cid:29) log NN (see [42]) to ensure uniform convergence of thepropensity score estimator and is typically chosen so that h N = o ( N − / ) to avoid the propagationof the non-parametric bias to the estimator of the average treatment effect.We now consider the impact of diffrential privacy on estimation of θ . As in Section 3 we rely on oursmoothness assuption that allows us to focus on additive mechanisms to induce differential privacy.Also, like in our previous analysis of the regression discontinuity design we start with the analyis ofthe global sensitivity of ψ ( P N ). Proposition 7. Suppose that average treatment effect estimator (4.5) uses propensity score estima-tor (4.4) and kernel function K ( · ) is such that | K ( · ) | ≤ ¯ K and there exists natural number d > such that for all k ≤ d , lim | z |→∞ | z | k K ( | z | ) = 0 . then(i) If the support X is bounded and h N = o ( N − / ) the global sensitivity of functional ψ ( P N ) in(4.5)) is bounded away from zero as N → ∞ . (ii) If the support X is unbounded then the global sensitivity of functional ψ ( P N ) in (4.5)) is + ∞ . This result allows us to formulate the following theorem. THEOREM 7. For estimation of average treatment effect any smooth ( ε N , δ N ) -differentially pri-vate propensity score-weighted estimator is inconsistent for any bounded sequence { ε N } and non-negative { δ N } . As a result, the limiting random set T E contains at least one non-degenerate elementdifferent from { θ } . Differential privacy is a powerful data security concept that precludes a potential adversary fromlinking sensitive data with outside information, inferring data attributes or determining if particularindividual is included in the dataset. The implementation of the differentially private data analysis isbased on consideration of randomized estimator where independent randomness is the key instrumentthat provides the differential privacy guarantee.In this paper we focused on identification of Econometric models under differential privacy. Weconcluded that even with relatively simple models identification in this context requires the conceptsand methods from the random set theory. We consider identification from the perspective of thelimit of statistical experiments where differentially private implementation of the estimator is appliedto the datasets of an increasing size. Identification in this case is the property of the set of weaklimits of such estimators. Under our mild regularity conditions this limiting set is a convex compactrandom set and, thus, it needs to be characterized in probabilistic terms, for instance, using thecontainment functional.We apply our theory to two popular Econometric models: the regression discontinuity design (RDD)and the average treatment effect (ATE). In the RDD settings we consider both sharp and fuzzy de-sign. We show that for both models the random set of weak limits of differentially private estimatorscontains non-degenerate random elements which precludes point identification of the parameters ofinterest. We illustrate this finding in a series of Monte Carlo simulations.Our result, in part, is driven by the structure of the estimators which have to rely on local propertiesof the underlying distribution. This may indicate that under differential privacy a similar behavioris to be expected for other Econometric models that rely on nuisance parameters. Let ω ν be the element of the σ -algebra F ν associated with random element ν N and ω S be the element of the σ -algebra of the subsets of Z n . Since T N is the closure of the set ofmeasurable selections that form a closed and bounded space, then for almost all elements ( ω N , ω S )the set of values θ ( P N ( · ; ω S ) , ν N ( ω ν )) is closed and bounded and, therefore, compact. Thus, randomset T N is compact.To prove convexity, it is enough to consider θ ( P N , ν N ) and θ (cid:48) ( P N , ν N ) that are realizations of tworegular ( (cid:15) N , δ N )-differential private and ( (cid:15) (cid:48) N , δ (cid:48) N )-differential private estimators, respectively, wheresequences of ( (cid:15) N , δ N ) and ( (cid:15) (cid:48) N , δ (cid:48) N ) are in E . Then by union bound their convex combination satisfies(2.2) with the right-hand side bound of at most 2 ¯ R ( n, κ ) . Also, any convex combination τ θ ( S N , ν N )+(1 − τ ) θ (cid:48) ( S N , ν N ) is a realization of the estimator τ θ ( · , · )+(1 − τ ) θ (cid:48) ( · , · ). This estimator is differentiallyprivate for the sequence of (max { (cid:15) N , (cid:15) (cid:48) N } , max { δ N , δ (cid:48) N } ) which belongs to E by our assumption of2 E being a join-semilattice. Finally not that the estimator τ θ ( · , · ) + (1 − τ ) θ (cid:48) ( · , · ) has a weak limitfrom the continuous mapping theorem as it is straightforward to show that ( θ, θ (cid:48) ) T ( · , · ) has a jointweak limit (of course, we would use the fact that θ, θ (cid:48) do not depend on N ). Thus, set T N is convexrandom set. (cid:4) Proof of Lemma 2: Assume, contrary to the statement of the Lemma that ∆ N = θ ( S N , ν N ) − τ p −→ . Then θ ( S N , ν N ) = τ + ∆ N , and because τ is not constant, then conditional on S N and S N +1 ,estimator θ ( S N , ν N ) and θ ( S N +1 , ν N +1 ) cannot be independent. This, in its turn, will contradict theindependence of elements ν N and ν N +1 , which is a fundamental requirement for differential privacy. (cid:4) Proof of Theorem 1. Provided that θ ( · , · ) belong to a compact subset of a separable space in L , for each χ there exists K such that for a finite set of elements { θ (1) ( · , · ) , . . . , θ ( K ) ( · , · ) } , theirconvex hull Θ KN is within χ -Hausdorf distance from set T N, E . Since each θ ( k ) ( · , · ) is a function ofthe same elements ( ω N , ω S ), weak convergence of θ ( k ) ( P N , ν N ) implies joint weak convergence of theset { θ (1) ( · , · ) , . . . , θ ( K ) ( · , · ) } and, therefore, weak convergence of their convex hull. We then choosesequence χ N , which induces sequence K N such that sup A (cid:12)(cid:12)(cid:12) C Θ KKN ( A ) − C T N, E ( A ) (cid:12)(cid:12)(cid:12) is a decreasingfunction of N. Then by Theorem 6.26 in [39] sequence of random sets T N, E converges weakly. (cid:4) Proof of Theorem 3. a) Suppose that for any sequence of ( (cid:15) N , δ N ) from E it holds thatany regular ( (cid:15) N , δ N )-differentially private estimator θ ( P N , ν N ) is such that θ ( P N , ν N ) p → θ . Then T E = { θ } (degenerate distribution at θ ). Then, clearly, for any convex polytope K (cid:51) θ we have C T E ( K ) = 1 ≥ − α for any α ∈ (0 , α ∈ (0 , 1) and any convex polytope K (cid:51) θ C T E ( K ) ≥ − α. Since α can be taken to be arbitrarily close to 0, this means that C T E ( K ) = 1. Since convex polytope K (cid:51) θ can be taken to have arbitrarily small volume, this means that T E = { θ } (degeneratedistribution at θ ). Indeed, take a decreasing sequence K m (cid:51) θ of convex polytopes such that ∩ ∞ m =1 K m = { θ } . By the continuity theorem for monotone sequences of events C T E ( ∩ ∞ m =1 K m ) =lim m →∞ C T E ( K m ) = 1, which immediately implies that T E = { θ } meaning that every θ ( S N , η N )converges weakly to θ , and thus, θ ( S N , η N ) p → θ . (cid:4) θ in Section 2.3 Here we briefly discuss the properties of the selection expectation of the limiting random set T E ofregular differentially private estimators for θ in the context of the example in Section 2.3.Let Θ N, E denote the selection expectation of the random set T N, E of all regular ( (cid:15) N , δ N )-differentiallyprivate estimators for θ with sequences ( (cid:15) N , δ N ) from E . The Hausdorff limit of Θ N, E as N → ∞ is denoted as Θ ∞ , E .3Θ ∞ , E , loosely speaking, contains all limits of expectations of regular differentially private estimators.Ideally, if differentially private estimators are compatible with consistency, for a broad range ofsequences (cid:15) N and δ N converging to zero the set Θ ∞ , E should be a singleton { θ } .Theorem 1.45 in [39] links Θ ∞ , E to the selection expectation of the limit random set T E . Indeed,that theorem immediately implies that under conditions of Theorem 1, ET N, E converges to E T E inthe Hausdorff metric and the Lebesgue measure of ET N, E converges to the Lebesgue measure of E T E as N → ∞ . In other words, weak convergence of a sequence of random sets implies the convergenceof the selection expectation.Going back to the discussion of our example in Section 2.3, we note that even though in that exampledifferential privacy-inducing mechanisms perturb the estimator with random noise symmetric at zero,there is no guarantee that the limiting Θ ∞ , E is a singleton at θ . In If we collect differentially privateestimators across all three regimes in that example, we find that the corresponding limiting set ofselection expectations will include E U ( { Argmin θ ∈ Θ θ, Argmax θ ∈ Θ θ } ) , { E Λ( c ) , c ∈ [0 , + ∞ ) } , , E [ X ] . We note that in this case the target expectation E [ X ] belongs to Θ ∞ , E . At the same time, the setΘ ∞ , E itself is clearly large.We also note that if we exclude the elements of the selection expectation that result from Regime3 where the scale of double exponential noise asymptotically increases, the selection expectation ofour considered family of estimator will be a linear segment in Θ that connects points 0 and E [ X ]since the set { E Λ( c ) , c ∈ [0 , + ∞ ) } is a line in Θ that connects 0 and E [ X ] . Treat G ( N ) as ∆ N in AC2 and choose respective σ N and [ a N , b N ] asin condition AC2. Take the definition of the differentially private estimator and note that for twoestimators ψ ( S N ) and ψ ( S (cid:48) N ) based on two datasets that differ in one observation only, we have P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B ) = P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N ∈ [ a N , b N ]) ++ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) ≤ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N ∈ [ a N , b N ])+ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) ≤ e D ( G ( N ) ,σ N ,a N ,b N ) P ν N ∼ f N ; σ N ( ν N + ψ ( S (cid:48) N ) ∈ B | ν N ∈ [ a N , b N ])+ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) , where B is any subset of R . Thus, if (cid:15) N is greater os equal e D ( G ( N ) ,σ N ,a N ,b N ) (which also givesacceptable rates of convergence for (cid:15) N to 0) and P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) δ N (which also gives acceptable rates of convergence for δ N to 0), then ( (cid:15) N , δ N )-differential privacycriterion is satisfied.4At the same time, because both the mean and the variance of ξ N converge to 0, then ξ N convergesin probability to 0 and, therefore, in light of (3.5) the estimator (cid:98) θ in (3.4) is consistent. (cid:4) Proof of Theorem 4. As we will show, the inconsistency of the estimator (cid:98) θ in (3.4) stems from the fact that the varianceof the mechanism noise ξ N does not go to 0 with the sample size, which in its turn is explained bythe fact that the global sensitivity of ψ ( S N ) does not go to 0 with the sample size.Indeed, suppose that G ( N ) is bounded away from zero as N → ∞ and is also bounded from above.Then for a fixed large enough interval [ a N , b N ], the value of D ( G ( N ) , σ N , a N , b N ) has to be boundedfrom above by ε N . Since G ( N ) is bounded away from 0, from AC1 we have that D ( G ( N ) , σ N , a, b ) → + ∞ if σ N → a, b ]. The definition of D () as a supremum implies that the sameproperty will hold if instead of the fixed interval [ a, b ] we take [ a N , b N ] converging to R . Thisimplies, of course, that σ N has to be bounded away from zero. This, in turn, implies that ξ N doesnot converge in probability to zero even if the mean of ξ N converges to 0. Hence, (cid:98) θ dose not convergein probability to the true parameter value.If G ( N ) = + ∞ , then to guarantee ( ε N , δ N )-differential privacy, one would have to take σ N = + ∞ ,clearly leading to the inconsistency of (cid:98) θ .Note that this inconsistency result applies even to ( (cid:15) N , δ N ) not changing with N . It will also betrue under stronger requirements of differential privacy when both parameters converge to 0. (cid:4) Lemmas 4 and 5 will help to establish results in Propositions 2-5. LEMMA 4. Consider two weighted averages q = T (cid:88) i =1 w i a i + w T +1 a T +1 , where w i = b i (cid:80) T +1 i =1 b i , i = 1 , . . . , T + 1 ,q = T (cid:88) i =1 ˜ w i a i + ˜ w T ˜ a T , where ˜ w i = b i (cid:80) Ti =1 b i + ˜ b T +1 , i = 1 , . . . , T, ˜ w T +1 = ˜ b T +1 (cid:80) Ti =1 b i + ˜ b T +1 , and ≤ c ≤ ( or < ) b i , ˜ b i ≤ c , (6.1) d ≤ a i ≤ d , i = 1 , . . . , T + 1 , and d < d . (6.2) Then(a) if c = 0 and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 , ˜ b T +1 s.t. (6 . , (6 . | q − q | = d − d . (b) if c > and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 , ˜ b T +1 s.t. (6 . , (6 . | q − q | = c ( d − d ) T · c + c . (c) if d = −∞ or d = + ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 , ˜ b T +1 s.t. (6 . , (6 . | q − q | = + ∞ . In cases (a)-(c), max | q − q | can be attained by a positive change as well as by a negative change– that is, there are values of a t ’s, b t ’s and ˜ a T +1 , ˜ b T +1 such that q − q = max | q − q | , and thereare values of a t ’s, b t ’s and ˜ a T +1 , ˜ b T +1 such as q − q = − max | q − q | . Proof of Lemma 4. (a) In this case, we can take • b = . . . = b T ≈ b T +1 = ˜ b T +1 = c ; • a , . . . , a T can be arbitrary values that satisfy (6.2); a T +1 = d , ˜ a T +1 = d .This gives us q − q = d − d . Therefore, we should have max | q − q | ≥ d − d . At the sametime each weighted average q and q has to belong to [ d , d ], which is the range for a i ’s. Therefore,necessarily max | q − q | ≤ d − d . This implies that max | q − q | = d − d . Note that if above wetake a T +1 = d , ˜ a T +1 = d , then q − q = d − d = −| d − d | .(b) In this case, to evaluate the largest change in the weighted average we have to consider extremesituations. The first extreme situation is when q = d and the ( T + 1)-th component in this averagehas the largest weight and changes to the the other extreme d in the new average q .This situation can be described as • b = . . . = b T = c ; b T +1 = ˜ b T +1 = c ; • a , . . . , a T = d ; a T +1 = d , ˜ a T +1 = d .This will give us q − q = c ( d − d ) T · c + c > b t ’s and ˜ b T +1 are the same as above but q = d and the( T + 1)-th component in this average has the largest weight and changes to the the other extreme d in the new average q , we obtain that q − q = − c ( d − d ) T · c + c < 0. These two extreme scenariosgive us exactly the same | q − q | .Thus, max | q − q | = c ( d − d ) T · c + c .6(c) In this case, consider the case when b i , i = 1 , . . . , T +1, and ˜ b T +1 are any values that satisfy (6.1).Suppose d = + ∞ . Let a i , i = 1 , . . . , T + 1, take any finite values while ˜ a T +1 is very (arbitrarily)large. This gives q − q = −∞ and, thus, | q − q | = + ∞ . Therefore, in this case max | q − q | = + ∞ .If, of course, ˜ a T +1 is taking a finite value while a T is very (arbitrarily) large, then q − q = + ∞ .The case of when d is finite but d = −∞ is analyzed analogously. (cid:4) LEMMA 5. Consider two weighted averages q = T (cid:88) i =1 w i a i + w T +1 a T +1 , where w i = b i (cid:80) T +1 j =1 b j , i = 1 , . . . , T + 1 ,q = T (cid:88) i =1 ˜ w i a i , where ˜ w i = b i (cid:80) Tj =1 b j , i = 1 , . . . , T, where b i and a i satisfy conditions (6.1) and (6.2), respectively. Then(a) if c = 0 and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 s.t. (6 . , (6 . | q − q | = d − d . (b) if c > and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 s.t. (6 . , (6 . | q − q | = c ( d − d ) T · c + c . (c) if d = −∞ or d = + ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 s.t. (6 . , (6 . | q − q | = + ∞ . In cases (a)-(c), max | q − q | can be attained by a positive change as well as by a negative change– that is, there are values of a t ’s, b t ’s such that q − q = max | q − q | , and there are values of a t ’s, b t ’s such that q − q = − max | q − q | . Proof of Lemma 5. (a) In this case, we can take • b = . . . = b T ≈ b T +1 = c ; • a = . . . = a T = d ; a T +1 = d .7This will give q ≈ d , q ≈ d and, thus, q − q ≈ d − d > | q − q | ≥ d − d . At the same time each weighted average q and q has to belong to [ d , d ], which is the range for a i ’s. Therefore, necessarily max | q − q | ≤ d − d .This implies that max | q − q | = d − d .(b) In this case, to evaluate the largest change in the weighted average we have to consider extremesituations. An extreme situation is when in q the ( T + 1)-th component (which is later dropped inwhen defining q ) has the largest weight and the value that is maximally different from the valuesof the first T components.The first extreme situation can be described as • b = . . . = b T = c ; b T +1 = c ; • a , . . . , a T = d ; a T +1 = d .This gives us | q − q | = c ( d − d ) T · c + c .The second extreme situation, where b t ’s are the same as above but a , . . . , a T = d and a T +1 = d ,gives us the exactly same value of | q − q | . Thus, max | q − q | = c ( d − d ) T · c + c .(c) In this case, consider the case when b i , i = 1 , . . . , T + 1, and ˜ b T +1 are any values that satisfy(6.1). Also, let a i , i = 1 , . . . , T + 1, take any finite values while ˜ a T +1 is very large (arbitrarily large)in the absolute value. This gives | q − q | = + ∞ . therefore, in this case max | q − q | = + ∞ . (cid:4) Proof of Proposition 2. (a) The global sensitivity of the estimator is calculated by comparing the results of the estimationfor two datasets that differ only in on data point. In order to calculate the global sensitivity, weneed to keep in mind the following things:(i) the new data point can enter a K - h -neighborhood of c (thus, the old data point was outsideof both K - h -neighborhoods of c )(ii) the new data point can fall outside of both K - h -neighborhoods of c (thus, the old data pointwas inside one of K - h -neighborhoods of c )(iii) the new data point remains in the same neighborhood(iv) the new data point can switch neighborhoodsIn order to find global sensitivity, it is enough for us to find maximum absolute changes in theestimate in these four situations and then takes their maximum. Let us consider these four differentsituation listed above. In this proof we use Lemmas 4 and 5 with c = ¯ K , where ¯ K denotes themaximum value the kernel K ( · ), and c = 0 since K ( · ) is continuous and, therefore, K = 0.8(i) Suppose the new data point enters the K - h -neighborhood to the left of c while the old data pointwas outside of both K - h -neighborhoods of c . Then by part (a) of Lemma 5, the maximum absolutechange G L in the estimate in this case is G L = Y l − Y l . Analogously we can consider the casewhen a new data point enters the K - h -neighborhood to the right of c . Then the maximum absolutechange G R in the estimate in this case is G R = Y r − Y r .(ii) In this case we have two situations – in one situation the old data point was in the left K - h -neighborhood and in the other situation the old data point was in the right K - h -neighborhood.In both situations the new data point falls outside of both neighborhoods. In the former case themaximum absolute change in the estimate coincides with G L , and in the latter case the maximumabsolute change in the estimate coincides with G R .(iii) When the observation remains in the left K - h -neighborhood, we apply part (a) of Lemma 4 toobtain that the maximum absolute change G LL in the estimate in this case is G LL = Y l − Y l .When the observation remains in the right K - h -neighborhood, we consider the maximum absolutechange G RR in estimate and analogously to above show that G RR = Y r − Y r .(iv) Suppose an observation moves from the left K - h -neighborhood to the the right K - h -neighborhood.Our estimator of interest is the difference between the weighted means in the right and the left K - h neighborhoods of c . Therefore, the move of the observation from one neighborhood to the otheraffects both parts of the estimator.As we know from part (a) of Lemma 4, the maximum absolute change in the weighted average forthe right-hand side is G R = Y r − Y r and that this degree of change can be attained as a positivechange (increase). Similarly, the maximum absolute change in the weighted average for the left-handside is G LO = Y l − Y l and that this degree of change can be attained as a negative change (decrease).In order to obtain the maximum absolute changes for the difference in weighted means we have tolook at the cases when these two weighted means change in opposite directions, which leads to themaximum change being G LR = Y r − Y r + Y l − Y l .Analogously, we can consider an observation moves from the right K - h -neighborhood to the the left K - h -neighborhood and show that in this case the maximum absolute change is G RL = Y r − Y r + Y l − Y l .To sum up the results of part (a), the global sensitivity is G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } = Y r − Y r + Y l − Y l . (b) Suppose for instance that the support Y r is unbounded. Then part (c) of Lemmas 4 and 5 willimmediately give us that for G R , G R , G RL , G RR defined above, G R = G R = G RL = G RR = + ∞ , which implies this part of the proposition. Other cases in this part of the proposition lead the sameconclusion.9 (cid:4) Proof of Proposition 3. Just like in Proposition 2, the global sensitivity is G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } , where G L , G R , G L , G R , G LL , G RR , G LR , G RL are defined as in the proof of Proposition 2.Once again, we will rely on the results in Lemmas 4 and 5 but this time in part (b) in both lemmasas we will take c = K and c = K , where K is the maximum value of kernel K .(a) Applying results of part (b) of Lemma 5 and noting that the minimum number of observationsin the left and the right K - h -neighborhoods of c is m l ( N ) and m r ( N ) respectively, we obtain that G L = G L = K ( Y l − Y l ) m l ( N ) · K + K ,G R = G R = K ( Y r − Y r ) m r ( N ) · K + K . Applying results of part (b) of Lemma 4, we have G LL = K ( Y l − Y l )( m l ( N ) − · K + K , G RR = K ( Y r − Y r )( m r ( N ) − · K + K . We next consider G LR which quantifies the case when an observation from the left K - h neighborhoodof c moved in to the right-hand side neighborhood. Suppose we started with T + 1 observationsin the left K - h -neighborhood, where m l ( N ) ≤ T ≤ N − m r ( N ). We need to evaluate the biggestchange that happened in the left-hand side neighborhood, the biggest change in the right-handside neighborhood and evaluate their directions (whether these changes are acting in the same oropposite directions). Relying on the results of part (b) of Lemma 5 we can establish that given T thelargest absolute change in the weighted mean in the left K - h neighborhood of c is K ( Y l − Y l ) T · K + K , and thelargest absolute change in the weighted mean from acquiring an extra point in that neighborhoodis K ( Y r − Y r )( N − T − · K + K .As shown in Lemma 5, these changes can be either positive or negative. Since our estimator ofinterest is the difference between the weighted means in the right and the left K - h neighborhoodsof c , to get maximum absolute changes for a given T we have to look at the cases when these twoweighted means change in opposite directions. For a given T this gives us the maximum absolutechange K · ( Y l − Y l ) T · K + K + K · ( Y r − Y r )( N − T − · K + K . T such that m l ( N ) ≤ T ≤ N − m r ( N ). If Y r − Y r > Y l − Y l , then the maximum is attained at T = N − m r ( N ), otherwise it is attained at T = m l ( N ). To summarize, G LR = max (cid:40) K · ( Y l − Y l ) m l ( N ) · K + K + K · ( Y r − Y r )( N − m l ( N ) − · K + K , K · ( Y l − Y l )( N − m r ( N )) · K + K + K · ( Y r − Y r )( m r ( N ) − · K + K (cid:41) . The case of G RL which quantifies the case when an observation from the right K - h neighborhoodof c moved in to the left-hand side neighborhood is considered analogously. In this case, G RL = max (cid:40) K · ( Y l − Y l )( N − m r ( N ) − · K + K + K · ( Y r − Y r ) m r ( N ) · K + K , K · ( Y l − Y l )( m l ( N ) − · K + K + K · ( Y r − Y r )( N − m l ( N )) · K + K (cid:41) . This gives the result that G ( N ) is of the rate { m l ( N ) ,m r ( N ) } .(b) Suppose e.g. that the support Y r is unbounded. Then part (c) of Lemmas 4 and 5 give us that G R = G R = G RR = + ∞ , which implies that G ( N ) = + ∞ . (cid:4) Proof of Proposition 4. The proof in this case is analogous to the proof of Proposition 2. Sincethe kernel has an unbounded support, there are no longer case of observations falling outside ofeither neighborhood or entering a neighborhood. Therefore, the global sensitivity is G ( N ) = max { G LR , G RL , G LL , G RR } , where G LL , G RR , G LR , G RL are defined as in the proof of Proposition 2. Throughout the proof weapply Lemmas 4 and 5 with the strict inequality version (0 = c < b i ) in (6.1).(a) When the observation remains to the left of c , we apply part (a) of Lemma 4 to obtain that themaximum absolute change G LL in the estimate in this case is G LL = Y lall − Y lall .When the observation remains to the right of c , we consider the maximum absolute change G RR inestimate and analogously to above show that G RR = Y rall − Y rall .Suppose an observation moves from the left of c to the right of c , or from the right of c to the leftof c . Analogously to the proof of of Proposition 2, we can establish that G LR = G RL = Y rall − Y rall + Y lall − Y lall . (b) Analogous to the proof in Proposition 2. (cid:4) Proof of Proposition 5. Just like in Propositions 2 and 3, we want to find G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } where G L and G R are sensitivities in situations of a new observation leaving the left or the right h -neighborhood, respectively; and G L and G R are sensitivities in situations of a new observationentering the left or the right h -neighborhood, respectively; G LR and G RL are sensitivities in casesof an observation switching the neighborhoods; G LL and G RR are sensitivities in cases when anobservation changes within the same neighborhood.Since the local linear estimator effectively considers observations whose running variable values are ina small neighborhood around c , we employ (3.11) as approximations of the support for the outcomein one-sided neighborhoods of c .As we know, (cid:98) α R = y R − ( x R − c ) (cid:80) Ni =1 ( q i x i − x R ) q i y i · c ≤ x i ) (cid:80) Ni =1 ( x i q i − x R ) · c ≤ x i ) (cid:98) α L = y L − ( x L − c ) (cid:80) Ni =1 ( x i q i − x L ) y i q i · x i < c ) (cid:80) Ni =1 ( x i q i − x R ) · x i < c ) , where q i = K (cid:16) x i − ch N (cid:17) , y R = (cid:80) Ni =1 q i y i c ≤ x i ) (cid:80) Ni =1 q i c ≤ x i ) , x R = (cid:80) Ni =1 q i x i c ≤ x i ) (cid:80) Ni =1 q i c ≤ x i ) , y L = (cid:80) Ni =1 q i y i x i 1. For a kernel K ( · ) with a bounded support u > − u , u ) is the support of this kernel. If K ( · ) has an unbounded support, then we2can take u to be a very large positive number. Whatever the situation is, we can take q i ≈ K (0) , i = 1 , . . . , m r ( N ) − q T ≈ K, q (cid:48) T = K (0) , where K = inf u ∈ ( − u ,u ) K ( u ). Suppose that y T = y (cid:48) T . Then (cid:98) α R ≈ y R − (cid:18) ( c + ∆ N )( T − K (0)( T − K (0) + K − c (cid:19) ×× K (0)(( c + ∆ N ) K (0) − ( c +∆ N )( T − K (0)( T − K (0)+ K ) (cid:80) T − i =1 y i + Ky T (cid:16) ( c + u h N − ∆ N ) K − ( c +∆ N )( T − K (0)( T − K (0)+ K (cid:17) (( c + ∆ N ) K (0) − ( c +∆ N )( T − K (0)( T − K (0)+ K ) + (cid:16) ( c + u h N − ∆ N ) K − ( c +∆ N )( T − K (0)( T − K (0)+ K (cid:17) (cid:98) α R ≈ y R − ∆ N (cid:18) − δT (cid:19) × − ∆ N δT (cid:80) T − i =1 y i − y T T − T ∆ N δ ( T − N δ T + ( T − T ∆ N δ ) For fixed T , h N , ∆ N , it is possible to have δ ↓ 0, in which case we have that | (cid:98) α (cid:48) R − (cid:98) α R | → ∞ . Since there are no changes in (cid:98) α L , we conclude that G RR = ∞ , and thus, G = ∞ .Note that when the kernel either has an unbounded support or has a bounded support with K = 0,then even without using δ ↓ 0, we can establish that the global sensitivity is bounded away from zerofor any N , using techniques similar to those in Propositions 2 and 4. The proof above is based on theability to have realizations of the data such that the minimum eigenvalue of the matrix T ˜ X Tr ˜ X r canbe arbitrarily close to zero, where ˜ X r is the T × , x i − c ) for when x i ≥ c . Ifin the implementation of a differentially private a data curator wants to establish a strictly positivelower bound on the minimum eigenvalue of this matrix, then in the case of a kernel with a boundedsupport and K > { m r ( N ) ,m l ( N ) } . However, an issue with this is given in discussion after Proposition3 and is related to the fact that there is a always a strictly positive probability of the number ofobservations being strictly less m r ( N ) in the K - h -neighborhood to the right or strictly less than m l ( N ) in the K - h -neighborhood to the left.It is obvious that when the support of Y | X in the neighborhood to the right of c or to the left of c is unbounded, then G ( N ) = + ∞ , which can be shown by just changing one value of Y i only. (cid:4) Proof of Proposition 6 .Just like in Proposition 5, we can formulate the problem as that of finding G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } , G L and G R are sensitivities in situations of a new observation leaving the left or the right h -neighborhood, respectively; and G L and G R are sensitivities in situations of a new observationentering the left or the right h -neighborhood, respectively; G LR and G RL are sensitivities in casesof an observation switching the neighborhoods; G LL and G RR are sensitivities in cases when anobservation changes within the same neighborhood. When we say ”leaves a neighborhood” or ”entersa neighborhood”, we mean that with respect the value of X i .If we follow the textbook definition of differential privacy and, thus, consider all possible realizationswith data no matter how small the probability of these realizations is as long as it is strictly positive,then we can show that G ( N ) = + ∞ . Indeed, let us show e.g. that following the textbook definitionof differential privacy, we have that G RR = ∞ .Consider a situation when the first T ≤ N observations in our data are in the right-hand sideneighborhood in the values of X i . Consider, for example, realizations of the datasets when only T -th observation in the right-hand side neighborhood changes its value x T while its values W T and Y T do not change. Then, of course, (cid:98) α y,L = (cid:98) α (cid:48) y,L , (cid:98) α w,L = (cid:98) α (cid:48) w,L . Now we will use the same realizations for X , . . . , X T , X (cid:48) T as given in (6.3)-(6.4) in the proof ofProposition 5. Also, we will take the realization of the dataset when W i = 1, i = 1 , . . . , T or W i = 0, i = 1 , . . . , T (again, due to the fuzzy scenario, the probability of this scenario may be perceivedas low but it is strictly positive and, thus, has to be taken into account by a differentially privatemechanism). Then we can take, of course, that (cid:98) α w,R = (cid:98) α (cid:48) w,R as the values of indicators 1( X i ≥ c ), i = 1 , . . . , T , do not add any explanatory power in the locallinear regression of W i on constant and 1( X i ≥ c ) in the right-hand side neighborhood (one canthink of this situation as the situation of the perfect fit in the reduced form in the IV regressioneven though technically (cid:98) α w,R and (cid:98) β w,R may not be separately estimated in a sample like the one wesuggested). Thus, changes in the value of (cid:98) τ F,LocLin in (3.3) happen only because of the changes inthe numerator. The changes in the numerator are, of course, the same as the changes in (cid:98) α R describedin the proof of Proposition 5 and, thus, manipulating δ , we can make this change arbitrarily largein the absolute value, leading us to conclusion that G RR = + ∞ .Even if one wanted to deviate from the textbook definition of differential privacy and restrict W i , i = 1 , . . . , T , to have some variation in each neighborhood – e.g. by requiring a minimum number ofzero’s and one’s in each neighborhood or a fixed proportion – in this case, it would be straightforwardto show that the global sensitivity would be bounded away from zero as N → ∞ .Indeed, in this case even using the same example with T realizations of X , . . . , X T , X (cid:48) T in theright-hand side neighborhood with the values given in (6.3)-(6.4) in the proof of Proposition 5, we4would obtain that manipulating δ approaching 0, change in both the numerator (cid:98) α y,R − (cid:98) α y,L andthe denominator (cid:98) α w,R − (cid:98) α w,L are arbitrarily large in the absolute value but they become arbitrarilylarge with the same rate in δ , thus allowing us to conclude that the change is constant and showthat this constant change may not diminish to 0 with the sample size.It is obvious that when the support of Y | X in the neighborhood to the right of c or to the left of c is unbounded, then G ( N ) = + ∞ , which can be shown by just changing one value of Y i only.Note that for simplicity we used the uniform kernel to define (3.3). In the case if one were using akernel with a bounded support but K > (cid:4) Proof of Proposition 7 Since the support of Y is bounded by Assumption 3, the global sensitvityis determined by variation of the empirical weight 1 / (cid:98) P ( x ) over X . Thensup x,x (cid:48) ∈X (cid:12)(cid:12)(cid:12) / (cid:98) P ( x ) − / (cid:98) P ( x (cid:48) ) (cid:12)(cid:12)(cid:12) ≥ ¯ K/ ( h N K (diam( X ) /h N )) . Note whenever diam( X ) is infinite, an infinte lower bound applies and (ii) immediately follows.When diam( X ) is finite, global sensitivity of ψ ( P N ) is bounded from below by ¯ K/ ( N h N K (diam( X ) /h N )) . For h N = o ( N − / ) , since lim | z |→∞ | z | k K ( | z | ) = 0 for k ≤ d with d > , then N h N K (diam( X ) /h N ) = o (1) and, thus global sensitivity of ψ ( P N ) does not decrease as N → ∞ . (cid:4) References [1] Disclosure avoidance and the 2020 census. https : // / about / policies / privacy / statistical safeguards/disclosure-avoidance-2020-census.html , 2019.[2] Open letter to census bureau leadership. https : // ipums.org / changes-to-census-bureau-data-products / open-letter-to-census-bureau-leadership , 2019.[3] Social science one announces access to facebook dataset of publicly shared urls forresearch. https : // socialscience.one / blog / social-science-one-announces-access-facebook-dataset-publicly-shared-urls , 2019.[4] Alberto Abadie and Guido W Imbens. Large sample properties of matching estimators foraverage treatment effects. Econometrica , 74(1):235–267, 2006.[5] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu.Approximation algorithms for k-anonymity. Journal of Privacy Technology , 2005112001, 2005.5[6] A. Beresteanu, I. Molchanov, and F. Molinari. Partial identification using random set theory. Journal of Econometrics , 166(1):17–32, 2012.[7] Arie Beresteanu and Francesca Molinari. Asymptotic properties for a class of partially identifiedmodels. Econometrica , 76(4):763–814, 2008.[8] J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility inanonymized data publishing. In Proceeding of the 14th ACM SIGKDD international conferenceon Knowledge discovery and data mining , pages 70–78. ACM, 2008.[9] Sebastian Calonico, Matias D. Cattaneo, and Rocio Titiunik. Robust nonparametric confidenceintervals for regression-discontinuity designs. Econometrica , 82(6):2295–2326, 2014.[10] Matias D. Cattaneo, Nicols Idrobo, and Roco Titiunik. A Practical Introduction to RegressionDiscontinuity Designs: Foundations . Elements in Quantitative and Computational Methods forthe Social Sciences. Cambridge University Press, 2020.[11] Victor Chernozhukov and Han Hong. An mcmc approach to classical estimation. Journal ofEconometrics , 115(2):293–346, 2003.[12] V. Ciriani, S.D.C. di Vimercati, S. Foresti, and P. Samarati. k-anonymity. Secure Data Man-agement in Decentralized Systems. Springer-Verlag , 2007.[13] C. Dwork. Differential privacy. Automata, languages and programming , pages 1–12, 2006.[14] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensi-tivity in private data analysis. In Theory of cryptography conference , pages 265–284. Springer,2006.[15] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foun-dations and Trends R (cid:13) in Theoretical Computer Science , 9(3–4):211–407, 2014.[16] Jianqing Fan. Design-adaptive nonparametric regression. Journal of the American StatisticalAssociation , 87(420):998–1004, 1992.[17] Jianqing Fan and Irene Gijbels. Local polynomial modelling and its applications . Number 66 inMonographs on statistics and applied probability series. Chapman and Hall.[18] Jinyong Hahn. On the role of the propensity score in efficient semiparametric estimation ofaverage treatment effects. Econometrica , pages 315–331, 1998.[19] Jinyong Hahn, Petra Todd, and Wilbert Van der Klaauw. Identification and estimation oftreatment effects with a regression-discontinuity design. Econometrica , 69(1):201–209, 2001.[20] Keisuke Hirano, Guido W Imbens, and Geert Ridder. Efficient estimation of average treatmenteffects using the estimated propensity score. Econometrica , 71(4):1161–1189, 2003.6[21] Guido Imbens and Karthik Kalyanaraman. Optimal bandwidth choice for the regression dis-continuity estimator. Review of Economic Studies , 79:933–959, 2012.[22] Guido W. Imbens and Thomas Lemieux. Regression discontinuity designs: A guide to practice. Journal of Econometrics , 142:615–635, 2008.[23] Aaron Johnson and Vitaly Shmatikov. Privacy-preserving data exploration in genome-wideassociation studies. In Inderjit S. Dhillon, Yehuda Koren, Rayid Ghani, Ted E. Senator, PaulBradley, Rajesh Parekh, Jingrui He, Robert L. Grossman, and Ramasamy Uthurusamy, editors, KDD , pages 1079–1087. ACM, 2013.[24] A. F. Karr, C. N. Kohnen, A. Oganian, J. P. Reiter, and A. P. Sanil. A framework for evaluatingthe utility of data altered to protect confidentiality. The American Statistician , 60(3):224–232,2006.[25] Toru Kitagawa. Estimation and inference for set-identified parameters using posterior lowerprobability. working paper , 2012.[26] T. Komarova, D. Nekipelov, and E. Yakovlev. Estimation of treatment effects from combineddata: Identification versus data security. In A. Goldfarb, S.M. Greenstein, and C.E. Tucker,editors, Economic Analysis of the Digital Economy . The University of Chicago Press, Chicago,2015.[27] Tatiana Komarova, Denis Nekipelov, Ahnaf Al Rafi, and Evgeny Yakovlev. K-anonymity: Anote on the trade-off between data utility and data security. Applied Econometrics , 48:44–62,2017.[28] Tatiana Komarova, Denis Nekipelov, and Evgeny Yakovlev. Identification, data combination,and the risk of disclosure. Quantitative Economics , 9(1):395–440, 2018.[29] Anna Kormilitsina and Denis Nekipelov. Consistent variance of the laplace-type estimators:Application to dsge models. International Economic Review , 57(2):603–622, 2016.[30] D. Lambert. Measures of disclosure risk and harm. Journal of Official Statistics , 9:313–313,1993.[31] David S. Lee and Thomas Lemieux. Regression discontinuity designs in economics. Journal ofEconomic Literature , 48:281355, 2008.[32] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity.In Proceedings of the 2005 ACM SIGMOD international conference on Management of data ,pages 49–60. ACM, 2005.[33] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity.In Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference , pages25–25. IEEE, 2006.7[34] Y. Lindell and B. Pinkas. Privacy preserving data mining. In Advances in CryptologyCRYPTO2000 , pages 36–54. Springer, 2000.[35] Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, and Lars Vilhuber.Privacy: Theory meets practice on the map. In Proceedings of the 2008 IEEE 24th InternationalConference on Data Engineering , ICDE 08, page 277286, USA, 2008. IEEE Computer Society.[36] Justin McCrary. Manipulation of the running variable in the regression discontinuity design: Adensity test. Journal of Econometrics , 142(2):698–714, February 2008.[37] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In , pages 94–103. IEEE, 2007.[38] Xue Meng, Hui Li, and Jiangtao Cui. Different strategies for differentially private histogrampublication. J. Comm. Inform. Networks , 2(3):68–77, 2017.[39] I. Molchanov. Theory of random sets . Springer, 2005.[40] Ilya Molchanov and Francesca Molinari. Random Sets in Econometrics . Econometric Societymonographs. Cambridge University Press.[41] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Securityand Privacy, 2008. SP 2008. IEEE Symposium on , pages 111–125. IEEE, 2008.[42] David Pollard. Empirical processes: theory and applications. In NSF-CBMS regional conferenceseries in probability and statistics , pages i–86. JSTOR, 1990.[43] Jack Porter. Estimation in the regression discontinuity model. working paper , 2003.[44] Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observa-tional studies for causal effects. Biometrika , 70(1):41–55, 1983.[45] P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity andits enforcement through generalization and suppression. Technical report, Citeseer, 1998.[46] L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems , 10(5):571–588,2002.[47] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of UncertaintyFuzziness and Knowledge Based Systems , 10(5):557–570, 2002.[48] Donald L. Thistlethwaite and Donald T. Campbell. Regression-discontinuity analysis: Analternative to the ex post facto experiment. Journal of Educational Psychology , 51(6):309–317,1960.[49] Caroline Uhler, Aleksandra B. Slavkovic, and Stephen E. Fienberg. Privacy-preserving datasharing for genome-wide association studies. J. Priv. Confidentiality , 5(1), 2013.8[50] M. Woo, J. P. Reiter, A. Oganian, and A. F. Karr. Global measures of data utility for microdatamasked for disclosure limitation. Journal of Privacy and Confidentiality , 1(1):111–124, 2009.9 F i g u r e : I ll u s t r a t i o n t o S ce n a r i o1 . Tw e n t y i nd e p e nd e n t p a t h s o f d i ff e r e n t i a ll y p r i v a t ee s t i m a t o r s l o c a lli n e a r e s t i m a t o r s f o r i n c r e a s i n g s a m p l e s i ze s f o r v a r i o u s d e g r ee s o f d i ff e r e n t i a l p r i v a c y p r o t ec t i o n .6