[PDF] Identification and Formal Privacy Guarantees

Abstract

Empirical economic research crucially relies on highly sensitive individual datasets. At the same time, increasing availability of public individual-level data makes it possible for adversaries to potentially de-identify anonymized records in sensitive research datasets. Most commonly accepted formal definition of an individual non-disclosure guarantee is referred to as differential privacy. It restricts the interaction of researchers with the data by allowing them to issue queries to the data. The differential privacy mechanism then replaces the actual outcome of the query with a randomised outcome. The impact of differential privacy on the identification of empirical economic models and on the performance of estimators in nonlinear empirical Econometric models has not been sufficiently studied. Since privacy protection mechanisms are inherently finite-sample procedures, we define the notion of identifiability of the parameter of interest under differential privacy as a property of the limit of experiments. It is naturally characterized by the concepts from the random sets theory. We show that particular instances of regression discontinuity design may be problematic for inference with differential privacy as parameters turn out to be neither point nor partially identified. The set of differentially private estimators converges weakly to a random set. Our analysis suggests that many other estimators that rely on nuisance parameters may have similar properties with the requirement of differential privacy. We show that identification becomes possible if the target parameter can be deterministically located within the random set. In that case, a full exploration of the random set of the weak limits of differentially private estimators can allow the data curator to select a sequence of instances of differentially private estimators converging to the target parameter in probability.

Full PDF

IIdentiﬁcation and Formal Privacy Guarantees

Tatiana Komarova a and Denis Nekipelov b ∗ This version: June 2020

AbstractAbstract:

Empirical economic research crucially relies on highly sensitive individual datasets.At the same time, increasing availability of public individual-level data that comes from socialnetworks, public government records and directories makes it possible for adversaries to poten-tially de-identify anonymized records in sensitive research datasets. This increasing disclosurerisk has incentivised large data curators, most notably the US Census bureau and several largecompanies including Apple, Facebook and Microsoft to look for algorithmic solutions to provideformal non-disclosure guarantees for their secure data. The most commonly accepted formaldata security concept in the Computer Science community is referred to as diﬀerential privacy.Diﬀerential privacy restricts the interaction of the researcher with the data by allowing herto issue queries that evaluate the functions of the data. The diﬀerential privacy mechanismthen replaces the actual outcome of the query with a randomised outcome with the amount ofrandomness determined by the sensitivity of the outcome to individual observations in the data.While diﬀerential privacy does provide formal data security guarantees, its impact on theidentiﬁcation of empirical economic models as well as on the performance of estimators in non-linear empirical Econometric models has not been suﬃciently studied. Since privacy protectionmechanisms are inherently ﬁnite-sample procedures, we deﬁne the notion of identiﬁability of theparameter of interest as a property of the limit of experiments. It is naturally characterized byconcepts from the random sets theory and is linked to the asymptotic behavior in measure ofdiﬀerentially private estimators.We demonstrate that particular instances of regression discontinuity design and averagetreatment eﬀect may be problematic for inference with diﬀerential privacy. Under diﬀerentialprivacy their estimators can only be ensured to converge weakly with their asymptotic limitremaining random and, thus, may not be estimated consistently. This result is clearly supportedby our simulation evidence. Our analysis suggests that many other estimators that rely onnuisance parameters may have similar properties with the requirement of diﬀerential privacy.

JEL Classiﬁcation:

C35, C14, C25, C13.

Keywords:

Diﬀerential privacy; average treatment eﬀect; regression discontinuity; randomsets; identiﬁcation ∗ a Department of Economics, Londons School of Economics and Political Science and b Department of Economicsand Computer Science, University of Virginia.Support from STICERD and the NSF is gratefully acknowledged. a r X i v : . [ ec on . E M ] J un A large portion of empirical work in Social Sciences and most notably in Economics relies on highlysensitive data. Sensitivity of the data may be well understood from the personal experiential per-spective and is largely associated with risks of potential exposure of individuals in the data whenthe data may reveal their personal or ﬁnancial information which consequently would make themsensitive to the adversaries or embarrass them. At the same time, there has been a long strugglein attempts to formalize the concept of sensitivity of the data and the related concept of “privacyguarantee” that would measure how and to what extent sensitive attributes of the data are protected.The most signiﬁcant progress in the eﬀorts to formalize privacy protection has been made in theComputer Science literature. The mainstream approach there would consider any attribute of thedata as “sensitive.” At the same time, the security risks are considered in the worst-case scenariosetting where public components of the protected dataset or its summaries are accessed by anadversary. The adversary is assumed to have an arbitrary amount of auxiliary information that heor she can use to expose individuals in the data.The concept of diﬀerential privacy ﬁrst introduced in [14] provides, arguably, the most acceptedformal deﬁnition of the security of the data in the Computer Science literature. Diﬀerential privacyis built on the idea of a data analyst communicating with a secure dataset by issuing queries whichare then evaluated by the privacy protection mechanism. Privacy protection mechanism alters anactual query outcome using an independent random noise. The privacy guarantee is measured bythe maximum possible change in the distribution of the randomized query if any one data entry isdeleted or altered. In other words, diﬀerential privacy ensures that no single individual/observationin the data can have a signiﬁcant impact on the distribution of the randomized query. Diﬀerentialprivacy gives a broad set of guarantees of data protection from adversarial attacks.Given the universal theoretical appeal of diﬀerential privacy, its adoption as a security standard hasbeen considered by a variety of private enterprises and government bodies. For instance, Facebookrecently announced ([3]) that it will provide public access to the data on shared links within thesocial network via a diﬀerentially private protocol. In 2019 the US Census bureau has announced([1]) that it will use diﬀerential privacy as the baseline for privacy protection in 2020 Census. Thetransition to the diﬀerential privacy standards has caused an outcry in Social Science communityand resulted in an open letter signed by a large number of academic researchers ([2]).In the evaluation of privacy technologies Computer Science literature focuses on the “privacy-utility”tradeoﬀ. In this literature, e.g. discussed in [34], [24], [8], [50]), the main idea is that privacyprotection makes the data “noisier” which reduces its “utility” but at the same time does notpreclude one from recovering the parameter of interest from the data, albeit possibly less accurately.In our previous work [28] we showed that this thought process is ﬂawed. Since privacy protectionis fundamentally a ﬁnite sample paradigm, its impact on the properties of estimators has to bestudied using the concept of limits of experiments. From this perspective, the asymptotic behaviorof the estimator of interest should be viewed as the limit of experiments where an estimator withprivacy constraints is produced from samples on an increasing size. It is important to note, however,that in our earlier work [28] (as well as in [26], [27]) we do not analyze diﬀerential privacy. Therewe consider a framework where the data needed for estimation has to be combined from diﬀerentsplit datasets and analyze the identiﬁcation of the parameter of interest in the presence of privacyguarantees. In particular, we show that the concept of k -anonymity, which was one of the ﬁrstattempts at a formal deﬁnition of privacy and is discussed below in more detail, is incompatiblewith the parameter identiﬁcation in conditional moment models, which, of course, include manycommonly used econometric models. With any degree of k -anonymity imposed on the data onecan only recover a pseudo-identiﬁed set for the parameter of interest. This set is generally a non-singleton subset of the parameter space and it does not include the true parameter. In other words,the “privacy-utility” tradeoﬀ is meaningless if one wants to enforce k -anonymity guarantees.In this paper we demonstrate that a similar drawback is inherent in the concept of diﬀerential privacy.There are without a doubt situations when diﬀerentially private estimation will be compatible withparameter identiﬁcation. These are usually situation when the maximum of the inﬂuence functionof the estimator – the diﬀerential privacy literature refers to it as global sensitivity – converges tozero. In this paper we show that diﬀerentially private versions of several important estimators,including regression discontinuity design (RDD) estimators and standard average treatment eﬀect(ATE) estimators, are incompatible with parameter identiﬁcation. We show that the main reason forthis is the fact that the weights of observations used in these estimation techniques are data-driven.That is, the fact that the weights of some observations may drastically change with the change inone data point leads to these estimators having global sensitivities bounded away from zero (and insome cases even being inﬁnite) even asymptotically. This leads to the loss of identiﬁcation. Sincethe issue of data-driven weights is typical in many important econometric approaches, we believethat our negative ﬁndings will extend to other important frameworks.Before we review the formal notion of a diﬀerentially private mechanism and present our ﬁndings, wewant to touch upon some important issues in the privacy-related research. It is clear that in orderto talk about privacy-preserving approaches and their properties, one has to be able to characterizeformally the level of exposure induced by the adversary. This can done, for instance, in accordancewith the US Census Bureau’s analysis in [30] which distinguishes between identity disclosure , wherean adversary is able to identify if a speciﬁc data entry belongs to a certain individual, and attributedisclosure where an adversary is able to ﬁnd out if a particular individual has a particular character-istic (e.g. belongs to a certain group). Formal legal approaches to protection of individual data, forinstance, behind HIPAA and FERPA are based on clear prioritization of identity disclosure over at-tribute disclosure and mandate the removal of speciﬁc demographic and personal identiﬁers to make When diﬀerentially private estimation is compatible with parameter identiﬁcation, then the issue of the asymptoticdistribution of the diﬀerentially private estimator arises. It is a diﬀerent problem and we don’t discuss it in detail inthis paper. However, we brieﬂy touch upon this issue in Example 1 in Section 2, where one can see that to ensuretraditional rates one would need the global sensitivity to converge to zero fast enough. the data “less sensitive.” Computer Science literature, at the same time, clearly demonstrates thatthe reduction in “sensitivity” of the data based on potential exposure level by removing individualcharacteristics is highly ineﬀective. The examples of successful attacks on the “anonymized” datalead to the early work on the formal deﬁnition of privacy guarantees that resulted, in particular, inthe development and the implementation (see [45], [46], [47], [32], [5], [33], [12], among others) ofthe so-called k -anonymity approach. A database instance is said to provide k -anonymity, for somenumber k , if every way of singling an individual out of the database returns records for at least k individuals. In other words, anyone whose information is stored in the database can be “confused”with k others. It is important to note that the k -anonymity approach was primarily targeted to pre-vent the identity disclosure and, as is well known, does not prevent the attribute disclosure. It alsoimplicitly requires that the data curator responsible for protecting the data is aware of all possibleauxiliary information that could be available to the adversary. These features make k -anonymityand its variants rather impractical.The concept of diﬀerential privacy developed in [13] provides a measure of privacy guarantees with-out these complications and addresses both identity and attribute disclosure concerns. Diﬀerentialprivacy formalizes the interaction of a user with a “sensitive” database via queries that are sub-mitted through a secure server with a privacy protection mechanism. These queries are functionsthat need to be computed on the data representing summaries or tabulations of the data. Theassumed risk to the database is that the queries to the database can be issued by an adversary. Theprivacy protection mechanism has two main elements that allow it to be eﬀective against arbitraryadversaries. First, it takes into account the “sensitivity” of the query to the data, that measures themaximum change in the output of the function computed on the data if any data entry is deletedor altered. Second, it independently randomizes the outcome of the query which ensures that theproduced outcome is not correlated with any auxiliary information that an adversary may have.More formally, when the query to the data is an estimator (cid:98) θ , for instance, representing a meanor a median of a particular variable in the data or the vector of estimated coeﬃcients in a linearmodel, diﬀerential privacy requires to replace the point estimator of interest (cid:98) θ with a randomizedfunction θ ( P N , ν ), where P N is the empirical distribution of the dataset and ν is an independentrandom element. In other words, it is required that the estimator is randomized using independentnoise. Independence of random element ν from the data sample is essential, since any correlationmay allow adversary to recover potentially sensitive attributes of observations in the data. While For instance, [47] identiﬁed the medical records of William Weld, then governor of Massachusetts, by linkingvoter registration records to “anonymized” Massachusetts Group Insurance Commission (GIC) medical encounterdata, which retained the birthdate, sex, and zip code of the patient. In another example in [41] the risk of disclosurewas identiﬁed in the so-called “Netﬂix prize dataset.” In 2009 Netﬂix announced a competition with a grand prize of$1M for developers of the prediction algorithm that would use the information on past viewership history for givenconsumer and how this consumer rated those movies and be be able to predict how that consumer would rate themovies that he or she has not seen yet. For this competition Netﬂix released an “anonymized” dataset containing100,480,507 ratings produced by 480,189 consumers. [41] used the public movie review data of users from imdb.comand were able to link a signiﬁcant fraction of “anonymous” consumers on Netﬂix to imdb.com users based on theuniqueness of watch histories. the distribution of the random element can be adjusted to the general properties of the population data distribution P N such as the sample size N and the number of variables, it may not depend onthe speciﬁc values of observations identifying the location of point masses of P N .We now provide formal deﬁnition of diﬀerential privacy in application to randomized estimator θ ( P N , ν ). DEFINITION 1 (( (cid:15), δ )-diﬀerential privacy [13]) . A randomized estimator θ ( P N , ν ) is ( (cid:15), δ ) -diﬀerentiallyprivate if for any two empirical distributions P N and P (cid:48) N over N support points and diﬀering (ar-bitrarily) in only one support point, we have that for all measurable sets A of possible outputs thefollowing holds: P ν [ θ ( P N , ν ) ∈ A ] ≤ e (cid:15) P ν [ θ ( P (cid:48) N , ν ) ∈ A ] + δ, (1.1) where (cid:15) > , δ ∈ [0 , are privacy parameters and probabilities are taken over randomness in ν . Inaddition, if δ = 0 , then the estimator θ ( P N , ν ) is referred to as (cid:15) -diﬀerentially private. In this deﬁnition we use notation P ν ( · ) to emphasize that diﬀerentially private estimator θ ( P N , ν ) isbased on the distribution of random element ν while the distributions of two adjacent datasets P N and P (cid:48) N are ﬁxed. The bound in the deﬁnition has to be valid for any possible empirical distributionsof the data P N and P (cid:48) N diﬀerent in one support point no matter what the probability of realizationof these datasets is.In Section 2 we discuss this notion in detail and review some common diﬀerential privacy practice.We then develop a notion of identiﬁcation under diﬀerential privacy. As we can see from Deﬁnition1, the diﬀerential privacy approach is implemented in the ﬁnite samples whereas, as is well known,identiﬁcation in Economics and Econometrics is a population property. Thus, to bring these twoworlds – the population and ﬁnite sample – together, we propose to look at the identiﬁcation (orlack of such) as the property of the limit of experiments, an idea we used in our previous work[28] for privacy under data combination. However, the identiﬁcation approach we suggest underdiﬀerential privacy is diﬀerent from that in [28], which is not surprising given that diﬀerentiallyprivacy is a fundamentally distinct approach for privacy protection. We ﬁrst consider a set ofestimators that can obtained in a ﬁnite sample by applying a diﬀerentially private mechanism. Thiscan be viewed as a random set and, thus, it is only natural for us to rely on the well-developed andextremely useful theory of random sets of [39] and [40] when deﬁning the notion of identiﬁcation inthe limit of experiments. We argue that the selection expectation the random set of estimators orother deterministic characteristics of the random sets are not suitable notions for identiﬁcation basisin this case as they are in conﬂict with the practice and fundamental requirements of diﬀerentialprivacy. Instead, we argue that it is suitable to base the notion of identiﬁcation on the weak limitof these random sets in the limit of sequence of experiments and its corresponding containmentfunctional . We discuss the notion of identiﬁcation and pseudo-identiﬁed set and illustrate them withsome examples.In Section 3 we conduct the identiﬁcation analysis for diﬀerentially private estimators in regressiondiscontinuity design models. We establish the lack of identiﬁcation for diﬀerentially private versionsof nonparametric regression at the boundary and local linear estimators. We also discuss implicationsof diﬀerential privacy for speciﬁcation tests in the RDD framework and present some simulationevidence.In Section 4 we conduct the identiﬁcation analysis and the performance of diﬀerentially privateestimators in models with treatment eﬀects and show that generally the ATE under diﬀerentialprivacy is not identiﬁed.Section 5 concludes.The proofs of all the results are collected in the Appendix. In our previous work [28] we considered identiﬁcation in models where a researcher has access to adataset that was obtained by combining split datasets subject to constraints on the prevention ofthe identity disclosure. There, both the data combination procedure and the impositions of privacyconstraints such as k -anonymity) were intrinsically ﬁnite sample procedures. To reconcile the natureof these procedures with the population nature of the identiﬁcation, we argued that the identiﬁcationnotion for econometric models from combined and privacy-protected data can only be deﬁned as alimit of a combined output of the privacy preserving procedure and ﬁnite sample distribution of thedata. In this paper, we aim to develop an approach to analyze identiﬁcation of econometric modelswhen the data curator requires the output of the econometric procedure to be diﬀerentially private.For that, we can use ideas realted to those in [28].As discussed in the introduction, in the context of point estimation diﬀerential privacy requires oneto replace the point estimator of interest (cid:98) θ with a randomized functional θ ( P N , ν ), where P N isthe empirical distribution of a given dataset and ν ∈ V is an independent random element whichis assumed to be belong to a Banach space V . In other words, it is required that the estimatoris randomized using independent noise orthogonal to the distribution inducing P N . Independenceof random element ν from the data sample is essential as any correlation between them may allowadversary to recover potentially sensitive attributes of observations in the data. Randomized esti-mator θ ( P N , ν ) in Deﬁnition 1 only ensures that information regarding individual data entry cannotbe reverse-engineered from its values. There are two important features of the privacy preservingmethodology we want to emphasize. The ﬁrst one is that even though the distribution of ν can beadjusted to the general properties of the population data distribution in P N (such as the the numberof variable and the support variables) and can depend on the sample size N , it may not depend onthe speciﬁc values of observations producing P N . The second feature is that the privacy protectionhas to be guaranteed for every possible realization of the data P N . These two features contribute tothe powerful privacy-preserving framework delivered by diﬀerential privacy. But at the same timethey contribute to the possible lack of identiﬁcation of parameters in diﬀerentially private versionsof some important econometric models (examples of those are given in section 3 and 4), which aswe will see, will be closely related to poor asymptotic properties of diﬀerentially private estimatorsin these models.It is clear from the deﬁnition of diﬀerential privacy that the smaller values of parameters ε ≥ δ ≥ ε as it measures the range of the likelihood ratio of distributions of randomizedestimators with two adjacent datasets while δ measures the (lack of) overlap between the sets of theserandomized estimators) correspond to stricter privacy restrictions and have to be chosen by a datacurator. As noted in [14], it is advisable that parameters in the deﬁnition of ( ε, δ )-diﬀerential privacyare calibrated such that both of them are allowed to approach zero as the sample size increases. inthis case we can write them as ( ε N , δ N ). Our coverage and the identiﬁcation notion will allow for avariety of situations – we can( ε N , δ N ) to be constant as well decreasing with N and this is formallygiven in Section 2.2.To give readers a more complete coverage of diﬀerential privacy approaches, we start with an exampleof a statistical procedure that admits consistent parameter estimation even with the requirement ofdiﬀerential privacy (such examples can be found in [15], among others). EXAMPLE 1 (Sample mean of a random variable with a bounded support) . Suppose that our goalis the estimation of the mean of random variable X with a bounded support from the sample of i.i.d.observations { X i } Ni =1 with empirical distribution P N . Consider the so-called Laplace mechanismwhen the estimator θ ( P N , ν N ) is obtain in the following additively separable fashion θ ( P N , ν N ) = ¯ X + a N ( ν N ) ,a N ( ν N ) ∼ Lap (0 , λ N ) where the Laplace distribution Lap ( µ, λ ) has density p ( x ; µ ; λ ; ) = λ exp (cid:16) − | x − µ | λ (cid:17) . If we choose µ = 0 and λ N = diam (Θ) Nε N (or greater of equal ), where Θ denotes the support of X , then θ ( P N , ν N ) is ( ε N , -diﬀerentially private because for any z ∈ R , f a N ( ν N ) (cid:0) z − ¯ X (cid:1) ≤ e ε N f a N ( ν N ) (cid:18) z − ¯ X − N ( X (cid:48) i − X i ) (cid:19) , for any X i , X (cid:48) i and ¯ X .If ε N remains bounded away from zero or even if ε N → as N → ∞ but N ε N → ∞ , then θ ( P N , ν N ) is obviously a consistent estimator of the population mean of X as the variance of the noise factor ν N decreases to zero. If, however, ε N = O ( N ) , then θ ( P N , ν N ) can be shown to be no longer consistent.If instead of looking at just θ ( P N , ν N ) , one wants to analyze the asymptotic behavior of the √ N ( θ ( P N , ν N ) − E [ X ]) (2.1) traditionally used for econometric inference, then one can show that if N ε N → ∞ , then the weaklimit of (2.1) is the same as for √ N ( ¯ X − E [ X ]) – that is, N (0 , V ar [ X ]) . If N ε N → constant > ,then the weak limit of (2.1) is still normal but with a large variance, thus leading to less accuracy inthe estimation. If N ε N → , then the weak limit of (2.1) does not exist. Thus, as we can see in Example 1, it is possible to have diﬀerentially private versions of samplemeans with inference being pretty much the same as for the original estimator. It does not mean,however, that diﬀerential privacy is broadly compatible with econometric inference. Example 2below demonstrates that even in the case of the sample mean a diﬀerentially private estimator maylose its nice asymptotic properties if we take X whose support is unbounded. EXAMPLE 2 (Sample mean of a random variable with unbounded support) . Suppose that incontrast with the situation in Example 1 the support of X is unbounded (for simplicity, we willtake it to be R ) but the variance of X is still ﬁnite. Then an ( ε, δ ) - diﬀerentially private estimator θ ( P N , ν N ) obtained by the addition of a mean zero noise to the sample mean ¯ X = N (cid:80) Ni =1 X i maynot be consistent.Indeed, suppose that the noise component has the density f a N ( ν N ) ( · ) . Deﬁnition 1 requires that ateach point and, in particular, at ¯ Xf a N ( ν N ) (cid:0) ¯ X (cid:1) ≤ e ε f a N ( ν N ) (cid:18) ¯ X + 1 N ( X (cid:48) i − X i ) (cid:19) + δ, for any X i , X (cid:48) i and ¯ X .The existence of the ﬁnite mean of random variable a N ( ν N ) is equivalent to the convergence of theimproper integral (cid:82) ∞ v t f a N ( ν N ) ( t ) dt for each v , which requires that lim t →∞ t f a N ( ν N ) ( t ) = 0 This impliesthat we can choose X (cid:48) i and X i such that f a N ( ν N ) (cid:0) ¯ X + N ( X (cid:48) i − X i ) (cid:1) ≤ e − ε f a N ( ν N ) (cid:0) ¯ X (cid:1) . Given that ¯ X can be an arbitrary value on the real line , Deﬁnition 1 then requires that for all N and all points t ∈ R , it holds that f a N ( ν N ) ( t ) < δ , which is clearly incompatible with the consistencyof the estimator which would require “concentration” of the distribution of a N ( ν N ) around zero as N → ∞ . Note that we have inconsistency of our diﬀerentially private estimator here even for ﬁxed ( ε, δ ) . Ifthe parameters of diﬀerential privacy are drifting to zero with N the inconsistency problem wouldbecome even more severe.One approach to “ﬁx” the behavior of the estimator is to consider trimmed or windsorised versionsof the sample mean. Trimming would bound the scale of noise that will need to be added to theestimator. That, however, may interfere with the asymptotic distribution of the mean depending Of course, some values of ¯ X may be considered to be very unlikely but we want to note here that the diﬀerentialprivacy notion requires considering all possible realizations of the samples regardless of their likelihood, and for anyvalue on the real line we can certainly ﬁnd realizations of P N that will give the sample mean equal to that value on the tail behavior of the distribution of X leading to the domination of the distribution of thediﬀerentially private estimator by the added noise. We would like to emphasize that Example 2 demonstrates the non-existence of consistent diﬀeren-tially private estimators for means of unbounded random variables generated by additive noise.

Onemay ask if maybe some non-additive way to incorporate noise would result in a consistent diﬀeren-tially private estimator. Our later discussion in Section 2.4 shows that in this situation inconsistencyis an intrinsic property of any diﬀerentially private estimator with some basic smoothness require-ments. As is evident from Example 1, support restrictions may mitigate this issue, but the fact thatDeﬁnition 1 is incompatible with consistent estimation of means of many commonly used randomvariables appears to be an unfortunate shortcoming of diﬀerential privacy. Our detailed analysis ofpopular applied econometrics methods in sections 3 and 4 will cover and discuss similar shortcomingseven in models with bounded supports for all the variables.One reason we have focused on the consistency property so far is because it is a minimum desirableproperty of a good estimator. The second reason is directly related to what we do in the remainder ofthis section – our notion of identiﬁcation is related to weak limits of diﬀerentially private estimatorsand the property of consistency describe special cases of those weak limits.

In this section, we present our formal approach to identiﬁcation for models with diﬀerentially privateoutcomes.We consider a sequence of statistical experiments indexed by the sample size N ( N → ∞ alongthis sequence), where for each N we generate an i.i.d. sample { z i } Ni =1 from the joint distribution of d -dimensional random vector Z leading to empirical distribution P N . We assume that the parameterof interest θ is in the interior of p -dimensional convex compact parameter space Θ ⊂ R p We then consider randomized estimators θ ( P N , ν N ) ∈ Θ where random element ν N ensures that theserandomized estimators are ( ε N , δ N )-diﬀerentially private for some sequences ε N and δ N . To ensureprotection from adversarial attacks on data P N added random element ν N has to be statisticallyindependent from ν N . In our analysis in this section we use the techniques from the theory of random sets which reﬂect thespirit of data analysis with diﬀerential privacy: the random element ν N or the technique that is usedto produce the randomized estimator θ ( P N , ν N ) aiming to represent parameter of interest θ may notbe available to the researcher. Instead, data curator controlling the dataset inducing the empricialdistribution P N reports parameters ( ε N , δ N ) which yield the upper bound guarantee for diﬀerentialprivacy of a given estimated output (and, possibly, the algorithm used for implementation of θ ( · , · )).This means that there can be an entire class of estimators for parameter θ that satisfy diﬀerentialprivacy with these parameters.0While we provided examples of mechanisms that can be used to achieve diﬀerential privacy inthe previous section, we need to formally deﬁne the structure of a general diﬀerentially privateestimator. A diﬀerentially private estimator takes as an input data sample that produced empiricaldistribution P N and a random element ν N and outputs a point in Θ . We treat ν N as a “seed” forrandomness represented by a ﬁxed standardized random variable that is then transformed by theestimator into the random variable used in a particular mechanism for diﬀerential privacy. E.g. ν N can be a uniformly distributed random variable (or a vector of such variables) on [0 ,

1] that is thentransformed by a mechanism into a Laplace random variable. We now give a formal description ofthe class of our considered estimators.

ASSUMPTION 1.

The class of estimators is formed by a class of bounded operators M such that:(i) M is a collection parametric families of operators such that operators M θ,ν ∈ M are well-deﬁned for each θ ∈ Θ and ν ∈ V ;(ii) M θ,ν : D ( R d ; [0 , (cid:55)→ R p for all M θ,ν ∈ M and all θ ∈ Θ and ν ∈ V (where D ( R d ; [0 , isthe Skorohod space of functions);(iii) For each F ∈ D ( R d ; [0 , and parametric family { M θ,ν : θ ∈ Θ , ν ∈ V} , M θ,ν ( F ) is Lipchitz-continuous in θ and ν ;(iv) Diﬀerentially private estimator is θ ( P N , ν N ) is deﬁned as a solution of the system of equations M θ,ν N ( P N ) = 0 for θ over parametric family { M θ,ν : θ ∈ Θ , ν ∈ V} where P N is the empirical distribution ofsample { z i } Ni =1 . We can illustrate this assumption for diﬀerentially private estimate of a sample mean using Laplacemechanism for the sample i.i.d. draws { X i } Ni =1 of random varible with bounded support and randomelement ν N ∼ U [0 , M θ,ν ( F ) = (cid:82) + ∞−∞ z dF ( z ) + diam (Θ) Nε N F − ( ν ) − θ (where F − ( · ) isthe inverse cdf of the standard Laplace distribution). We note that the empirical moment inducedby this operator produces ( ε N , ε N , δ N ) such that ε N ≤ ¯ ε and δ N ≤ ¯ δ for all N for some universal constants ¯ ε and ¯ δ . DEFINITION 2.

For a given sequence of ( ε N , δ N ) , we say that an ( ε N , δ N ) -diﬀerentially privateestimator θ ( · , · ) : Z N × V → Θ satisfying Assumption 1 is regular for the parameter of interest θ ifthe following conditions hold: (i) θ ( P N , ν N ) is a continuous random variable with respect to the Lebesgue measure;(ii) in the absence of the mechanism noise – that is, when the estimator is θ ( P N , , – there existsa function ¯ R ( N κ ) , such that lim N →∞ ¯ R ( N, κ ) = 0 that for all N and for all κ > : P ( (cid:107) θ ( P N , − θ (cid:107) > κ ) ≤ ¯ R ( N, κ ); (2.2) (iii) θ ( P N , ν N ) has a weak limit if the sequence ( ε N , δ N ) is convergent. Condition (i) states that the distribution of θ ( · , · ) has a density. Condition (ii) implies that in theabsence of any mechanism noise are informative for the parameter of interest θ – in particular, theestimator is consistent and has a guaranteed rate of convergence (in most practical scenarios ¯ R ( N, κ )would be required to be exponentially decreasing in N and κ ). The condition that function θ ( · , · )takes values in Θ only, ensures that in cases when the mechanism noise would drive the estimatoroutside of the parameter space Θ, such an estimator would be projected on the boundary of Θ.Finally, condition (iii) requires the diﬀerentially private estimator to converge in distribution.Now, having deﬁned a class of regular diﬀerentially private estimators, we make the next step towardsthe notion of identiﬁcation.Our next notion will depend on the set of sequences of ( ε N , δ N ) a data curator is willing to consider.E.g. it could be the set of sequences where ε N and δ N do not change with N , or it could a set ofsequences converging to zero at a certain rate. We will refer to a ﬁxed set of sequences as E . Wewill suppose that this set of sequences is a join-semilattice in the coordinate-wise partial order for( ε N , δ N ) – that is, the join of any two sequences from E is also in E .Following [39], we use the concept of measurable selection to deﬁne the set of all regular diﬀerentiallyprivate estimators. DEFINITION 3.

Consider set T ∗ N, E of all random variables θ ( P N , ν N ) : Z N × V (cid:55)→ Θ satisfyingDeﬁnition 2 and corresponding to sequences ( ε N , δ N ) from E . We deﬁne random set T N, E referredto as the set of regular diﬀerentially private estimators for θ for a given E as the completion of theset T ∗ N, E ∩ L ( P ) with respect to L ( P ) -norm. L ( P ) is a space of measurable functions that map the elements of the σ -algebra on Z N (withthe product measure deﬁned by the probability measure on Z ) and the σ -algebra associated withrandom elements ν N mapping into R p , and for which the Euclidean norm is integrable. By deﬁnitionof θ ( · , · ) in 2 and compactness of Θ, all elements in T ∗ N, E and, thus, in T N, E as well are bounded inthe L -norm.Random set T N, E is compact and convex in the sense of Deﬁnitions 1.30 and 4.32 in [39], as shownin Lemma 1. LEMMA 1. T N is convex and compact random set. T N, E overlap with an arbitrary neighborhood of θ with probability approaching 1. However,the notion of the probability limit is too strong as it will not allow us to talk about the limit in thefollowing simple instance of regular diﬀerentially private estimators: θ ( P N , ν N ) = θ ( P N ,

0) + a N ( ν N )where the variance of a N ( ν N ) remains constant or increases as N → ∞ (the estimator θ ( P N ,

0) inthe absence of the mechanism noise is, of course, consistent by the condition (ii) in Deﬁnition 2).As we will see later, such situations will be prevalent in the estimation of ATE and RDD models.In fact, we already conveyed our intention to consider weak limits in condition (iii) in Deﬁnition 2.The next lemma demonstrates that weak convergence is the strongest plausible convergence conceptto consider unless weal limits of regular diﬀerentially private estimators are constant.

LEMMA 2.

Suppose that θ ( P N , ν N ) W → τ as n → . Then if τ is not constant with a positiveprobability, then there exists ¯ κ > and γ > such that for all κ ≤ ¯ κ lim sup N →∞ P ( | θ ( P N , ν N ) − τ | > κ ) > γ. Theorem 1 establishes weak convergence of the convex compact random set T N, E when all thesequences of ( ε N , δ N ) in E are convergent. THEOREM 1.

For a E be the join-semilattice that consists of only convergent sequences of ( ε N , δ N ) . The random set T N, E as deﬁned in Deﬁnition 3 weakly converges to a random set T E ,which is the closure of all weak limits of estimators in the respective random set T ∗ N, E . The convergence result of Theorem 1 is essentially a result of what happens in the limit of statisticalexperiments. Naturally, we will base our notion of identiﬁability and, more generally, pseudo-identiﬁed sets, on some characteristics of the random set T E . T E . Notions of identiﬁcation and(pseudo)-identiﬁed set. Our next step will be to produce a tangible characterization of the random set T E and deﬁne theinformation content delivered by this set T E with regard to the parameter of interest. The best casescenario from the information content point of view is the case when T E is the degenerate distributionconcentrated at θ , which essentially means that the sequence of the statistical experiments deliversthe true parameter values in the limit. Generally, however, this may not be the case. One of themain diﬃculties in characterizing this set and its information content is that it may not containthe target parameter θ (i.e. it can be “biased”) and that the distribution of T E may not be3degenerate (i.e. it is not “consistently” estimating θ ). Some important work in the random setsliterature, such as [7] and [6], deﬁned the information content (or, in other words, identiﬁed set) asthe selection expectation of a random set. In our previous work [28] we used the concept related toselection expectation to analyze the impact of privacy guarantees (in particular, k -anonymity) underdata combination. Thus, selection expectation might seem like a promising approach to explorein our framework as well, especially given that such characterization is deterministic, but for thereasons explained below (privacy budget for diﬀerentially private mechanisms and the impossibilityof repeated experiments), in the diﬀerentially private setting we do not see this approach as a fruitfulone. Instead, our pseudo-identiﬁed set (“pseudo-” because it does not necessarily contain the trueparameter value) will be the random set T E itself. The notion of a random identiﬁed or a randompseudo-identiﬁed set is not traditional in econometrics as usually researchers are able to extractdeterministic consensus about which parameter values can be driving observables. We argue thatin the diﬀerentially private setting this is the preferable approach. Some other work, such as [25]employed the notion of random identiﬁed set. In [25], the source of probability that induces therandom identiﬁed set is the posterior uncertainty for the identiﬁable parameters. In our case, thissource is a combination of the sampling uncertainty of the observations and the mechanism noise,and it is not possible to separate these two sources.Having given the gist of the content of this section, we now turn to a more detailed discussion.First, as mentioned above, in the context of converging random sets, as for converging randomvariables, the notion of the expectation or the median (or some other quantile) might appear anatural way to characterize the limit (and consequently provide the framework for identiﬁcation).Recall that the selection (or Aumann) expectation of random set T N, E denoted E T N, E is the closureof the set { E ξ : ξ ∈ T N, E } . However, the selection expectation (along with any ﬁrst-order statistic,such as the Vorob’ev’s expectation) fails to be “representative” for the limiting random set T E . Thislack of representativeness stems from the inherent impossibility to replicate a statistical experimentwhose outcome is the regular diﬀerentially private estimator driven by the required structure ofdiﬀerentially private systems as discussed in [15]. Indeed, in the context of diﬀerentially privatesystems no function evaluated on the data can be considered in isolation. Diﬀerential privacy isthe property of the entirety of all functions that have ever been or will ever be evaluated from agiven dataset. By the composition property of diﬀerential privacy, two diﬀerent functions that, forinstance, are ( ε/ , ε, ε in the deﬁnition of diﬀerential privacy is considered to be apolicy parameter, then it determines the “privacy budget” of a given database. The more functionsneed to be evaluated from the data, the more noise will need to be added to each function to ensurethe entire set of functions is within the “privacy budget.” The evaluation of K functions to ensure( ε, ε/K, K times less sensitive to arbitrary changes in individual observations inthe dataset. As discussed in the introduction, k -anonymity is a formal privacy guarantee that predates diﬀerential privacy). ν N to produce the regular diﬀerentially private estimator θ ( P N , ν N )is generated only once and then any data user who wants to estimate a given parameter θ willobserve exactly the same value of the randomized estimator θ ( P N , ν N ). In other words, repeatedidentical queries to the data always result in the same (randomized only once) output.From this perspective, the concept of the selection expectation or other related statistics is clearlymisleading for a characterization of the limiting random set T E . While it may be the case that E ν N θ ( P N , ν N ) is close to θ with high probability (as N → ∞ ) for all measurable selections θ ( P N , ν N )of T N, E , there is no guarantee that θ ( P N , ν N ) is also close to θ with high probability and under“privacy budget” considerations there is no way a researcher can access repeated samples from thedistribution θ ( P N , ν N ) corresponding to a given empirical distribution P N to “average out” theadded noise ν N . To put in other words, given the “privacy budget” considerations, the selectionexpectation E ν N θ ( P N , ν N ) is not “feasible” in the diﬀerentially private framework. Moreover, aswe illustrate in the example in Section 2.3, diﬀerential privacy may be in conﬂict even with theidentiﬁcation of expectations. It is clear that we need to use a diﬀerent approach to characterizingthe limiting random set T E . To provide a comprehensive characterization of random sets T N, E and T E we use the notion of thecontainment functional adopted from [39]: DEFINITION 4.

Functional C X ( K ) = P ( X ⊂ K ) for convex compact subset of Θ is referred toas the containment functional of random set X . By Theorem 1.7.8 in [39] containment functional provides a complete characterization of convexcompact random set. Moreover, it is suﬃcient to choose the “test sets” K to be convex polytopes.Coverage functional preserves the property of the weak convergence of the sequence of random sets.We summarize this in the following theorem THEOREM 2.

Under conditions of Theorem 1 for any convex polytope K ⊂ Θ C T N, E ( K ) → C T E ( K ) , as N → ∞ . This theorem is a simple corollary of Theorem 1.6.5 in [39] and it ensures that the coverage functionalpreserves the properties of the converging sequence of random sets T N, E and, more importantly, itslimit T N, E . The characterization of the limiting coverage functional would equivalently characterizethe limiting random set. In other words, the analysis of weak convergence of random sets can bereplaced with the analysis of pointwise convergence of the coverage function on the set of convexpolytopes contained in Θ . We now formulate the notion of identiﬁability of the parameter of interest.5

DEFINITION 5 (Identiﬁability of parameter under diﬀerential privacy) . Let E include only someconverging sequences of ( ε N , δ N ) . We will say that the parameter θ is identiﬁed in the regular ( ε N , δ N ) -diﬀerentially private framework, where the sequences of ( ε N , δ N ) belong to E , if and only iffor any α ∈ (0 , and any convex polytope K (cid:51) θ C T E ( K ) ≥ − α. Theorem 3 below gives necessary and suﬃcient conditions for the identiﬁcation of parameter θ . THEOREM 3.

Suppose the conditions of Theorem 1 hold. For any sequence of ( ε N , δ N ) from E it holds that any regular ( ε N , δ N ) -diﬀerentially private estimator θ ( P N , ν N ) is such that θ ( P N , ν N ) p → θ , if and only if for any α ∈ (0 , and any convex polytope K (cid:51) θ we have C T E ( K ) ≥ − α and, thus,the parameter θ is identiﬁable even under diﬀerential privacy. Theorem 3 provides our characterization of identiﬁability which corresponds to the convergence ofthe sequence of random sets to a singleton. In other words, this parallels consistency for sequencesof ordinary random variables.Based on the same principles we can characterize the case of non-identiﬁability .

DEFINITION 6 (non-identiﬁability of parameter under diﬀerential privacy) . Let E consist ofconverging sequences of ( ε N , δ N ) . We will say that the parameter θ is non-identiﬁed in the regular ( ε N , δ N ) -diﬀerentially private framework, where the sequences of ( ε N , δ N ) belong to E , if and only ifthere exists β ∈ (0 , and a convex polytope K β (cid:51) θ such that C T E ( K β ) ≤ − β. Non-identiﬁability implies that the limiting random set is not degenerate. Therefore, it becomesimpossible to pinpoint the true parameter θ by tracing a “mass point” of the coverage function ofthat limiting random set T E . This makes the analysis of partial identiﬁcation in our case diﬀerentfrom the traditional approach where partial identiﬁcation aims to construct a deterministic set thatcontains the parameter of interest θ . In our case where the coverage function is non-degenerate inthe limit it is impossible to construct such a deterministic set. At the same time, coverage functionitself may be diﬃcult to work with in practice. To address this we deﬁne the (pseudo)-identiﬁed setas a set of probability distributions:

DEFINITION 7.

Pseudo-identiﬁed set for parameter of interest θ produced by regular diﬀeren-tially private estimators is a class of distribution functions F θ , E such that for each F ∈ F θ , E thereexists measurable selection ξ ∈ T E such that F is the distribution function of ξ. One of the mechanisms most commonly used to induce diﬀerential privacy in theoretical literature(e.g. [15]) is the Laplace mechanism in which the original estimator is augmented by the addition ofindependent double exponential (or, Laplace) noise calibrated in a speciﬁc way. In this section, weconsider an example in which we illustrate the construction of a more general family of diﬀerentiallyprivate estimators by combining the Laplace mechanism with a random subsampling procedure.The resulting combination produces a random set of regular diﬀerentially private estimators. Forsimplicity and for the sake of highlighting important issues related to identiﬁability, we will evenassume that the researcher is informed of the Bernoulli-Laplace mechanism being used to deliver adiﬀerentially private output, even though in practice a data curator may not release that information(hence, our generic notation for a “seed” ν N ).The reason why we want to highlight this speciﬁc mechanism is because, as we discuss further in thispaper, many relevant estimators in Economics can be viewed as being. constructed from weightedmeans. In fact, in the RDD models estimation in Section 3 and in the ATE estimation in Section 4 itwill be clear that the lack of identiﬁability under diﬀerential privacy and poor statistical performanceof regular diﬀerentially private estimators (even with relatively weak privacy requirements) willstem, in particular, from the weights of observations not being ﬁxed but decided by the data. Theseweights may vary in a certain ﬁxed interval even when only one observation in the dataset becomesdiﬀerent. The example in this section considers an extreme version of such situations when theweights of observations in the weighted mean are 0/1 with some probabilities and these weights areindependent of other available data. It provides a nice support for our subsequent discussion ofRDD and ATE diﬀerentially private estimators.We will take X to have support on [0 , θ = E [ X ].We consider the following mechanism for obtaining a regular diﬀerentially private estimator (thechoice of ( ε N , δ N ) parameters is discussed later) on the basis of the i.i.d. sample { x i } Ni =1 .1. First, we create a subsample from { x i } Ni =1 that would independently randomly include eachobservation with probability π N and that would exclude it with probability 1 − π N

2. We compute the weighted average of the included observations and output that weightedaverage with added random variable u N , where u N ∼ Lap (0 , λ N ): θ ( P N , ν N ) = 1 nπ N n (cid:88) i =1 d i x i + u N , where d i is a Bernoulli random variable with parameter π N . Variables d i , i = 1 , . . . , N , aremutually independent. To ensure that θ ( P N , ν N ) ∈ Θ , we consider the estimator as projectionof θ ( P N , ν N ) on Θ . ε N and for δ N = 0 if for each pair of samples diﬀerent from each other by a singleobservation (suppose this is N -th observation), the likelihood ratio L N ≡ c + ac + b = 1 + a − bc + b , where a = π N (cid:88) S υ ⊆S\{ x N } π |S υ | N (1 − π N ) n − −|S υ | · e − | t − (cid:101) θ S υ ∪{ xN }| λN ,b = π N (cid:88) S υ ⊆S\{ x N } π |S υ | N (1 − π N ) n − −|S υ | · e − | t − (cid:101) θ S υ ∪{ x (cid:48) N }| λN ,c = (1 − π N ) (cid:88) c ST ⊆S\{ x N } π |S υ | N (1 − π N ) n − −|S υ | · e − | t − (cid:101) θ S υ | λN does not exceed e ε . Note that the maximum absolute change in our estimator with the changein one observation (the so-called global sensitivity ) is 1 / ( N π N ). Using the partition of unity (cid:80) S υ ⊆S\{ x N } π |S υ | N (1 − π N ) N − −|S υ | = 1, we can write the upper bound on L N as follows: L N ≤ π N ( e N λN πN − . Therefore, ( ε N , − π N + π N exp(1 / ( N λ N π N )) ≤ exp( ε N ) , for which a suﬃcient condition is ε N ≥ π N exp(1 / ( N λ N π N )) . (2.3)As a result, for a converging sequence ε N → ε there will be a family of sequences π N and λ N thatensure diﬀerential privacy.Let ﬁrst discuss the extreme cases. This estimator can be made (0 , π N ≡ λ N ≡ + ∞ , which means that the Laplace distribution for the mechanism noisehas an inﬁnite variance. From Deﬁnition 2 we can clearly see that this estimator is not a regulardiﬀerentially private estimator. Indeed, the requirement of ε N = 0 is very strong. At the otherextreme, we can consider (+ ∞ , π N ≡ λ N ≡

0. That is, the mechanism noise is zero and this estimator is consistent with (2.2) in Deﬁnition2 with ¯ R ( N, κ ) = 2 d exp( − N κ / diam(Θ) ) obtained from the Hoeﬀding bound.We can focus on the cases when 0 < ε N < + ∞ . Note that we can always ﬁnd a family of sequences λ N and π N such that ε N ≥ π N exp(1 / ( N λ N π N )) . In this family to ensure weak convergence of8diﬀerentially private estimator (one of the requirements of Deﬁnition 2) , the sequence π N canconverge to any limit in [0 , ε ] . At the same time, the non-negative-valued sequence λ N can havelimits on [0 , + ∞ ) with + ∞ indicating a divergent sequence. Without a loss of generality, we assumethat sequences λ N and π N are monotonic. We then consider the behavior of the resulting regulardiﬀerentially private estimator in the following series of various regimes. Regime 1: λ N → ∈ (0 , + ∞ ) as N → ∞ . Regime 1A.

In this regime, whenever π N (cid:29) /N, the variance of the randomized estimator θ ( P N , ν N )is O (1 / ( N π N ) + λ N ) = o (1) , meaning that the estimator converges weakly (and, of course, inprobability) to E [ X ]. Thus, the distribution of the random element τ in the limit is degenerate.Note that in order to guarantee ( ε N , λ N → E that consists of converging sequences ( ε N , ε N ≤ ¯ ε , thatsatisfy (2.3) (note that ε N may not converge to zero too quickly), the random set T E is degenerateat θ = E [ X ], which means that in this regime the parameter of interest is identiﬁed. Thus, in thiscase if one is willing to impose weaker privacy restrictions as N → ∞ in the sense that ε N goesto zero slowly enough, then the parameter is identiﬁed. However, even in this optimistic scenarioone has to remember that point identiﬁcation is obtained given the knowledge of the asymptoticbehavior of λ N and π N , which a data curator may not release in such a detail (imagine, e.g. thedata curator releasing the rate for π N but at the same time releasing only a lower bound on λ N )! Regime 1B.

When lim N →∞ N π N = c ∈ [0 , + ∞ ) , the downsampled mean converges weakly to therandom variable Λ( c ) = k (cid:80) kj =1 X j , where X j are independent random variables distributed as X and k is Poisson random variable with parameter c. Given that λ N → , a weak limit τ ofthe randomized estimator θ ( P N , ν N ) projected on Θ is distributed as Λ( c ) projected on Θ. Thedistribution of Λ( c ) projected on Θ is the pseudo-identiﬁed set in this regime.Note that in order to guarantee ( ε N , λ N → ε N ,

0) comprising E thatconverges to zero would have to do so more slowly than in Regime 1A leading to further weakeningof diﬀerential privacy guarantees. Regime 1C.

When lim N →∞ N π N = 0 , the downsampled sample average converges weakly to a masspoint at 0 . Diﬀerential privacy will be guaranteed for all sequences ε N (cid:29) /N for λ N converging to0 suﬃciently slow to satisfy (2.3). Regime 2: λ N → λ ∈ (0 , + ∞ ) as N → ∞ . Since in this regime the variance of the mechanism noise does not diminish, one can impose strongerdiﬀerential privacy guarantees in the sense of taking sequences ε N converging to zero faster than inrespective cases in Regime 1.9 Regime 2A.

For any sequence π N (cid:29) /N, the downsampled sample average nπ N (cid:80) Ni =1 d i x i convergesin probability to E [ X ] . Its variance is O (1 / ( N π N )) = o (1) , which means that the Laplace noise willdominate the asymptotic behavior of regular diﬀerentially private estimators. The randomized diﬀer-entially private estimator θ ( P N , ν N ) projected on Θ will converge weakly to Lap ( E [ X ] , λ ) projectedon Θ, with this distribution being our pseudo-identiﬁed set. Regime 2B.

When lim N →∞ N π N = c > , the resulting downsampled mean weakly converges to therandom variable Λ( c ) = k (cid:80) kj =1 X j , where X j are independent random variables distributed as X and k is Poisson random variable with parameter c. As a result, the randomized estimator θ ( P N , ν N )projected on Θ will converge weakly to the sum of distributions Λ( c ) + Lap (0 , λ ) projected on Θ. Regime 2C.

When lim N →∞ N π N = 0 , the downsampled sample average converges weakly to a masspoint at 0 . As a result, the variance of the additive noise dominates the distribution of the randomizedestimator θ ( P N , ν N ) which when projected on Θ will converge weakly to Lap (0 , λ ) projected on Θ. Regime 3: λ N → + ∞ as N → ∞ . This is the case when privacy guarantees can be strongest.In this regime, however, the Laplace noise increasingly dominates the element of the estimatorcorresponding to the sample average. The randomized diﬀerentially private estimator θ ( P N , ν N )diverges as N → ∞ , meaning that according to our convention, we need to consider its projectionon the parameter space Θ . The distribution of the projected estimator will concentrate on theboundary of the parameter space. The resulting weak limit τ is discrete random variable with thesupport { Argmin Θ , Argmax Θ } and taking equal probabilities.Suppose for simplicity that E consists of one converging sequence of ( ε N , T E may contain measurable selections withdistributions corresponding to those regimes. When a data curator releases the information aboutthe regime, this narrows down the class of regular diﬀerentially private estimators T N, E and, thus,results in a “smaller” limiting random set T E . One has to keep in mind, however, that a data curatormay release information that will give only partial knowledge about regimes compatible with givendiﬀerential privacy restriction (as discussed above), which once again will lead to an “increase” in T E .In an extreme case when E consists of all converging sequence of ( ε N , ε N ≤ ¯ ε , we have the largest T N, E possible that contains measurable selection with distributions collected across all the regimes.In Section 2.2.1, we argued that the selection expectation is not a natural object to study underdiﬀerential privacy. With a limited privacy budget of a given dataset, it is impossible to constructa function of the data that can reliably converge to an expectation of the diﬀerentially privateestimator (taken both with respect to the distribution of the data and the random element inducing0diﬀerential privacy). For the sake of having a more comprehensive discussion, in the Appendixwe brieﬂy discuss the properties of the selection expectation of the limiting random set T E in thecontext of the example in this section. The very nature of that discussion ultimately ignores thevery important issue of the privacy budget. While our previous discussion considers a general, possibly non-separable form for regular diﬀeren-tially private estimator all existing approaches to inducing diﬀerential privacy lead to much simpler(approximately) separable estimators. We now further narrow down the class of regular diﬀerentiallyprivate estimators to reﬂect this property.

DEFINITION 8.

We say that regular diﬀerentially private estimator θ ( P N , ν N ) is smooth, if thereexists a functional ψ ( · ) and function a N with range on V such that θ ( P N , ν N ) = ψ ( P N ) + a N ( ν N ) + ∆ N , with E (cid:20)(cid:16)(cid:113) log R ( N,κ ) ∆ N (cid:17) (cid:21) → as N → ∞ for all κ > , where ¯ R ( N, κ ) is provided in Deﬁnition2. Deﬁnition 8 focuses on the regular diﬀerentially private estimators that are approximately separablein the way they depend on the data sample and on the added noise to achieve diﬀerential privacy.The residual ∆ N is required to be “small” relative to the rate of convergence of the version of theestimator θ ( P , ν N ) that is infused with the “trivial” noise that does not perturb it.We note that for two early mechanisms that were proposed to provide diﬀerential privacy: Laplaceand Gaussian mechanisms, Deﬁnition 8 applies trivially. In Laplace mechanisms diﬀerential privacyis achieved by the addition of the double-exponential noise ν N to the original estimator and in theGaussian mechanism ν N is the additive normal noise. In other words, by construction for both ofthese mechanisms ∆ N ≡ . One interesting example of a popular non-separable mechanism for diﬀerential privacy is the expo-nential mechanism developed in [37]. In application to extremum estimators, the mechanism replacesextremum estimator (cid:98) θ that maximizes sample objective function Q ( θ ; P N ) over θ ∈ Θ with a drawfrom quasi-posterior distribution implied by Q ( · ; P N ) . This class of estimators is directly related torandomized estimators developed in [11]. [11] consider the case where the population analog of theobjective function Q ( θ ; P N ) satisﬁes the information matrix equality. To form the estimator theypropose to consider a prior distribution π ( · ) over θ and a quasi-likelihood function exp ( Q ( θ ; P N )) (sothat the original objective function is quasi-log-likelihood function). Then the estimator is a meanof quasi-posterior distribution ∝ exp ( Q ( θ ; P N )) π ( θ ) which consistently estimates the maximizerof the population objective function regardless of the shape of the prior distribution π ( · ) (undermild regularity conditions). Moreover, the variance of this quasi-posterior distribution accuratelyestimates the asymptotic variance of the original extremum estimator. A signiﬁcant advantage of1this estimator over the original extremum estimator (cid:98) θ is that it does not require maximization ofpotentially non-smooth or hard to optimize function Q ( θ ; P N ) . In the follow-up work in [29] for the cases where Q ( θ ; P N ) may be steep in the vicinity of themaximum, which may lead to slow convergence of the simulations required to sample from the quasi-posterior, it is proposed to scale the exponent in the pseudo-likelihood function as exp ( λ Q ( θ ; P N ))using a constant λ which is selected based on the speed of mixing of the simulated Markov Chain(produced using the new pseudo-posterior). The corresponding posterior mean remains a consistentestimator for the maximizer of the population objective function while its asymptotic variance canbe estimated by scaling the variance of the quasi-posterior using λ. The exponential mechanism for diﬀerential privacy considered in [37] is a simple implementation ofthe idea in [11]: the estimator is a single draw from the quasi-posterior ∝ exp( λ Q ( θ ; P N )) π ( θ ) . The resulting estimator turns out to be ( λ, ∆ Q , Q = sup θ ∈ Θ , P N , P (cid:48) N | Q ( θ ; P (cid:48) N ) − Q ( θ ; P N ) | is the global sensitivity of the objective function Q ( θ ; P N ) evaluated over all empirical distributions P (cid:48) N that are diﬀerent from P N in any one single support point.[11] and later [29] focus on the cases where Q ( θ ; P N ) is stochastically equicontinuous and the quasi-posterior is asymptotically equivalent to ∝ exp (cid:18) − λ ( θ − (cid:98) θ ) (cid:48) H ( θ − (cid:98) θ ) + o p ( (cid:107) θ − (cid:98) θ (cid:107) ) (cid:19) , where H is the Hessian of the population objective function. This means that a single draw fromthis quasi-posterior, corresponding to the exponential mechanism for diﬀerential privacy can berepresented as ˜ θ = (cid:98) θ + λ ξ + o p (1) , where ξ is a multivariate normal random vector with mean zero and covariance matrix H − . Theextremum estimator (cid:98) θ is only depends on the data distribution P N and is not aﬀected by the noise.Therefore, the exponential mechanism is smooth in the sense of Deﬁnition 8.We now present a simple lemma that outlines the additive representation of smooth diﬀerentiallyprivate estimators which we will use in our applications. LEMMA 3.

Consider additive randomized estimator θ ∗ ( P N , ν N ) = ψ ( P N ) + a N ( ν N ) , where • For each κ > , lim N →∞ P ( | ψ ( P N ) − θ | > κ ) = 0 • a N (0) = 0 for all N. Then estimator θ ∗ ( P N , ν N ) is regular in the sense of Deﬁnition 2. Moreover for any convergingsequence ( ε N , δ N ) and any smooth ( ε N , δ N ) - diﬀerentially private estimator θ ( P N , ν N ) (i.e. as inDeﬁnition 8) there exists an asymptotically equivalent regular estimator θ ∗ ( P N , ν N ) , T E . We now move on to analyzing performance of smooth diﬀerentially private estimators in someimportant econometric models.

Regression discontinuity design is an important empirical tool for estimation of treatment eﬀects ina variety of disciplines. The literature on RDD goes back to the work by [48]. A lot of importanttheoretical and empirical work on RDD has emerged in Economics the last two decades, with toomany papers to list here. For a general review of this literature, see [19], [22], [31], [10].In the usual setting for the RD design, the object of interest is the causal eﬀect of binary treatmenton the outcome and units are either exposed or not exposed to a treatment. The eﬀect of thetreatment can be heterogenous across units but we will focus on the case of when this eﬀect ishomogeneous as our main goal is to highlight the loss of identiﬁcation of of the treatment eﬀect inthese models when estimator is subject to diﬀerential privacy guarantees.Following the tradition of the treatment eﬀect literature, we let Y and Y denote the pair of potentialoutcomes (without the treatment with exposure to the treatment, respectively) and the actualtreatment denoted by Y and deﬁned as Y = W · Y + (1 − W ) · Y , where W is the treatment indicator. The goal is to evaluate the average treatment eﬀect (ATE) ofthe treatment. The observables are ( W, Y, X ), where X is a pre-treatment covariate (the so-called forcing or running variable).We ﬁrst give a review of two main designs used in this literature. We then formulate conditionsunder which diﬀerentially private mechanism applied to traditional RDD methods leads to the lackof identiﬁcation of the treatment eﬀect. We then show that these situations of non-identiﬁabilitywill be generic due to the global sensitivity of RDD estimators being bounded away from zero evenas the sample size increases. These ﬁndings no doubt will be of interest to researchers as usuallyRD design is considered to be one of the most credible identiﬁcation strategies for causal inferenceand it loses this powerful feature under diﬀerential privacy.3 In the sharp design there is a deterministic relation between the running variable and the treatmentindicator: W = 1( X ≥ c ) , and the average causal eﬀect β is given by the discontinuity in the conditional expectation of theoutcome given the covariate eﬀect of the treatment: β = lim X ↓ c E [ W | X ] − lim X ↑ c E [ W | X ] . Even though sometimes researchers rely on parametric methods by estimating, for instance, thelinear model Y = α + γW + β · W + δ · DW + ε (or analogous models with more polynomial terms), these approaches may work poorly in practicedue to their reliance on a functional form.The state-of-the art methods are local and rely on learning β from a small neighborhood of c , the sizeof which becomes increasingly smaller as the sample size increases. In other words, these approachesrely on employing only observations in a small neighborhood ( c − h, c + h ) of c , where the size of theneighborhood is described by bandwidth h = h ( N ) that depends on sample size N , where h ( N ) → N → ∞ . Such estimators can be roughly classiﬁed into two categories: nonparametric regressionat the boundary and local linear regression . Nonparametric regression at the boundary

This method selects a kernel K ( · ) and takes theestimator ˆ τ S,NR = ˆ τ r ( c ) − ˆ τ l ( c ) , (3.1)where ˆ τ r ( c ) = (cid:80) X i ≥ c Y i · K (cid:0) X i − ch (cid:1)(cid:80) X i ≥ c K (cid:0) X i − ch (cid:1) , ˆ τ l ( c ) = (cid:80) X i

This method conducts two optimization problems by ﬁtting linear re-gression functions to the observations within an h -neighborhood on either side of the discontinuity4point: ( (cid:98) α L , (cid:98) β L ) = arg min α L ,β L (cid:88) i : X i

0) ( X i − c ), 1 ( X i − c ≥

0) ( X i − c ) andendogenous W i , while using the indicator 1( X i ≥ c ) as the excluded instrument. This estimatorcan, of course, be easily generalized to include more polynomial terms in each estimation. In thedeﬁnition of (cid:98) α y,L , (cid:98) α y,R and (cid:98) α w,L , (cid:98) α w,R we, for simplicity, used the uniform kernel. However, onecould use other kernels. In this section, we establish general results that connect the asymptotic behavior of the globalsensitivity with the inconsistency of regular diﬀerentially private estimators, allowing us, in lightof our notion of identiﬁcation in Section 2, to make conclusions about the non-identiﬁability of theparameter of interest under diﬀerentially private mechanisms.We consider smooth estimators as given in Deﬁnition 8. The smoothness property allows us toessentially consider estimators with an additive mechanism noise ξ N : (cid:98) θ = ψ ( P N ) + ξ N (3.4)(as discussed in the introduction, ν N in the Deﬁnitions 2 and 8, among others, play the role of the“seed” and, thus, the actual independent from the data additive noise ξ N is a transformation of ν N ).For this estimator to satisfy regularity requirements in Deﬁnition 2, it has to, among other things,be consistent in the absence of any mechanism noise, thus giving the identiﬁcation of the parameterin the limit of statistical experiments. This immediately leads us to the condition that ψ ( P N ) p → θ , (3.5)where θ denotes the true parameter value.Suppose the family from which the distribution of ξ N is drawn is described by the density f N ; σ ,where σ denotes the variance (the actual value of σ in practice depend on the sample size). Thenotation f N ; σ is not meant to say that the distributional family for the additive noise is fullydescribed by the variance parameter. This parameter is introduced explicitly in the notation as6usually the situations when diﬀerentially private mechanisms do not prevent the identiﬁcation of θ in the limit are characterized the behavior of this parameter as the sample size increases. Eventhough for now we consider just one distributional family, one has to keep in mind that potentiallyseveral diﬀerent distributions could be used, in which one has to consider all of them. To keep thisexposition simple, we will focus on one family f σ , as this already will give us a rather comprehensiveanalysis.For any ∆ > σ and a < b , deﬁne D N (∆ , σ , a, b ) ≡ sup z ∈ [ a,b ] | log f N ; σ ( z ) − log f N ; σ ( z + ∆) | , (3.6)which is the discrepancy between the logarithms of two densities with a ﬁxed variance, from whichone is calculated with a shifts ∆, on some interval [ a, b ]. As can be seen from the Deﬁnition 1 ofdiﬀerential privacy, the behaviour of this object across diﬀerent ∆ and σ is directly related to theparameters of the diﬀerential privacy. The properties of a density imply that if [ a, b ] is large enough,then for a ﬁxed sample size N we have D N (∆ , σ , a, b ) → + ∞ as σ → , ∆ (cid:54) = 0 . (3.7)In the majority of applications the family of distributions f N ; σ is not indexed by N and the changein the distribution of the estimators with the sample size may be driven by diﬀerent variances. Infact, this is the case for all commonly used mechanisms – Laplace, Gaussian, exponential mechanismsand their variations. To make our discussion more general, we can allow for diﬀerent distributionalfamilies across diﬀerent N , in which case we will require the following Algorithm condition 1 (AC1).

For any ∆ , σ and a < b sup N D N (∆ , σ , a, b ) ≤ D (∆ , σ , a, b ) (3.8) D (∆ , σ , a, b ) → + ∞ as σ → , ∆ (cid:54) = 0 . (3.9)As is clear from the discussion later, in the diﬀerential privacy implementation ∆ will be associatedwith with the global sensitivity of ψ ( P N ). Condition AC1 tells us that if the global sensitivityremains bounded away from 0 as the sample size increases, then the diminishing variance of thenoise and ( (cid:15) N , δ N ), (cid:15) N ≤ ¯ (cid:15) , δ N ≤ ¯ δ diﬀerential privacy guarantees are incompatible with each other.This will allow us to immediately draw conclusions about the situation of non-identiﬁability in thelimit of statistical experiments of the parameter of interest.We give another algorithmic condition on the family of the distribution of the noise variable, whichholds generally for diﬀerentially private mechanisms and which also help to establish further resultsrelated to the identiﬁability (or lack of such) of the parameter. For very popular mean-zero Laplace and Gaussian mechanisms, the distribution of ξ N is fully characterized bythe variance parameter. Algorithm condition 2 (AC2).

For any ∆ N → as N → ∞ , it is possible to indicate σ N → and [ a N , b N ] such that H ([ a N , b N ] , R ) → and D (∆ N , σ N , a N , b N ) → , (3.10) where D (∆ , σ , a, b ) is as deﬁned in (3.10). To give an example, consider the Laplace mechanism (discussed in Example 1) and note thatlog f λ ( z + ∆) = − | z +∆ | λ , and thus, D (∆ , σ , a, b ) ≤ | ∆ | λ , which in particular implies that we can even take a = −∞ and b = + ∞ in AC2. We can see thatthis case trivially satisﬁes AC2 and, of course, condition AC1 if one chooses the Laplace mechanismfor any N .It is exactly condition AC2 that would give a hope for the identiﬁability of the parameter of interestin cases when global sensitivity of ψ ( P N ) goes to 0. Indeed, the convergence ∆ N → σ N → a N , b N ] convergingto the whole real line at the right rates (which clearly depend on the rate of a decreasing globalsensitivity) in such a way that D ( µ ,N , µ ,N , σ N , a N , b N ) remains bounded by a small ε N → − δ N . This will ensure that the diﬀerentially privacy criteria will besatisﬁed for some sequences (cid:15) N , δ N ) converging to (0 ,

0) while delivering consistent diﬀerentiallyprivate estimator. This is given in Proposition 1 below.

Proposition 1.

Consider a smooth ( (cid:15) N , δ N ) -diﬀerentially private estimator – without a loss ofgenerality represented as (3.4), – and suppose that in the absence of the mechanism noise thisestimator (denotes as ψ ( P N ) ) is consistent, i.e. (3.5) holds. Suppose that the global sensitivity of ψ ( P N ) denoted as G ( N ) converges to 0 with the sample size and the mean on ξ N converges to 0. IfAC2 holds, then the diﬀerentially private estimator is consistent if (cid:15) N and δ N both converging to 0have slow enough rates. Note that under the conditions of Proposition 1 implies that an ( (cid:15), δ )-diﬀerentially private estimatoris consistent for ﬁxed ( (cid:15), δ ) as well since these requirements are weaker than requiring that ( (cid:15) N , δ N )converges to 0. The result of Proposition 1 gives a hope that the parameter of interest can beidentiﬁed in the limit of experiments with a suitable choice of the set E of sequences ( (cid:15) N , δ N ).Our next step is to establish that a generic ( ε N , δ N )-diﬀerentially private algorithm will give aninconsistent estimator if the global sensitivity remains bounded away from zero as the sample sizeincreases even if the mean of the mechanism noise ξ N converges to 0.8 THEOREM 4.

Consider a smooth ( (cid:15) N , δ N ) -diﬀerentially private estimator (cid:98) θ – without a lossof generality represented as (3.4), – and suppose that in the absence of the mechanism noise thisestimator (denoted as ψ ( P N ) ) is consistent, i.e. (3.5) holds. Suppose that AC1 holds and the globalsensitivity of ψ ( P N ) denoted as G ( N ) does not converge to 0 with the sample size whereas the meanon ξ N converges to 0.Then this ( (cid:15) N , δ N ) -diﬀerentially private estimator is inconsistent even if (cid:15) N does not change with N . We end this section by giving suﬃcient condition for when the parameter of interest is not identiﬁedfrom the diﬀerential privacy estimation in the limit of statistical experiments.

COROLLARY 1.

Consider a class of smooth ( (cid:15) N , δ N ) -diﬀerentially private estimators (cid:98) θ – withouta loss of generality represented as (3.4), – and suppose that in the absence of the mechanism noisethese estimators (correspond to ψ ( P N ) ) are consistent, i.e. (3.5) holds. Suppose that AC1 holds andthe global sensitivity of ψ ( P N ) denoted as G ( N ) does not converge to 0 with the sample size whereasthe mean on ξ N converges to 0.For any join-semilattice E of sequences of ( (cid:15) N , δ N ) , with (cid:15) N ≤ ¯ (cid:15) , δ N ≤ ¯ δ , parameter θ is notidentiﬁed in the limit of experiments. This corollary directly follows from Theorem 4.

In this section, we use results of Section 3.2 to analyze the identiﬁability of the average treatmenteﬀect of diﬀerentially private regression discontinuity design estimators. Even though this sectionwill be focused on the issue of identiﬁability, there are other important issues one would want toexplore in the RDD framework. One of these issues is the question of how diﬀerential privacyrequirements would aﬀect the visual analysis of the data, which is on e of the fundamental stepsin the practice of RDD. Another issue is the the question of the credibility of speciﬁcation tests(continuity of the density of the running variable at the cut-oﬀ, placebo tests with pre-treatmentcovariates), under diﬀerential privacy. Even though our main focus is on identiﬁability of the averagetreatment eﬀect under diﬀerential privacy, we do discuss these other related issues in Section 3.4albeit in less detail.Even though sometimes there are some parametric RDD estimation methods which are global innature, the state-of-the-art RDD techniques are local in nature and employ some elements of non-parametric methods. These latter methods focus on a neighborhood around the switch point with thesize of this neighborhood being determined by a kernel K ( · ) and a respective bandwidth h = h ( N ).We will suppose that h ( N ) is chosen by a certain diﬀerential privacy algorithm according to somerule in such a way that h ( N ) = o (1) as N → ∞ . Then the expected number of observations froma sample of size N in a right-hand side neighborhood of c is N · P r ( c ≤ X < c + h ( N )), and in the9left-hand side neighborhood is N · P r ( c − h ( N ) < X < c ). There are, of course, some well knownapproaches for selecting a bandwidth, such as [21], [9], among others. Our analysis will apply togeneral bandwidth choices subject. We start with the analysis of diﬀerentially private RDD estimators for the sharp design. We beginour series of formal results with establishing the results on the global sensitivity of nonparamet-ric regression at the boundary and local linear (polynomial) estimation. Propositions 2-4 look atnonparametric regression at the boundary and various properties of kernels that aﬀect the globalsensitivity result. Proposition 5 looks at the local linear estimation. In light of the results in Section3.2, the knowledge of the asymptotic behavior of the global sensitivity of these estimators will allowus to analyze wether smooth diﬀerentially private approaches are compatible with the identiﬁabilityof the ATE of interest. As we show, the exact results on the global sensitivity even depend on thetype of kernel used in the above-mentioned estimation techniques.Before formulate to Proposition 2, we formulate what we mean by kernels with a bounded support.

DEFINITION 9.

We say that the kernel function K ( · ) : R → R + has a bounded support if thereis a value u > such that K ( u ) = 0 when | u | > u . If this condition is not satisﬁed, then we willsay that the kernel has an unbounded support. Uniform, Epanechnikov, triangular kernels are examples of kernel functions with bounded supports.Gaussian and logistic kernels are examples of kernel functions with unbounded supports. Even ifone considers kernels with a bounded support, we will see that it will make a diﬀerence whether thekernel is continuous (like the triangular kernel) or has discontinuities (like the uniform kernel).

DEFINITION 10.

For a given kernel function K ( · ) with a bounded support and a given bandwidth h , we deﬁne a K - h -neighborhood to the right of c as a set [ c, c + ∆ K,r ( h )) , where ∆ K,r ( h ) > , suchthat K ( u − ch ) > for u > c if and only if u ∈ [ c, c + ∆ r ( h )) . In other words, this is the set of pointto the right of c that will be used in the nonparametric regression.Analogously, we deﬁne a K - h -neighborhood to the left of c as a set ( c − ∆ K,l ( h ) , c ) , where ∆ K,l ( h ) > ,such that K ( u − ch ) > for u < c if and only if u ∈ ( c − ∆ K,l ( h ) , c ) . A diﬀerentially private algorithm takes the support of the observed variables as given and usuallydepends on this support, and thus, uses the supports of Y in both right-hand side and left-hand side K - h -neighborhoods as inputs. For a kernel with a bounded support, these supports can be denotedas Y r ( h ) and Y l ( h ), respectively, and may generally depend on h . However, they are naturallyapproximated by Y r = lim h ↓ Y r ( h ) and Y l = lim h ↓ Y l ( h ) (3.11)that no longer depend on the bandwidth choice (these limits are well deﬁned as {Y r ( h ) } and {Y l ( h ) } are sequences of monotonically decreasing events.)0We will suppose that Y r and Y l are convex non-singleton sets. As further notations, we will use Y r = sup Y r , Y r = inf Y r ,Y l = sup Y l , Y l = inf Y l . We now present a series of results on the global sensitivity.

Proposition 2.

Consider a nonparametric regression at the boundary estimator that uses a con-tinuous kernel with a bounded support .Suppose that for a data-driven choice of bandwidth h = h ( N ) , for any sample size N it is possible tohave realizations of the data { ( Y i , W i , X i ) } Ni =1 that will deliver the minimum number of observations m r ( N ) ≥ in the K - h -neighborhood to the right of c and the minimum number of observations m l ( N ) ≥ in the K - h -neighborhood to the left of c .(a) If the supports Y r and Y l are bounded, then the global sensitivity of the nonparametric regres-sion at the boundary estimator is Y r − Y r + Y l − Y l and, hence, it does not depend on the sample size.(b) If at least one of the supports Y r and Y l is unbounded, the global sensitivity of of the nonpara-metric regression at the boundary estimator is + ∞ . We next consider the case of a kernel function with a bounded support and discontinuities at thesupport boundaries − u and u with the main example here being the uniform kernel. We deﬁne K ≡ inf u ∈ ( − u ,u ) K ( u ) . (3.12)The discontinuities of K ( · ) at − u and u imply that K >

0. For expositional simplicity, in theformulation of Proposition 3 we only indicate the rate of the global sensitivity. However, the proofof this proposition in the Appendix gives the exact expression for this sensitivity.

Proposition 3.

Consider a nonparametric regression at the boundary that uses a kernel with abounded support and

K > , where K is as deﬁned in 3.12.Suppose that for a data-driven choice of bandwidth h , for any sample size N it is somehow possibleto guarantee the minimum number of observations m r ( N ) ≥ in the K - h -neighborhood to the rightof c and the minimum number of observations m l ( N ) ≥ in the K - h -neighborhood to the left of c .(a) If the supports Y r and Y l are bounded are bounded, then the global sensitivity of the nonpara-metric regression at the boundary estimator is proportional to { m r ( N ) ,m l ( N ) } . (b) If at least one of the supports Y r or Y l is unbounded, the global sensitivity of of the nonpara-metric regression at the boundary estimator is + ∞ . Part (a) in Proposition 3 seemingly gives some hope of achieving the situation when the globalsensitivity may be going to zero as N → ∞ if it can be ensured that min { m r ( N ) , m l ( N ) } → ∞ .This hope, however, is a false one as the situation of being able to guarantee a minimal number(growing to ∞ ) of observations in each neighborhood for any sample { X i } Ni =1 with a given supportof X is a rather hypothetical scenario as the probabilities that the number of observations in theright- and left-hand side neighborhoods is strictly less than m r ( N ) and m l ( N ), respectively: m r ( N ) (cid:88) k =0 (cid:18) Nk (cid:19) F X ( c ) N − k (1 − F X ( c )) k is the probability of fewer than m r ( N ) observations to the right of c, m l ( N ) (cid:88) k =0 (cid:18) Nk (cid:19) F X ( c ) k (1 − F X ( c )) N − k is the probability of fewer than m l ( N ) observations to the left of c. Both these probabilities are clearly strictly positive when c is an interior point of the support of X .The non-stochastic nature of the global sensitivity concept eﬀectively leads to the situation of theglobal sensitivity always being bounded away from 0 as N → ∞ in the case of the kernel with abounded support and K > ±∞ (such as the Gaussian kernel). In this case we willtake that a diﬀerentially private algorithm uses the supports of Y | X ≥ c and Y | X < c since thekernel weights are technically never equal to zero. Let’s denote these supports as Y rall and Y lall ,respectively. Also denote Y rall = sup Y rall , Y rall = inf Y rall ,Y lall = sup Y lall , Y lall = inf Y lall . Since the kernel approaches zero arbitrarily closely, the global sensitivity results for this case willbe similar to those in Proposition 4 where the inﬁmum of the values of the kernel on its supportbounded support is 0.

Proposition 4.

Consider a nonparametric regression at the boundary that uses a kernel functionwith a unbounded support. Potentially diﬀerentially private algorithms may use more a complicated support for Y | X that could depend on X . This is not going to change our qualitative ﬁndings on the global sensitivity being bounded away from 0, eventhough the exact numerical values for global sensitivities may be diﬀerent. (a) If the supports Y rall and Y lall are bounded, then the global sensitivity of a nonparametric regres-sion at the boundary estimator is Y rall − Y rall + Y lall − Y lall – that is, it does not depend on the sample size.(b) If at least one of the supports of Y rall and Y lall is unbounded, then the global sensitivity of anonparametric regression at the boundary estimator is + ∞ . Thus, the results of Propositions 2-4 and the discussion following Proposition 3 lead us to con-clude that the global sensitivity of a nonparametric at boundary estimator is always bounded awayfrom zero. The implications of this for asymptotic properties of diﬀerentially private estimatorsis given in Theorem 4 and Corollary 1 allow us to conclude that the

ATE is not identiﬁed in thelimit of statistical experiments of smooth diﬀerentially private nonparametric regression at boundaryestimators .Our next step is to analyze whether things get better with a local linear (and more generally,polynomial) estimator as deﬁned in (3.2). Proposition 5 below establishes that this is not the caseand the global sensitivity of this estimator is in fact inﬁnite even if the support of the outcomevariable is bounded.

Proposition 5.

Consider the local linear estimator as deﬁned in (3.2). The global sensitivity ofthis estimator is bounded away from zero as N → ∞ . Theorem 4 and Corollary 1 allow us to conclude that the

ATE is not identiﬁed in the limit ofstatistical experiments of smooth diﬀerentially private local linear estimators .Our ﬁndings for the sharp design are summarized in Theorem 5.

THEOREM 5.

In the sharp regression discontinuity design case, any smooth ( ε N , δ N ) -diﬀerentiallyprivate nonparametric regression at the boundary estimator and any smooth ( ε N , δ N ) -diﬀerentiallyprivate local linear estimator is inconsistent for any bounded sequences of positive { ε N } and non-negative { δ N } . If we add other covariates to our estimation or use more terms in the local polynomial estimation, theconclusion of Theorem 5 remains exactly the same, even though in Propositions 2-5 quantitativelythe global sensitivities of the estimators could be diﬀerent. Indeed, from the proofs in the Appendixone could easily see that the global sensitivity for the local polynomial estimator would once againrely on the formula for the OLS estimator for the intercept whereas in the nonparametric regressionat the boundary estimator the lower bounds on global sensitivities could be obtained in the sameway as the proofs in Propositions 2-4.3

In the case of the fuzzy design the results for the global sensitivities of the estimators are analogousto the sharp design case. Naturally, we are also able the conclude that diﬀerentially pirate versionsof traditional estimators in this framework are inconsistent.Indeed, as discussed in Section 3.1.2, the estimator (3.1) may be used in the fuzzy design case aswell, with the asymptotic properties analogous to the sharp design scenario. The global sensitivityof this estimator remains the same and, therefore, its properties are described by Propositions 2-4.As for the local linear estimator, the result is the same, once again, even though the proof is slightlymore elaborate than in the sharp design case. For the sake of completeness, we establish this resultformally in Proposition 6 below.

Proposition 6.

Consider the local linear estimator as deﬁned in (3.3). The global sensitivity ofthis estimator is bounded away from 0 as N → ∞ . The proof of Proposition 6 in the Appendix for simplicity does not employ other covariates. It isworth mentioning, however, that the situation with other covariates (let’s call them S i with thesupport S ) may be even worse as in the case we may have thatinf S i S (cid:12)(cid:12)(cid:12)(cid:12) lim x ↑ c P ( W i = 1 | X i = x ) (cid:54) = lim x ↓ c P ( W i = 1 | X i = x ) (cid:12)(cid:12)(cid:12)(cid:12) = 0 . If this situation occurs, then even in the nonparametric regression at the boundary type of estimatorsthe global sensitivities may be converging to ∞ as N → ∞ , giving even more severe implicationsfor the asymptotic properties of smooth diﬀerentially private estimators.In the deﬁnition of the local linear estimator in Section 3.1.2 we for simplicity used the uniformkernel. However, the result of Proposition 6 remains true of other kernels are used.Then, relying on the results in Propositions 2-5, 6 and Theorem 4, we can immediately obtain theresult of Theorem 6 below. THEOREM 6.

In fuzzy regression discontinuity design, any smooth ( ε N , δ N ) -diﬀerentially privatenonparametric regression at the boundary estimator and any smooth ( ε N , δ N ) -diﬀerentially privatelocal linear estimator is inconsistent for any bounded sequences of positive { ε N } and non-negative { δ N } . To summarize the results of Theorems 5 and 6 in Sections 3.3.1 and 3.3.2, in the sharp and fuzzyregression discontinuity design the requirements of ( ε, δ ) -diﬀerential privacy either with ﬁxed ε, δ orwith these parameters decreasing with the sample size are incompatible with the consistent estimationof the average treatment eﬀect. Therefore, given our notion of identiﬁability in Section 2, they arealso incompatible with the identiﬁability of the average treatment eﬀect in the limit of statisticalexperiments. So far we mostly have focused on the non-identiﬁability of the average treatment eﬀect under smoothdiﬀerentially private mechanisms. However, every regression discontinuity design analysis is tradi-tionally accompanied by speciﬁcation testing. This includes checking for the possibility of otherchanges at the cutoﬀ value c of the forcing variable X i and also checking for the manipulation of X i .The ﬁrst type of checks include testing the null hypothesis of a zero average eﬀect on pseudo outcomesknown not to be aﬀected by the treatment. The outcomes of such placebo tests would also have tobe ( ε N , δ N )-diﬀerentially private. A traditional diﬀerentially private literature approach in this casewould add noise to the true test statistic and then adjust the asymptotic distribution to computecorrect p -values (see e.g. [49]). It is not surprising that, once again, this brings a range of issuesin the context of placebo tests in regression discontinuity designs. Indeed, let (cid:98) τ pl denote the trueregression discontinuity design estimator in the analysis of treatment of pseudo-outcomes, and let τ pl denote the true parameter. The testing of the null H : τ pl = 0 is based on the t -ratio (cid:98) τ pl se ( (cid:98) τ pl ) .Without giving formal results on this, we nevertheless want to point out that utilizing our techniquesin the proofs of Propositions 2-5 and 6, we can establish that the global sensitivity of this ratio eitherincreases to ∞ with the sample size or is already ∞ in a ﬁnite sample. This implies that the noiseadded to this ratio would asymptotically dominate this ratio. Even if the critical values are correctedto account for the added noise, it is clear that the conclusion of the ( ε N , δ N )-diﬀerentially privatetest based on such a procedure are not credible and, in particular, result in a lower power of thetest (in the limit, the power of this test is trivial). To make our point more transparent, let us focuson a stylized version of the test when the asymptotic variance of √ N h ( (cid:98) τ pl − τ pl ) is known. We willdenote is as Avar ( (cid:98) τ pl ). In this stylized version we want to create a diﬀerentially private version ofthe test statistic t N = √ Nh (cid:98) τ pl √ Avar ( (cid:98) τ pl ) .As we have shown in Sections 3.3.1 and 3.3.2, the global sensitivity of (cid:98) τ pl may be constant andbounded away from zero or may even be inﬁnite for every N (in situations when we add othercovariates, it may be increasing to ∞ wth the sample size). Given that h = h ( N ) is chosen in a wayto give N h → ∞ , this will imply immediately that the global sensitivity of t N is either increasingto inﬁnity with the sample size or is inﬁnite. This means that the variance of the independent noiseadded in the diﬀerentially private algorithm will increasingly dominate the asymptotically constantvariance of t N . Instead of using the standard normal distribution critical values, one would take thecritical values from the distribution that suitably combines the standard normal distribution andthe distribution of noise. However, as N increases, the testing essentially becomes inference aboutthe mechanism noise, leading to a decreasing power of the test, which asymptotically diminishes tothe trivial power. There are also approaches to hypotheses testing in the diﬀerential privacy literature that are based on addingnoise to the inputs – see e.g. [23]. They. may, however, be considered less reliable than the approaches based on theoutput perturbation. In contrast, in a simpler case of the estimation of mean, as in Example 1, the global sensitivity of t -ratio wouldbe ﬁnite and bounded away from zero as N increases. Upon the correction of the critical values, in this situation the

5A second type of tests are for the manipulation of the forcing variable. We will illustrate issuesassociated with ( ε, δ )-diﬀerentially private versions of these tests by considering the test of thecontinuity of the density at cutoﬀ by [36]. The test is based on the ratio (cid:98) θ (cid:98) σ θ , where (cid:98) θ = ln (cid:98) f + − ln (cid:98) f − , (cid:98) σ θ = (cid:114) N h (cid:18) (cid:98) f + + 1 (cid:98) f − (cid:19) , where h is the bandwidth and (cid:98) f + = (cid:88) X i >c K (cid:18) X i − ch (cid:19) S + N, − S + N, ( X i − c ) S + N, S + N, − (cid:16) S + N, (cid:17) Y i = (cid:88) X i >c K (cid:0) X i − ch (cid:1) M + N S + N, M + N − S + N, M + N ( X i − c ) S + N, M + N S + N, M + N − (cid:18) S + N, M + N (cid:19) Y i , (3.13) (cid:98) f − = (cid:88) X i c K (cid:0) X i − ch (cid:1) , S + N,k = (cid:80) X i >c K (( X i − c ) /h ) ( X i − c ) and analogous deﬁnition forthe objects with the minus superscript. In the rewritten expression for (cid:98) f + and (cid:98) f − on the right-handside of (3.13) and (3.14), we can see that these deﬁnitions are weighted averages with the weights K (( X i − c ) /h ) M + N and K (( X i − c ) /h ) M − N , respectively. This brings us to the situations analogous to the onesin Propositions 2-4 and, thus, allows us to analyze the behavior of the global sensitivities of (cid:98) θ andthe ratio (cid:98) θ (cid:98) σ θ using similar tools. Even if only one observation in a sample changes, these weightsmay vary from 0 to 1 for continuous kernels with abounded support (like in Proposition 2) and forkernels with an unbounded support (like in Proposition 4), and they may vary within a range that isbounded away from 0 for kernels with a bounded support and K described in Proposition 3. Theseare exactly the features that would allow us to establish that the global sensitivity of (cid:98) θ is boundedaway from zero (may even be inﬁnite in some situations) as N → ∞ and the global sensitivity of (cid:98) θ (cid:98) σ θ goes to ∞ (or may even be inﬁnite for a ﬁnite N ) as N → ∞ . This implies that for smooth estimatorsthe independent mechanism noise combined with the test statistic (cid:98) θ (cid:98) σ θ to make such a ratio ( ε N , δ N )-diﬀerentially private, will dominate the asymptotic behavior of the diﬀerentially private ratio. Evenif the critical values are corrected by taking into account the distribution of the mechanism noise,the conclusions of this test are not credible and the power of the test is very low (in the limit, thepower of this test is trivial). Graphical analyses have become an essential part of regression discontinuity design applications asthey give a powerful way to visualize the identiﬁcation strategy of the RD design. [22], amongothers, summarizes three type of analyses particularly useful. signiﬁcance testing would still have non-trivial power even though the power of the test would be lower. Y k = (cid:80) Ni =1 Y i · b k < X i ≤ b k +1 ) (cid:80) Ni =1 b k < X i ≤ b k +1 )against the average points b k + b k +1 of bins [ b k , b k +1 ].The diﬀerentially private literature has developed a variety of methods for outputting diﬀerentiallyprivate histograms (for a review see e.g. [38]). Starting with ﬁrst papers on diﬀerentially privatehistograms (such as [35]), it has been recognized that the partitioning can leak information aboutthe data and for that reason usually the data are split into bins using a privacy-preserving clusteringalgorithm that is compatible with diﬀerential privacy, such as the k -means algorithm or similar.This means that usually the bins would be chosen by an algorithm rather than a data curator ora researcher. in such a situation in general the cut-oﬀ point will not be the separator of two ofthe bins, and when it is not, then the main purpose of this visual analysis is defeated. In case aresearcher wants to impose restrictions that the cut-oﬀ is a separator of the two bins, then the issuesencountered under these restrictions will be similar to the ones in the nonparametric regression atthe boundary estimation with the uniform kernel (see Proposition 3). In particular, this meansthat generally the global sensitivity of means in the bins next to the threshold does not converge tozero with the sample size, implying that in any ( ε N , δ N )-diﬀerentially private histogram-like outputfor conditional means of the outcome the impact of the mechanism noise will be persistent for anysample size, leading to the lack of credibility of such an analysis.The second type of analyses is similar to the ﬁrst one but plots average bin values of other covariatesagainst the average points of bin values. These types of graphs are useful in detecting potential spec-iﬁcation problems. The issues with delivering diﬀerentially private graphs are completely analogousto those described in the ﬁrst type of analyses above.Finally, the third type of analyses plots the histogram of the distribution of the forcing variable asthis helps to inspect whether there is a discontinuity in the distribution of the forcing variable at thecut-oﬀ and, thus, to analyze whether there is any manipulation of the forcing variable. As arguedin [31], it is preferable to use histograms rather than smoothed density estimates as histograms canprovide a sense in which any jump at the threshold is unusual. Once again, the issues with makingthese type of histograms diﬀerentially private are the same as before: a) usually the bins wouldbe chosen by a mechanisms adaptively and therefore generally one the bins will contain the cut-oﬀpoint as an interior point thus defeating the purpose of this visualization; b) if a researcher asks for7separate histograms to the right and to the left of the cut-oﬀ, this will result in either having globalsensitivity that is large and does not diminish to 0 with the sample size (similar to the issues wehad with the nonparametric regression at the boundary estimator) or in the bins that are too wide.In either case this leads to a signiﬁcant amount of information about the distribution of the forcingvariable. In this section, we want to illustrate our ﬁndings of the generally poor performance of the smoothdiﬀerentially private RDD estimators. We consider the sharp design and illustrate paths of thediﬀerentially private local linear estimator with a triangular kernel for increasing sample sizes withdiﬀerent degrees of the privacy protection. These paths are constructed for increasing samples fromthe size of 300 till the size of 4000. For visual simplicity, we give paths for 20 independent realizationsof datasets.

Scenario 1 .The forcing variable X has a uniform distribution on [ − , X i < X i > m ( x ) = (cid:40) .

35 + 1 . x + 7 . x + 20 . x + 21 . x + 7 . x , if x < , .

65 + 0 . x − x + 7 . x − . x + 3 . x , if x ≥ , and the error u having a symmetric uniform distribution on [ − . · √ , . · √ .

002 for any sample size (for a conservative lower bound of4 · . · √ , this would correspond to (cid:15) N being 10times of this δ N = 0). Panel 2 in Figure 1 depicts the paths of the estimator when the mechanismnoise variance equal to 0 .

002 for any sample size (for a conservative lower bound of 4 · . · √ (cid:15) N being equal to 10 times ofthis and δ N = 0). Panel 3 in Figure 1 shows the paths of the estimator when the mechanism noisevariance equal to 2 for any sample size (for a conservative lower bound of 4 · . · √ (cid:15) N being equal to this and δ N = 0).Finally, Panel 4 in Figure 1 illustrates the paths of the estimator when the mechanism noise varianceequal to 200 for any sample size (for a conservative lower bound of 4 · . · √ See our discussion in the proof of Proposition 5 in the Appendix. N = 500 1 1 0.6846 0.3706 0.0666 0.0286 0.056 0.023 N = 2000 1 1 0.7252 0.3880 0.0664 0.0290 0.0666 0.0272 N = 5000 1 1 0.7260 0.3844 0.0694 0.0284 0.0594 0.0250Table 1: Rejection rates in 5000 simulations of the false null hypothesis H : τ = 0 in Scenario 1. N denotesthe number of observations. sensitivity of the estimator, this would correspond to (cid:15) N being equal to the one-tenth of this and δ N = 0). Note the diﬀerent range of the values on the vertical axis in these panels.In Table 1 we focus on the rejection of the null H : τ = 0 against H : τ (cid:54) = 0 when a researcheruses diﬀerentially private estimates and their standard errors (note that this is diﬀerent from ourdiscussion of the diﬀerentially private release of t -tests in Section 3.4). Scenario 2 . The only diﬀerence here from Scenario 1 is that u is normally distributed with mean zeroand variance 0 . (cid:15) N , (cid:15) N ≤ ¯ (cid:15) , in the Laplace diﬀerentially private mechanism the noise has to be drawn from thedistribution with an inﬁnite variance. Figure 2 shows that paths of diﬀerentially private local linearestimators for when the variance is equal to 10 .Here we could have conducted similar power analysis based on a large number of simulations, likein Scenario 1, and we would have obtained that power of the test H : τ = 0 vs H : τ (cid:54) = 0 based ondiﬀerentially private estimates is very low. A central problem in evaluation studies is that potential outcomes that program participants wouldhave received in the absence of the program is not observed. Letting D i denote a binary variabletaking the value 1 if treatment was given to agent i , and 0 otherwise, and letting Y i , Y i denotepotential outcome variables, we refer to Y i − Y i as the treatment eﬀect for the i ’th individual. Aparameter of interest for identiﬁcation and estimation is the average treatment eﬀect , deﬁned as: θ = E [ Y i − Y i ] (4.1)As in the previous section our notation will be to denote realizations of random variables by lower9case letters and the random variables themselves by capital letters. One identiﬁcation strategy for θ was proposed in [44], under the following assumption: ASSUMPTION 2 (ATE under Conditional Independence) . Let the following hold: (i)

There exists an observed variable X i s.t. D i ⊥ ( Y i , Y i ) | X i (ii) < P ( D i = 1 | X i ) < ∀ X i See also [18], [20], [4]. The above assumption can be used to identify α as θ = E [ E [ Y | D = 1 , X ] − E [ Y | D = 0 , X ]] . (4.2)The above parameter can be written as: θ = E (cid:20) Y ( D − p ( X )) p ( X )(1 − p ( X )) (cid:21) , (4.3)where p ( X ) = P ( D = 1 | X ) is the propensity score. This parameter is a weighted moment conditionwhere the denominator gets small if the propensity score approaches 0 or 1. Also, identiﬁcation islost when we remove any region in the support of X (so, ﬁxed trimming will not identify θ above).Consider the general setting of the treatment eﬀect model under unconfoundeness with two potentialcontinuous outcomes Y and Y and treatment D along with the vector of (continuous and discrete)covariates X . We assume that ( Y , Y ) ⊥ D | X . The observed outcome is Y = Y D + Y (1 − D ) . In our setup the propensity score needs to be estimated as a function of X. In the further discussionwithout loss of generality we assume that X is single-dimensional. Our theory will be based on thefollowing structure: ASSUMPTION 3. (i) X has a support X is a closed and continuous (but possible unbounded)set.(ii) ( Y , Y ) | X = x has an absolutely continuous density for each x ∈ X . Moreover the supportof Y k for k = 0 , is bounded.(iii) The propensity score is strictly positive P ( · ) > on its support. We consider the following procedure to implement an estimator for θ . First, a non-parametricestimator is used to estimate the propensity score (cid:98) P ( x ) = N (cid:80) Ni =1 D i K (cid:16) x − X i h N (cid:17) N (cid:80) Ni =1 K (cid:16) x − X i h N (cid:17) , (4.4)0where K ( · ) is a symmetric kernel and h N is the bandwidth. Then the average treatment eﬀect θ isestimated as: (cid:98) θ = ψ ( P N ) ≡ N N (cid:88) i =1 (cid:32) Y i D i (cid:98) P ( X i ) − Y i (1 − D i )1 − (cid:98) P ( X i ) (cid:33) . (4.5)In our analysis we focus on the kernel-based estimator for the propensity score without the loss ofgenerality. One can use a diﬀerent approach such as the series estimator where the number of termsused to approximate the function would play the role of the tuning parameter equivalent to thebandwidth parameter.We consider the kernel functions K ( · ) with sub-polynomial tail behavior. In particular, we assumethat there exists natural number d > k ≤ d , lim | z |→∞ | z | k K ( | z | ) = 0. This ensuresexistence of moments of kernel-weighted statistics over the distribution of X which is particularlyhelpful when the support of X is unbounded. We note that all “standard” kernel functions such asthe bounded support uniform, quadratic and Epanechnikov kernel as well as the most commonlyused Gaussian kernel satisfy this condition.The bandwidth is required to satisfy h N (cid:29) log NN (see [42]) to ensure uniform convergence of thepropensity score estimator and is typically chosen so that h N = o ( N − / ) to avoid the propagationof the non-parametric bias to the estimator of the average treatment eﬀect.We now consider the impact of diﬀrential privacy on estimation of θ . As in Section 3 we rely on oursmoothness assuption that allows us to focus on additive mechanisms to induce diﬀerential privacy.Also, like in our previous analysis of the regression discontinuity design we start with the analyis ofthe global sensitivity of ψ ( P N ). Proposition 7.

Suppose that average treatment eﬀect estimator (4.5) uses propensity score estima-tor (4.4) and kernel function K ( · ) is such that | K ( · ) | ≤ ¯ K and there exists natural number d > such that for all k ≤ d , lim | z |→∞ | z | k K ( | z | ) = 0 . then(i) If the support X is bounded and h N = o ( N − / ) the global sensitivity of functional ψ ( P N ) in(4.5)) is bounded away from zero as N → ∞ . (ii) If the support X is unbounded then the global sensitivity of functional ψ ( P N ) in (4.5)) is + ∞ . This result allows us to formulate the following theorem.

THEOREM 7.

For estimation of average treatment eﬀect any smooth ( ε N , δ N ) -diﬀerentially pri-vate propensity score-weighted estimator is inconsistent for any bounded sequence { ε N } and non-negative { δ N } . As a result, the limiting random set T E contains at least one non-degenerate elementdiﬀerent from { θ } . Diﬀerential privacy is a powerful data security concept that precludes a potential adversary fromlinking sensitive data with outside information, inferring data attributes or determining if particularindividual is included in the dataset. The implementation of the diﬀerentially private data analysis isbased on consideration of randomized estimator where independent randomness is the key instrumentthat provides the diﬀerential privacy guarantee.In this paper we focused on identiﬁcation of Econometric models under diﬀerential privacy. Weconcluded that even with relatively simple models identiﬁcation in this context requires the conceptsand methods from the random set theory. We consider identiﬁcation from the perspective of thelimit of statistical experiments where diﬀerentially private implementation of the estimator is appliedto the datasets of an increasing size. Identiﬁcation in this case is the property of the set of weaklimits of such estimators. Under our mild regularity conditions this limiting set is a convex compactrandom set and, thus, it needs to be characterized in probabilistic terms, for instance, using thecontainment functional.We apply our theory to two popular Econometric models: the regression discontinuity design (RDD)and the average treatment eﬀect (ATE). In the RDD settings we consider both sharp and fuzzy de-sign. We show that for both models the random set of weak limits of diﬀerentially private estimatorscontains non-degenerate random elements which precludes point identiﬁcation of the parameters ofinterest. We illustrate this ﬁnding in a series of Monte Carlo simulations.Our result, in part, is driven by the structure of the estimators which have to rely on local propertiesof the underlying distribution. This may indicate that under diﬀerential privacy a similar behavioris to be expected for other Econometric models that rely on nuisance parameters.

Let ω ν be the element of the σ -algebra F ν associated with random element ν N and ω S be the element of the σ -algebra of the subsets of Z n . Since T N is the closure of the set ofmeasurable selections that form a closed and bounded space, then for almost all elements ( ω N , ω S )the set of values θ ( P N ( · ; ω S ) , ν N ( ω ν )) is closed and bounded and, therefore, compact. Thus, randomset T N is compact.To prove convexity, it is enough to consider θ ( P N , ν N ) and θ (cid:48) ( P N , ν N ) that are realizations of tworegular ( (cid:15) N , δ N )-diﬀerential private and ( (cid:15) (cid:48) N , δ (cid:48) N )-diﬀerential private estimators, respectively, wheresequences of ( (cid:15) N , δ N ) and ( (cid:15) (cid:48) N , δ (cid:48) N ) are in E . Then by union bound their convex combination satisﬁes(2.2) with the right-hand side bound of at most 2 ¯ R ( n, κ ) . Also, any convex combination τ θ ( S N , ν N )+(1 − τ ) θ (cid:48) ( S N , ν N ) is a realization of the estimator τ θ ( · , · )+(1 − τ ) θ (cid:48) ( · , · ). This estimator is diﬀerentiallyprivate for the sequence of (max { (cid:15) N , (cid:15) (cid:48) N } , max { δ N , δ (cid:48) N } ) which belongs to E by our assumption of2 E being a join-semilattice. Finally not that the estimator τ θ ( · , · ) + (1 − τ ) θ (cid:48) ( · , · ) has a weak limitfrom the continuous mapping theorem as it is straightforward to show that ( θ, θ (cid:48) ) T ( · , · ) has a jointweak limit (of course, we would use the fact that θ, θ (cid:48) do not depend on N ). Thus, set T N is convexrandom set. (cid:4) Proof of Lemma 2:

Assume, contrary to the statement of the Lemma that ∆ N = θ ( S N , ν N ) − τ p −→ . Then θ ( S N , ν N ) = τ + ∆ N , and because τ is not constant, then conditional on S N and S N +1 ,estimator θ ( S N , ν N ) and θ ( S N +1 , ν N +1 ) cannot be independent. This, in its turn, will contradict theindependence of elements ν N and ν N +1 , which is a fundamental requirement for diﬀerential privacy. (cid:4) Proof of Theorem 1.

Provided that θ ( · , · ) belong to a compact subset of a separable space in L , for each χ there exists K such that for a ﬁnite set of elements { θ (1) ( · , · ) , . . . , θ ( K ) ( · , · ) } , theirconvex hull Θ KN is within χ -Hausdorf distance from set T N, E . Since each θ ( k ) ( · , · ) is a function ofthe same elements ( ω N , ω S ), weak convergence of θ ( k ) ( P N , ν N ) implies joint weak convergence of theset { θ (1) ( · , · ) , . . . , θ ( K ) ( · , · ) } and, therefore, weak convergence of their convex hull. We then choosesequence χ N , which induces sequence K N such that sup A (cid:12)(cid:12)(cid:12) C Θ KKN ( A ) − C T N, E ( A ) (cid:12)(cid:12)(cid:12) is a decreasingfunction of N. Then by Theorem 6.26 in [39] sequence of random sets T N, E converges weakly. (cid:4) Proof of Theorem 3. a) Suppose that for any sequence of ( (cid:15) N , δ N ) from E it holds thatany regular ( (cid:15) N , δ N )-diﬀerentially private estimator θ ( P N , ν N ) is such that θ ( P N , ν N ) p → θ . Then T E = { θ } (degenerate distribution at θ ). Then, clearly, for any convex polytope K (cid:51) θ we have C T E ( K ) = 1 ≥ − α for any α ∈ (0 , α ∈ (0 ,

1) and any convex polytope K (cid:51) θ C T E ( K ) ≥ − α. Since α can be taken to be arbitrarily close to 0, this means that C T E ( K ) = 1. Since convex polytope K (cid:51) θ can be taken to have arbitrarily small volume, this means that T E = { θ } (degeneratedistribution at θ ). Indeed, take a decreasing sequence K m (cid:51) θ of convex polytopes such that ∩ ∞ m =1 K m = { θ } . By the continuity theorem for monotone sequences of events C T E ( ∩ ∞ m =1 K m ) =lim m →∞ C T E ( K m ) = 1, which immediately implies that T E = { θ } meaning that every θ ( S N , η N )converges weakly to θ , and thus, θ ( S N , η N ) p → θ . (cid:4) θ in Section 2.3 Here we brieﬂy discuss the properties of the selection expectation of the limiting random set T E ofregular diﬀerentially private estimators for θ in the context of the example in Section 2.3.Let Θ N, E denote the selection expectation of the random set T N, E of all regular ( (cid:15) N , δ N )-diﬀerentiallyprivate estimators for θ with sequences ( (cid:15) N , δ N ) from E . The Hausdorﬀ limit of Θ N, E as N → ∞ is denoted as Θ ∞ , E .3Θ ∞ , E , loosely speaking, contains all limits of expectations of regular diﬀerentially private estimators.Ideally, if diﬀerentially private estimators are compatible with consistency, for a broad range ofsequences (cid:15) N and δ N converging to zero the set Θ ∞ , E should be a singleton { θ } .Theorem 1.45 in [39] links Θ ∞ , E to the selection expectation of the limit random set T E . Indeed,that theorem immediately implies that under conditions of Theorem 1, ET N, E converges to E T E inthe Hausdorﬀ metric and the Lebesgue measure of ET N, E converges to the Lebesgue measure of E T E as N → ∞ . In other words, weak convergence of a sequence of random sets implies the convergenceof the selection expectation.Going back to the discussion of our example in Section 2.3, we note that even though in that examplediﬀerential privacy-inducing mechanisms perturb the estimator with random noise symmetric at zero,there is no guarantee that the limiting Θ ∞ , E is a singleton at θ . In If we collect diﬀerentially privateestimators across all three regimes in that example, we ﬁnd that the corresponding limiting set ofselection expectations will include E U ( { Argmin θ ∈ Θ θ, Argmax θ ∈ Θ θ } ) , { E Λ( c ) , c ∈ [0 , + ∞ ) } , , E [ X ] . We note that in this case the target expectation E [ X ] belongs to Θ ∞ , E . At the same time, the setΘ ∞ , E itself is clearly large.We also note that if we exclude the elements of the selection expectation that result from Regime3 where the scale of double exponential noise asymptotically increases, the selection expectation ofour considered family of estimator will be a linear segment in Θ that connects points 0 and E [ X ]since the set { E Λ( c ) , c ∈ [0 , + ∞ ) } is a line in Θ that connects 0 and E [ X ] . Treat G ( N ) as ∆ N in AC2 and choose respective σ N and [ a N , b N ] asin condition AC2. Take the deﬁnition of the diﬀerentially private estimator and note that for twoestimators ψ ( S N ) and ψ ( S (cid:48) N ) based on two datasets that diﬀer in one observation only, we have P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B ) = P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N ∈ [ a N , b N ]) ++ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) ≤ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N ∈ [ a N , b N ])+ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) ≤ e D ( G ( N ) ,σ N ,a N ,b N ) P ν N ∼ f N ; σ N ( ν N + ψ ( S (cid:48) N ) ∈ B | ν N ∈ [ a N , b N ])+ P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) , where B is any subset of R . Thus, if (cid:15) N is greater os equal e D ( G ( N ) ,σ N ,a N ,b N ) (which also givesacceptable rates of convergence for (cid:15) N to 0) and P ν N ∼ f N ; σ N ( ν N + ψ ( S N ) ∈ B | ν N / ∈ [ a N , b N ]) δ N (which also gives acceptable rates of convergence for δ N to 0), then ( (cid:15) N , δ N )-diﬀerential privacycriterion is satisﬁed.4At the same time, because both the mean and the variance of ξ N converge to 0, then ξ N convergesin probability to 0 and, therefore, in light of (3.5) the estimator (cid:98) θ in (3.4) is consistent. (cid:4) Proof of Theorem 4.

As we will show, the inconsistency of the estimator (cid:98) θ in (3.4) stems from the fact that the varianceof the mechanism noise ξ N does not go to 0 with the sample size, which in its turn is explained bythe fact that the global sensitivity of ψ ( S N ) does not go to 0 with the sample size.Indeed, suppose that G ( N ) is bounded away from zero as N → ∞ and is also bounded from above.Then for a ﬁxed large enough interval [ a N , b N ], the value of D ( G ( N ) , σ N , a N , b N ) has to be boundedfrom above by ε N . Since G ( N ) is bounded away from 0, from AC1 we have that D ( G ( N ) , σ N , a, b ) → + ∞ if σ N → a, b ]. The deﬁnition of D () as a supremum implies that the sameproperty will hold if instead of the ﬁxed interval [ a, b ] we take [ a N , b N ] converging to R . Thisimplies, of course, that σ N has to be bounded away from zero. This, in turn, implies that ξ N doesnot converge in probability to zero even if the mean of ξ N converges to 0. Hence, (cid:98) θ dose not convergein probability to the true parameter value.If G ( N ) = + ∞ , then to guarantee ( ε N , δ N )-diﬀerential privacy, one would have to take σ N = + ∞ ,clearly leading to the inconsistency of (cid:98) θ .Note that this inconsistency result applies even to ( (cid:15) N , δ N ) not changing with N . It will also betrue under stronger requirements of diﬀerential privacy when both parameters converge to 0. (cid:4) Lemmas 4 and 5 will help to establish results in Propositions 2-5.

LEMMA 4.

Consider two weighted averages q = T (cid:88) i =1 w i a i + w T +1 a T +1 , where w i = b i (cid:80) T +1 i =1 b i , i = 1 , . . . , T + 1 ,q = T (cid:88) i =1 ˜ w i a i + ˜ w T ˜ a T , where ˜ w i = b i (cid:80) Ti =1 b i + ˜ b T +1 , i = 1 , . . . , T, ˜ w T +1 = ˜ b T +1 (cid:80) Ti =1 b i + ˜ b T +1 , and ≤ c ≤ ( or < ) b i , ˜ b i ≤ c , (6.1) d ≤ a i ≤ d , i = 1 , . . . , T + 1 , and d < d . (6.2) Then(a) if c = 0 and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 , ˜ b T +1 s.t. (6 . , (6 . | q − q | = d − d . (b) if c > and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 , ˜ b T +1 s.t. (6 . , (6 . | q − q | = c ( d − d ) T · c + c . (c) if d = −∞ or d = + ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 , ˜ b T +1 s.t. (6 . , (6 . | q − q | = + ∞ . In cases (a)-(c), max | q − q | can be attained by a positive change as well as by a negative change– that is, there are values of a t ’s, b t ’s and ˜ a T +1 , ˜ b T +1 such that q − q = max | q − q | , and thereare values of a t ’s, b t ’s and ˜ a T +1 , ˜ b T +1 such as q − q = − max | q − q | . Proof of Lemma 4. (a) In this case, we can take • b = . . . = b T ≈ b T +1 = ˜ b T +1 = c ; • a , . . . , a T can be arbitrary values that satisfy (6.2); a T +1 = d , ˜ a T +1 = d .This gives us q − q = d − d . Therefore, we should have max | q − q | ≥ d − d . At the sametime each weighted average q and q has to belong to [ d , d ], which is the range for a i ’s. Therefore,necessarily max | q − q | ≤ d − d . This implies that max | q − q | = d − d . Note that if above wetake a T +1 = d , ˜ a T +1 = d , then q − q = d − d = −| d − d | .(b) In this case, to evaluate the largest change in the weighted average we have to consider extremesituations. The ﬁrst extreme situation is when q = d and the ( T + 1)-th component in this averagehas the largest weight and changes to the the other extreme d in the new average q .This situation can be described as • b = . . . = b T = c ; b T +1 = ˜ b T +1 = c ; • a , . . . , a T = d ; a T +1 = d , ˜ a T +1 = d .This will give us q − q = c ( d − d ) T · c + c > b t ’s and ˜ b T +1 are the same as above but q = d and the( T + 1)-th component in this average has the largest weight and changes to the the other extreme d in the new average q , we obtain that q − q = − c ( d − d ) T · c + c <

0. These two extreme scenariosgive us exactly the same | q − q | .Thus, max | q − q | = c ( d − d ) T · c + c .6(c) In this case, consider the case when b i , i = 1 , . . . , T +1, and ˜ b T +1 are any values that satisfy (6.1).Suppose d = + ∞ . Let a i , i = 1 , . . . , T + 1, take any ﬁnite values while ˜ a T +1 is very (arbitrarily)large. This gives q − q = −∞ and, thus, | q − q | = + ∞ . Therefore, in this case max | q − q | = + ∞ .If, of course, ˜ a T +1 is taking a ﬁnite value while a T is very (arbitrarily) large, then q − q = + ∞ .The case of when d is ﬁnite but d = −∞ is analyzed analogously. (cid:4) LEMMA 5.

Consider two weighted averages q = T (cid:88) i =1 w i a i + w T +1 a T +1 , where w i = b i (cid:80) T +1 j =1 b j , i = 1 , . . . , T + 1 ,q = T (cid:88) i =1 ˜ w i a i , where ˜ w i = b i (cid:80) Tj =1 b j , i = 1 , . . . , T, where b i and a i satisfy conditions (6.1) and (6.2), respectively. Then(a) if c = 0 and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 s.t. (6 . , (6 . | q − q | = d − d . (b) if c > and | d | , | d | < ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 s.t. (6 . , (6 . | q − q | = c ( d − d ) T · c + c . (c) if d = −∞ or d = + ∞ , then max a ,...,a T ,a T +1 , ˜ a T +1 ,b ,...,b T ,b T +1 s.t. (6 . , (6 . | q − q | = + ∞ . In cases (a)-(c), max | q − q | can be attained by a positive change as well as by a negative change– that is, there are values of a t ’s, b t ’s such that q − q = max | q − q | , and there are values of a t ’s, b t ’s such that q − q = − max | q − q | . Proof of Lemma 5. (a) In this case, we can take • b = . . . = b T ≈ b T +1 = c ; • a = . . . = a T = d ; a T +1 = d .7This will give q ≈ d , q ≈ d and, thus, q − q ≈ d − d > | q − q | ≥ d − d . At the same time each weighted average q and q has to belong to [ d , d ], which is the range for a i ’s. Therefore, necessarily max | q − q | ≤ d − d .This implies that max | q − q | = d − d .(b) In this case, to evaluate the largest change in the weighted average we have to consider extremesituations. An extreme situation is when in q the ( T + 1)-th component (which is later dropped inwhen deﬁning q ) has the largest weight and the value that is maximally diﬀerent from the valuesof the ﬁrst T components.The ﬁrst extreme situation can be described as • b = . . . = b T = c ; b T +1 = c ; • a , . . . , a T = d ; a T +1 = d .This gives us | q − q | = c ( d − d ) T · c + c .The second extreme situation, where b t ’s are the same as above but a , . . . , a T = d and a T +1 = d ,gives us the exactly same value of | q − q | . Thus, max | q − q | = c ( d − d ) T · c + c .(c) In this case, consider the case when b i , i = 1 , . . . , T + 1, and ˜ b T +1 are any values that satisfy(6.1). Also, let a i , i = 1 , . . . , T + 1, take any ﬁnite values while ˜ a T +1 is very large (arbitrarily large)in the absolute value. This gives | q − q | = + ∞ . therefore, in this case max | q − q | = + ∞ . (cid:4) Proof of Proposition 2. (a) The global sensitivity of the estimator is calculated by comparing the results of the estimationfor two datasets that diﬀer only in on data point. In order to calculate the global sensitivity, weneed to keep in mind the following things:(i) the new data point can enter a K - h -neighborhood of c (thus, the old data point was outsideof both K - h -neighborhoods of c )(ii) the new data point can fall outside of both K - h -neighborhoods of c (thus, the old data pointwas inside one of K - h -neighborhoods of c )(iii) the new data point remains in the same neighborhood(iv) the new data point can switch neighborhoodsIn order to ﬁnd global sensitivity, it is enough for us to ﬁnd maximum absolute changes in theestimate in these four situations and then takes their maximum. Let us consider these four diﬀerentsituation listed above. In this proof we use Lemmas 4 and 5 with c = ¯ K , where ¯ K denotes themaximum value the kernel K ( · ), and c = 0 since K ( · ) is continuous and, therefore, K = 0.8(i) Suppose the new data point enters the K - h -neighborhood to the left of c while the old data pointwas outside of both K - h -neighborhoods of c . Then by part (a) of Lemma 5, the maximum absolutechange G L in the estimate in this case is G L = Y l − Y l . Analogously we can consider the casewhen a new data point enters the K - h -neighborhood to the right of c . Then the maximum absolutechange G R in the estimate in this case is G R = Y r − Y r .(ii) In this case we have two situations – in one situation the old data point was in the left K - h -neighborhood and in the other situation the old data point was in the right K - h -neighborhood.In both situations the new data point falls outside of both neighborhoods. In the former case themaximum absolute change in the estimate coincides with G L , and in the latter case the maximumabsolute change in the estimate coincides with G R .(iii) When the observation remains in the left K - h -neighborhood, we apply part (a) of Lemma 4 toobtain that the maximum absolute change G LL in the estimate in this case is G LL = Y l − Y l .When the observation remains in the right K - h -neighborhood, we consider the maximum absolutechange G RR in estimate and analogously to above show that G RR = Y r − Y r .(iv) Suppose an observation moves from the left K - h -neighborhood to the the right K - h -neighborhood.Our estimator of interest is the diﬀerence between the weighted means in the right and the left K - h neighborhoods of c . Therefore, the move of the observation from one neighborhood to the otheraﬀects both parts of the estimator.As we know from part (a) of Lemma 4, the maximum absolute change in the weighted average forthe right-hand side is G R = Y r − Y r and that this degree of change can be attained as a positivechange (increase). Similarly, the maximum absolute change in the weighted average for the left-handside is G LO = Y l − Y l and that this degree of change can be attained as a negative change (decrease).In order to obtain the maximum absolute changes for the diﬀerence in weighted means we have tolook at the cases when these two weighted means change in opposite directions, which leads to themaximum change being G LR = Y r − Y r + Y l − Y l .Analogously, we can consider an observation moves from the right K - h -neighborhood to the the left K - h -neighborhood and show that in this case the maximum absolute change is G RL = Y r − Y r + Y l − Y l .To sum up the results of part (a), the global sensitivity is G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } = Y r − Y r + Y l − Y l . (b) Suppose for instance that the support Y r is unbounded. Then part (c) of Lemmas 4 and 5 willimmediately give us that for G R , G R , G RL , G RR deﬁned above, G R = G R = G RL = G RR = + ∞ , which implies this part of the proposition. Other cases in this part of the proposition lead the sameconclusion.9 (cid:4) Proof of Proposition 3.

Just like in Proposition 2, the global sensitivity is G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } , where G L , G R , G L , G R , G LL , G RR , G LR , G RL are deﬁned as in the proof of Proposition 2.Once again, we will rely on the results in Lemmas 4 and 5 but this time in part (b) in both lemmasas we will take c = K and c = K , where K is the maximum value of kernel K .(a) Applying results of part (b) of Lemma 5 and noting that the minimum number of observationsin the left and the right K - h -neighborhoods of c is m l ( N ) and m r ( N ) respectively, we obtain that G L = G L = K ( Y l − Y l ) m l ( N ) · K + K ,G R = G R = K ( Y r − Y r ) m r ( N ) · K + K .

Applying results of part (b) of Lemma 4, we have G LL = K ( Y l − Y l )( m l ( N ) − · K + K , G RR = K ( Y r − Y r )( m r ( N ) − · K + K .

We next consider G LR which quantiﬁes the case when an observation from the left K - h neighborhoodof c moved in to the right-hand side neighborhood. Suppose we started with T + 1 observationsin the left K - h -neighborhood, where m l ( N ) ≤ T ≤ N − m r ( N ). We need to evaluate the biggestchange that happened in the left-hand side neighborhood, the biggest change in the right-handside neighborhood and evaluate their directions (whether these changes are acting in the same oropposite directions). Relying on the results of part (b) of Lemma 5 we can establish that given T thelargest absolute change in the weighted mean in the left K - h neighborhood of c is K ( Y l − Y l ) T · K + K , and thelargest absolute change in the weighted mean from acquiring an extra point in that neighborhoodis K ( Y r − Y r )( N − T − · K + K .As shown in Lemma 5, these changes can be either positive or negative. Since our estimator ofinterest is the diﬀerence between the weighted means in the right and the left K - h neighborhoodsof c , to get maximum absolute changes for a given T we have to look at the cases when these twoweighted means change in opposite directions. For a given T this gives us the maximum absolutechange K · ( Y l − Y l ) T · K + K + K · ( Y r − Y r )( N − T − · K + K . T such that m l ( N ) ≤ T ≤ N − m r ( N ). If Y r − Y r > Y l − Y l , then the maximum is attained at T = N − m r ( N ), otherwise it is attained at T = m l ( N ). To summarize, G LR = max (cid:40) K · ( Y l − Y l ) m l ( N ) · K + K + K · ( Y r − Y r )( N − m l ( N ) − · K + K , K · ( Y l − Y l )( N − m r ( N )) · K + K + K · ( Y r − Y r )( m r ( N ) − · K + K (cid:41) . The case of G RL which quantiﬁes the case when an observation from the right K - h neighborhoodof c moved in to the left-hand side neighborhood is considered analogously. In this case, G RL = max (cid:40) K · ( Y l − Y l )( N − m r ( N ) − · K + K + K · ( Y r − Y r ) m r ( N ) · K + K , K · ( Y l − Y l )( m l ( N ) − · K + K + K · ( Y r − Y r )( N − m l ( N )) · K + K (cid:41) . This gives the result that G ( N ) is of the rate { m l ( N ) ,m r ( N ) } .(b) Suppose e.g. that the support Y r is unbounded. Then part (c) of Lemmas 4 and 5 give us that G R = G R = G RR = + ∞ , which implies that G ( N ) = + ∞ . (cid:4) Proof of Proposition 4.

The proof in this case is analogous to the proof of Proposition 2. Sincethe kernel has an unbounded support, there are no longer case of observations falling outside ofeither neighborhood or entering a neighborhood. Therefore, the global sensitivity is G ( N ) = max { G LR , G RL , G LL , G RR } , where G LL , G RR , G LR , G RL are deﬁned as in the proof of Proposition 2. Throughout the proof weapply Lemmas 4 and 5 with the strict inequality version (0 = c < b i ) in (6.1).(a) When the observation remains to the left of c , we apply part (a) of Lemma 4 to obtain that themaximum absolute change G LL in the estimate in this case is G LL = Y lall − Y lall .When the observation remains to the right of c , we consider the maximum absolute change G RR inestimate and analogously to above show that G RR = Y rall − Y rall .Suppose an observation moves from the left of c to the right of c , or from the right of c to the leftof c . Analogously to the proof of of Proposition 2, we can establish that G LR = G RL = Y rall − Y rall + Y lall − Y lall . (b) Analogous to the proof in Proposition 2. (cid:4) Proof of Proposition 5.

Just like in Propositions 2 and 3, we want to ﬁnd G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } where G L and G R are sensitivities in situations of a new observation leaving the left or the right h -neighborhood, respectively; and G L and G R are sensitivities in situations of a new observationentering the left or the right h -neighborhood, respectively; G LR and G RL are sensitivities in casesof an observation switching the neighborhoods; G LL and G RR are sensitivities in cases when anobservation changes within the same neighborhood.Since the local linear estimator eﬀectively considers observations whose running variable values are ina small neighborhood around c , we employ (3.11) as approximations of the support for the outcomein one-sided neighborhoods of c .As we know, (cid:98) α R = y R − ( x R − c ) (cid:80) Ni =1 ( q i x i − x R ) q i y i · c ≤ x i ) (cid:80) Ni =1 ( x i q i − x R ) · c ≤ x i ) (cid:98) α L = y L − ( x L − c ) (cid:80) Ni =1 ( x i q i − x L ) y i q i · x i < c ) (cid:80) Ni =1 ( x i q i − x R ) · x i < c ) , where q i = K (cid:16) x i − ch N (cid:17) , y R = (cid:80) Ni =1 q i y i c ≤ x i ) (cid:80) Ni =1 q i c ≤ x i ) , x R = (cid:80) Ni =1 q i x i c ≤ x i ) (cid:80) Ni =1 q i c ≤ x i ) , y L = (cid:80) Ni =1 q i y i x i

1. For a kernel K ( · ) with a bounded support u > − u , u ) is the support of this kernel. If K ( · ) has an unbounded support, then we2can take u to be a very large positive number. Whatever the situation is, we can take q i ≈ K (0) , i = 1 , . . . , m r ( N ) − q T ≈ K, q (cid:48) T = K (0) , where K = inf u ∈ ( − u ,u ) K ( u ). Suppose that y T = y (cid:48) T . Then (cid:98) α R ≈ y R − (cid:18) ( c + ∆ N )( T − K (0)( T − K (0) + K − c (cid:19) ×× K (0)(( c + ∆ N ) K (0) − ( c +∆ N )( T − K (0)( T − K (0)+ K ) (cid:80) T − i =1 y i + Ky T (cid:16) ( c + u h N − ∆ N ) K − ( c +∆ N )( T − K (0)( T − K (0)+ K (cid:17) (( c + ∆ N ) K (0) − ( c +∆ N )( T − K (0)( T − K (0)+ K ) + (cid:16) ( c + u h N − ∆ N ) K − ( c +∆ N )( T − K (0)( T − K (0)+ K (cid:17) (cid:98) α R ≈ y R − ∆ N (cid:18) − δT (cid:19) × − ∆ N δT (cid:80) T − i =1 y i − y T T − T ∆ N δ ( T − N δ T + ( T − T ∆ N δ ) For ﬁxed T , h N , ∆ N , it is possible to have δ ↓

0, in which case we have that | (cid:98) α (cid:48) R − (cid:98) α R | → ∞ . Since there are no changes in (cid:98) α L , we conclude that G RR = ∞ , and thus, G = ∞ .Note that when the kernel either has an unbounded support or has a bounded support with K = 0,then even without using δ ↓

0, we can establish that the global sensitivity is bounded away from zerofor any N , using techniques similar to those in Propositions 2 and 4. The proof above is based on theability to have realizations of the data such that the minimum eigenvalue of the matrix T ˜ X Tr ˜ X r canbe arbitrarily close to zero, where ˜ X r is the T × , x i − c ) for when x i ≥ c . Ifin the implementation of a diﬀerentially private a data curator wants to establish a strictly positivelower bound on the minimum eigenvalue of this matrix, then in the case of a kernel with a boundedsupport and K > { m r ( N ) ,m l ( N ) } . However, an issue with this is given in discussion after Proposition3 and is related to the fact that there is a always a strictly positive probability of the number ofobservations being strictly less m r ( N ) in the K - h -neighborhood to the right or strictly less than m l ( N ) in the K - h -neighborhood to the left.It is obvious that when the support of Y | X in the neighborhood to the right of c or to the left of c is unbounded, then G ( N ) = + ∞ , which can be shown by just changing one value of Y i only. (cid:4) Proof of Proposition 6 .Just like in Proposition 5, we can formulate the problem as that of ﬁnding G ( N ) = max { G L , G R , G L , G R , G LR , G RL , G LL , G RR } , G L and G R are sensitivities in situations of a new observation leaving the left or the right h -neighborhood, respectively; and G L and G R are sensitivities in situations of a new observationentering the left or the right h -neighborhood, respectively; G LR and G RL are sensitivities in casesof an observation switching the neighborhoods; G LL and G RR are sensitivities in cases when anobservation changes within the same neighborhood. When we say ”leaves a neighborhood” or ”entersa neighborhood”, we mean that with respect the value of X i .If we follow the textbook deﬁnition of diﬀerential privacy and, thus, consider all possible realizationswith data no matter how small the probability of these realizations is as long as it is strictly positive,then we can show that G ( N ) = + ∞ . Indeed, let us show e.g. that following the textbook deﬁnitionof diﬀerential privacy, we have that G RR = ∞ .Consider a situation when the ﬁrst T ≤ N observations in our data are in the right-hand sideneighborhood in the values of X i . Consider, for example, realizations of the datasets when only T -th observation in the right-hand side neighborhood changes its value x T while its values W T and Y T do not change. Then, of course, (cid:98) α y,L = (cid:98) α (cid:48) y,L , (cid:98) α w,L = (cid:98) α (cid:48) w,L . Now we will use the same realizations for X , . . . , X T , X (cid:48) T as given in (6.3)-(6.4) in the proof ofProposition 5. Also, we will take the realization of the dataset when W i = 1, i = 1 , . . . , T or W i = 0, i = 1 , . . . , T (again, due to the fuzzy scenario, the probability of this scenario may be perceivedas low but it is strictly positive and, thus, has to be taken into account by a diﬀerentially privatemechanism). Then we can take, of course, that (cid:98) α w,R = (cid:98) α (cid:48) w,R as the values of indicators 1( X i ≥ c ), i = 1 , . . . , T , do not add any explanatory power in the locallinear regression of W i on constant and 1( X i ≥ c ) in the right-hand side neighborhood (one canthink of this situation as the situation of the perfect ﬁt in the reduced form in the IV regressioneven though technically (cid:98) α w,R and (cid:98) β w,R may not be separately estimated in a sample like the one wesuggested). Thus, changes in the value of (cid:98) τ F,LocLin in (3.3) happen only because of the changes inthe numerator. The changes in the numerator are, of course, the same as the changes in (cid:98) α R describedin the proof of Proposition 5 and, thus, manipulating δ , we can make this change arbitrarily largein the absolute value, leading us to conclusion that G RR = + ∞ .Even if one wanted to deviate from the textbook deﬁnition of diﬀerential privacy and restrict W i , i = 1 , . . . , T , to have some variation in each neighborhood – e.g. by requiring a minimum number ofzero’s and one’s in each neighborhood or a ﬁxed proportion – in this case, it would be straightforwardto show that the global sensitivity would be bounded away from zero as N → ∞ .Indeed, in this case even using the same example with T realizations of X , . . . , X T , X (cid:48) T in theright-hand side neighborhood with the values given in (6.3)-(6.4) in the proof of Proposition 5, we4would obtain that manipulating δ approaching 0, change in both the numerator (cid:98) α y,R − (cid:98) α y,L andthe denominator (cid:98) α w,R − (cid:98) α w,L are arbitrarily large in the absolute value but they become arbitrarilylarge with the same rate in δ , thus allowing us to conclude that the change is constant and showthat this constant change may not diminish to 0 with the sample size.It is obvious that when the support of Y | X in the neighborhood to the right of c or to the left of c is unbounded, then G ( N ) = + ∞ , which can be shown by just changing one value of Y i only.Note that for simplicity we used the uniform kernel to deﬁne (3.3). In the case if one were using akernel with a bounded support but K > (cid:4)

Proof of Proposition 7

Since the support of Y is bounded by Assumption 3, the global sensitvityis determined by variation of the empirical weight 1 / (cid:98) P ( x ) over X . Thensup x,x (cid:48) ∈X (cid:12)(cid:12)(cid:12) / (cid:98) P ( x ) − / (cid:98) P ( x (cid:48) ) (cid:12)(cid:12)(cid:12) ≥ ¯ K/ ( h N K (diam( X ) /h N )) . Note whenever diam( X ) is inﬁnite, an inﬁnte lower bound applies and (ii) immediately follows.When diam( X ) is ﬁnite, global sensitivity of ψ ( P N ) is bounded from below by ¯ K/ ( N h N K (diam( X ) /h N )) . For h N = o ( N − / ) , since lim | z |→∞ | z | k K ( | z | ) = 0 for k ≤ d with d > , then N h N K (diam( X ) /h N ) = o (1) and, thus global sensitivity of ψ ( P N ) does not decrease as N → ∞ . (cid:4) References [1] Disclosure avoidance and the 2020 census. https : // / about / policies / privacy / statistical safeguards/disclosure-avoidance-2020-census.html , 2019.[2] Open letter to census bureau leadership. https : // ipums.org / changes-to-census-bureau-data-products / open-letter-to-census-bureau-leadership , 2019.[3] Social science one announces access to facebook dataset of publicly shared urls forresearch. https : // socialscience.one / blog / social-science-one-announces-access-facebook-dataset-publicly-shared-urls , 2019.[4] Alberto Abadie and Guido W Imbens. Large sample properties of matching estimators foraverage treatment eﬀects. Econometrica , 74(1):235–267, 2006.[5] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu.Approximation algorithms for k-anonymity.

Journal of Privacy Technology , 2005112001, 2005.5[6] A. Beresteanu, I. Molchanov, and F. Molinari. Partial identiﬁcation using random set theory.

Journal of Econometrics , 166(1):17–32, 2012.[7] Arie Beresteanu and Francesca Molinari. Asymptotic properties for a class of partially identiﬁedmodels.

Econometrica , 76(4):763–814, 2008.[8] J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility inanonymized data publishing. In

Proceeding of the 14th ACM SIGKDD international conferenceon Knowledge discovery and data mining , pages 70–78. ACM, 2008.[9] Sebastian Calonico, Matias D. Cattaneo, and Rocio Titiunik. Robust nonparametric conﬁdenceintervals for regression-discontinuity designs.

Econometrica , 82(6):2295–2326, 2014.[10] Matias D. Cattaneo, Nicols Idrobo, and Roco Titiunik.

A Practical Introduction to RegressionDiscontinuity Designs: Foundations . Elements in Quantitative and Computational Methods forthe Social Sciences. Cambridge University Press, 2020.[11] Victor Chernozhukov and Han Hong. An mcmc approach to classical estimation.

Journal ofEconometrics , 115(2):293–346, 2003.[12] V. Ciriani, S.D.C. di Vimercati, S. Foresti, and P. Samarati. k-anonymity.

Secure Data Man-agement in Decentralized Systems. Springer-Verlag , 2007.[13] C. Dwork. Diﬀerential privacy.

Automata, languages and programming , pages 1–12, 2006.[14] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensi-tivity in private data analysis. In

Theory of cryptography conference , pages 265–284. Springer,2006.[15] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of diﬀerential privacy.

Foun-dations and Trends R (cid:13) in Theoretical Computer Science , 9(3–4):211–407, 2014.[16] Jianqing Fan. Design-adaptive nonparametric regression. Journal of the American StatisticalAssociation , 87(420):998–1004, 1992.[17] Jianqing Fan and Irene Gijbels.

Local polynomial modelling and its applications . Number 66 inMonographs on statistics and applied probability series. Chapman and Hall.[18] Jinyong Hahn. On the role of the propensity score in eﬃcient semiparametric estimation ofaverage treatment eﬀects.

Econometrica , pages 315–331, 1998.[19] Jinyong Hahn, Petra Todd, and Wilbert Van der Klaauw. Identiﬁcation and estimation oftreatment eﬀects with a regression-discontinuity design.

Econometrica , 69(1):201–209, 2001.[20] Keisuke Hirano, Guido W Imbens, and Geert Ridder. Eﬃcient estimation of average treatmenteﬀects using the estimated propensity score.

Econometrica , 71(4):1161–1189, 2003.6[21] Guido Imbens and Karthik Kalyanaraman. Optimal bandwidth choice for the regression dis-continuity estimator.

Review of Economic Studies , 79:933–959, 2012.[22] Guido W. Imbens and Thomas Lemieux. Regression discontinuity designs: A guide to practice.

Journal of Econometrics , 142:615–635, 2008.[23] Aaron Johnson and Vitaly Shmatikov. Privacy-preserving data exploration in genome-wideassociation studies. In Inderjit S. Dhillon, Yehuda Koren, Rayid Ghani, Ted E. Senator, PaulBradley, Rajesh Parekh, Jingrui He, Robert L. Grossman, and Ramasamy Uthurusamy, editors,

KDD , pages 1079–1087. ACM, 2013.[24] A. F. Karr, C. N. Kohnen, A. Oganian, J. P. Reiter, and A. P. Sanil. A framework for evaluatingthe utility of data altered to protect conﬁdentiality.

The American Statistician , 60(3):224–232,2006.[25] Toru Kitagawa. Estimation and inference for set-identiﬁed parameters using posterior lowerprobability. working paper , 2012.[26] T. Komarova, D. Nekipelov, and E. Yakovlev. Estimation of treatment eﬀects from combineddata: Identiﬁcation versus data security. In A. Goldfarb, S.M. Greenstein, and C.E. Tucker,editors,

Economic Analysis of the Digital Economy . The University of Chicago Press, Chicago,2015.[27] Tatiana Komarova, Denis Nekipelov, Ahnaf Al Raﬁ, and Evgeny Yakovlev. K-anonymity: Anote on the trade-oﬀ between data utility and data security.

Applied Econometrics , 48:44–62,2017.[28] Tatiana Komarova, Denis Nekipelov, and Evgeny Yakovlev. Identiﬁcation, data combination,and the risk of disclosure.

Quantitative Economics , 9(1):395–440, 2018.[29] Anna Kormilitsina and Denis Nekipelov. Consistent variance of the laplace-type estimators:Application to dsge models.

International Economic Review , 57(2):603–622, 2016.[30] D. Lambert. Measures of disclosure risk and harm.

Journal of Oﬃcial Statistics , 9:313–313,1993.[31] David S. Lee and Thomas Lemieux. Regression discontinuity designs in economics.

Journal ofEconomic Literature , 48:281355, 2008.[32] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan. Incognito: Eﬃcient full-domain k-anonymity.In

Proceedings of the 2005 ACM SIGMOD international conference on Management of data ,pages 49–60. ACM, 2005.[33] K. LeFevre, D.J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity.In

Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference , pages25–25. IEEE, 2006.7[34] Y. Lindell and B. Pinkas. Privacy preserving data mining. In

Advances in CryptologyCRYPTO2000 , pages 36–54. Springer, 2000.[35] Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, and Lars Vilhuber.Privacy: Theory meets practice on the map. In

Proceedings of the 2008 IEEE 24th InternationalConference on Data Engineering , ICDE 08, page 277286, USA, 2008. IEEE Computer Society.[36] Justin McCrary. Manipulation of the running variable in the regression discontinuity design: Adensity test.

Journal of Econometrics , 142(2):698–714, February 2008.[37] Frank McSherry and Kunal Talwar. Mechanism design via diﬀerential privacy. In , pages 94–103. IEEE, 2007.[38] Xue Meng, Hui Li, and Jiangtao Cui. Diﬀerent strategies for diﬀerentially private histogrampublication.

J. Comm. Inform. Networks , 2(3):68–77, 2017.[39] I. Molchanov.

Theory of random sets . Springer, 2005.[40] Ilya Molchanov and Francesca Molinari.

Random Sets in Econometrics . Econometric Societymonographs. Cambridge University Press.[41] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In

Securityand Privacy, 2008. SP 2008. IEEE Symposium on , pages 111–125. IEEE, 2008.[42] David Pollard. Empirical processes: theory and applications. In

NSF-CBMS regional conferenceseries in probability and statistics , pages i–86. JSTOR, 1990.[43] Jack Porter. Estimation in the regression discontinuity model. working paper , 2003.[44] Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observa-tional studies for causal eﬀects.

Biometrika , 70(1):41–55, 1983.[45] P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity andits enforcement through generalization and suppression. Technical report, Citeseer, 1998.[46] L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression.

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems , 10(5):571–588,2002.[47] L. Sweeney. k-anonymity: A model for protecting privacy.

International Journal of UncertaintyFuzziness and Knowledge Based Systems , 10(5):557–570, 2002.[48] Donald L. Thistlethwaite and Donald T. Campbell. Regression-discontinuity analysis: Analternative to the ex post facto experiment.

Journal of Educational Psychology , 51(6):309–317,1960.[49] Caroline Uhler, Aleksandra B. Slavkovic, and Stephen E. Fienberg. Privacy-preserving datasharing for genome-wide association studies.

J. Priv. Conﬁdentiality , 5(1), 2013.8[50] M. Woo, J. P. Reiter, A. Oganian, and A. F. Karr. Global measures of data utility for microdatamasked for disclosure limitation.

Journal of Privacy and Conﬁdentiality , 1(1):111–124, 2009.9 F i g u r e : I ll u s t r a t i o n t o S ce n a r i o1 . Tw e n t y i nd e p e nd e n t p a t h s o f d i ﬀ e r e n t i a ll y p r i v a t ee s t i m a t o r s l o c a lli n e a r e s t i m a t o r s f o r i n c r e a s i n g s a m p l e s i ze s f o r v a r i o u s d e g r ee s o f d i ﬀ e r e n t i a l p r i v a c y p r o t ec t i o n .6