Differentially private depth functions and their associated medians
DDifferentially private depth functions and their associated medians
Kelly Ramsay, Shoja’eddin ChenouriJanuary 2021
Abstract
In this paper, we investigate the differentially private estimation of data depth functions and theirassociated medians. We start with several methods for privatizing depth values at a fixed point, andshow that for some depth functions, when the depth is computed at an out of sample point, privacycan be gained for free when n → ∞ . We also present a method for privately estimating the vector ofsample depth values, and show that privacy is not gained for free asymptotically. We also introduceestimation methods for depth-based medians for both depth functions with low global sensitivity anddepth functions with only highly probably, low local sensitivity. We provide a general Theorem (Lemma1) which can be used to prove consistency of an estimator produced by the exponential mechanism,provided the asymptotic cost function is uniquely minimized and is sufficiently smooth. We introduce ageneral algorithm to privately estimate minimizers of a cost function which has low local sensitivity, buthigh global sensitivity. This algorithm combines propose-test-release with the exponential mechanism.An application of this algorithm to generate consistent estimates of the projection depth-based medianis presented. For these private depth-based medians, we show that it is possible for privacy to be freewhen n → ∞ . Keywords—
Differential Privacy, Depth function, Multivariate Median, Propose-test-release
There is a large body of literature that shows simply removing the identifying information about subjects from adatabase is not enough to ensure data privacy [see Dwork et al., 2017, and the references therein]. Even if only certainsummary statistics are released, an adversary can still learn a surprising amount about individuals in a database[Dwork et al., 2017]. This phenomena is largely due to auxiliary information that is known by the adversary. Giventhe large amount of information about individuals that is publicly available, it is not infeasible to assume that anadversary already knows some information about the individual they wish to learn about. On the contrary, if astatistic is differentially private an adversary cannot learn about the attributes of specific individuals in the originaldatabase, regardless of the amount of initial information the adversary possesses. This property, coupled with thelack of assumptions on the data itself needed to ensure privacy, accounts for the volume of recent literature ondifferentially private statistics.One part of this literature represents a growing interest in the statistical community in differentially privateinference [Wasserman and Zhou, 2010, Awan et al., 2019, Cai et al., 2019, Brunel and Avella-Medina, 2020, see, e.g.,].One burgeoning area is the connection between robust statistics and differentially private statistics, first discussedby Dwork and Lei [2009]. Private M-estimators were studied by several authors [Lei, 2011, Avella-Medina, 2019]. Aconnection between private estimators and gross error sensitivity was formalized by Chaudhuri and Hsu [2012], whopresent upper and lower bounds on the convergence of differentially private estimators in relation to their gross errorsensitivity. The connection between private estimators and gross error sensitivity has been further exploited in orderto construct differentially private statistics [Avella-Medina, 2019]. Brunel and Avella-Medina [2020] greatly expandedthe propose-test-release paradigm of Dwork and Lei [2009] using the concept of the finite sample breakdown point.The same authors use this idea to construct private median estimators with sub-Gaussian errors [Avella-Medina andBrunel, 2019]. Our present work is inspired by these recent papers, where we explore the privatization of depthfunctions, a robust and nonparametric data analysis tool; given the recent success of robust procedures in the privatesetting, it is worthwhile to develop and study privatized depth functions and associated medians.Depth functions facilitate the extension of, among other things, medians and rank statistics to the multivariatesetting. The robustness properties of depth functions, including breakdown and gross error sensitivities, are wellstudied and favourable [Romanazzi, 2001, Chen and Tyler, 2002, Zuo, 2004, Dang et al., 2009], making them a a r X i v : . [ m a t h . S T ] J a n romising direction of study for use in the private setting. We take the some of the first steps in privatizing depthbased inference. The contributions are as follows • We present several approaches for the privatization of sample depth functions, including a discussion of advan-tages and disadvantages of each approach. • We present algorithms for the private release of sample depth values of several popular depth functions. Theseinclude halfspace depth [Tukey, 1974], simplicial depth [Liu, 1990], IRW depth [Ramsay et al., 2019] andprojection depth [Zuo, 2003]. Our algorithms and analysis can also be applied to depth functions with similarcharacteristics. We present asymptotic results concerning these private, depth value estimates, showing thatpointwise, private depth values can be consistently estimated. • We present algorithms for generating consistent, private depth-based medians, using the exponential mechanismand the propose-test-release framework of [Dwork and Lei, 2009, Brunel and Avella-Medina, 2020]. • We extend the propose-test-release algorithm of Brunel and Avella-Medina [2020] to be used with the expo-nential mechanism. We present a general algorithm for releasing a private maximizer of an objective function(or minimizer of a cost function) which may have infinite global sensitivity. The objective function should besuch that observing high local sensitivity is unlikely. • We present a lemma that can be used to prove weak consistency of private estimators generated from theexponential mechanism, even when the cost function is not necessarily differentiable.Some work has been done surrounding the private computation of halfspace depth regions and the halfspace median[Beimel et al., 2019, Gao and Sheffet, 2020] from a computational geometry point of view. Though Beimel et al.[2019] mentions that the Tukey depth function can be used with the exponential mechanism, they do not study theestimator’s properties from a statistical point of view; it is used as a method of finding a point in the convex hull ofa set of points. To the best of our knowledge, no one has attempted to privatize other depth functions.
In this section we introduce the fundamentals of differential privacy. Two essential concepts are that of a mechanismand that of adjacent databases. In order for a statistic (or database) to be differentially private, it must be stochas-tically computed [Dwork and Roth, 2014]. This differs from typical data analysis in that all differentially privatestatistics are generated from a distribution (cid:101) T ( X n ) ∼ Q X n rather than being deterministically computed. In otherwords, all differentially private statistics (cid:101) T ( X n ) admit measures Q X n given the data X n . We assume here that the datais a random sample of size n such that each observation is in R d , we will denote the sample by X n = { X , . . . , X n } .We call the procedure that determines Q X n and then outputs (cid:101) T ( X n ) ∼ Q X n a mechanism. We may also refer to themechanism by (cid:101) T with an abuse of notation.Along with mechanisms, we also must define adjacent databases. We say that X n and Y n (another randomsample of size n ) are adjacent if they differ by one observation. In other words, X n and Y n are adjacent if they havesymmetric difference equal to one. Equipped with these concepts, we can define differential privacy: Definition 1.
A mechanism (cid:101) T is (cid:15) -differentially private for (cid:15) > if Q X n ( B ) Q Y n ( B ) ≤ e (cid:15) (1) holds for all measurable sets B and adjacent X n and Y n . The parameter (cid:15) should be small, implying that Q X n ( B ) Q Y n ( B ) ≈ , which gives the interpretation that the two measures Q X n and Q Y n are almost equivalent. To understand thisdefinition, it helps to think of the problem from the adversary’s point of view. Suppose that we are the adversaryand that we have access to all the entries in the database except for one, call it θ , which we are trying to learn about.If (cid:101) T is released, how can we use it to conduct inference about θ ? In order to test H : θ = θ vs. H : θ = θ we, as statisticians, would then ask two questions: How likely was it to observe (cid:101) T under H ? and How likely was it to observe (cid:101) T under H ? ifferential privacy stipulates that both of these questions have practically the same answer, making it impossible toinfer anything about θ from (cid:101) T . Definition 1 implies that if someone in the dataset was replaced, we are just as likely tohave seen (cid:101) T (or some value very close to (cid:101) T if Q X n is continuous). Another way to interpret the definition is to observethat differential privacy implies that KL( Q X n , Q Y n ) < (cid:15) , where KL is the Kullback–Leibler divergence; implying thatthe distributions are necessarily close. Differential privacy is a worst case restriction, in that the inequality covers alldatabases, and all possible outcomes of the mechanism.Definition 1 can be difficult to satisfy because the umbrella of ‘all databases and mechanism outputs’ can includeboth some extreme databases and extreme mechanism outputs. One may wish to relax this definition over unlikelymechanism outputs; one way to do this is if B is such that Q X n ( B ) is very small, then the bound could be allowedto fail. This is called approximate differential privacy or ( (cid:15), δ )-differential privacy, in which we have Q X n ( B ) ≤ e (cid:15) Q Y n ( B ) + δ (2)in place of the condition (1). Typically, δ << (cid:15) , and δ can be interpreted as the probability under which the bound isallowed to fail. To see this, observe that for B such that Q X n ( B ) < δ , (2) holds regardless of (cid:15) . We mention that forremainder of the paper, (cid:15) and δ are always assumed to be positive and that sometimes we may have that the privacyparameters are a function of the sample size, and we indicate this with a subscript n ; (cid:15) n , δ n . Central to many private algorithms is the concept of sensitivity . Consider some function T : ( R d ) n → R k where( R d ) n denotes the sample space. Usually T represents a statistic or a data driven objective function. Sensitivitymeasures how sensitive T is to exchanging one sample point for another. Two important types of sensitivity are localsensitivity and global sensitivity, which are defined asLS( T ; X n ) = sup Y n (cid:107) T ( X n ) − T ( Y n ) (cid:107) and GS( T ) = sup X n , Y n (cid:107) T ( X n ) − T ( Y n ) (cid:107) . In some cases, it is necessary to use different norms and so we add the subscript GS p to indicate global sensitivitycomputed with respect to the p -norm.We can now introduce some important building blocks of differentially private algorithms. Let W , . . . , W k , . . . and Z , . . . , Z k , . . . represent a sequence of independent, standard Laplace random variables and a sequence of independent,standard Gaussian random variables respectively. The Laplace and Gaussian mechanisms are essential differentiallyprivate mechanisms; they define how much an estimator must be perturbed in order for it to be differentially private. Mechanism 1 (Dwork et al. [2006]) . Given a statistic T : ( R d ) n → R k , the mechanism that outputs (cid:101) T ( X n ) = T ( X n ) + ( W , . . . , W k )GS ( T ) /(cid:15) is (cid:15) -differentially private. Mechanism 2 (Dwork et al. [2006], Dwork and Roth [2014]) . Given a statistic T : ( R d ) n → R k , the mechanism thatoutputs (cid:101) T ( X n ) = T ( X n ) + ( Z , . . . , Z k ) (cid:112) . /δ )GS ( T ) (cid:15) is ( (cid:15), δ ) -differentially private. This can be improved in strict privacy scenarios [Balle and Wang, 2018]. We can also add noise based on smoothsensitivity [Nissim et al., 2007]. Using smooth sensitivity allows the user to leverage improbable, worst case localsensitivities. Often in practice, statistics are computed by maximizing a data driven objective function φ X n ( · ). Wecan privatize such a procedure via the exponential mechanism. The exponential mechanism can be defined as follows: Mechanism 3 (McSherry and Talwar [2007]) . Given the data, consider a function φ X n : R d → R and define theglobal sensitivity of such a function as GS( φ ) = sup X n , Y n (cid:107) φ X n − φ Y n (cid:107) ∞ . Then a random draw from the density f ( v ; φ X n , (cid:15) ) that satisfies f ( x ; φ X n , (cid:15) ) ∝ exp (cid:18) (cid:15)φ X n ( x )2GS( φ ) (cid:19) , is an (cid:15) -differentially private mechanism. It is assumed that (cid:90) R d exp (cid:18) (cid:15)φ X n ( x )2GS( φ ) (cid:19) dx < ∞ . he factor of 2 can be removed if the normalizing term is independent of the sample. All of the mechanismsdiscussed so far require that the statistic has finite global sensitivity. This is a somewhat strict requirement; underthe normal model neither the sample mean nor sample median have finite global sensitivity. The sample mediandoes, however, have low local sensitivity, viz.LS(Med( X n )) ≤ | F − n (1 / − /n ) − F − n (1 / /n ) | , where F − refers to the left continuous quantile function for a distribution F and F n is the empirical distributionof a univariate sample X n . Since 1 /n →
0, we expect this value to be small (assuming the sample comes froma distribution which is continuous at its median). Throughout the paper we define the median of a continuousdistribution by Med( F ) = F − (1 / X n ) and Med( F n ) are taken to be the usual sample median.The propose-test-release mechanism, or PTR, can be used to generate private versions of statistics with infiniteglobal sensitivity but highly probable low local sensitivity. The propose-test-release idea was introduced by Dworkand Lei [2009] but was greatly expanded in the recent paper by Brunel and Avella-Medina [2020]. The PTR algorithmof Brunel and Avella-Medina [2020] relies on the truncated breakdown point A η , which is the minimum number ofpoints that must be changed in order to move an estimator by η : A η ( T ; X n ) = min (cid:40) k : sup Y n ∈D ( X n ,k ) | T ( X n ) − T ( Y n ) | > η (cid:41) , (3)where D ( X n , k ) is the set of all samples that differ from X n by k observations. Unlike the traditional breakdownpoint, the dependence of A η ( T ; X n ) on X n is important. PTR works by proposing a statistic, testing if it is insensitiveand then releasing it if it is, in fact, insensitive. A private version of A η ( T ; X n ) is used to check the sensitivity. Mechanism 4.
Given a statistic T : ( R d ) n → R k , the mechanism that outputs (cid:101) T ( X n ) = (cid:26) ⊥ A η ( T ; X n ) + (cid:15) W ≤ log(2 /δ ) (cid:15) T ( X n ) + η(cid:15) W o.w. is (2 (cid:15), δ ) differentially private and the statistic (cid:101) T ( X n ) = ⊥ if A η ( T ; X n ) + √ . /δ ) (cid:15) Z ≤ . /δ ) (cid:15) T ( X n ) + η √ . /δ ) (cid:15) Z o.w. is (cid:0) (cid:15), e (cid:15) δ + δ (cid:1) differentially private. The release of ⊥ means that the dataset was too sensitive for the statistic to be released. The goal is to choose a T such that releasing ⊥ is incredibly unlikely; A η ( T ; X n ) should be large with high probability.All of the mechanisms discussed thus far can be combined to produce more sophisticated algorithms, combiningtwo or more mechanisms is called composition. One type of composition is computing a function of a differentiallyprivate statistic, where the function is defined independently of the data. Such statistics are also differentially private.It is also true that sums and products of k differentially private procedures each with privacy budget (cid:15) i are (cid:80) ki =1 (cid:15) i -differentially private [Dwork et al., 2006]. This can be improved with advanced composition [Dwork and Roth, 2014]. Theorem 1 (Dwork and Roth [2014]) . For given < (cid:15) < and δ (cid:48) > , the composition of k mechanisms which areeach (cid:18) (cid:15) √ k log(1 /δ (cid:48) ) , δ (cid:19) -differentially private is ( (cid:15), kδ + δ (cid:48) ) -differentially private. A data depth function is a robust, nonparametric tool used for a variety of inference procedures in multivariate spaces,as well as in more general spaces. A data depth function gives meaning to centrality, order and outlyingness in spacesbeyond R . Data depth functions do this by giving all points in the working space a rating based on how central thepoint is in the sample. Precisely, we can write multivariate depth functions as D : R d × F n → R + ; given the empiricaldistribution of a sample F n and a point in the domain, the depth function assigns a real valued depth to that point.Figure 1(a) shows a sample of 20 points labelled by their depth values, we can see that the points in the centre ofthe data cloud have larger values. Note that it is not necessary to restrict the domain of the depth function to pointsin the sample; we can compute depth values for each point in the sample space. The heatmap in Figure 1(a) givesthe depth value for each point in the plot. Writing depth functions as functions of the empirical distribution D( · ; F n )rather than functions of the sample provides a natural definition for the population depth function D( · ; F ). Figure1(b) shows the population depth values when F is the two dimensional standard normal distribution. .050.3 0.30.150.1 0.150.15 0.20.1 0.150.050.1 0.3 0.10.2 0.050.15 0.050.150.05 −3−2−10123−3 −2 −1 0 1 2 3 X Y (a) −3−2−10123−3 −2 −1 0 1 2 3 X Y (b)Figure 1: (a) Sample halfspace depth values, i.e., D( X i ; F n ), are displayed in white text. The heatmapof the sample depth function, i.e., D( · ; F n ), is also displayed. This sample is drawn from a standard, twodimensional normal distribution. (b) Theoretical halfspace depth contours for the standard, two dimensionalnormal distribution. Depth functions provide an immediate definition of order statistics; observations can be ordered by their depthvalues. However, since the ordering of the sample is center outward, the depth-based order statistics have a differentinterpretation than univariate order statistics. Nevertheless, data depth-based order can be used to define multivariateanalogues of many univariate, nonparametric inference procedures. For example, the definition of the depth-basedmedian is Med( F ; D) = argmax x ∈ R d D( x ; F ) . Depth-based medians are generally robust, in the sense that they are not affected by outliers. Many depth-basedmedians have high a breakdown point and favourable properties related to the influence function [Chen and Tyler,2002, Zuo, 2004]. Furthermore, depth-based medians inherent any transformation invariance properties possessed bythe depth function. We can subsequently define sample depth ranks asR i = { X j : D( X j ; F n ) ≤ D( X i ; F n ) } , which are the building block of various multivariate depth based rank tests [Liu and Singh, 1993, Serfling, 2002,Chenouri et al., 2011], as well as providing a method to construct trimmed means [Zuo, 2002]. Depth values canalso be used directly in testing procedures [Li and Liu, 2004]. Depth functions have also been used for visualization,including the bivariate extension of the boxplot (bagplots) and dd-plots, which allow the analysts to visually comparetwo samples of any dimension [Liu et al., 1999, Li and Liu, 2004]. In the same vein of data exploration, we canvisualise multivariate distributions through one dimensional curves based on depth values [Liu et al., 1999]. Inthe past decade this depth-based inference framework has expanded to include solutions to clustering [J¨ornsten,2004, Baidari and Patil, 2019], classification [J¨ornsten, 2004, Lange et al., 2014], outlier detection [Chen et al.,2009, C´ardenas-Montes, 2014], process monitoring [Liu, 1995], change-point problems [Chenouri et al., 2019] anddiscriminant analysis [Chakraborti and Graham, 2019]. In summary, depth functions facilitate a framework forrobust, nonparametric inference in R d . A major motivating factor for this work is that by privatizing depth functions,we consequentially privatize many of the procedures in this framework. This means that private depth values implyaccess to private procedures for nonparametrically estimating location, scale, rank tests, building classifiers and more.In their seminal paper Zuo and Serfling [2000] give a concrete set of mathematical properties which a multivariatedepth function should satisfy in order to be considered a statistical depth function . These properties include1. Affine invariance : This implies any depth based analysis is independent of the coordinate system, particularlythe scales used to measure the data. . Maximality at centres of symmetry : If a distribution is symmetric about a point, then surely this point shouldbe regarded as the most central point.3.
Decreasing along rays : This property ensures that as one moves away from the deepest point, the depthdecreases.4.
Vanishing at infinity : As a point moves toward infinity along some ray, the depth vanishes.A depth function which satisfies these four properties is known as a statistical depth function. The last three propertiesare all related to centrality, where the first is to ensure there is no dependence on the measurement system. Not allpopular depth functions satisfy all four of these properties, but they typically satisfy most of them. Affine invariance,as discussed previously, ensures that the function is not dependent on the coordinate system which, from a practicalpoint of view, means that the measurement scales can be adjusted freely. Maximality at centre means that if adistribution is symmetric about some point θ , the depth is maximal at that point. Think of the median coincidingwith the mean in the univariate case. Decreasing along rays means that as one moves away from the deepest pointalong some ray, i.e., moves away from the centre, the depth decreases. This property can be replaced with uppersemi-continuity. Vanishing at infinity means that as the point moves along a ray to infinity, its depth approaches 0.Note that if all four of these properties are not satisfied, it does not necessarily mean that a depth function is invalidor not useful in data analysis; it is merely a limitation to consider.Aside from coordinate invariance and centrality, there are other properties that are desirable for a depth functionto satisfy. We shall list the main ones here • Robustness:
A robust depth function implies subsequent inference will be robust, and may make it moreamenable to privatization. • Consistency/Limiting Distribution:
Consistency for a population depth value and existence of a limiting dis-tribution is useful for developing inference procedures. • Continuity:
It can be a building block for consistency and for optimizing the depth function. • Computation:
In order to apply depth-based inference, it is necessary that the depth values are computedquickly. Specifically, being able to compute or approximate the depth values in polynomial time with respectto both d and n is useful.On top of having these properties, a depth function that is to be used in the private setting should be insensitive. Inother words, the depth function has low global sensitivity and or highly probable, low local sensitivity.We now introduce several depth functions and evaluate their sensitivities. The first depth function we will discussis halfspace depth [Tukey, 1974]. Definition 2 (Halfspace depth) . Let S d − = { x ∈ R d : (cid:107) x (cid:107) = 1 } be the set of unit vectors in R d . Define thehalfspace depth HD of a point x ∈ R d with respect to some distribution X ∼ F as HD( x ; F ) = inf u ∈ S d − Pr (cid:16) X (cid:62) u ≤ x (cid:62) u (cid:17) . Halfspace depth is the minimum of the projected mass above and below the projection of x , over all univariateprojections. We can interpret the sample depth of some point x as the minimum normalised, univariate, centre-outward rank of x ’s projections amongst the samples’ projections, over all univariate directions. Therefore, if apoint is exchanged, all the ranks are shifted by at most one, and the global sensitivity of the unnormalised halfspacedepth is 1. We get GS(HD) = 1 /n , which leads us to conclude that this depth function is relatively insensitive.In terms of known properties, halfspace depth is a statistical depth function. Its sample depth function is alsouniformly consistent [Mass´e, 2004]. Halfspace depth is frequently cited as being computationally complex Serfling[2006], however, recently an algorithm for computing half-space depth in high dimensions has been proposed [Zuo,2019].We can replace the minimum in Definition 2 with an average [Ramsay et al., 2019]. Definition 3 (Integrated Rank-Weighted Depth) . Define integrated rank-weighted depth as
IRW( x ; F ) = (cid:90) S d − min (cid:16) Pr (cid:16) X (cid:62) u ≤ x (cid:62) u (cid:17) , − Pr (cid:16) X (cid:62) u < x (cid:62) u (cid:17)(cid:17) dν ( u ) , where ν is the uniform measure on S d − . t immediately follows from the discussion on the sensitivity of halfspace depth that GS(IRW) = 1 /n ; thisdepth function has the interpretation of the average normalised univariate centre-outward rank over all projections.Therefore, IRW depth is also insensitive. Aside from being insensitive, IRW depth also vanishes at infinity andis maximal at points of symmetry. It is invariant under similarity transformations, which is weaker than affineinvariance. It is conjectured that this function also has the decreasing along rays property. This depth functionis also continuous, and can be approximately computed very quickly [Ramsay et al., 2019]. This depth function’ssample depths are also uniformly consistent and asymptotically normal under mild assumptions.Another asymptotically normal depth function is simplicial depth, which was introduced by Liu [1988]. Definition 4 (Simplicial Depth.) . Suppose that Y , . . . , Y d +1 are i.i.d. from F . Define simplicial depth as SMD( x ; F ) = Pr( x ∈ ∆( Y , . . . , Y d +1 )) , where ∆( Y , . . . , Y d +1 ) is the simplex with vertices Y , . . . , Y d +1 . We can show that sample simplicial depth has finite global sensitivity. Note thatSMD( x ; F n ) = 1 (cid:0) nd +1 (cid:1) (cid:88) F is angularly symmetric, but fails to satisfythe maximality at centre and decreasing along rays for some discrete distributions.The investigation by Zuo and Serfling [2000] lead to the study of a general and powerful statistical depth functionbased on outlyingness functions. Outlyingness functions O ( · ; F ) : R d → R + measure the degree of outlyingness of apoint. A particular version of depth based on outlyingness is projection depth. Definition 5 (Projection Depth) . Given a univariate translation and scale equivariant location measure µ and aunivariate measure of scale ς which is equivariant and translation invariant, we can define projected outlyingness as O ( x ; F ; µ, ς ) = sup u ∈ S d − (cid:12)(cid:12) u (cid:62) x − µ ( F u ) (cid:12)(cid:12) ς ( F u ) and thus projection depth as, PD( x ; F ; µ, ς ) = 11 + O ( x ; F ; µ, ς ) . Typically, µ and ς refer to the median and median absolute deviation, but properties have been investigated forgeneral µ and ς . One idea is to design µ and ς such that O ( x ; F n ) has low global sensitivity, but that is left to laterwork. Here, we will use either O ( x ; F n ) := O ( x ; F n ; Med , MAD) = sup (cid:107) u (cid:107) =1 (cid:12)(cid:12) u (cid:62) x − Med (cid:0) X (cid:62) n u (cid:1)(cid:12)(cid:12) MAD ( X (cid:62) n u )or O ( x ; F n ) := O ( x ; F n ; Med , IQR) = sup (cid:107) u (cid:107) =1 (cid:12)(cid:12) u (cid:62) x − Med (cid:0) X (cid:62) n u (cid:1)(cid:12)(cid:12) IQR ( X (cid:62) n u ) . The global sensitivities of O , O are unbounded, implying that the global sensitivity of PD is equal to 1. Seeingas the range of projection depth is [0 , O , O have bounded localsensitivities, making projection depth a good candidate for the propose-test-release procedure. Note that we usea slight abuse of notation, where X (cid:62) n u refers to the sample { X (cid:62) u, . . . , X (cid:62) n u } . We may also refer to the empiricaldistribution implied by this sample as F n,u . A thorough investigation of the properties of projection depth was donein the successive papers [Zuo, 2003, 2004]. As a result of these papers, it has been shown that projection depth is astatistical depth function, it also has a limiting distribution and is quite robust against outliers. Private Data Depth
There are several ways in which we could approach privatizing depth functions. A natural and easy way to dothis is to start with a differentially private estimate of the distribution of the data (cid:101) F n and use (cid:101) D( x, (cid:101) F n ), which isdifferentially private. Computing (cid:101) F n relies on existing methods for generating private multidimensional empiricaldistribution functions. This method fails to take advantage of any robustness properties of depth functions; it doesnot leverage the low sensitivities of the depth function itself. This method also does not give a method for computingthe sample depth values D( X , F n ), since D( X , (cid:101) F n ) is not private. Computing the sample depth values is oftenincluded in depth-based inference [see, e.g., Li and Liu, 2004, Lange et al., 2014]. In this paper, we aim to study theadvantages of the robustness properties of depth functions in the private setting and so we forgo study of (cid:101) D( x, (cid:101) F n ).If the global sensitivity of D is finite, then an obvious private estimate is (cid:101) D( x ; F n ) = D( x ; F n ) + V δ,(cid:15) GS(D) , where V δ,(cid:15) is independent noise, from the Laplace or Gaussian distribution with scale calibrated to ensure privacy. Ifa depth function has infinite or large global sensitivity, then since it is likely robust and thus, it makes sense to applya propose-test-release algorithm.We can also produce a more direct privatized estimate of D( · , F ) based on the sample, such as has been done withhistogram bins [Wasserman and Zhou, 2010]. For example, many depth functions are defined based on functions ofprojections: h ( · ; X (cid:62) n u ), u ∈ U n where U n is some set of directions, i.e., U n ⊂ S d − . We could then produce privateversions of X (cid:62) n u or private versions of h ( · ; X (cid:62) n u ), if h is insensitive. The advantage of this approach would be thatthe entire depth function could be privatized at once, including the sample depth values. In the same vein, recallingthat ν is the uniform measure on S d − , there exists an image measure, µ x,F n ( A ) = ν ( h − ( A ; x, F n )) on the Borelsets of the range of the depth, i.e., A ∈ B ( I D ). If µ x,F n is insensitive, then we can construct a differentially privateestimator based on random draws from µ x,F n . This approach is somewhat complicated, and could be tedious sincewe must setup a sampler for each x at which we want to compute a depth value. We leave these projection typeapproaches for future research.From the discussion above, it is clear that a key question is at which points would we like to estimate depthvalues? Algorithms which estimate the depth of single point are of course of interest; they can be composed tocompute depth values at several points privately. Additionally, simple algorithms to compute the depth of a singlepoint can be used as building blocks for private versions of depth based inference procedures. As mention previously,it is also of interest to compute the depth values of the sample points: (cid:98) D ( F n ) := (D( X ; F n ) , D( X ; F n ) , . . . , D( X n ; F n )) . Since X i appears in both arguments, the sensitivity of D( X i ; F n ) is larger than that of D( x ; F n ) x / ∈ X n . Weinvestigate private methods of estimating the vector of sample depth values. A further question is whether or not wecan estimate several depth values from different samples simultaneously, e.g., for use in depth-based clustering. Toelaborate, if X n contains the samples for J groups, then X n = X n ∪ . . . ∪ X Jn . For example, if we privatize the onedimensional projections of the entire sample X (cid:62) n u we can then compute the depth of each point in X n with respectto each group X jn .An important question is how well do the privatized inference procedures perform when compared to their non-private counterparts. Do the privatized depth values converge to their non-private counterparts? If so, what is therate of convergence? Does this private estimate have a limiting distribution? If so, is the limiting distribution differentfrom that of the non-private limiting distribution? We investigate some of these questions in the next section. As mentioned previously, for depth functions with finite global sensitivity, we can make use of the Gaussian andLaplace mechanisms.
Mechanism 5.
For x given independently of the data, the following estimators (cid:101) D ( x ; F n ) = D( x ; F n ) + W GS(D) /(cid:15) and (cid:101) D ( x ; F n ) = D( x ; F n ) + Z GS(D) (cid:112) . /δ ) /(cid:15) are (cid:15) -differentially private and ( (cid:15), δ ) -differentially private respectively. The fact that these mechanisms are differentially private follow from the differential privacy of Mechanisms 1 and 2.The following results are immediate heorem 2. For a given depth function, suppose that √ n ( (cid:101) D (cid:96) ( x ; F n ) − D( x ; F )) d → V D ( x ) , where d → denotes conver-gence in distribution. Suppose we can write GS(D) = C (D) /n where C (D) does not depend on n . Let r > and (cid:96) = 1 or (cid:96) = 2 . For depth values generated under Mechanism 5, the following holds1. For δ n = o ( n − k ) and (cid:15) n = O ( n − r ) , (cid:101) D (cid:96) ( x ; F n ) p → D( x ; F ) , where p → denotes convergence in probability.2. For δ n = o ( n − k ) and (cid:15) n = O ( n − / r ) , √ n ( (cid:101) D (cid:96) ( x ; F n ) − D( x ; F )) d → V D ( x ) . It should be noted that choosing δ = o ( n − k ) and (cid:15) = O ( n − / r ) maintains a reasonable level of privacy. Forexample, choosing (cid:15) ∈ O (1) and δ < /n is “the most-permissive setting under which ( (cid:15), δ )-differential privacy is anontrivial guarantee” [Cai et al., 2019]. From Theorem 2 we can conclude that for large samples and small privacyparameters, depth value estimates generated via Mechanism 5 are minimally affected by privatization.What if we want to calculate depth at a sample point? How we can estimate the vector of depth values at thesample points (cid:98) D ( F n ) = (D( X ; F n ) , D( X ; F n ) , . . . , D( X n ; F n ))privately? The sample values now appear in both arguments of D so we must do a bit more work to computesensitivities. First we look at halfspace and IRW depth. Consider one set of projections X (cid:62) n u and their correspondingempirical distribution F n,u . We want to compute the sensitivity of the vector R n,u = ( R ,u , . . . , R n,u ), with R i,u = min { F n,u ( X (cid:62) i u ) , − F n,u ( X (cid:62) i u − ) } , and F ( x − ) = P ( X < x ). If we change one observation, that will change at most n − R n,u in the odd caseand at most n − R n,u in the even case. Alternatively, we can change one depth value by (cid:98) ( n +1) / (cid:99) /n − /n and (cid:98) ( n + 1) / (cid:99) − /n . This gives thatGS ( R n,u ) = 2 n (cid:18)(cid:22) n + 12 (cid:23) − (cid:19) ≈ , and that GS ( R n,u ) = 1 n (cid:115)(cid:18)(cid:22) n + 12 (cid:23) − (cid:19) + (cid:22) n + 12 (cid:23) − ≈ . Since averaging or taking the supremum over such R n,u does not affect these sensitivities, it follows that forhalfspace depth and IRW depth GS (cid:96) ( D ) = GS (cid:96) ( R n,u ) . Concerning simplicial depth, with respect to some adja-cent dataset, the depth values of the unchanged points can each change by at most ( d + 1) /n . For the pointthat is different, we can bound the sensitivity above by 1 − ( d + 1) /n . It follows that GS ( SMD ) ≤ ( SMD ) ≤ (cid:112) (1 − ( d + 1) /n ) + (( d + 1) /n ) ≈
1, where
SMD is the vector D with D = SMD. In summary, theglobal sensitivities of the vector of sample depth values for halfspace, IRW and simplicial depth are all close to 1.Considering we pay (cid:15)/n for each depth value in Mechanism 5, we pay the same privacy budget for n depth values at n arbitrary points as we do for the depth values at the n sample points. In other words, we do not use any extraprivacy budget for the fact that we are computing the depth at the sample values. We can then use the followingmechanism to estimate the vector of depth values: Mechanism 6.
The following estimators for the vector of depth values of the sample points (cid:101) D ( F n ) = D ( F n ) + ( W , . . . , W n )GS( D ) /(cid:15) and (cid:101) D ( F n ) = D ( F n ) + ( Z , . . . , Z n )GS( D ) (cid:112) . /δ ) (cid:15) are (cid:15) -differentially private and ( (cid:15), δ ) -differentially private respectively. The fact that these mechanisms are differentially private follow from the differential privacy of Mechanisms 1 and2. For the full vector of sample depth values we do not get privacy for free in the limit. Observe that for (cid:13)(cid:13)(cid:13) (cid:101) D ( F n ) − D ( F ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) (cid:101) D ( F n ) − D ( F n ) (cid:13)(cid:13)(cid:13) + (cid:107) D ( F n ) − D ( F ) (cid:107) = (cid:107) ( W , . . . , W n )GS( D ) /(cid:15) (cid:107) + O p ( n / ) = O p ( n / ) . (4) The level of noise is the same order as the sampling error, and therefore must be accounted for in inference procedures.For (cid:101) D with δ ∝ n − k , we have that (cid:13)(cid:13)(cid:13) (cid:101) D ( F n ) − D ( F ) (cid:13)(cid:13)(cid:13) ≤ O p ( n / log / n );the level of noise introduced by the privatization is larger than that of the sampling error.We now turn our attention to a depth function with high global sensitivity: projection depth. For projection depth,we would like to generate private outlyingness values, which have unbounded sensitivity. Note that Med , MAD , IQRare all robust statistics, in the sense that they are not perturbed by extreme data points. This implies that O and have an unlikely chance of worst case sensitivity, which would make projection depth a good candidate for thepropose-test-release framework [Dwork and Lei, 2009, Brunel and Avella-Medina, 2020]. Suppose that IQR ≈ ∀ u .If η (cid:39) max( F − n,u (1 / − F − n,u (1 / − /n ) , F − n,u (1 / /n ) − F − n,u (1 / u then A η ≈ (cid:98) n/ (cid:99) −
1, which means it is very unlikely that Mechanism 4 will return ⊥ . Mechanism 7.
For (cid:96) = 1 or (cid:96) = 2 , define privatized projection depth as (cid:102) PD (cid:96) ( x ; F n ) = 11 + (cid:101) O (cid:96) ( x ; F n ) , where (cid:101) O (cid:96) ( x ; F n ) = (cid:26) ⊥ if A η ( O (cid:96) ( x ; F n ); X n ) + a δ (cid:15) V ≤ b δ (cid:15) O (cid:96) ( x ; F n ) + ηa δ (cid:15) V o.w. where a δ , b δ , the level of privacy and V j are according to Mechanism 4. We can actually show that this algorithm is consistent for the population depth values when using O as the outly-ingness measure. Theorem 3.
Let ξ p,u be the p th quantile of F u . Suppose that for all h > , u | F u ( ξ p,u + h ) − F u ( ξ p,u ) | = M | h | q (1 + O | h | q/ ) with M > , q > , for p = 1 / , / , / . Suppose that sup u ξ p,u < ∞ for p = 1 / , / . For η ∝ log nn / − r with r > , δ n = O ( n − k ) and n / (log log n ) / log δ n (cid:15) n → ∞ we have that | (cid:102) PD ( x ; F n ) − PD( x ; F ; Med , IQR) | p → . Theorem 3 shows we can choose both (cid:15) n and η n decreasing in n and still maintain a consistent estimator. Infact, η n can be chosen to be quite small relative to the size of the sample. In the Laplace case, if the statistic isreleased, recall that the scale parameter is proportional to η/(cid:15) and so, given the statistic is released, Mechanism 7allows for a smaller amount of noise added to the depth value than that of Mechanism 5, however, this is paid forwith approximate differential privacy budget of 2 (cid:15) , rather than a budget of just (cid:15) . The difference in noise is on theorder of (log log n ) − / r (cid:48) , r (cid:48) >
0, therefore, in smaller samples the gain would be negligible.In terms of computing this estimator, the difficulty lies in computing A η for a given dataset. This is non-trivialfor projection depth, as the ratio of estimators makes the computation difficult. We can approximate the depth byinstead of computing O (cid:96) = sup u ∈ S d − | x (cid:62) u − Med( X (cid:62) n u ) | ς (cid:96) ( X (cid:62) n u )we can compute ˆ O (cid:96) = max u ∈ U ,...,U m | x (cid:62) u − Med( X (cid:62) n u ) | ς (cid:96) ( X (cid:62) n u ) , where U j are sampled uniformly from S d − . Then, we can compute the truncated breakdown point of each (cid:98) O U j (cid:96) ( x ) = | x (cid:62) U j − Med( X (cid:62) n U j ) | ς (cid:96) ( X (cid:62) n U j ) , to construct an approximation of the breakdown A η . It may also be possible to compute this estimator exactly usingtechniques from computational geometry, e.g., [Liu and Zuo, 2014].We can present an algorithm to check if A η (cid:16) (cid:98) O U j ( x ); X n (cid:17) ≤ k ∗ , where k ∗ = 1 + b δ (cid:15) − a δ (cid:15) V . Suppose that Y n ∈ D ( X n , k ∗ ). We will actually check if A η ( O u , X n ) > k ∗ . To this end, note that | x − Med( Y (cid:62) n u ) | ≤ max( | x − F − n,u (1 / k ∗ /n ) | , | x − F − n,u (1 / − k ∗ /n ) | ) := up(Med , u )and that | x − Med( Y (cid:62) n u ) | ≥ min( | x − F − n,u (1 / k ∗ /n ) | , | x − F − n,u (1 / − k ∗ /n ) | , | x − m ( u ) | , | x − m ( u ) | ) := lo(Med , u ) , ith m ( u ) being the median of a dataset the same as X (cid:62) n u , except that the smallest k ∗ observations of X (cid:62) n u arereplaced with x (cid:62) u and m ( u ) being the same as m ( u ), except instead the largest k ∗ observations of X (cid:62) n u are replaced.Define B = { F − n,u (3 / k /n ) − F − n,u (1 / k /n ) : − k ∗ ≤ k , k ≤ k ∗ , | k | + | k | = k ∗ } . We can then write lo(IQR , u ) := min
B ≤
IQR( Y (cid:62) n u ) ≤ max B := up(IQR , u ) . Therefore, it holds that (cid:98) O u(cid:96) ( x ) ∈ (cid:20) lo(Med , u )up(IQR , u ) , up(Med , u )lo(IQR , u ) (cid:21) = (cid:104) lo( (cid:98) O u(cid:96) ( x )) , up( (cid:98) O u(cid:96) ( x )) (cid:105) and we can check if max( (cid:98) O u(cid:96) ( x ) − lo( (cid:98) O u(cid:96) ( x )) , up( (cid:98) O u(cid:96) ( x )) − (cid:98) O u(cid:96) ( x )) < η. Then if this holds for all u we must have that A η ( (cid:98) O ( x ; F n ) , X n ) ≥ k ∗ , which gives a lower bound on the truncatedbreakdown point. This lower bound can be used when implementing Mechanism 7, in the ‘test’ portion of thealgorithm.The methods used to construct private depth values discussed in this section can be used to privatize inferenceprocedures based solely on functions of depth values. For example, a common way to compare scale between two mul-tivariate samples, say X n and Y n , is to compute the sample depth values with respect to the empirical distributionof the pooled sample X n ∪ Y n [Li and Liu, 2004, Chenouri et al., 2011]. We can denote this empirical distributionby G n + n . Private depth-based ranks could then be defined as (cid:101) R ji = { X k(cid:96) : (cid:101) D( X k(cid:96) ; G n + n ) ≤ (cid:101) D( X ji ; G n + n ) } , where X ji is the i th observation from sample j . We can use these ranks to privately test for a difference in scalebetween the two groups with the rank sum test statistic, viz. (cid:101) T ( X n ∪ Y n ) = n (cid:88) i =1 (cid:101) R ji . The distribution of such a statistic remains the same under the null hypothesis, and (4) can be used to assess itsperformance under the alternative hypothesis. It is clear that the power will be lowered, as the noise biases thestatistic toward failing to reject the null hypothesis. We can also take a similar approach in multivariate, covariancechange-point models [Chenouri et al., 2019, Ramsay and Chenouri, 2020]. The algorithms of this section cannot beused to compute private depth-based medians, i.e., private maximizers of the depth functions, and so we investigatealgorithms to compute depth-based medians in the next section.
For depth functions with finite global sensitivity, it is natural to estimate the depth-based median using the exponentialmechanism. As such, we could generate an observation from f ( v ; F n ) ∝ exp (cid:18) (cid:15) v ; F n ) (cid:19) , to be used as a private estimate of the D-based median. One issue is that this density is not necessarily valid. Forexample, f ( v ; F n ) = exp (cid:18) (cid:15) v ; F n ) (cid:19)(cid:82) R d exp (cid:18) (cid:15) v ; F n ) (cid:19) , is not a valid density, since (cid:82) R d exp (cid:18) (cid:15) v ; F n ) (cid:19) = ∞ . To see this, note that1 < exp (cid:18) (cid:15) v ; F n ) (cid:19) < ∞ and so even if we transform this to exp (cid:18) − (cid:15) α − HD( v ; F n )) (cid:19) , t is still bounded below for any α . This implies that (cid:90) R d exp (cid:18) (cid:15) v ; F n ) (cid:19) dv = ∞ . Similar results follow for the remaining depth functions, since they all have a range that lies in an interval. If thedata for which we would like to estimate the median is within some compact set B , then we can easily reduce therange of the estimator to B and the density f ( v ; F n ) = exp (cid:18) (cid:15) v ; F n ) (cid:19) { v ∈ B } (cid:82) B exp (cid:18) (cid:15) v ; F n ) (cid:19) dv , is valid. If there is no clear set B in which the median will lie then we propose a Bayesian approach, and recommendusing a prior π ( v ) on the median such that f ( v ; F n ) = exp (cid:18) (cid:15) v ; F n ) (cid:19) π ( v ) (cid:82) R d exp (cid:18) (cid:15) v ; F n ) (cid:19) π ( v ) dv , is a valid density. Seeing as { v ∈ B } normalized by (cid:82) B dv is a special case of a prior, we can summarise thisprocedure as follows: Mechanism 8.
Suppose that
GS(D) = C (D) /n . Suppose also that π ( v ) is a density chosen independently of thedata. Provided f ( v ; F n ) = exp (cid:18) n(cid:15) C (D) D( v ; F n ) π ( v ) (cid:19)(cid:82) R d exp (cid:18) n(cid:15) C (D) D( v ; F n ) (cid:19) π ( v ) dv , is a valid Lebesgue density, a random draw from f ( v ; F n ) is an (cid:15) -differentially private estimate of the depth-basedmedian of X n . It is imperative that this prior is chosen independently of the data or the privacy of the procedure will be violated.For any depth function bounded above and below it is easy to see that this is a valid density. Suppose that the rangeof D is [0,1], then the following inequality holds1 = (cid:90) R d π ( v ) dv ≤ (cid:90) R d exp (cid:18) n(cid:15) C (D) D( v ; F n ) (cid:19) π ( v ) dv ≤ (cid:90) R d exp (cid:18) n(cid:15) C (D) (cid:19) π ( v ) dv = exp (cid:18) n(cid:15) C (D) (cid:19) . Some asymptotics of the exponential mechanism have been investigated by Awan et al. [2019], but their result requiresthat the cost function is twice differentiable and convex. Depth functions do not typically satisfy these requirements.The following lemma is useful for proving some asymptotic results related to the exponential mechanism, when thecost function is not necessarily differentiable, but smooth at the limiting minimizer
Lemma 1.
Let be the zero vector in R d and π ( v ) be a density on R d . Suppose that φ n ( ω, v ) : Ω × R d → R + are asequence of random functions on the probability space (Ω , A , P ) . Assume that1. λ n = Cn r for some C > .2. For s > r , (cid:107) φ n ( ω, · ) − φ ( ω, · ) (cid:107) ∞ = o p ( n − s ) .3. For some α > , φ ( ω, v ) is α -H¨older continuous in a neighborhood around P − almost surely. This means that | φ ( ω, v ) − φ ( ω, ) | ≤ C (cid:107) v (cid:107) α for some constant C .4. φ ( ω, v ) = 0 if and only if v = ; φ is uniquely minimized at v = .5. π ( v ) is a bounded Lebesgue density which is positive in some neighborhood around .Let V n be a sequence of random variables whose measure on ( R d , B ( R d )) is given by Q n ( A ) = (cid:90) Ω (cid:90) A e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) R d e − λ n φ n ( ω,v ) π ( v ) dv dP, for A ∈ B ( R d ) . Then V n p → . emma 1 may be applied outside the context of depth functions, and can be used to prove weak consistency ofan estimator based on the exponential mechanism. When applying Lemma 1, the sequence λ n in Lemma 1 should bereplaced by the ratio of the privacy parameter and the global sensitivity of the cost function. This lemma shows thatsmoother, insensitive cost functions will allow the estimator to be consistent for smaller privacy budgets, providedthat the prior is positive in a region around the maximizer. Additionally, if e − λ n φ n ( ω,v ) is integrable, then we can let π ( v ) = 1 and the result still holds. We can apply Lemma 1 to data depth functions, which results in the followingtheorem. Theorem 4.
Suppose that sup v | D( v ; F n ) − D( v ; F ) | = o p ( n − s ) where s ≥ , GS(D) = C (D) /n , the maximum of D ( x ; F ) occurs uniquely at θ and D is α -H¨older continuous at θ , for some α > . Additionally, suppose that π ( v ) is a bounded Lebesgue density which is positive in a neighborhood around θ . Let ρ := D( θ ; F ) < ∞ , then for (cid:101) T ( X n ) drawn from the density f ( v ; F n ) = exp (cid:18) − (cid:15) n ρ − D( v ; F )) (cid:19) π ( v ) (cid:82) R d exp (cid:18) − (cid:15) n ρ − D( v ; F n )) (cid:19) π ( v ) dv , it holds that1. (cid:101) T ( X n ) p → θ when (cid:15) n = O ( n r (cid:48) ) , r (cid:48) < s − .2. (cid:101) T ( X n ) d → T, when n(cid:15) n → K < ∞ , with the density of T proportional to exp (cid:16) − K ( ρ − D( v ; F ))2 C (D) (cid:17) . Remark 1.
The continuity condition is weak, in the sense that α can be very small. Halfspace depth is α -H¨oldercontinuous if F u are α -H¨older continuous and F is continuous. IRW depth is α -H¨older continuous if F u are α -H¨oldercontinuous. For simplicial depth, we only need F to be α -H¨older continuous. Remark 2.
For many depth functions, we can choose s arbitrarily close to 1/2 and the convergence requirement isstill satisfied. Therefore choices of r (cid:48) such that r (cid:48) is close to -1/2 give the fastest rates at which the privacy parametercan decrease to 0 while maintaining consistency of the estimator. Theorem 4 can then be applied to the three depths of [Tukey, 1974, Liu, 1988, Ramsay et al., 2019]. Theyare all satisfy the uniform consistency requirement for s < / φ X n : R d → R + is some cost function which we would like to minimize.Then, define A η ( φ X n ; X n ) = min (cid:40) k ∈ N : sup Y n ∈D ( X n ; k ) sup x | φ X n ( x ) − φ Y n ( x ) | > η (cid:41) , as the truncated breakdown point of the cost function. This is a direct extension of the truncated breakdown pointof [Brunel and Avella-Medina, 2020] to the functional context; we are essentially using the infinity norm and writing (cid:107) φ X n ( x ) − φ Y n ( x ) (cid:107) ∞ > η . We can write the estimator as follows Mechanism 9.
Suppose that Q X n ( A ) = (cid:90) A exp( − φ X n ( v ) (cid:15) η ) dv (cid:82) R d exp( − φ X n ( v ) (cid:15) η ) dv is a valid measure on the ( R d , B ( R d )) . Then the estimator (cid:101) T ( X n ) = (cid:26) ⊥ if A η ( φ X n ; X n ) + a δ (cid:15) V ≤ b δ (cid:15) (cid:98) T ( X n ) ∼ Q X n o.w. , with a δ , b δ and V as in Mechanism 4 is a differentially private estimate of argmin φ X n ( v ) . Under the Laplace version,the estimator is (2 (cid:15), δ ) -differentially private and under the Gaussian version, the estimator is (2 (cid:15), δ ) -differentiallyprivate. his theorem shows that we can still use PTR with the exponential mechanism. The level of privacy under theGaussian version is slightly higher than that of the original PTR mechanism, which is due to the pure differentialprivacy of the exponential mechanism. Theorem 5.
Suppose that the sequence of random functions φ X n ( v ) : R d → R + satisfy the conditions of Lemma 1and that (cid:90) R d exp (cid:18) − Kφ ( v )2 (cid:19) dv < ∞ , for any K > . Suppose that (cid:101) T ( X n ) is generated according to Mechanism 9. If the sequences (cid:15) n , δ n , η n imply that Pr (cid:18) A η n ( φ X n ; X n ) + V a δ n (cid:15) n ≤ b δ n (cid:15) n + 1 (cid:19) → then it holds that1. (cid:101) T ( X n ) p → argmin v ∈ R d φ ( v ) when (cid:15) n /η n = O ( n r ) , r < / (cid:101) T ( X n ) d → T, when (cid:15) n /η n → K < ∞ with the density of T proportional to exp (cid:16) − Kφ ( v )2 (cid:17) . We can now substitute in the outlyingness function and see how this algorithm works for the purposes of privatelyestimating the projection depth median. A first question is whether or not this density f ( v ) = exp (cid:18) − (cid:15) sup u | v (cid:62) u − Med( X (cid:62) n u ) | /ς ( X (cid:62) n u )2 η (cid:19)(cid:82) R d exp (cid:18) − (cid:15) sup u | v (cid:62) u − Med( X (cid:62) n u ) | /ς ( X (cid:62) n u )2 η (cid:19) dv even exists. If sup u Med( X (cid:62) n u ) < ∞ and inf u ς ( X (cid:62) n u ) > (cid:90) R d exp (cid:18) − (cid:15) sup u | v (cid:62) u − Med( X (cid:62) n u ) | /ς ( X (cid:62) n u )2 η (cid:19) dv ≤ C (cid:90) (cid:107) v (cid:107) > exp (cid:18) − (cid:15) (cid:107) v (cid:107) η (cid:19) dv + C (cid:90) (cid:107) w (cid:107)≤ exp (cid:18) − (cid:15) η (cid:19) dv < ∞ . Unfortunately, immediately using PTR with the exponential mechanism gives no gains in estimating the projectionmedian over using the global sensitivity of projection depth (which is 1). If the points in X n are distinct, we havethat A η ( O (cid:96) ( · ; X n ); X n ) = 1 for any η . To see this, suppose that Y n is a neighboring dataset, with X changed to besome observation such that ς ( Y (cid:62) n u ) (cid:54) = ς ( X (cid:62) n u ). It follows that for any u sup x | O u(cid:96) ( x ; X n ) − O u(cid:96) ( x ; X n ) | ≈ sup x (cid:12)(cid:12)(cid:12)(cid:12) x (cid:62) u ς ( X (cid:62) n u ) − ς ( Y (cid:62) n u ) ς ( X (cid:62) n u ) ς ( Y (cid:62) n u ) (cid:12)(cid:12)(cid:12)(cid:12) = ∞ . In order to estimate the projection depth-based median privately, we can truncate the outlyingness function O ; for (cid:107) x (cid:107) > M n , set O ( x ) = ∞ . Define this function to be O (cid:96) ( x ; X n ; M n ) = (cid:26) O (cid:96) ( x ; X n ) (cid:107) x (cid:107) < M n ∞ (cid:107) x (cid:107) ≥ M n . We can now apply Mechanism 9 and Theorem 5 to O in order to privately estimate the projection depth-basedmedian. The following theorem gives reasonable choices of η and (cid:15) that maintain consistency of the estimate. Theorem 6.
Suppose that O (1) ≥ M n = o ( n / ) , sup u | Med( F u ) | < ∞ , sup u IQR( F u ) < ∞ , inf u IQR( F u ) > and the conditions of Theorem 3 hold. Further, suppose that PD( x ; F ) is uniquely maximized at θ. Let (cid:101) T ( X n ) be theestimator of Mechanism 9 with cost function φ X n ( v ) = O ( v ; X n ; M n ) , then1. (cid:101) T ( X n ) p → θ when (cid:15) n /η n = O ( n r ) , r < / .2. (cid:101) T ( X n ) converges in distribution to a random variable with measure Q ( A ) = (cid:90) A exp( − ( O ( v ; F ) − O ( θ ; F )) (cid:15) n η n ) dv (cid:82) R d exp( − ( O ( v ; F ) − O ( θ ; F )) (cid:15) n η n ) dv when (cid:15) n /η n = K, K < ∞ . The obvious issue is choosing M n in practice, which can be partially informed by the above theorem. Clearly, ifthe data is known to be bounded it is easy to choose M n . If the data is not bounded one can choose M n independentof the data, given domain knowledge. The choice of M n should not depend on the data, which would violate theconsistency theorem; if M n is chosen based on the data, then M n could differ between two datasets, implying that O u(cid:96) ( x ; X n ) − O u(cid:96) ( x ; Y n ) = ∞ for some x and consequentially the truncated breakdown point of the outlyingnessfunction is 1. Computationally, again the difficulty lies in computing A η ( φ X n ; X n ), for which we can use similarmethods as the previous section. Concluding Remarks
We have introduced several mechanisms for private, depth-based inference. These mechanisms include private es-timates of point-wise depth values for population depth functions, such as halfspace and projection depth. Suchmechanisms have been shown to output consistent estimators, even for cases where the privacy budget is small; (cid:15) →
0. Notable is that we have shown that one can get consistent estimates of projection depth, even though it hashigh global sensitivity. We have also introduced algorithms for estimating popular depth-based medians, includingthe simplicial, halfspace, IRW and projection medians. These algorithms all provide differentially private, consistentestimators of the population median under some very mild conditions. These conditions include the existence of aunique depth-based population median and that the population depth function is smooth at its median. The privacybudget is allowed to decrease to 0, provided it is not too fast, e.g., (cid:15) n ∝ n − r , r < / Proof of Theorem 2.
The first property follows directly from consistency of the sample depths and the fact thatGS( D ) →
0. The second case is true for the same reasons.
Proof of Theorem 3.
We show the result for the Laplace case, but the Gaussian case follows the same path. First,we want to show that Pr (cid:18) A η ( O ( x ; F n ) ; X n ) ≤ /δ n ) − W (cid:15) n (cid:19) → . Note that Pr( | W | > log(2 /δ n )) = e − log(2 /δ n ) = ( δ n /
2) = O ( n − k ) , from the properties of the Laplace distribution and the rate of convergence of δ n . We can then writePr (cid:18) A η ( O ( x ; F n ) ; X n ) ≤ /δ n ) − W (cid:15) n (cid:19) ≤ Pr (cid:18) A η ( O ( x ; F n ) ; X n ) ≤ /δ n ) − W (cid:15) n , W > − log(2 /δ n ) (cid:19) + Pr (cid:18) A η ( O ( x ; F n ) ; X n ) ≤ /δ n ) − W (cid:15) n , W < − log(2 /δ n ) (cid:19) ≤ Pr (cid:18) A η ( O ( x ; F n ) ; X n ) ≤ /δ n ) (cid:15) n (cid:19) + O ( n − k ) . Now, let ρ n = 2 log(2 /δ n ) (cid:15) n and we want to show thatPr ( A η ( O ( x ; F n ) ; X n ) ≤ ρ n ) → . It holds that Pr ( A η ( O ( x ; F n ) ; X n ) ≤ ρ n ) = Pr (cid:32) ρ n (cid:91) j =1 sup Y n ∈D ( X n ,j ) | O ( x ; X n ) − O ( x ; Y n ) | ≥ η (cid:33) . onsider the Taylor series expansion of f ( x, y ) = x/y about | x (cid:62) u − Med ( F u ) | / IQR( F u ): O u ( x ; X n ) = | x (cid:62) u − Med ( F u ) | IQR( F u ) + | x (cid:62) u − Med (cid:0) X (cid:62) n u (cid:1) | − | x (cid:62) u − Med ( F u ) | IQR( F u ) − | x (cid:62) u − Med ( F u ) | IQR( F u ) (IQR( X (cid:62) n u ) − IQR( F u )) + R n,u = | x (cid:62) u − Med (cid:0) X (cid:62) n u (cid:1) | IQR( F u ) − | x (cid:62) u − Med ( F u ) | IQR( F u ) (IQR( X (cid:62) n u ) − IQR( F u )) + O p ( n − ) . It is easy to see that R n,u = O p ( n − ), since (IQR( X (cid:62) n u ) − IQR( F u )) = O p ( n − ). We can then write | O u ( x ; X n ) − O u ( x ; Y n ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | x (cid:62) u − Med (cid:0) u (cid:62) X n (cid:1) | IQR( X (cid:62) n u ) − | x (cid:62) u − Med (cid:0) u (cid:62) Y n (cid:1) | IQR( Y (cid:62) n u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | x (cid:62) u − Med (cid:0) X (cid:62) n u (cid:1) | − | x (cid:62) u − Med (cid:0) Y (cid:62) n u (cid:1) | IQR( F u ) − | x (cid:62) u − Med ( F u ) | IQR( F u ) (IQR( X (cid:62) n u ) − IQR( Y (cid:62) n u )) + O p ( n − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | x (cid:62) u − Med (cid:0) X (cid:62) n u (cid:1) | − | x (cid:62) u − Med (cid:0) Y (cid:62) n u (cid:1) | IQR( F u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | x (cid:62) u − Med ( F u ) | IQR( F u ) (IQR( X (cid:62) n u ) − IQR( Y (cid:62) n u )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + O p ( n − ) ≤ | Med (cid:0) Y (cid:62) n u (cid:1) − Med (cid:0) X (cid:62) n u (cid:1) | IQR( F u ) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | x (cid:62) u − Med ( F u ) | IQR( F u ) (IQR( X (cid:62) n u ) − IQR( Y (cid:62) n u )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + O p ( n − ) , where the last line follows from the reverse triangle inequality. Now, recall that Y n differs from X n by at most ρ n + 1points. If F n,u is the empirical distribution corresponding to X (cid:62) n u and G n,u ( x ) = 1 − F n,u ( x ) , it holds that | Med (cid:16) Y (cid:62) n u (cid:17) − Med (cid:16) X (cid:62) n u (cid:17) | ≤ max (cid:8) | F − n (1 / − F − n,u (1 / ρ n + 1) /n ) | , | F − n,u (1 / − F − n,u (1 / − ( ρ n + 1) /n ) | (cid:9) = | F − n,u (1 / − F − n,u (1 / ± ( ρ n + 1) /n ) | = | F − u (1 / − F − n,u (1 /
2) + F − n,u (1 / ± ( ρ n + 1) /n ) − F − u (1 / | = | G n,u (Med( F u )) + R (cid:48)(cid:48) n,u − / − G n,u (Med( F u )) − R (cid:48) n,u + 1 / | = O ( n − / log n ) a.s. . where the second last line and the last line follow from a Bahadur type representation of quantiles, as long as n (cid:16) log(2 /δ n ) (cid:15) n (cid:17) = O ((log log n/n ) / ) [see Theorem 2 on page 2 of de Haan and Taconis-Haantjes, 1979]. However,we know that n (cid:16) log(2 /δ n ) (cid:15) n (cid:17) = O ((log log n/n ) / ) holds from the assumptions on (cid:15) n and δ n . We can showsomething similar for the inter-quartile range by simply replacing 1/2 with 1/4 and 3/4. Now, we must show that (cid:12)(cid:12)(cid:12)(cid:12) sup u O u ( x ; X n ) − sup u O u ( x ; Y n ) (cid:12)(cid:12)(cid:12)(cid:12) → a.s. . We see that (cid:12)(cid:12)(cid:12)(cid:12) sup u O u ( x ; X n ) − sup u O u ( x ; Y n ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ u | O u ( x ; X n ) − O u ( x ; Y n ) | = O ( n − / log n ) a.s. , where the last line follows from the fact that ξ / ,u , ξ / ,u are bounded as functions of u , implying that R (cid:48) n,u , R (cid:48) n,u are also bounded in u . See the proof of Theorem 2’ of [de Haan and Taconis-Haantjes, 1979] for the exact expressionof R (cid:48) n,u , R (cid:48)(cid:48) n,u . This implies that for η ∝ log nn / − r it holds that Pr [ A η ( O ( x ; X n ); X n ) ≤ ρ n ] →
0. Now, since η(cid:15) n W → x ; F n ; Med , IQR) p → PD( x ; F ; Med , IQR) we have that (cid:102) PD ( x ; F n ) p → PD( x ; F ; Med , IQR).For the Gaussian case, we have thatPr( | Z | (cid:112) . /δ n ) | > . /δ n )) ≤ e − log(1 . /δ n ) = 2( δ n / .
25) = O ( n − k ) , rom the properties of the normal distribution and the rate of convergence of δ n . We can then write, using the sameargument as above, Pr (cid:32) A η ( O ( x ; F n ) ; X n ) ≤ . /δ n ) − Z (cid:112) . /δ n ) (cid:15) n (cid:33) ≤ Pr (cid:18) A η ( O ( x ; F n ) ; X n ) ≤ . /δ n ) (cid:15) n (cid:19) + O ( n − k ) . Now, the probability is of the same form of that of the Laplace case, and the same argument applies.
Proof of Lemma 1.
It is clear that e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) R d e − λ n φ n ( ω,v ) π ( v ) dv is a valid density, since e − x is bounded for all x ∈ R + . The goal is to show that Q n converges weakly to ( · ), sincethis is equivalent to V n p → . Note that we use ( B ) as shorthand for { ∈ B } . We use the Portmanteau Theoremand show that for all ( · )-continuity sets A , Q n ( A ) → ( A ). Let B n = { ω : (cid:107) φ n ( ω, · ) − φ ( ω, · ) (cid:107) < n − s } and let Q ωn ( · ) be the measure corresponding to V n , conditional on ω , viz. Q ωn ( A ) = (cid:90) A e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) R d e − λ n φ n ( ω,v ) π ( v ) dv . We can now writelim n →∞ Q n ( A ) = lim n →∞ (cid:90) Ω (cid:90) A e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) R d e − λ n φ n ( ω,v ) π ( v ) dv dP = lim n →∞ (cid:90) B n Q ωn ( A ) dP + lim n →∞ (cid:90) B cn Q ωn ( A ) dP. It is easy to see that 0 ≤ lim n →∞ (cid:90) B cn Q ωn ( A ) dP ≤ lim n →∞ (cid:90) B cn dP = 0 , where the last equality comes from assumption 2. Using this, we can writelim n →∞ Q n ( A ) = lim n →∞ (cid:90) B n Q ωn ( A ) dP + lim n →∞ (cid:90) B cn Q ωn ( A ) dP = lim n →∞ (cid:90) B n Q ωn ( A ) dP = (cid:90) Ω lim n →∞ { ω ∈ B n } Q ωn ( A ) dP, where the last equality follows from dominated convergence theorem, noting that Q ωn ( A ) <
1. We now considerlim n →∞ Q ωn ( A ) for fixed ω ∈ B n . Note that for any ( · )-continuity set, is either an interior point or not in theset. Keeping this in mind, let A I be a ( · )-continuity set such that is interior in A I . We can writelim n →∞ Q ωn ( A I ) = lim n →∞ (cid:90) A I e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) R d e − λ n φ n ( ω,v ) π ( v ) dv = lim n →∞ (cid:90) A I e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) A I e − λ n φ n ( ω,v ) π ( v ) dv + (cid:82) A cI e − λ n φ n ( ω,v ) π ( v ) dv . Observe that for fixed ω ∈ B n , λ n ( φ n ( ω, v ) − φ ( ω, v )) = O ( n r ) o ( n − s ) = o ( n r − s ) := o ( n − β ), where β >
0; we thenhave that λ n ( φ n ( ω, v ) − φ ( ω, v )) = o ( n − β ). Thus, assumption 1 and 2 imply that λ n ( φ n ( ω, v ) − φ ( ω, v )) ≤ Cn − β forsome β > v . Therefore, we can write (cid:90) A I e − λ n φ n ( ω,v ) π ( v ) dv = (cid:90) A I e − λ n ( φ n ( ω,v ) − φ ( ω,v )+ φ ( ω,v ) − φ ( ω, )) π ( v ) dv = (cid:90) A I e − λ n ( φ n ( ω,v ) − φ ( ω,v )) e − λ n ( φ ( ω,v ) − φ ( ω, )) π ( v ) dv ≥ (cid:90) A I e − Cn − β e − λ n ( φ ( ω,v ) − φ ( ω, )) π ( v ) dv. For large n and fixed ξ >
0, the neighborhood N n − ξ ( ) = { x ∈ R d : (cid:107) x (cid:107) < n − ξ } is in A I , since is interior in A I .From assumption 3 (H¨older continuity) we have thatsup v ∈N n − ξ ( ) | φ ( ω, v ) − φ n ( ω, ) | < C d ξα /n ξα . Choose ξ such that α (cid:48) = ξα > r , and write (cid:90) A I e − λ n ( φ ( ω,v ) − φ ( ω, )) e − Cn − β π ( v ) dv ≥ (cid:90) N n − ξ ( ) e − λ n ( φ ( ω,v ) − φ ( ω, )) e − Cn − β π ( v ) dv ≥ C n − dξ e − C n − α (cid:48) λ n e − Cn − β ≥ C n − dξ e − C n − α (cid:48) + r e − Cn − β = O ( n − dξ ) . ote that assumption 5 implies that there exists some N , such that for all n > N , π ( v ) is bounded below on N n − ξ ( )and so π is absorbed into the constant C . Now, consider A cI , a ( · )-continuity set such that is not interior in A cI .There then exists a neighborhood around , call it N k ( ), such that N k ( ) / ∈ A cI . By assumption 3 and 4, we alsohave that φ ( ω, v ) > k (cid:48) > A cI , for some k (cid:48) independent of n . It follows easily that (cid:90) A cI e − λ n φ n ( ω,v ) π ( v ) dv = (cid:90) A cI e − λ n φ ( ω,v ) e − λ n ( φ n ( ω,v ) − φ ( ω,v )) π ( v ) dv = O (1) (cid:90) A cI e − λ n φ ( ω,v ) π ( v ) dv ≤ O (1) (cid:90) A cI e − λ n k (cid:48) π ( v ) dv = O ( e − λ n k (cid:48) ) , where the second equality comes from the fact that λ n ( φ n ( ω, v ) − φ ( ω, v )) = o (1) uniformly in v . The last equalitycomes from the fact that π ( v ) is a density with respect to the Lebesgue measure. It then follows thatlim n →∞ Q ωn ( A I ) = lim n →∞ (cid:90) A I e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) A I e − λ n φ n ( ω,v ) π ( v ) dv + (cid:82) A cI e − λ n φ n ( ω,v ) π ( v ) dv = lim n →∞ O ( n − dξ ) O ( n − dξ ) + O ( e − λ n k (cid:48) ) = 1 , which immediately gives that lim n →∞ Q ωn ( A ) = ( A ) for ( · )-continuity sets A . Then, since lim n →∞ { ω ∈ B n } =1 P -almost surely by assumption 2, we have thatlim n →∞ Q n ( A ) = (cid:90) Ω lim n →∞ { ω ∈ B n } Q ωn ( A ) dP = ( A ) , which implies that Q n d → ( · ) . Lemma 2.
Suppose the conditions of Lemma 1 hold, except that lim n →∞ λ n = K . Let V n be a sequence of randomvariables whose measure on ( R d , B ( R d )) is given by Q n ( A ) = (cid:90) Ω (cid:90) A e − λ n φ n ( ω,v ) π ( v ) dv (cid:82) R d e − λ n φ n ( ω,v ) π ( v ) dv dP, for A ∈ B ( R d ) . Then V n d → Q ( A ) , where Q ( A ) = (cid:90) A e − Kφ ( v ) π ( v ) dv (cid:82) R d e − Kφ ( v ) π ( v ) dv . Proof.
From the proof of Lemma 1, it suffices to look atlim n →∞ (cid:90) A e − λ n φ n ( ω,v ) π ( v ) dv, on B n . On B n , (cid:107) φ n ( ω, v ) − φ ( ω, v ) (cid:107) ∞ →
0, which implies that a n = e − λ n ( φ n ( ω,v ) − φ ( ω,v )) →
1, uniformly in v . Wecan then say thatlim n →∞ (cid:90) A e − λ n φ n ( ω,v ) π ( v ) dv = lim n →∞ (cid:90) A e − λ n φ ( ω,v ) a n π ( v ) dv = (cid:90) A lim n →∞ e − λ n φ ( ω,v ) π ( v ) a n dv = (cid:90) A e − Kφ ( ω,v ) π ( v ) dv, where the last line follows from dominated convergence theorem and the fact that π is a density function. The resultfollows from the proof of Lemma 1. Proof of Theorem 4.
We can write this problem as an application of Lemmas 1 and 2. For the first part, set φ n = ρ − D( v ; F n ). Then, we have that C (D) n(cid:15) n = n r (cid:48) +1 12 C (D) with r (cid:48) + 1 < s . Assumption 1 of Lemma 1 is then satisfiedand the result follows. The second result is a direct application of Lemma 2. roof of Remark 1. Note that for halfspace depth consider two points x, y ∈ R d . If F u are α -H¨older continuous, then | F u ( x (cid:62) u ) − F u ( y (cid:62) u ) | ≤ C | ( x − y ) (cid:62) u | α = C (cid:107) x − y (cid:107) α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) x − y (cid:107) x − y (cid:107) (cid:19) (cid:62) u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α ≤ C (cid:107) x − y (cid:107) α . (5)Now, without loss of generality, suppose that HD( x, F ) > HD( y, F ). Suppose further that u ∗ is such that F u ∗ ( y (cid:62) u ) =inf u F u ( y (cid:62) u ). There exists such a u ∗ because F is continuous, implying that F u is continuous in u , thus, F u iscontinuous function on a compact set. It follows that | HD( x, F ) − HD( y, F ) | = inf u F u ( x (cid:62) u ) − F u ∗ ( y (cid:62) u ) ≤ F u ∗ ( x (cid:62) u ) − F u ∗ ( y (cid:62) u ) ≤ C (cid:107) x − y (cid:107) α . For IRW depth, it holds that | IRW( x, F ) − IRW( y, F ) | = (cid:90) S d − min( F u ( x (cid:62) u ) , − F u ( x (cid:62) u )) − min( F u ( y (cid:62) u ) , − F u ( y (cid:62) u )) dν ( u ) ≤ (cid:90) S d − | F u ( x (cid:62) u ) − F u ( y (cid:62) u ) | + 2 | − F u ( x (cid:62) u ) − F u ( y (cid:62) u ) | dν ( u ) ≤ C (cid:107) x − y (cid:107) α (cid:90) S d − dν ( u ) , which is a result of (5) and the fact that | − F u ( x (cid:62) u ) − F u ( y (cid:62) u ) | ≤ F is α -H¨older continuous, then we must show that Pr( x ∈ ∆( X , . . . , X d +1 ) is also α -H¨oldercontinuous. It is easy to begin with two dimensions. Consider Pr( x ∈ ∆( X , X , X )) − Pr( y ∈ ∆( X , X , X )), asper [Liu, 1990], we need to show that Pr( X X intersects xy ) ≤ C (cid:107) x − y (cid:107) α . In order for this event to occur, wemust have that X is above xy and X is below xy , but both are projected onto the line segment xy when projectedonto the line running through xy . The affine invariance of simplicial depth implies we can assume, without loss ofgenerality, that x and y lie on the axis of the first coordinate. Let x and y be the first coordinates of x and y .Suppose that X is the first coordinate of X . It then follows from α -H¨older continuity of F thatPr( X X intersects xy ) ≤ Pr( x < X < y ) ≤ C | x − y | α ≤ C (cid:107) x − y (cid:107) α . In dimensions greater than two, a similar line of reasoning can be used. We can again assume, without loss ofgenerality, that x and y lie on the axis of the first coordinate. It holds thatPr( x ∈ ∆( X , X , X )) − Pr( y ∈ ∆( X , X , X )) ≤ (cid:32) d + 1 d (cid:33) Pr( A d ) , where A d is the event that the d − d points randomly drawnfrom F , intersects the line segment xy . It is easy to see thatPr( A d ) ≤ Pr( x < X < y ) ≤ C | x − y | α ≤ C (cid:107) x − y (cid:107) α . Proof of Differential Privacy of Mechanism 9.
The proof has the same outline as that of [Brunel and Avella-Medina,2020], as well as the proof that the exponential mechanism is differentially private, which can be found in [McSherryand Talwar, 2007, Dwork and Roth, 2014]. First, assume that it holds | φ X n ( x ) − φ Y n ( x ) | ≤ η ∀ x , then f X n ( v ) /f Y n ( v ) = exp( − φ X n ( v ) (cid:15)/ η )exp( − φ Y n ( v ) (cid:15)/ η ) (cid:82) exp( − φ Y n ( v ) (cid:15)/ η ) dv (cid:82) exp( − φ X n ( v ) (cid:15)/ η ) dv ≤ e (cid:15)/ (cid:82) exp( − φ Y n ( v ) (cid:15)/ η ) dv (cid:82) exp( − φ X n ( v ) (cid:15)/ η ) dv ≤ e (cid:15)/ e (cid:15)/ (cid:82) exp( − φ X n ( v ) (cid:15)/ η ) dv (cid:82) exp( − φ X n ( v ) (cid:15)/ η ) dv = e (cid:15) . Note that, for B ∈ B ( R d ) (the Borel sets with respect to R d ) this implies thatPr( (cid:98) T ( X n ) ∈ B ) ≤ e (cid:15) Pr( (cid:98) T ( Y n ) ∈ B ) . (6) t follows from Brunel and Avella-Medina [2020] that A η ( φ X n ; X n ) has global sensitivity equal to 1, since changingone point can at most change the breakdown by 1. ThenPr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) = Pr (cid:18) A η ( φ X n ; X n ) + 1 (cid:15) V ≥ /δ ) (cid:15) , (cid:98) T ( X n ) ∈ B (cid:19) ≤ e (cid:15) Pr (cid:18) A η ( φ Y n ; Y n ) + 1 (cid:15) V ≥ /δ ) (cid:15) (cid:19) Pr( (cid:98) T ( X n ) ∈ B ) ≤ e (cid:15) Pr (cid:18) A η ( φ Y n ; Y n ) + 1 (cid:15) V ≥ /δ ) (cid:15) (cid:19) Pr( (cid:98) T ( Y n ) ∈ B )= e (cid:15) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:17) . The first inequality is from independence and the fact that A η ( φ X n ; X n ) + (cid:15) V is an (cid:15) -differentially private estimator.The second inequality is from (6). Now what if there exists an x such that | φ X n ( x ) − φ Y n ( x ) | ≥ η ? This implies that A η ( φ X n ; X n ) = 1 andPr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) ≤ Pr (cid:18) A η ( φ X n ; X n ) + 1 (cid:15) V ≥ /δ ) (cid:15) (cid:19) = Pr ( V ≥ log(2 /δ )) = δ ≤ δ + e (cid:15) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:17) . This implies that we get (2 (cid:15), δ ) differential privacy if B is restricted to B ( R d ). For completeness, we need to includesets of the form B = B (cid:48) ∪ {⊥} , where B (cid:48) ∈ B ( R d ). ConsiderPr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) = Pr (cid:18) (cid:98) T ( X n ) ∈ B (cid:48) , A η ( φ X n ; X n ) + 1 (cid:15) V ≤ /δ ) (cid:15) (cid:19) + Pr (cid:18) A η ( φ X n ; X n ) + 1 (cid:15) V > /δ ) (cid:15) (cid:19) ≤ e (cid:15) (cid:18) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:48) (cid:17) + Pr (cid:18) A η ( φ Y n ; Y n ) + 1 (cid:15) V > /δ ) (cid:15) (cid:19)(cid:19) + δ = e (cid:15) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:17) + δ. The first inequality comes from the fact that we get (2 (cid:15), δ ) differential privacy if B is restricted to B ( R d ) and thefact that A η ( φ Y n ; Y n ) + (cid:15) V is (cid:15) -differentially private.Now, suppose that V, a δ and b δ correspond to the Gaussian version of PTR. Then, following the same steps asfor the Laplace version gives, for B ∈ B ( R d ),Pr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) ≤ e (cid:15) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:17) + δ, when (cid:107) φ X n − φ Y n (cid:107) ∞ < η . When (cid:107) φ X n − φ Y n (cid:107) ∞ ≥ η ,Pr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) ≤ Pr (cid:16) Z ≥ (cid:112) . /δ ) (cid:17) ≤ δ. We then have that Pr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) ≤ e (cid:15) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:17) + δ. Again, we need to include sets of the form B = B (cid:48) ∪ {⊥} , where B (cid:48) ∈ B ( R d ). ConsiderPr (cid:16) (cid:101) T ( X n ) ∈ B (cid:17) = Pr (cid:16) (cid:101) T ( X n ) ∈ B (cid:48) (cid:17) + Pr (cid:32) A η ( φ X n ; X n ) + (cid:112) . /δ ) (cid:15) Z > . /δ ) (cid:15) (cid:33) ≤ e (cid:15) (cid:32) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:48) (cid:17) + Pr (cid:32) A η ( φ Y n ; Y n ) + (cid:112) . /δ ) (cid:15) Z > . /δ ) (cid:15) (cid:33)(cid:33) + 2 δ = e (cid:15) Pr (cid:16) (cid:101) T ( Y n ) ∈ B (cid:17) + 2 δ. Proof of Theorem 5.
The only difference, from the previous proofs of results in this section is that we do not have aprior π. The assumed existence of the measures Q X n and Q ( A ) = (cid:90) A exp( − K φ ( v )2 ) (cid:82) R d exp( − K φ ( v )2 ) dv dv remedy this. The proofs of Lemma 1 and Lemma 2 would then imply that (cid:101) T ( X n ) satisfies the above convergenceresults if P ( (cid:101) T ( X n ) = ⊥ ) → n → ∞ . This statement is, however, assumed by the theorem. roof of Theorem 6. First, note that because sup u | Med( F u ) | < ∞ , sup u IQR( F u ) < ∞ and inf u IQR( F u ) >
0, wehave that O ( x, F ) ≥ (cid:107) x (cid:107) − sup (cid:107) u (cid:107) =1 µ ( F u )sup (cid:107) u (cid:107) =1 σ ( F u ) , see page 1477 of [Zuo, 2003]. It immediately follows that (cid:90) R d exp (cid:18) − (cid:15) n O ( v ; F )2 η n (cid:19) dv ≤ C (cid:90) R d exp (cid:18) − (cid:15) n (cid:107) v (cid:107) η n (cid:19) dv < ∞ . For the density (cid:90) R d exp (cid:18) − (cid:15) n O ( v, X n ; M n )2 η n (cid:19) dv = (cid:90) (cid:107) v (cid:107)
0, [seeRemark 2.5 of Zuo, 2003]. Assumption 4 is implied by the theorem assumptions. We need to show thatPr (cid:18) A η n ( φ X n ; X n ) + V a δ n (cid:15) n ≤ b δ n (cid:15) n + 1 (cid:19) → . First, suppose that (cid:107) x (cid:107) < M n . From proof of Theorem 3, we have directly thatlim n →∞ Pr (cid:18) A η n ( φ X n ; X n ) + V a δ n (cid:15) n ≤ /δ n ) (cid:15) n (cid:19) = 0 . The fact that (cid:107) x (cid:107) < M n allows us to bounded the remainder in the Taylor series expansion used in the proof ofTheorem 3. Now, suppose that (cid:107) x (cid:107) > M n , which implies that O u(cid:96) ( x ; X n ) − O u(cid:96) ( x ; Y n ) = 0, which implies thatPr (cid:18) A η n ( φ X n ; X n ) + V a δ n (cid:15) n ≤ /δ n ) (cid:15) n (cid:19) = 0 . The conditions of Theorem 5 are satisfied and the result follows.
References
M. Avella-Medina. Privacy-preserving parametric inference: A case for robust statistics.
Journal of the AmericanStatistical Association , pages 1–45, 2019. ISSN 0162-1459. doi: 10.1080/01621459.2019.1700130. 1M. Avella-Medina and V.-E. Brunel. Differentially private sub-Gaussian location estimators. arXiv e-prints , pages1–16, 2019. URL http://arxiv.org/abs/1906.11923 . 1J. Awan, A. Kenney, M. Reimherr, and A. Slavkovi´c. Benefits and pitfalls of the exponential mechanism withapplications to hilbert spaces and functional pca. arXiv e-prints , art. arXiv:1901.10864, Jan. 2019. 1, 12I. Baidari and C. Patil. K-data depth based clustering algorithm. In
Computational Intelligence: Theories, Applica-tions and Future Directions , volume 1 of
Advances in Intelligent Systems and Computing , pages 13–24. Springer,Singapore, 2019. doi: 10.1007/978-981-13-1132-1 2. 5B. Balle and Y.-X. Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration andoptimal denoising. arXiv e-prints , art. arXiv:1805.06530, May 2018. 3A. Beimel, S. Moran, K. Nissim, and U. Stemmer. Private center points and learning of halfspaces. arXiv e-prints ,art. arXiv:1902.10731, Feb. 2019. 2V.-E. Brunel and M. Avella-Medina. Propose, test, release: Differentially private estimation with high probability. arXiv e-prints , art. arXiv:2002.08774, Feb. 2020. 1, 2, 4, 10, 13, 15, 19, 20 . T. Cai, Y. Wang, and L. Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation withdifferential privacy. arXiv e-prints , art. arXiv:1902.04495, Feb. 2019. 1, 9M. C´ardenas-Montes. Depth-based outlier detection algorithm. In M. Polycarpou, A. C. P. L. F. de Carvalho, J.-S.Pan, M. Wo´zniak, H. Quintian, and E. Corchado, editors, Hybrid Artificial Intelligence Systems , pages 122–132,Cham, 2014. Springer International Publishing. ISBN 978-3-319-07617-1. 5S. Chakraborti and M. A. Graham. Nonparametric (distribution-free) control charts: An updated overview and someresults.
Quality Engineering , pages 1–22, may 2019. ISSN 0898-2112. doi: 10.1080/08982112.2018.1549330. URL . 5K. Chaudhuri and D. Hsu. Convergence rates for differentially private statistical estimation. arXiv e-prints , art.arXiv:1206.6395, June 2012. 1Y. Chen, X. Dang, H. Peng, and H. Bart. Outlier detection with the kernelized spatial depth function.
IEEETransactions on Pattern Analysis and Machine Intelligence , 31:288–305, 2009. 5Z. Chen and D. E. Tyler. The influence function and maximum bias of Tukey’s median.
Annals of Statistics , 30(6):1737–1759, 2002. ISSN 00905364. doi: 10.1214/aos/1043351255. 1, 5S. Chenouri, C. G. Small, and T. J. Farrar. Data depth-based nonparametric scale tests.
Canadian Journal ofStatistics , 39(2):356–369, 2011. doi: 10.1002/cjs.10099. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cjs.10099 . 5, 11S. Chenouri, A. Mozaffari, and G. Rice. Robust multivariate change point analysis based on data depth.
To Appearin Canadian Journal of Statistics , Under Revi, 2019. 5, 11X. Dang, R. Serfling, and W. Zhou. Influence functions of some depth functions, and application to depth-weighted L-statistics.
Journal of Nonparametric Statistics , 21(1):49–66, jan 2009. ISSN 1048-5252. doi:10.1080/10485250802447981. URL . 1L. de Haan and E. Taconis-Haantjes. On Bahadur’s representation of sample quantiles.
Annals of the Instituteof Statistical Mathematics , 31(2):299–308, dec 1979. ISSN 0020-3157. doi: 10.1007/BF02480286. URL http://link.springer.com/10.1007/BF02480286 . 16C. Dwork and J. Lei. Differential privacy and robust statistics.
Proceedings of the 41st annual ACM symposium onSymposium on theory of computing - STOC ’09 , page 371, 2009. ISSN 07378017. doi: 10.1145/1536414.1536466.URL http://portal.acm.org/citation.cfm?doid=1536414.1536466 . 1, 2, 4, 10C. Dwork and A. Roth. The algorithmic foundations of differential privacy.
Foundations and Trends in TheoreticalComputer Science , 9(3–4):211–407, 2014. ISSN 1551-305X. doi: 10.1561/0400000042. URL http://dx.doi.org/10.1561/0400000042 . 2, 3, 4, 19C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In
Theoryof Cryptography Conference , pages 265–284, 2006. 3, 4C. Dwork, A. Smith, T. Steinke, and J. Ullman. Exposed! a survey of attacks on private data.
An-nual Review of Statistics and Its Application , 4(1):61–84, mar 2017. ISSN 2326-8298. doi: 10.1146/annurev-statistics-060116-054123. URL . 1L. D¨umbgen. Limit theorems for the simplicial depth.
Statistics & Probability Letters , 14(2):119 – 128, 1992. ISSN0167-7152. doi: https://doi.org/10.1016/0167-7152(92)90075-G. URL . 13Y. Gao and O. Sheffet. Private approximations of a convex hull in low dimensions. arXiv e-prints , art.arXiv:2007.08110, July 2020. 2R. J¨ornsten. Clustering and classification based on the L data depth. Journal of Multivariate Analysis , 90(1):67–89,jul 2004. ISSN 0047259X. doi: 10.1016/j.jmva.2004.02.013. URL . 5T. Lange, K. Mosler, and P. Mozharovskyi. Fast nonparametric classification based on data depth.
Statistical Papers ,55(1):49–69, feb 2014. ISSN 0932-5026. doi: 10.1007/s00362-012-0488-4. URL http://link.springer.com/10.1007/s00362-012-0488-4 . 5, 8 . Lei. Differentially private m-estimators. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger,editors, Advances in Neural Information Processing Systems , volume 24, pages 361–369. Curran Associates, Inc.,2011. 1J. Li and R. Y. Liu. New nonparametric tests of multivariate locations and scales using data depth.
Statistical Science ,19(4):686–696, nov 2004. ISSN 0883-4237. doi: 10.1214/088342304000000594. URL http://projecteuclid.org/euclid.ss/1113832733 . 5, 8, 11R. Y. Liu. On a notion of simplicial depth.
Proceedings of the National Academy of Sciences , 85(6):1732–1734, 1988.ISSN 0027-8424. doi: 10.1073/pnas.85.6.1732. URL . 7, 13R. Y. Liu. On a notion of data depth based on random simplices.
Annals of Statistics. , 18(1):405–414, 03 1990. doi:10.1214/aos/1176347507. URL https://doi.org/10.1214/aos/1176347507 . 2, 19R. Y. Liu. Control charts for multivariate processes.
Journal of the American Statistical Association , 90(432):1380–1387, dec 1995. ISSN 0162-1459. doi: 10.1080/01621459.1995.10476643. URL . 5R. Y. Liu and K. Singh. A quality index based on data depth and multivariate rank tests.
Journal of the AmericanStatistical Association , 88(421):252–260, 1993. ISSN 01621459. URL . 5R. Y. Liu, J. M. Parelius, and K. Singh. Multivariate analysis by data depth: Descriptive statistics, graphics andinference.
The Annals of Statistics , 27(3):783–840, 1999. ISSN 00905364. URL . 5X. Liu and Y. Zuo. Computing projection depth and its associated estimators.
Statistics and Com-puting , 24(1):51–63, jan 2014. ISSN 0960-3174. doi: 10.1007/s11222-012-9352-6. URL https://link-springer-com.proxy.lib.uwaterloo.ca/content/pdf/10.1007{%}2Fs11222-012-9352-6.pdfhttp://link.springer.com/10.1007/s11222-012-9352-6 . 10J.-C. Mass´e. Asymptotics for the tukey depth process, with an application to a multivariate trimmed mean.
Bernoulli ,10(3):397–419, 06 2004. doi: 10.3150/bj/1089206404. URL https://doi.org/10.3150/bj/1089206404 . 6, 13F. McSherry and K. Talwar. Mechanism design via differential privacy. In , pages 94–103. IEEE, oct 2007. ISBN 0-7695-3010-9. doi: 10.1109/FOCS.2007.4389483. 3, 19K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In
Proceedingsof the Thirty-Ninth Annual ACM Symposium on Theory of Computing , STOC ’07, page 75–84, New York, NY,USA, 2007. Association for Computing Machinery. ISBN 9781595936318. doi: 10.1145/1250790.1250803. URL https://doi.org/10.1145/1250790.1250803 . 3K. Ramsay and S. Chenouri. Robust, multiple change-point detection for covariance matrices using data depth. arXive-prints , art. arXiv:2011.09558, Nov. 2020. 11K. Ramsay, S. Durocher, and A. Leblanc. Integrated rank-weighted depth.
Journal of Multivariate Analysis , 173:51– 69, 2019. ISSN 0047-259X. doi: https://doi.org/10.1016/j.jmva.2019.02.001. URL . 2, 6, 7, 13M. Romanazzi. Influence function of halfspace depth.
Journal of Multivariate Analysis , 77(1):138–161, 2001. ISSN0047259X. doi: 10.1006/jmva.2000.1929. 1R. Serfling. A depth function and a scale curve based on spatial quantiles. In
Statistical Data Analysis Based onthe L -Norm and Related Methods , pages 25–38. Birkh¨auser Basel, Basel, 2002. doi: 10.1007/978-3-0348-8201-9 3.URL http://link.springer.com/10.1007/978-3-0348-8201-9{_}3 . 5R. J. Serfling. Depth functions in nonparametric multivariate inference. Data Depth: Robust Multivariate Analysis,Computational Geometry, and Applications , pages 1–16, 2006. 6J. W. Tukey. Mathematics and the picturing of data. In
Proceedings of the International Congress of Mathematicians ,1974. 2, 6, 13L. Wasserman and S. Zhou. A statistical framework for differential privacy.
Journal of the American StatisticalAssociation , 105(489):375–389, 2010. ISSN 01621459. doi: 10.1198/jasa.2009.tm08651. 1, 8 . Zuo. Multivariate trimmed means based on data depth. In Y. Dodge, editor, Statistical Data Analysis Based onthe L1-Norm and Related Methods , pages 313–322, Basel, 2002. Birkh¨auser Basel. ISBN 978-3-0348-8201-9. 5Y. Zuo. Projection-based depth functions and associated medians.
The Annals of Statistics , 31(5):1460–1490, 102003. doi: 10.1214/aos/1065705115. URL https://doi.org/10.1214/aos/1065705115 . 2, 7, 15, 21Y. Zuo. Influence function and maximum bias of projection depth based estimators.
Annals of Statistics , 32(1):189–218, 2004. 1, 5, 7Y. Zuo. A new approach for the computation of halfspace depth in high dimensions.
Communications in Statistics- Simulation and Computation , 48(3):900–921, mar 2019. ISSN 0361-0918. doi: 10.1080/03610918.2017.1402040.URL . 6Y. Zuo and R. Serfling. General notions of statistical depth function.
Annals of Statistics , 28(2):461–482, 2000. 5, 7,13, 28(2):461–482, 2000. 5, 7,13