[PDF] Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data

Abstract

Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.

Full PDF

DDeconvoluting Kernel Density Estimation and Regression forLocally Diﬀerentially Private Data

Farhad Farokhi ∗ August 31, 2020

Abstract

Local diﬀerential privacy has become the gold-standard of privacy literature for gathering or releasingsensitive individual data points in a privacy-preserving manner. However, locally diﬀerential data cantwist the probability density of the data because of the additive noise used to ensure privacy. In fact, thedensity of privacy-preserving data (no matter how many samples we gather) is always ﬂatter in compari-son with the density function of the original data points due to convolution with privacy-preserving noisedensity function. The eﬀect is especially more pronounced when using slow-decaying privacy-preservingnoises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This isan important challenge facing social scientists due to the use of diﬀerential privacy in the 2020 Census inthe United States. In this paper, we develop density estimation methods using smoothing kernels. Weuse the framework of deconvoluting kernel density estimators to remove the eﬀect of privacy-preservingnoise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally diﬀerentially private data. We demonstrate theperformance of the developed methods on ﬁnancial and demographic datasets.

Introduction

Government regulations, such as the roll-out of the General Data Protection Regulation in the EuropeanUnion (EU) , the California Consumer Privacy Act , and the development of the Data Sharing and ReleaseBill in Australia increasingly prohibit sharing customers data without explicit consent [1].A strong candidate for ensuring privacy is diﬀerential privacy. Diﬀerential privacy intuitively uses ran-domization to provide plausible deniability for the data of an individual by ensuring that the statistics ofprivacy-preserving outputs do not change signiﬁcantly by varying the data of an individual [2,3]. Companieslike Apple , Google , Microsoft , and LinkedIn have rushed to develop projects and to integrate diﬀerentialprivacy into their products. Even, the US Census Bureau has decided to implement diﬀerential privacy in2020 Census [4]. Of course, this has created much controversy pointing to “ripple eﬀect on the many publicand private organizations that conduct surveys based on census data” [5].A variant of diﬀerential privacy is local diﬀerential privacy in which all data points are randomized beforebeing used by the aggregator, who attempts to infer the data distribution or some of its properties [6–8].This is in contrast with diﬀerential privacy in which the data is ﬁrst processed and then obfuscated by noise.Local diﬀerential privacy ensures that the data is kept private from the aggregator by adding noise to theindividual data entries before the aggregation process. This is a preferred choice when dealing with untrusted ∗ The author is with the Department of Electrical and Electronic Engineering at the University of Melbourne. e-mail:[email protected] https://gdpr-info.eu https://oag.ca.gov/privacy/ccpa https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-at-linkedin a r X i v : . [ m a t h . S T ] A ug ggregators, e.g., third party service providers or commercial retailers with ﬁnancial interests, or when it isdesired to release an entire dataset publicly for research in a privacy-preserving manner [9].Locally diﬀerential data can signiﬁcantly distort our estimates of the probability density of the databecause of the additive noise used to ensure privacy. The density of privacy-preserving data can becomeﬂatter in comparison with the density function of the original data points due to convolution of its densitywith privacy-preserving noise density. The situation can be even more troubling when using slow-decayingprivacy-preserving noises, such as the Laplace noise. This concern is true irrespective of how many samplesare gathered. This can result in under/over-estimation of the heavy-hitters, a common and worrying criticismof using diﬀerential privacy in the US Census [10].Estimating probability distributions/densities under diﬀerential privacy is of extreme importance as it isoften the ﬁrst step in gaining more important insights into the data, such as regression analysis. However,most of the existing work on probability distributions estimation based on locally diﬀerential private datafocuses on categorical data [11–15]. For categorical data (in contrast with numerical data), the privacy-preserving noise is no longer additive, e.g., the so-called exponential mechanism [16] or other boutiquediﬀerential privacy mechanisms [17] are often employed that are not on the oﬀer in the 2020 US Census.The work on continuous domains is often done by binning or quantizing the domain. However, ﬁnding theoptimal number of bins or quantization resolution depending on privacy parameters, data distribution, andnumber of data points is a challenging task.In this paper, we take a diﬀerent approach to density estimation by using kernels and thus eliminat-ing the need to quantize the domain. We particularly use the framework of deconvoluting kernel densityestimators [18–21] to remove the eﬀect of privacy-preserving noise, which is often in the form of Laplacenoise [22]. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables [23–25] to develop regression models based on locally diﬀerentially private data. These areimportant challenges facing social science researchers and demographers in the face of changes administeredin the 2020 Census in the United States [4]. Methods

Consider independently distributed data points { x [ i ] } ni =1 ⊂ R q , for some ﬁxed dimension q ≥

1, fromcommon probability density function φ x . Each data point x [ i ] ∈ R q belongs to an individual. Under noprivacy restrictions, the data points can be provided to the central aggregator to construct an estimate ofthe density φ x denoted by (cid:98) φ x . We may use kernel K , which is a bounded even probability density function,to generate the density estimate (cid:98) φ x . A widely recognized example of a kernel is the Gaussian kernel [26] in K ( x ) = 1 (cid:112) (2 π ) q exp (cid:18) − x (cid:62) x (cid:19) . (1)In the big data regime n (cid:29)

1, the choice of the kernel is not crucial to the accuracy of kernel densityestimators so long as it meets the conditions in [18]. In this paper, we keep the kernel general. By usingkernel K , we can construct the estimate (cid:98) φ np x ( x ) = 1 nh q n (cid:88) i =1 K (( x − x [ i ]) /h ) , (2)where h > h → n → ∞ . The optimalrate of decay for the bandwidth has been established for families of distributions [18, 21].As discussed in the introduction, due to privacy restrictions, the exact data points { x [ i ] } ni =1 might notbe available to generate the density estimate in (2). The aggregator may only have access to noisy versionsof these data points: z [ i ] = x [ i ] + n [ i ] , (3)where n [ i ] is a privacy-preserving additive noise. To ensure diﬀerential privacy, Laplace additive noises isoften used [22]. For any probability density φ , we use the notation supp( φ ) to denote its support set, i.e.,supp( φ ) := { ξ : φ ( ξ ) > } . 2 ssumption 1 (Bounded Support) . supp( φ x ) ⊆ (cid:81) qi =1 [ x i , x i ] for ﬁnite constants x i ≤ x i . Assumption 1 is without loss of generality as we are always dealing with bounded domains in socialsciences with a priori known bounds on the data (e.g., the population of a region).

Deﬁnition 1 (Local Diﬀerential Privacy) . The reporting mechanism in (3) is (cid:15) -(locally) diﬀerentially privatefor (cid:15) ≥ if P { x [ i ] + n [ i ] ∈ Z| x [ i ] = x } ≤ exp( (cid:15) ) P { x [ i ] + n [ i ] ∈Z| x [ i ] = x (cid:48) } , ∀ x , x (cid:48) ∈ supp( φ x ) , for any Borel-measurable set Z ⊆ R q . Deﬁnition 1 ensures that the statistics of privacy-preserving output x [ i ] + n [ i ], determined by its distri-bution, do not change “signiﬁcantly” (the magnitude of change is bounded by the privacy parameter (cid:15) ) ifthe data of individual x [ i ] changes. If (cid:15) →

0, the output becomes more noisy and a higher privacy guaranteeis achieved. Laplace additive noise is generally used to ensure diﬀerential privacy. This is formalized in thefollowing theorem, which is borrowed from [22].

Theorem 1.

Let { n [ i ] } ni =1 be distributed according to the common multivariate Laplace density: φ n ( n ) = 12 q (cid:81) qj =1 b j exp  − q (cid:88) j =1 | n j | b j  , where n j is the j -th component of n ∈ R q . The reporting mechanism in (3) is (cid:15) -locally diﬀerentially privateif b j = q ( x j − x j ) /(cid:15) for j ∈ { , . . . , q } . In what follows, we assume that the reporting policy in Theorem 1 is used to generate locally diﬀerentiallyprivate data points. Since { n [ i ] } ni =1 are distributed according to the common density φ n ( n ), { z [ i ] } qi =1 wouldalso follow a common probability density, which is denoted by φ z . Note thatΦ z ( t ) = Φ x ( t )Φ n ( t ) , (4)where Φ z , Φ x , and Φ n are the characteristic functions of φ z , φ x , and φ n . Using (4), we can use anyapproximation of Φ z to construct an approximation of Φ x and thus estimate φ x . If we use kernel K forestimating density of z [ i ], ∀ i , we get (cid:98) φ z ( z ) = 1 nh q n (cid:88) i =1 K (( z − z [ i ]) /h ) . Here, (cid:98) φ z is used to denote the approximation of φ z . The characteristic function of (cid:98) φ z is given by (cid:98) Φ z ( t ) =Φ K ( h t ) (cid:98) Φ( t ) , where Φ K ( t ) is the characteristic function of K and (cid:98) Φ( t ) is the empirical characteristic function of measure-ments { z [ i ] } ni =1 , deﬁned as (cid:98) Φ( t ) = 1 n n (cid:88) i =1 exp (cid:0) i t (cid:62) z [ i ] (cid:1) . Therefore, the characteristic function of (cid:98) φ x is given by (cid:98) Φ x ( t ) = Φ K ( H t ) (cid:98) Φ( t )Φ n ( t )3urther, note that Φ n ( t ) = E (cid:8) exp (cid:0) i t (cid:62) n (cid:1)(cid:9) = E { exp ( it n ) exp ( it n ) · · · exp ( it q n q ) } = E { exp ( it n ) } E { exp ( it n ) } · · · E { exp ( it q n q ) } = q (cid:89) j =1

11 + b j t j , where t j is the j -th component of t ∈ R q . We get (cid:98) φ x ( x ) = 1 nh q n (cid:88) i =1 (cid:98) K h (( x − z [ i ]) /h ) , (5)where (cid:98) K h ( x ) = 1(2 π ) q (cid:90) R q exp( − i t (cid:62) x ) Φ K ( t )Φ n ( t /h ) d t = 1(2 π ) q (cid:90) R q exp( − i t (cid:62) x ) q (cid:89) j =1 (cid:18) b h t j (cid:19) Φ K ( t )d t = q (cid:89) j =1 (cid:32) − b j h ∂ ∂x j (cid:33) K ( x ) , where x j is the j -th component of x ∈ R q .Under appropriate conditions on the kernel K [18], we can see that E { (cid:98) φ x ( x ) |{ x i } ni =1 } = (cid:98) φ np x ( x ) . (6)Therefore, (cid:98) φ x ( x ) in (5) is eﬀectively an unbiased estimate of (cid:98) φ np x ( x ) in (2). In average, we are cancelingthe eﬀect of the diﬀerential privacy noise. Furthermore, if h scales according to n − / , (cid:98) φ x ( x ) is a consistentestimator of φ x as n → ∞ , i.e., (cid:98) φ x ( x ) converges φ x point-wise for all x ∈ supp( φ x ).For regression analysis, we consider independently distributed data points { ( x [ i ] , y [ i ]) } ni =1 from commonprobability density function. We would like to understand the relationship between inputs x [ i ] and outputs y [ i ] for all i . Similarly, we assume that we can only access noisy privacy-preserving inputs { z [ i ] } ni =1 insteadof accurate inputs { x [ i ] } ni =1 . Following the argument above, we can also construct the Nadaraya-Watsonkernel regression (see, e.g., [27]) as (cid:98) m ( x ) := (cid:80) ni =1 (cid:98) K h (( x − z [ i ]) /h ) y [ i ] (cid:80) ni =1 (cid:98) K h (( x − z [ i ]) /h ) . (7)Under appropriate conditions on the kernel K and the bandwidth h [25], (cid:98) m ( x ) converges to E { y | x } almostsurely. In practice the bandwidth can be computed by minimizing the cross-validation cost, i.e., the error ofestimating each y [ (cid:96) ] using the Nadaraya-Watson kernel regression constructed from { ( z [ i ] , y [ i ]) } i ∈{ ,...,n }\{ (cid:96) } averaged over all choices of (cid:96) . Results

In this section, we demonstrate the performance of the developed methods on ﬁnancial and demographicdatasets. 4 .5 3 3.5 4 4.5 5 5.5 6 6.500.20.40.60.811.21.4 log(credit rating − p r o b a b ili t y d e n s i t y f un c t i o n Figure 1:

Estimates of probability density function of the credit score using original noiseless data with original kernel (cid:98) φ npx ( x ) = nh (cid:80) ni =1 K (( x − x [ i ]) /h ) (solid gray), (cid:15) -locally diﬀerential private data with original kernel (cid:101) φ x ( x ) = nh (cid:80) ni =1 K (( x − z [ i ]) /h )(dashed black), and (cid:15) -locally diﬀerential private data with adjusted kernel (cid:98) φ x ( x ) = nh (cid:80) ni =1 K h (( x − z [ i ]) /h ) (solid black) for (cid:15) = 5 . h = 0 . Lending Club Dataset

The dataset contains information of 2,260,701 accepted and 27,648,741 rejected loans application on LendingClub, a peer-to-peer lending platform, over 2007 to 2018. The dataset is available for download on Kaggle [28].For the accepted loans, dataset contains interest rates of the loans per annum and loan attributes, such astotal loan size, and borrower information, such as number of credit lines, credit rating, state of residence,and age. Here, we only focus on data from 2010 (to avoid possible yearly ﬂuctuations of the interest rate),which contains 12,537 accepted loans. We also focus on the relationship between the FICO credit score(low range) and the interest rates of the loan. This is an interesting relationship pointing to the value ofcredit rating reports [29]. The FICO credit score is very sensitive (as it relates to the ﬁnancial health of anindividual) and possesses a signiﬁcant commercial value (as it is sold by a for-proﬁt corporation). Thus, weassume that is is made available publicly in a privacy-preserving manner using (3). Note that the originaldata in [28] provides this data in an anonymized manner without privacy-preserving noise.We use the following original kernel: K ( x ) = 1 π

11 + x . Note that x = x is a scalar as we are only considering credit score as an input. This is the Cauchy distribution.We get the adjusted kernel in (cid:98) K h ( x ) = (cid:18) − b h d d x (cid:19) K ( x )= 1 π (cid:20)

11 + x − b h x ( x + 1) + b h x + 1) (cid:21) . We use cross-validation to ﬁnd the bandwidth in the following experiments.Figure 1 illustrates estimates of probability density function of the credit score φ x ( x ) using originalnoiseless data with original kernel (cid:98) φ npx ( x ) in (2) (solid gray), (cid:15) -locally diﬀerential private data with originalkernel (cid:101) φ x ( x ) = nh (cid:80) ni =1 K (( x − z [ i ]) /h ) (dashed black), and (cid:15) -locally diﬀerential private data with adjustedkernel in (5) (solid black) for (cid:15) = 5 . h = 0 .

1. Note that (cid:101) φ x ( x ) = nh (cid:80) ni =1 K (( x − z [ i ]) /h )is a naive density estimate as it does not try to cancel the eﬀect of the privacy-preserving noise. Clearly,using the original kernel for the noisy privacy-preserving data ﬂattens the density estimate (cid:101) φ x ( x ). This isbecause we are in fact observing a convolution of the original probability density with the probability density Real dataKernel Regression without privacyLinear Regression without privacy log(credit rating − i n t e r e s t r a t e ( p e r ce n t ag e ) Figure 2:

The kernel regression model (solid black) and the linear regression model (dashed black) based on the original datawith bandwidth h = 0 .

02 superimposed on the original noiseless data (gray dots). The mean squared error for the kernelregression model is 4 .

42 and the mean squared error for the linear regression model is 4 . log(credit rating − i n t e r e s t r a t e ( p e r ce n t ag e ) Figure 3:

The kernel regression model (solid black) and the linear regression model (dashed black) based on the (cid:15) -locallydiﬀerential private data with (cid:15) = 5 and bandwidth h = 0 .

20 superimposed on the original noiseless data (gray dots). The meansquared error for the kernel regression model is 5 .

70 and the mean squared error for the linear regression model is 7 . of the Laplace noise. Upon using the adjusted kernel (cid:98) K h ( x ) the estimate of the probability density using thenoisy privacy-preserving data matches the estimate of the probability density with the original data (withadditional ﬂuctuations due to the presence of noise). This provides a numerical validation of (6).Now, let us focus on the regression analysis. Figure 2 shows the kernel regression model (solid black) andthe linear regression model (dashed black) based on the original data with bandwidth h = 0 .

02 superimposedon the original noiseless data (gray dots). The mean squared error for the kernel regression model is 4 .

42 andthe mean squared error for the linear regression model is 4 .

61. The kernel regression model is thus slightlysuperior (roughly 4%) to the linear regression model; however, the gap is narrow. Figure 3 illustrates thekernel regression model (solid black) and the linear regression model (dashed black) based on the (cid:15) -locallydiﬀerential private data with (cid:15) = 5 and bandwidth h = 0 .

20 superimposed on the original noiseless data(gray dots). The mean squared error for the kernel regression model is 5 .

70 and the mean squared errorfor the linear regression model is 7 .

11. In this case, the kernel regression model is considerably (roughly20%) better. In Figure 4, we observe the mean squared error for the kernel regression model and the linearregression model based on the (cid:15) -locally diﬀerential private data versus privacy budget (cid:15) . Clearly, the kernelregression model is consistently superior to the linear regression model. As (cid:15) grows larger, the performanceof the kernel regression model and the linear regression model based on the (cid:15) -locally diﬀerential privatedata converge to the performance of the kernel regression model and the linear regression model based onoriginal noiseless data. This intuitively makes sense as, by increasing the privacy budget, the magnitude of6 privacy budget m e a n s q u a r e d e rr o r Figure 4:

The mean squared error for the kernel regression model and the linear regression model based on the (cid:15) -locallydiﬀerential private ( (cid:15) -LDP in the legend) data versus privacy budget (cid:15) . The horizontal lines show the mean squared error forthe kernel regression model and the linear regression model based on original noiseless data. the privacy-preserving noise becomes smaller.

Adult Dataset

The dataset contains information of 32,561 individuals from the 1994 Census database. The dataset isavailable for download on UCI [30]. The dataset contains attributes, such as education, age, work type,gender, race, and a binary report whether the individual earns more than 50,000 $ per year. We also focuson the relationship between the education (in years) and the individual ability to earn more than 50,000 $ per year. The education is assumed to be made public in a privacy-preserving form following (3). Thisinformation can be considered private as it can be used in conjunction with other information to de-anonymizethe dataset.Figure 5 The kernel regression model (solid black) and the logistic regression model (dashed black) basedon the original data with bandwidth h = 0 .

17. The logarithm of the likelihood for the kernel regressionmodel is − .

49 and the logarithm of the likelihood for the logistic regression model is − .

50. The kernelregression model is thus slightly superior (roughly 2%) to the logistic regression model; however, the gap isalmost negligible. Figure 6 illustrates the kernel regression model (solid black) and the logistic regressionmodel (dashed black) based on the (cid:15) -locally diﬀerential private data with (cid:15) = 5 . h = 2 .

98. Thelogarithm of the likelihood for the kernel regression model is − .

51 and the logarithm of the likelihood for thelogistic regression model is − .

53. In this case, the kernel regression model is slightly (roughly 4%) better. InFigure 7, we observe the logarithm of the likelihood for the kernel regression model and the logistic regressionmodel based on the (cid:15) -locally diﬀerential private data versus privacy budget (cid:15) . The horizontal lines showthe logarithm of the likelihood for the kernel regression model and the logistic regression model based onoriginal noiseless data. Again, the kernel regression model is consistently superior to the logistic regressionmodel. However, the eﬀect is not as pronounced as the linear regression in the previous subsection. Finally,again, as (cid:15) grows larger, the performance of the kernel regression model and the logistic regression modelbased on the (cid:15) -locally diﬀerential private data converge to the performance of the kernel regression modeland the linear regression model based on original noiseless data.

Discussion

The density of privacy-preserving data is always ﬂatter in comparison with the density function of theoriginal data points due to convolution with privacy-preserving noise density function. This is certainlya cause for concern due to addition of diﬀerential-privacy noise in 2020 US Census. This unfortunateeﬀect is always present irrespective of how many samples we gather because we observe the convolutionof the original probability density with the probability density of the privacy-preserving noise. This can7

Kernel Regression without privacyLogistic Regression without privacy education (years) P { i n c o m e ≥ , $ } Figure 5:

The kernel regression model (solid black) and the logistic regression model (dashed black) based on the original datawith bandwidth h = 0 .

17. The logarithm of the likelihood for the kernel regression model is − .

49 and the logarithm of thelikelihood for the logistic regression model is − . education (years) P { i n c o m e ≥ , $ } Figure 6:

The kernel regression model (solid black) and the logistic regression model (dashed black) based on the (cid:15) -locallydiﬀerential private data with (cid:15) = 5 . h = 2 .

98. The logarithm of the likelihood for the kernel regression model is − .

51 and the logarithm of the likelihood for the logistic regression model is − . result in miss-estimation of the heavy-hitters that often play an important role in social sciences due totheir ties to minority groups. We developed density estimation methods using smoothing kernels and usedthe framework of deconvoluting kernel density estimators to remove the eﬀect of privacy-preserving noise.This can result in a superior performance both for estimating probability density functions and for kernelregression in comparison to popular regression techniques, such as linear and logistic regression models. Inthe case of estimating the probability density function, we could entirely remove the ﬂatting eﬀect of theprivacy-preserving noise at the cost of additional ﬂuctuations. The ﬂuctuations however could be reducedby gathering more data. References [1] C. J. Bennett and C. D. Raab, “Revisiting the governance of privacy: Contemporary policy instrumentsin global perspective,”

Regulation & Governance , vol. 14, no. 3, pp. 447–464, 2020.[2] C. Dwork, “Diﬀerential privacy: A survey of results,” in

Theory and Applications of Models of Compu-tation (M. Agrawal, D. Du, Z. Duan, and A. Li, eds.), vol. 4978 of

Lecture Notes in Computer Science ,pp. 1–19, Springer Berlin Heidelberg, 2008. 8 -0.58-0.56-0.54-0.52-0.5-0.48 privacy budget l og li k e li h oo d Figure 7:

The logarithm of the likelihood for the kernel regression model and the logistic regression model based on the (cid:15) -locally diﬀerential private data versus privacy budget (cid:15) . The horizontal lines show the logarithm of the likelihood for thekernel regression model and the logistic regression model based on original noiseless data. [3] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private dataanalysis,” in

Theory of Cryptography Conference , pp. 265–284, 2006.[4] J. M. Abowd, “The US Census Bureau adopts diﬀerential privacy,” in

Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining , pp. 2867–2867, 2018.[5] J. Mervis, “Researchers object to census privacy measure,”

Science , vol. 363, no. 6423, pp. 114–114,2019.[6] R. Dewri, “Local diﬀerential perturbations: Location privacy under approximate knowledge attackers,”

IEEE Transactions on Mobile Computing , vol. 12, no. 12, pp. 2360–2372, 2013.[7] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in , pp. 429–438, 2013.[8] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local diﬀerential privacy,” in

Advancesin Neural Information Processing Systems , pp. 2879–2887, 2014.[9] P. Liu, Y. Xu, Q. Jiang, Y. Tang, Y. Guo, L.-e. Wang, and X. Li, “Local diﬀerential privacy for socialnetwork publishing,”

Neurocomputing , vol. 391, pp. 273–279, 2020.[10] G. Wezerek and D. V. Riper, “Changes to the census could make small towns disappear.”The New York Times, , Date Accessed: 27 August 2020.[11] J. Acharya, Z. Sun, and H. Zhang, “Hadamard response: Estimating distributions privately, eﬃciently,and with little communication,” in

The 22nd International Conference on Artiﬁcial Intelligence andStatistics , pp. 1120–1129, 2019.[12] R. Bassily and A. Smith, “Local, private, eﬃcient protocols for succinct histograms,” in

Proceedings ofthe forty-seventh annual ACM symposium on Theory of computing , pp. 127–135, 2015.[13] ´U. Erlingsson, V. Pihur, and A. Korolova, “Rappor: Randomized aggregatable privacy-preserving ordi-nal response,” in

Proceedings of the 2014 ACM SIGSAC Conference on Computer and CommunicationsSecurity , pp. 1054–1067, 2014.[14] T. Wang, J. Blocki, N. Li, and S. Jha, “Locally diﬀerentially private protocols for frequency estimation,”in , pp. 729–745, 2017.915] M. Ye and A. Barg, “Optimal schemes for discrete distribution estimation under locally diﬀerentialprivacy,”

IEEE Transactions on Information Theory , vol. 64, no. 8, pp. 5662–5676, 2018.[16] F. McSherry and K. Talwar, “Mechanism design via diﬀerential privacy,” in , pp. 94–103, IEEE, 2007.[17] Z. Li, T. Wang, M. Lopuhaa-Zwakenberg, N. Li, and B. Skoric, “Estimating numerical distributionsunder local diﬀerential privacy,” in

Proceedings of the 2020 ACM SIGMOD International Conferenceon Management of Data , pp. 621–635, 2020.[18] L. A. Stefanski and R. J. Carroll, “Deconvolving kernel density estimators,”

Statistics , vol. 21, no. 2,pp. 169–184, 1990.[19] M. H. Neumann and O. H¨ossjer, “On the eﬀect of estimating the error density in nonparametric decon-volution,”

Journal of Nonparametric Statistics , vol. 7, no. 4, pp. 307–330, 1997.[20] A. Delaigle, P. Hall, A. Meister, et al. , “On deconvolution with repeated measurements,”

The Annalsof Statistics , vol. 36, no. 2, pp. 665–685, 2008.[21] R. J. Carroll and P. Hall, “Optimal rates of convergence for deconvolving a density,”

Journal of theAmerican Statistical Association , vol. 83, no. 404, pp. 1184–1186, 1988.[22] C. Dwork and A. Roth, “The algorithmic foundations of diﬀerential privacy,”

Foundations and Trendsin Theoretical Computer Science , vol. 9, no. 3-4, pp. 211–407, 2014.[23] A. Delaigle and A. Meister, “Nonparametric regression estimation in the heteroscedastic errors-in-variables problem,”

Journal of the American Statistical Association , vol. 102, no. 480, pp. 1416–1426,2007.[24] D. Ioannides and P. Alevizos, “Nonparametric regression with errors in variables and applications,”

Statistics & Probability Letters , vol. 32, no. 1, pp. 35–43, 1997.[25] J. Fan and Y. K. Truong, “Nonparametric regression with errors in variables,”

The Annals of Statistics ,pp. 1900–1925, 1993.[26] M. P. Wand and M. C. Jones,

Kernel smoothing . Chapman & Hall/CRC, 1994.[27] W. H¨ardle,

Applied nonparametric regression . Cambridge university press, 1990.[28] N. George, “All Lending Club loan data: 2007 through current Lending Club accepted and rejected loandata.” , Date Accessed: 20 Aug 2020.[29] D. Czarnitzki and K. Kraft, “Are credit ratings valuable information?,”

Applied Financial Economics ,vol. 17, no. 13, pp. 1061–1070, 2007.[30] D. Dua and C. Graﬀ, “University of California (UCI) machine learning repository.” http://archive.ics.uci.edu/ml , Date Accessed: 20 Aug 2020.

Acknowledgements

The work of F.F. is in part supported by a startup grant from Melbourne School of Engineering at theUniversity of Melbourne.

Author contributions statement

F.F. is the sole author of the paper.