Prospect Theoretic Analysis of Privacy-Preserving Mechanism
11 Prospect Theoretic Analysis of Privacy-PreservingMechanism
Guocheng Liao,
Student Member, IEEE,
Xu Chen,
Member, IEEE, and Jianwei Huang,
Fellow, IEEE
Abstract —We study a problem of privacy-preserving mecha-nism design. A data collector wants to obtain data from individ-uals to perform some computations. To relieve the privacy threatto the contributors, the data collector adopts a privacy-preservingmechanism by adding random noise to the computation result,at the cost of reduced accuracy. Individuals decide whether tocontribute data when faced with the privacy issue. Due to theintrinsic uncertainty in privacy protection, we model individuals’privacy-related decision using Prospect Theory. Such a theorymore accurately models individuals’ behavior under uncertaintythan the traditional expected utility theory, whose predictionalways deviates from practical human behavior. We show that thedata collector’s utility maximization problem involves a polyno-mial of high and fractional order, the root of which is difficult tocompute analytically. We get around this issue by consideringa large population approximation, and obtain a closed-formsolution that well approximates the precise solution. We discoverthat the data collector who considers the more realistic ProspectTheory based individual decision modeling would adopt a moreconservative privacy-preserving mechanism, compared with thecase based on the expected utility theory modeling. We alsostudy the impact of Prospect Theory parameters, and concludesthat more loss-averse or risk-seeking individuals will trigger amore conservative mechanism. When individuals have differentProspect Theory parameters, simulations demonstrate that theprivacy protection first becomes stronger and then becomesweaker as the heterogeneity increases from a low value to ahigh one.
Index Terms —Privacy protection, (cid:15) -differential privacy,Prospect Theory
I. I
NTRODUCTION
A. Background and Motivation
Personal data collection is becoming increasingly commonin our daily life, in various industries such as online socialnetwork and medical treatment, to better understand the indi-viduals and gain new inspirations for knowledge generation.
This work was supported in part by the General Research Fund CUHK14219016 from Hong Kong UGC, Presidential Fund from the Chinese Uni-versity of Hong Kong, Shenzhen, and the Shenzhen Institute of Artificial In-telligence and Robotics for Society, and in part by the National Science Foun-dation of China (No. U1711265, No. 61972432); the Program for GuangdongIntroducing Innovative and Enterpreneurial Teams (No.2017ZT07X355);thePearl River Talent Recruitment Program (No.2017GC010465). Part of thispaper was presented in GLOBECOM 2017 [1].Guocheng Liao is with Department of Information Engineering, the ChineseUniversity of Hong Kong (e-mail: [email protected]).Jianwei Huang is with the School of Science and Engineering, TheChinese University of Hong Kong, Shenzhen, China, the Shenzhen Institute ofArtificial Intelligence and Robotics for Society (AIRS), and the Departmentof Information Engineering, The Chinese University of Hong Kong, HongKong, China (e-mail: [email protected]).Xu Chen is with School of Data and Computer Science, Sun Yat-senUniversity, Guangzhou, China (e-mail: [email protected].)
A privacy-aware individual, however, may have the privacyconcern when being requested for data contribution. He wor-ries about potential personal information leakage if the datacollector does not provide enough privacy protection. A recentexample of this is the data leakage incident of Facebook dueto the illegal data usage of third party Cambridge Analytica[2]. He would carefully evaluate the protection consequencepromised by the data collector.The growing privacy concern brings great challenges to thedata collector regarding the data analysis. She conducts somecomputations with collected data (e.g., exploiting individuals’smoking habit to study its relation with the chance of get-ting the lung cancer). She would adopt a privacy-preservingmechanism, which adds some random noise to the computationresult such that an adversary cannot easily infer participants’actual information. The downside is that the added noise willreduce the accuracy of the result. For example, an inaccurateaverage number of cigarettes that people smoke per day maynot fully indicate the accurate relation between the smokinghabit and lung cancer. The collector needs to carefully designthe privacy-preserving mechanism to trade off the individualsatisfaction and the computation accuracy.One key feature of this privacy-preserving data collectionproblem is the uncertainty of privacy protection level, as indi-viduals are not always sure about the privacy level promisedby the data collector. As suggested in [3], [4], the uncertaintyof outcomes plays a significant role in privacy-related decisionmaking, and behavioral economics can help better understandthe individual’s decision concerning privacy. When dealingwith uncertainty, most prior studies applied the ExpectedUtility Theory (EUT) to model an individual’s decision thatmaximizes his expected utility. However, experimental evi-dences (e.g., [5]) showed that in practice human behaviorcould significantly deviate from the EUT, due to the complexpsychological perception and subjectivity [6]. This indicatesthat the traditional EUT is not accurate enough to capture anindividual’s decision pattern.Alternatively, Prospect Theory (PT) [5], [7], a Nobel-Prize-Winning theory in behavioral economics, can provide a moreaccurate prediction on an individual’s behavior. Supportedby a large number real-word experiments (e.g., [8], [9]),Prospect Theory both normatively and descriptively interpretshow individuals make decisions by evaluating uncertain gainsand losses. This motivates us to consider Prospect Theory tomodel individuals privacy-related decisions under uncertainty. In this paper, for the ease of presentation clarity, we will use “he” torefer to an individual and use “she” to refer to the data collector. Such aterminology choice does not reflect any gender bias. a r X i v : . [ c s . G T ] N ov Furthermore, Prospect Theory has been successfully appliedin many areas, such as cognitive radio network management(e.g., [10], [11]) and smart grid management (e.g., [12], [13]).However, there does not exist any theoretical studies regardingthe application of Prospect Theory in the area of privacy-preserving mechanism design. Our paper presents the first steptowards this important and under-explored area.In this paper, with prospect theoretic modeling, we willexplore the answers to the following key questions: • From the individuals’ perspective, how would they decidewhether to participate in the data collection consideringthe privacy protection uncertainty? • From the data collector’s perspective, how would shedesign the privacy-preserving mechanism considering theindividuals’ behavior?
To answer the above questions, we model the interactionbetween the data collector and individual as a two-stageStackelberg game. At the lower level, we use Prospect Theoryto capture individuals’ subjective decision-making under theprivacy protection uncertainty. At the higher level, a betterprivacy protection by the data collector will attract moreindividuals, but the corresponding higher level of perturbation(due to the added noise) will degrade the accuracy of theanalysis. We compute the data collector’s optimal strategybased on her prediction of individuals’ participation decisionsto various privacy protection levels.
B. Key Contributions
The main contributions of this paper are as follows. • Prospect Theory-based individual behavior model : Dueto the effectiveness uncertainty of the privacy-preservingmechanism, we model individuals’ decisions based onProspect Theory, which is more accurate compared withthe widely adopted approach of EUT. In particular, wefocus on understanding the impact of both the level ofloss aversion and the shift of reference point. • Analysis of the data collector’s utility maximization prob-lem . Since this problem involves a polynomial of highand fractional order, it is difficult to obtain the analyticalsolution. We consider a large individual population ap-proximation that allows us to compute a unique optimalsolution that is close to the optimal solution in realisticsettings. • Design insights based on the impact of prospect theoreticmodel : We compare the results under the prospect theo-retic model and that under the EUT model, and concludethat the data collector should adopt a more conservativeprivacy-preserving mechanism based on Prospect Theory.Regarding the impact of Prospect Theory parameters, wefind that more loss-averse individuals lead to a moreconservative privacy-preserving mechanism. However, theshift of reference point towards a more tolerant attitudedoes not always indicate a less conservative mecha-nism. Taking into account the heterogeneity of differentindividuals’ Prospect Theory parameters, we find thatthe privacy protection first becomes stronger and then becomes weaker as the heterogeneity increases from alow value to a high one.The rest of this paper is organized as follows. We firstintroduce the related work in Section II. In Section III, weintroduce the preliminaries of differential privacy, which is theprivacy metric that we use in this paper. In Section IV, we dis-cuss the system model regarding the individual’s participationproblem and the data collector’s utility maximization problem,respectively. In Section V, we solve these problems andanalyze the impact of different Prospect Theory parameters.In Section VI and Section VII, we provide simulation resultsand discuss the corresponding insights. We conclude the paperin Section VIII. II. L
ITERATURE R EVIEW
The problem of privacy-preserving data collection has beenwidely considered based on the concept of differential privacy[14], which is regarded as a powerful tool to quantify privacyin the literature (e.g., [15]–[20]). Differential privacy requiresthat an individual’ data only has a limited impact on the outputof a computation, therefore it is hard for an adversary to inferan individual’s data from the exposed output. Motivated bythis, we use differential privacy as the privacy model in thispaper. However, different from previous literature [15]–[20],we model the individuals’ subjective reactions to the privacymetric with Prospect Theory due to the intrinsic uncertainty.Regarding the problem of privacy-preserving data collec-tion, the studies in [15]–[22] considered a stylized case:the data collector, with a certain computation goal, wantsto obtain the personal data from privacy-aware individuals.However, the detailed problem model and analysis can bedifferent based on the incentive methods adopted by differentapplications. There are mainly three categories: no incentive(e.g., [15]), monetary payments (e.g., [16]–[20], [23]–[26]),and non-monetary rewards (e.g., [21], [22]). For the firstcategory, Ghosh et al. in [15] considered that an individualwould participate if the privacy protection offered by the datacollector satisfies his privacy requirement. For the second cat-egory of monetary payments, the studies in [16]–[20] focusedon the mechanism design that minimizes the data collector’stotal payment subject to computation accuracy constraints.Crowd sensing (e.g., [23]–[26]) is one business practice of thiscategory. For example, Jin et al. in [23] designed an auction-based incentive mechanism with data perturbation that ensuresworkers’ privacy protection. Zhang et al. in [25] designed acontract-based incentive mechanism that optimizes accuracysubject to monetary budget by satisfying different privacypreferences of different users. For the third category, thestudies in [21], [22], [27], [28] considered that individuals canbenefit from the data result computation, and focused on theindividuals’ trade-off between the privacy loss and the benefit.References [27], [28] further studied the privacy-preservingmechanism based on the individuals’ trade-off.Our model and analysis fall into the third category, sincewe do not rely on monetary payments but consider theanalysis-based reward. In our problem, an individual needsto decide whether to participate in the data collection. In the studies in [21], [22], [27], [28], however, an individualdecides on the variance of noise to be added to his reporteddata (as he will always participate). Moreover, regarding theindividuals’ decision modeling, none of the previous studies(e.g., [15]–[28]) explored the behavioral economics modelingof individuals.Acquisti and Grossklags in [3], [4] suggested that behavioraleconomics is a powerful tool to better understand the privacy-related decision with empirical studies. However, they did notdetail any behavioral theoretic model together with realisticprivacy metric. In our work, we fill in this gap based on a prac-tical privacy-preserving framework, and capture individuals’behavior under uncertainty by applying the Prospect Theoryin behavioral economics.The adoption of Prospect Theory to better design andanalyze engineering systems only emerged recently (e.g., [10]–[13], [29]–[40]). For example, Li et al. in [10] studied arandom access game in wireless networks, and compared theperformance under Nash Equilibrium under for both ProspectTheory and EUT settings. Yang et al. in [11] considered theimpact of user decision-making on radio resource pricing, andproposed prospect pricing to improve radio resource manage-ment. Xiao et al. in [13] formulated a static energy exchangegame between microgrids, and derived Nash Equilibrium toanalyze the selling and buying decisions based on ProspectTheory. Saad et al. in [33] considered the security of integratedcircuit outsourcing, and captured the subjectivity of attackerand defender with weighting effect in Prospect Theory. Theseprevious studies (e.g., [10], [11], [13], [30]–[33]) applied oneperspective of probability distortion in Prospect Theory. Inour work, we consider two perspectives of S-shape valuationfunction and reference point to capture individuals’ subjectiveperception of uncertainty. Due to the continuous nature ofprivacy outcome, we will leave the more complicated analysisof probability distortion in the future work. Furthermore, tothe best our knowledge, our work is the first theoretical re-search exploring the application of Prospect Theory in privacy-preserving mechanism design.III. P
RELIMINARIES
Our individuals’ decision model is based on the ProspectTheory, and our privacy model is based on the widely usedtheoretical framework of differential privacy with privacyguarantee. In this section, we introduce the preliminaries onProspect Theory and differential privacy.
A. Prospect Theory
Compared with the widely-used modeling approach ofexpected utility theory in economics, Prospect Theory bettercaptures practical human behavioral characteristics under un-certainty. Prospect Theory suggests that an individual evaluatesan outcome with his subjective perception due to psychologicalloss and risk preference. An S-shape asymmetrical valuationfunction together with a reference point [7] (as shown in Fig1) characterizes such a pattern: v ( x ) = (cid:40) ( x − x ref ) β , if x ≥ x ref − λ ( x ref − x ) β , if x < x ref . (1) x ref -1.5 x ref x ref +1.5 x -3-2-1012 v ( x ) λ = 1, β = 1 λ = 1.5, β = 1 λ = 1.5, β = 0.8 λ = 2, β = 0.5 Loss Gain
Fig. 1: S-shape asymmetrical valuation function.Here x represents the actual outcome, e.g., the amount ofmoney earned in the gambling. A larger x indicates a betteroutcome. Parameter x ref represents the reference point, and λ ≥ and β ∈ ( , ] are loss aversion parameter and riskparameter, respectively. We will explain the physical meaningof these parameters later. The valuation v ( x ) describes howindividual subjectively evaluates the actual outcome.
1) Reference Points:
Reference point serves as the individ-ual’s benchmark to evaluate the actual outcome. If the actualoutcome is higher than the reference point, the individualwill perceive it as a gain, due to a better outcome than hisanticipation (represented by his reference point). Otherwise,he will perceive it as a loss. For example, an gambler targetsat earning 100 dollars, which is his reference point. Wheneverhe earns less than 100 dollars (even though he indeed earnssome money), he would consider it as a loss.
2) Loss and Risk Parameters:
The loss penalty parameter λ captures the loss aversion level. As λ is often greater thanone, we know that the impact of the loss is larger than thatof the gain of the same absolute value. For example, when angambler losses 100 dollars, he feels as if he lost more than 100dollars, due to his loss aversion attitude. A larger λ indicatesthat he is more loss averse. The parameter β describes theconcavity of the gain part of the function and the convexity ofthe loss part of the function, capturing the risk aversion leveltoward the gain and the risk-seeking level toward the loss.Finally, when β = λ = , the S-shape function degeneratesto a straight line, which corresponds to the Expected UtilityTheory [41] in the literature.Finally, the prospect outcome is obtained by associating thevaluation function with probability weighting: U PT = (cid:213) i v ( x i ) · p i , (2)where p i is the probability weighting associated with theoutcome x i . The general formula of a prospect outcome also integrates probabilityweighting distortion to p i . In this work, however, we do not consider theweighting distortion for simplicity, since the continuous and infinite nature ofprivacy outcome would complicate the associated weighting analysis. B. Differential Privacy
Definition 1 defines (cid:15) -differential privacy, which serves asa basic framework to offer reliable privacy protection in therelated literature [16]–[20]. In the definition, one entry ina database corresponds to one individual’s reported data. Adifferentially private mechanism ensures that the result of thecomputation will not change significantly when an individual’sdata is added to the database. This ensures that when thecomputation result is revealed, an adversary can not easilyinfer the information of an individual’s data without extrainformation.
Definition 1. ( (cid:15) - differential privacy) [14] A randomizedmechanism A is (cid:15) -differentially private if for any two neigh-boring databases D , D (cid:48) that only differ in only one entry andfor any set S of outputs:Pr [A( D ) ∈ S] ≤ exp ( (cid:15) ) · Pr [A( D (cid:48) ) ∈ S] ,where Pr [·] denotes the probability of the event. Consider the extreme case of (cid:15) = , i.e., exp ( (cid:15) ) = ,Definition 1 implies that Pr [A( D ) ∈ S] = Pr [A( D (cid:48) ) ∈ S] .This means that any two neighboring databases will havethe same output distribution regardless of any single entrydifference, which means a perfect privacy protection. Whenthe value (cid:15) becomes larger , the privacy protection becomes weaker . C. Laplace Mechanism
Laplace mechanism is a commonly used mechanism thatcan ensure differential privacy for numerical data [14]–[17].Such a mechanism adds random noise with a Laplace distri-bution to the computation result and calibrates the standarddeviation of the noise according to the sensitivity of thecomputation function (defined as follows).
Definition 2. ( Sensitivity) [14]
The sensitivity of a functionh : D → R is: S ( h ) = max D , D (cid:48) ∈D || h ( D ) − h ( D (cid:48) )|| , for all neighboring databases D and D (cid:48) that differ in only oneentry. The sensitivity of a function measures the maximum vari-ation that any single variable can cause to the computationresult. For example, the mean function with the representation h ( X ) = (cid:205) i x i / n , where X = [ x , x , ..., x n ] is the collected data,has the sensitivity of S ( h ) = x max / n . Here, x max = max i | x i | .The frequency function (e.g., h ( X ) = |{ x i | x i > }|/ n thatindicates the proportion of positive data in the whole dataset),has the sensitivity of / n .A Laplace distribution, denoted by Lap ( b ) with the scaleparameter b , has a probability density function: p ( x ) = b exp (− | x | b ) . (3)It has a zero mean and a standard deviation of √ b . Bycalibrating the standard deviation according to the sensitivity Non-monetary reward Data collector
Data
Individuals
The reward can’t compensate for my privacy cost. No!
The reward is appealing.
I will participate! … … Designs a privacy-preserving mechanism
Fig. 2: System model: in Stage I, the data collector initiates acollection with a privacy-preserving mechanism. In Stage II,individuals decide whether to participate in the data collection.of computation, the mechanism can achieve the differentialprivacy.
Theorem 1. ( Laplace mechanism) [14], [42]
For any func-tion h : D → R , the following Laplace mechanism canachieve (cid:15) -differential privacy: A( D ) = h ( D ) + Y , where Y is a random variable drawn from the Laplacedistribution Lap ( S ( h )/ (cid:15) ) (see Section 3.3 in [42] for a detailedproof). IV. S
YSTEM M ODEL
Fig. 2 illustrates the system model of the privacy-preservingdata collection process. In this model, a data collector wantsto collect data from individuals, and provides an analysis-based reward as an incentive. First, the data collector designsa privacy-preserving mechanism. Second, individuals decidewhether to report data based on the trade-off between rewardand privacy loss. Next, we describe the individual’s partici-pation problem and the data collector’s utility maximizationproblem in Section IV-A and IV-B, respectively.
A. An individual’s Participation Problem
In this subsection, we formulate individuals’ participationproblem under the privacy protection uncertainty. We useProspect Theory to model individuals’ behavioral character-istics in this context.
1) Privacy Measurement Based on Differential Privacy:
Based on Definition 1 of the (cid:15) -differential privacy, we regard (cid:15) as the privacy level for a given mechanism. Recall that alarger value of (cid:15) means a weaker privacy protection, and asmaller value of (cid:15) means stronger privacy protection.As we can see from Definition 1, the privacy level involvessome inherent uncertainty. The parameter (cid:15) corresponds to theweakest privacy protection (or the highest privacy leakage)among all possible neighboring databases. More specifically,data from different participants (i.e., different entries) mayhave different effects on the output, and this definition mea-sures the most significant effect among all possible singleentries. So the actual privacy level of a particular participant under this mechanism can be uncertain. This motivates us touse the Prospect Theory to model how a participant subjec-tively responds when he concerns about the potential risk ofhis actual privacy level.
2) Prospect Theoretic Model of an Individual’s Preference:
We first characterize individual’s subjective valuation on aparticular actual privacy level. By applying the S-shape asym-metrical valuation function in (1), we obtain the valuationfunction in our context: v ( (cid:15) ) = (cid:40) ( (cid:15) ref − (cid:15) ) β , if (cid:15) ≤ (cid:15) ref − λ ( (cid:15) − (cid:15) ref ) β , if (cid:15) > (cid:15) ref , (4)where < β ≤ , λ ≥ , and (cid:15) ref is the reference point. Onemain difference is that in (1), a larger value of x correspondsto a better outcome, while in the differential privacy setting,a smaller value of (cid:15) corresponds to a better outcome ofprotection. So if the privacy level (cid:15) is lower than the referencepoint (cid:15) ref , the individual would consider treat it as a gain.Otherwise, he would treat it as a loss.Next, we characterize the prospect privacy level given an (cid:15) -differential private mechanism. A challenge of applyingthe classical Prospect Theory in our setting comes from thefact that we consider the continuous privacy level (whilemost prospect theoretic analysis considered discrete outcomes,e.g., [10], [11], [13], [29]–[32]). We adopt the result from[43] to get around this issue, by approximating the infinitenumber of continuous outcomes with a finite number ofdiscrete outcomes. More specifically, we decompose the setof all possible continuous outcomes [ , (cid:15) ] into m discreteoutcomes i (cid:15) / m , i = , , ..., m . Then a participant’s prospectprivacy level is the summation of weighted valuations ofall discrete outcomes p i v ( i (cid:15) / m ) , i = , , ..., m , where p i isthe weighting (or probability) assigned to the correspondingoutcome. For computational simplicity, we assume that theprobability is evenly assigned to all the outcomes, then weobtain the prospect privacy level under an (cid:15) -differentiallyprivate mechanism as in (2): (cid:15) p = m m (cid:213) i = v (cid:18) im (cid:15) (cid:19) . (5)
3) Individuals’ Utility Maximization Problem:
We willderive the individuals’ utility maximization problem by dis-cussing the privacy cost and the reward gain. We first introducethe participants’ privacy cost and participation benefit, andthen introduce the non-participants’ privacy cost. The individ-uals decide whether to participate after comparing two options.
Privacy Cost of a Participant:
When a participant’s datais used in an (cid:15) -differentially private mechanism, he willexperience a privacy cost that is associated with the privacylevel. Similar to [15]–[17], [25], we model this privacy cost asa linear function of differential privacy level. Let g ( (cid:15) p ) denotethe linear function that maps the prospect differential privacylevel to privacy cost, i.e., g ( (cid:15) p ) = c · (cid:15) p . Here the parameter c By adopting the same method in [43], the calculation can be extended toother distributions. We do not consider the privacy cost from membership or non-membershipto avoid trivial case. We only consider the privacy cost from participation whenhis data is being used. measures the privacy cost per privacy level. Since the datacollector gathers the same type of data for a computation(e.g., income or movie rating), we assume that participantsexperience the same cost parameter c (similar as in [19], [20]).Regarding the prospect theoretic model parameters (i.e., (cid:15) ref , λ , and β ), we will first assume that they are homogeneousacross all individuals in the analysis. Later in Section VII, wewill further numerically explore the impact of heterogeneousparameters. Participation Benefit:
We assume that participants can ben-efit from the non-monetary reward (e.g., computation analysis-based reward and other formats of benefit) from the datacollector. For example, the participants contribute ratings onmovies and obtain movie recommendations in return. Theconsumers report the experience of a product surveyed by acompany, which could help improve the product and providebetter service. The individuals’ benefit from data contribution(e.g., their interest in uncovering new results) could be utilizedto incentivize their participations, which helps the data collec-tor get rid of further monetary cost. The consideration of non-monetary reward has not only been theoretically consideredand analyzed in [21], [22], but also has been implementedin practical businesses [44]. To characterize the impact of thebenefit on individuals’ decisions, we consider an individual i’svaluation on the benefit denoted by W i (measured in the sameunit of the privacy cost). Different individuals might value thebenefit differently. Privacy Cost of a Non-Participant:
For a non-participatingindividual, he will not get a reward from the data collector.Furthermore, his actual privacy level is zero, i.e., perfectprivacy protection. Hence his utility is g ( v ( )) = g ( (cid:15) β ref ) . Ifthe reference point (cid:15) ref is positive, then not participatingindicates a “gain” of privacy. If the reference point (cid:15) ref iszero, then the utility would be zero, meaning no gain or lossof privacy. Utility Maximization Problem:
The utility of a participantis the summation of benefit valuation and privacy cost, i.e., W i + g ( (cid:15) p ) where the cost takes a negative value. The utilityof a non-participant is just his privacy cost. Each individual i decides whether to participate in the data collection by solvingthe following optimization problem: max a i U i ( a i ) = a i ( W i + g ( (cid:15) p )) + ( − a i ) g ( v ( )) s.t. a i ∈ { , } . (6)Action a i = means participation, and a i = otherwise. Sim-ilar to [16], [17], we assume the data collector is trustworthy,hence all participants will truthfully report their data due tothe trusted data collector. B. Data Collector’s Utility Maximization Problem
In this subsection, we discuss the data collector’s computa-tion and utility function. A positive reference point could correspond to the case where theindividual believes that participating in the data collection is the social norm. The mechanism design problem involving untruthful data reporting due toa potentially untrustworthy data collector would be much more complicated[18]–[20], and we will study it in the future work.
The data collector obtains data from individuals to performa data-driven computation. Throughout this paper, we considerthe case where the data collector wants to calculate the meanof the collected numerical data. This kind of analysis has awide range of application scenarios. For example, the datacollector wants to investigate the average salary level ofresidents in a district, or obtain the popularity of a new movieby investigating the average score rated by audiences.Then we discuss the components of the data collector’s util-ity. She benefits more if she manages to collect more data, as itenables more convincing computation result [45]. Meanwhile,she adopts a differentially private mechanism (e.g., the Laplacemechanism) that adds some random noise to the computationresult, which leads to an accuracy penalty. This implies thatthe data collector’s utility function depends on two factors:the amount of collected data and the accuracy penalty. Recallthat the data collector would provide non-monetary reward toincentivize individuals. The reward naturally comes from thedata-driven analysis without incurring a significant additionalcost, as such an incentive is a by-product of the data analysis.The data collector does not have an additional cost (in termsof the incentives to individuals).
1) Data Collector’s Benefit:
We use R ( n ) to denote thedata collector’s benefit of collecting data from n participants.We assume that R ( n ) is non-negative, monotonic increasing,strictly concave, and upper bounded. As n grows large, themarginal benefit of collecting data from one more participantreduces, hence the concavity shape of the function. Further-more, the data amount is not the only factor that affects thecomputation result. Other factors such as methods of repre-sentation [45] and optimization all influence the computationto some extent. Hence function R ( n ) would be bounded evenwhen n goes to infinity. For the ease of expose, we follow [46]and use the following benefit function with the parameters k and l in our analysis, R ( n , [ k , l ]) = − k + l · n . (7)Here k > and l > . Fig. 3 shows some examples of thefunction with different values of k and l . The data collectorneeds to adjust the values of k and l to match the exact benefitof a particular application. n R ( n ) k=0.989, l=9.8 × -4 k=0.989, l=1.3 × -4 k=0.509, l=1.3 × -4 Fig. 3: Data amount benefit function R ( n ) . Our general prospect theoretic privacy-preserving mechanism works forany other scenarios or applications involving privacy protection uncertainty.
2) Data Collector’s Accuracy Penalty:
Define l ( x ) as thepenalty if the added noise level is x . The penalty is moresignificant if the noise multitude | x | is larger, so l ( x ) is non-negative and nondecreasing in | x | . Similar to [47], [48], weconsider one of the possible representations, l ( x ) = x , whichemphasizes the variance in the error. Meanwhile, recall inTheorem 1 that the data collector can add Laplacian randomnoise to ensure (cid:15) -differential privacy. Combining the proba-bility density function in (3), we get the expected accuracypenalty L ( (cid:15) ) under (cid:15) -differential privacy as follows: L ( (cid:15) ) = ∫ + ∞−∞ l ( x ) p ( x ) dx = S ( h ) (cid:15) . (8)The data collector needs to choose the privacy level (cid:15) tomaximize her utility, i.e., max (cid:15)> U c ( (cid:15) ) = R ( n ( (cid:15) )) − L ( (cid:15) ) . (9)Here n ( (cid:15) ) is the number of participants under the (cid:15) -differentially private mechanism, which will be derived basedon the individuals’ responses to the mechanism (as in SectionIV-A). C. Problem Formulation
We formulate the overall system as a two-stage game, asillustrated in Fig. 2. In Stage I, the data collector designs an (cid:15) -differentially private mechanism to maximize its utility in(9). In Stage II, each individual decides whether to participatein the data collection to maximize her utility in (6). Weuse backward induction to solve this two-stage optimizationproblem. V. S
OLVING THE T WO - STAGE P ROBLEM
A. Individual’s Decision-Making
We first consider the case where individuals’ reference pointis zero, which means that individuals are intolerant and expectperfect privacy protection. Hence any privacy level induced inthe data collection process will be considered as a loss. Wewill consider the case of a positive reference point later inSection V-D.We will derive the number of participants under an (cid:15) -differential private mechanism. We first analyze individu-als’ participation condition. In an individual’s participationproblem (6), the individual will decide to participate if andonly if U i ( ) ≥ U i ( ) . Under the zero reference point, wehave g ( v ( )) = g ( (cid:15) β ref ) = , and the condition becomes W i ≥ g ( v ( )) − g ( (cid:15) p ) = − g ( (cid:15) p ) .We consider a group of N individuals. Let p v ( W ) denote theprobability density function of the reward valuation W amongthe individuals. Then the number of individuals choosing toparticipate in data collection is: n ( (cid:15) ) = N · Pr (cid:0) W > − g ( (cid:15) p ) (cid:1) = N ∫ ∞− g ( (cid:15) p ) p v ( W ) dW . (10)We can show that n ( (cid:15) ) is non-increasing with (cid:15) . For the con-venience of analysis, we let each individual’s reward valuation W follows a uniform distribution in [ , W max ] in the theoreticalanalysis in Section V, i.e., p v ( W ) = (cid:40) W max , if W ∈ [ , W max ] ;0 , otherwise . (11)In Section VI, we will perform simulation studies based ona more general truncated normal distribution (which includesthe uniform distribution as a special case). B. Data Collector’s Optimal Differentially Private Mechanism
Next we solve the data collector’s utility maximization prob-lem in (9). We assume that she possesses adequate informationof the target individuals, i.e., the privacy loss coefficient c , thedistribution p v ( W ) of reward valuation W , and Prospect Theoryparameters, by abundant previous data-related investigations.Then she can decide the optimal (cid:15) to maximize her utilitybased on the anticipation of individuals’ reactions.Recall that we consider the mean analysis of numerical dataas an example in this work. The range of the data is normalizedfrom zero to one. The sensitivity of this computation is S ( h ) = / n . Then the accuracy penalty is: L ( (cid:15) ) = S ( h ) (cid:15) = n (cid:15) . (12)We substitute (10), (11) and (12) to the data collector’sutility maximization problem (9) and obtain a one-variableoptimization problem. It is difficult to obtain the closed-form optimal solution, as the derivative of the objectivefunction involves a high-and-fractional-order polynomial. Wecan apply many effective one-dimensional search methods [49]to numerically solve this problem. To gain useful insightsfor the practical implementation, nevertheless, we would liketo derive an analytical solution by taking some reasonableapproximations.Next, we describe how to derive the approximated optimalsolution of (9) under a large population approximation. Wefirst simplify the derivative formulation and reduce the orderby considering that the population size N is large enough, andfurther get around the fractional order by considering that theparameter β = . We then can compute the unique root ofthe derivative in the feasible set, which is the approximatedoptimal solution.More specifically, let us consider the feasible set of (cid:15) suchthat there is at least one participant (i.e, n ( (cid:15) ) > ) accordingto (10) and (11): (cid:8) (cid:15) : − g ( (cid:15) p ) < W max (cid:9) . (13)We can compute the data collector’s objective function in(9) together with its derivative as follows: U c ( (cid:15) ) = − k + lN W max + g ( (cid:15) p ) W max − N (cid:15) (cid:16) W max + g ( (cid:15) p ) W max (cid:17) , (14)and U (cid:48) c ( (cid:15) ) = klN g (cid:48) ( (cid:15) p ) W max (cid:104) + lN W max + g ( (cid:15) p ) W max (cid:105) + N W max + g ( (cid:15) p ) + g (cid:48) ( (cid:15) p ) (cid:15) W max (cid:104) W max + g ( (cid:15) p ) W max (cid:105) (cid:15) . (15) One of the key challenges of analytically computing the rootof U (cid:48) c ( (cid:15) ) = is due to the six-order polynomial at the numer-ator after combining the terms. However, we can approximate + lN (cid:0) W max + g ( (cid:15) p ) (cid:1) /( W max ) with lN (cid:0) W max + g ( (cid:15) p ) (cid:1) /( W max ) when the population size N is large. Hence we can obtain anapproximated (denoted by the superscript a ) version of (15): U (cid:48) ac ( (cid:15) ) = (cid:16) W max + g ( (cid:15) p ) W max (cid:17) (cid:15) N f ( (cid:15) ) , (16)where f ( (cid:15) ) = (cid:18) W max + g ( (cid:15) p ) W max (cid:19) (cid:18) + k N l (cid:15) g (cid:48) ( (cid:15) p ) W max (cid:19) + (cid:15) g (cid:48) ( (cid:15) p ) W max . (17)We can compute the root of f ( (cid:15) ) = in the feasible as theapproximated optimal solution, which is unique. Theorem 2.
Under the approximation of (15), we obtain theunique approximated optimal solution ˜ (cid:15) ∗ of the problem (9)in closed form as the root of f ( (cid:15) ) = . We provide the proof of Theorem 2 in Appendix A, includ-ing the detailed closed-form expression of the approximatedoptimal solution. The approximation enables us to derive asimplified polynomial function f ( (cid:15) ) , which makes it math-ematically tractable and straightforward to study the impactof Prospect Theory parameters on the optimal mechanismsolution (as we will do in Sections V-C and V-D). Thepolynomial function serves as a bridge connecting ProspectTheory parameters and the optimal solution. C. Comparison with EUT
In this section, we focus on the traditional EUT case [41],which has been widely used in most prior literature (e.g., [15]–[28]) when dealing with uncertainty. Basically, EUT explainsan individual’s decision by evaluating the expected outcome(or utility) under uncertainty without considering risk and lossattitudes. EUT can be considered as a special case of ProspectTheory when we choose λ = and β = in (1). We comparethe result of the data collector’s approximated optimal solutionunder the EUT case with that under the general ProspectTheory case (excluding the special case of EUT). Corollary 1.
The data collector’s approximated optimal ˜ (cid:15) ∗ e under the EUT case is higher than that under the generalProspect Theory case. We provide the proof of Corollary 1 in Appendix B. Weconclude that compared with traditional EUT modeling, thedata collector should adopt a more conservative privacy-preserving mechanism when considering the individuals’ lossattitude predicted by the Prospect Theory. Without doingso, she will suffer a utility loss for not properly capturingindividuals’ behavioral characteristics. We use notation ˜ to represent an approximated optimal solution. D. Impact of Prospect Theory Parameters
To better understand the insights from Theorem 2, we studythe impact of Prospect Theory parameters on the optimaldifferentially private mechanism in this section. Note that theseparameters are intrinsic properties of individuals that the datacollector can not control. The data collector needs to designthe mechanism based on individuals’ prospect properties.
1) Impact of the Loss Aversion Parameter λ : We first lookat the impact of the Prospect Theory parameters λ under thecase of a zero reference point. Corollary 2.
The data collector’s optimal ˜ (cid:15) ∗ decreases in λ . We provide the proof of Corollary 2 in Appendix C. Corol-lary 2 suggests that if the individuals are more loss averse (i.e.,with a larger loss aversion parameter λ ), they are less likelyto participate in the data collection due to serious concernsof privacy loss. In order to attract individuals’ participation,the data collector needs to adopt a more conservative privacy-preserving mechanism to alleviate the concerns.
2) Impact of the Reference Point (cid:15) ref : Recall that a positivereference point means that individuals are tolerant. They wouldperceive a privacy level outcome as a gain if it is betterthan the reference even though it is not a perfect protection.Intuitively, the data collector can take advantage of individu-als’ tolerance towards privacy, and adopt a less conservativeprivacy-preserving mechanism comparing with the case of azero reference point. However, is this intuition always true?Next, we will try to answer this question.We compare the case of a positive reference point with azero reference point. The methodology of computing the datacollector’s approximated optimal privacy-preserving mecha-nism under a large population approximation is similar to thatunder the case of a zero reference point. The main differencelies in the characterization of the prospect privacy levels ofboth participation and non-participation.Under the case of a positive reference point, the prospectprivacy level of participation based on (5) is (with the subscript pos and superscript p ): (cid:15) ppos = tm t (cid:213) i = (cid:18) (cid:15) ref − im (cid:15) (cid:19) β − (cid:16) − tm (cid:17) λ m (cid:213) i = t + (cid:18) im (cid:15) − (cid:15) ref (cid:19) β , for t satisfying tm (cid:15) < (cid:15) ref and t + m (cid:15) > (cid:15) ref . (18)The first summation term of (18) corresponds to the gain part,and the second summation term corresponds to the loss part.The prospect privacy level of non-participation is (with thesubscript pos and superscript n ): (cid:15) npos = (cid:15) β ref . (19)Individuals would always enjoy certain gain from non-participation due to a better privacy protection than the positivereference level.Let U (cid:48) apos ( (cid:15) ) (formulated below) be the approximated deriva-tive of the objective function and f pos ( (cid:15) ) be the polynomialnumerator in U (cid:48) apos ( (cid:15) ) . Comparing the polynomial part f pos ( (cid:15) ) with f ( (cid:15) ) in (17) under the case of a zero reference point,we can see that the term g ( (cid:15) p ) in (17) is replaced by the term g ( (cid:15) ppos ) − g ( (cid:15) npos ) in (21). Then the approximated optimalsolution ˜ (cid:15) ∗ pos under the case of a positive reference point is theroot of f pos ( (cid:15) ) = in the feasible set { (cid:15) : g ( (cid:15) npos ) − g ( (cid:15) ppos ) < W max } . U (cid:48) apos ( (cid:15) ) = (cid:16) W max + g ( (cid:15) ppos )− g ( (cid:15) npos )) W max (cid:17) (cid:15) N f pos ( (cid:15) ) , (20)and f pos ( (cid:15) ) = (cid:32) W max + g ( (cid:15) ppos ) − g ( (cid:15) npos ) W max (cid:33) × (cid:32) + k N l (cid:15) g (cid:48) ( (cid:15) ppos ) − g (cid:48) ( (cid:15) npos ) W max (cid:33) − g (cid:48) ( (cid:15) npos ) − g (cid:48) ( (cid:15) ppos ) W max (cid:15) . (21)We compare the data collector’s optimal privacy-preservingmechanism under both the cases of a positive reference pointand a zero reference point. Remind that ˜ (cid:15) ∗ is the approximatedoptimal solution under the case of a zero reference point, i.e, f ( ˜ (cid:15) ∗ ) = . Theorem 3.
Comparing ˜ (cid:15) ∗ pos with ˜ (cid:15) ∗ , we have the followingresult: • When f pos ( ˜ (cid:15) ∗ ) < is true, ˜ (cid:15) ∗ pos < ˜ (cid:15) ∗ . • When f pos ( ˜ (cid:15) ∗ ) > is true, ˜ (cid:15) ∗ pos > ˜ (cid:15) ∗ . • When f pos ( ˜ (cid:15) ∗ ) = is true, ˜ (cid:15) ∗ pos = ˜ (cid:15) ∗ . We provide the proof of Theorem 3 in Appendix D. The-orem 3 indicates that a positive reference point (representingindividuals’ tolerance to the privacy issue) does not necessarilylead to a less conservative privacy-preserving mechanism. Incontrast, the data collector needs to adopt a more conservativemechanism under a certain condition, i.e., f pos ( ˜ (cid:15) ∗ ) < , underthe case of a positive reference point. More specifically, thecondition f pos ( ˜ (cid:15) ∗ ) < holds if the loss aversion parameter λ is relatively small or the risk parameter β is relativelyhigh. Both indicate that individuals are not sensitive to loss.When the reference point (cid:15) ref changes from zero to positive,the perceived privacy protection gain from non-participationis more significant and overweights the loss reduction ofparticipation. So individuals would prefer not to participate.In this case, the data collector needs to enforce a moreconservative privacy protection to encourage the individualsto participate.VI. N UMERICAL RESULTS AND INSIGHTS
In this section, we evaluate the proposed privacy-preservingdata collection mechanism from a variety of perspectives,including the approximation performance, the impact of accu-rate prospect theoretic modeling, and the impact of ProspectTheory parameters.
A. The Accuracy of the Approximated Solution
First, we compare the optimal solution with and withoutapproximation under different values of population size N O p t i m a l a nd a pp r o x i m a t e d o p t i m a l Fig. 4: Comparison between optimal (cid:15) ∗ with and withoutapproximation vs. N . Parameter λ O p t i m a l ǫ * β = 0.8 β = 0.6 β = 0.4 Fig. 5: Optimal (cid:15) ∗ under different parameters λ and β for thezero reference point case. λ β -1.5-1-0.500.511.52 × -3 Fig. 6: (cid:15) ∗ pos − (cid:15) ∗ under different parameters λ and β . λ U t ili t y l o ss ( % ) β = 0.88 β = 0.92 β = 0.96 Fig. 7: Utility loss (%) under different λ and β for the zeroreference point case.and parameter β . The result with approximation is calculatedaccording to Theorem 2, and the result without approximationis obtained by an exhaustive search. We change the populationsize N , and we compare the approximated solution ˜ (cid:15) ∗ under β = with the optimal solution (cid:15) ∗ under different values of β . Fig. 4 shows that both the approximated and the optimalsolutions decrease in N . This is because a larger numberof participants can potentially provide a higher accuracy indata computation, which reduces the data collector’s accuracypenalty due to the added noise. To attract more participants,the data collector would prefer to adopt a more conservativeprivacy-preserving mechanism. Comparing the two curves under β = , we see that thegap due to the approximation is relatively small, and such agap decreases in N . For example, as N changes from to , the relative difference between ˜ (cid:15) ∗ with approximationand (cid:15) ∗ without approximation decreases from . to . .This is because when N becomes larger, the large populationapproximation becomes more accurate. B. The Impact of Prospect Theory Parameters
In this subsection, we study the impact of Prospect Theoryparameters λ , β , and the reference point (cid:15) ref . We consider ageneral truncated normal distribution [50] for the individuals’ On the other hand, this does not mean that the data collector would include all the individuals. Since the optimal (cid:15) ∗ is non-zero, i.e., the privacy protectionis imperfect, there may always exist some individuals who are reluctant toparticipate. The data collector needs to trade-off the balance between dataamount benefit and accuracy penalty by optimizing the value of (cid:15) ∗ . reward valuation. The uniform distribution used in the theo-retic analysis is a special case of truncated normal distributionwhen the variance approaches infinity. In this and the latersimulations, we use the optimal solution (cid:15) ∗ without approxi-mation as the measurement of privacy-preserving mechanism.We will see that the simulation results match the theoreticalresults that we obtained through approximation in Section V.We first focus on parameters λ and β for a zero referencepoint case. Fig. 5 shows that the data collector’s optimal solu-tion (cid:15) ∗ decreases in the loss aversion parameter λ and increasesin the risk aversion parameter β . Intuitively, the loss aversionparameter λ is larger or the parameter β is smaller, individualswould subjectively experience more serious privacy loss oncechoosing participation. So the data collector needs to adopt amore conservative privacy-preserving mechanism to encourageindividuals’ participation.We then focus on the reference point (cid:15) ref . A higher (cid:15) ref means that individuals are more tolerant about their privacyloss. We set the positive reference point (cid:15) ref to . . Fig.6shows the difference of optimal solution between the case of apositive reference point and a zero reference point (i.e., (cid:15) ∗ pos − (cid:15) ∗ ) under different values of parameters λ and β .We first focus on the bottom right region in Fig. 6 where λ is large (i.e., λ ≥ ) and β is small( i.e., β ≤ . ).The difference (cid:15) ∗ pos − (cid:15) ∗ is positive in this region, i.e., thedata collector offers a less conservative privacy-preservingmechanism with a positive reference point. This is becausein this region individuals are very sensitive to loss (see Fig.1). When (cid:15) ref increases from zero, a participant is less likelyto experience loss, hence the subjectively perceived privacy cost from participation significantly decreases. Such a privacyloss tolerance attitude allows the data collector to adopt a lessconservative privacy-preserving mechanism.Next, we consider the upper left region in Fig. 6 where λ is small(i.e., λ ≤ . ) or β is large (i.e., β > . ).The difference (cid:15) ∗ pos − (cid:15) ∗ is negative in this region, i.e., thedata collector offers a more conservative privacy-preservingmechanism with a positive reference point. This is because inthis region individuals are less sensitive to loss (see Fig. 4).So when (cid:15) ref increases from zero, the reduction of privacyprotection loss from participation is less significant comparedwith privacy protection gain from non-participation. Hence thedata collector needs to enforce a more conservative privacyprotection to encourage the individuals to participate. C. The Impact of Prospect Theoretic Modeling Accuracy
In this subsection, we show the importance when usingan accurate prospect theoretic modeling. In Section V-C,Corollary 1 shows that a data collector should adopt a moreconservative privacy-preserving mechanism when consideringthe prospect theoretic characteristics of individuals. However,if the data collector assumes that individuals make decisionsbased on EUT (while the actual decisions are made based onProspect Theory), she can suffer a significant utility loss.Fig. 7 shows the relative utility loss (normalized by themaximum utility) when the mismatch happens under differentProspect Theory parameters λ and β for a zero reference pointcase. We see that the loss increases significantly when theparameters λ and β deviate from EUT case (which correspondsto λ = and β = ). For example, when λ = . and β = . ,the utility loss is about 14%. This indicates the importance forthe data collector to have an accurate estimation of the users’Prospect Theory parameters through extensive data collectionand analysis.VII. I MPACT OF THE H ETEROGENEITY OF I NDIVIDUALS
Previous analysis and simulations in Sections V and VI arebased on the assumption of homogeneous individuals. Herewe numerically study a more realistic situation where differ-ent individuals may have different behavior characterizations[51]–[54], and explore the impact of such heterogeneity onthe data collector’s optimal privacy-preserving mechanism. Wewill leave the analytical study of the heterogeneous parametermodel in the future work.We first want to understand the distribution of ProspectTheory parameters in real life. The data comes from theliterature in the area of psychology and behavioral economicsthat investigated the Prospect Theory parameters of eachsubject in the experiments (e.g., [51]–[54]). More specifically,we utilize the experimental results in [52]. Fig. 8 shows the The experiments are based on subjects’ reactions under monetary reward,instead of privacy protection. Monetary reward is widely used in psychologicalexperiments and the corresponding literature. The purpose of utilizing thesereported data in the literature is to provide a relative realistic context in termsof the Prospect Theory parameter choices, instead of randomly generatingthese parameters. Our theoretical results apply for any Prospect Theoryparameter settings, and it is important future work to perform field studiesto understand the actual parameters in the privacy preserving contexts forpopulations of different age, sex, education background, and countries. (a) λ ∼ Gamma (3.24,0.60).(b) β ∼ Gamma (12.87,0.06).
Fig. 8: Probability density of parameters λ and β based on thedata from [52].empirical probability density of parameters β and λ based onthe data from [52]. We perform data fitting by using the chi-squared test [55], we conclude that the parameter λ followsa Gamma distribution with a shape parameter k λ = . and a scale parameter θ λ = . ( p = . ), and thatthe parameter β follows a Gamma distribution with a shapeparameter k β = . and a scale parameter θ β = . ( p = . ). Utilizing the above empirical data, we will study the im-pact of the heterogeneity of parameter λ . We generate thisparameter through the Gamma distribution with a fixed mean( µ λ = k λ × θ λ = . ) based on Fig. 8a and different valuesof variance. We fix other parameters including the parameter β = k β × θ β = . and the reference point (cid:15) ref = .Fig. 9 shows how the data collector’s optimal (cid:15) ∗ changeswith the variance of λ . We can see that the optimal (cid:15) ∗ firstlydecreases in the variance and then increases in the variance.To better understand such an impact, we visualize individuals’participation in Fig. 10 given the optimal (cid:15) ∗ under differentvariances of λ (in an increasing order). Those with the lowervalue of λ (less loss aversion) would participate and thecorresponding threshold changes with the variance.Combining Fig. 9 and Fig. 10, we are able to illustrate theintuition. At a low diversity level (small variance such as inFig. 10a), most individuals’ λ values are close to the mean.Comparing Fig. 10a (small variance) and Fig. 10b (mediumvariance), we can see that as the variance becomes larger,there are more individuals with λ values further away fromthe mean. To attract those individuals with λ values higherthan the mean, the data collector needs to adopt a moreconservative privacy-preserving mechanism when the varianceincreases. A more conservative mechanism corresponds to a The mean of the Gamma distributed random variable is the product ofthe shape parameter k and the scale parameter θ , i.e., k · θ . Var( λ ) O p t i m a l ǫ * × -3 Fig. 9: Impact of variance of λ on optimal (cid:15) ∗ . λ pd f ( λ ) Var( λ )=0.1952 participatenot participate (a) V ar ( λ ) = . , (cid:15) ∗ = . × − . λ pd f ( λ ) Var( λ )=0.7807 participatenot participate (b) V ar ( λ ) = . , (cid:15) ∗ = . × − . λ pd f ( λ ) Var( λ )=2.3421 participatenot participate (c) V ar ( λ ) = . , (cid:15) ∗ = . × − . Fig. 10: Participation under different variance of λ .larger participation threshold (3.13) in Fig. 10b than that (2.82)in Fig. 10a.However, at a high diversity level (as in Fig. 10c), individ-uals’ λ values spread around a big range. Comparing Fig. 10b(medium variance) and Fig. 10c (large variance), we can seethat as the variance becomes very large, more individuals havevery high λ values. Those individuals are more loss averse andare difficult to be motivated to participate. In this case, a more Var( β ) × -3 O p t i m a l ǫ * × -3 Fig. 11: Impact of variance of β on optimal (cid:15) ∗ .conservative mechanism to attract those individuals wouldresult in more accuracy penalty due to a larger variance of theadded noise. Instead, it would be better for the data collectorto ignore those with very high λ values and to consider arelatively less conservative mechanism. A less conservativemechanism corresponds to a smaller participation threshold(2.95) in Fig. 10c than that (3.13) in Fig. 10b.We also study the impact of the heterogeneity of parameter β . Similarly, we generate the parameter β through the Gammadistribution with a fixed mean ( µ β = k β × θ β = . ) based onFig. 8b and different values of variance. We fix λ = k λ × θ λ = . and (cid:15) ref = . Fig. 11 shows the optimal (cid:15) ∗ under differentvariances of β . The pattern is similar to that in Fig. 9: theoptimal (cid:15) ∗ firstly decreases and then increases in the varianceof β . The corresponding insights are also the same as thatfrom the parameter λ .VIII. C ONCLUSION
In this paper, we analyzed a privacy-preserving data col-lection problem with the privacy protection uncertainty. Tothe best our knowledge, this is the first theoretical study ofProspect Theory application in the area of privacy protection.We demonstrate the importance of a realistic and accurateindividual decision modeling to the privacy-preserving mech-anism design. Considering the loss and risk attitudes based onprospect theoretic modeling, the data collector should adopta more conservative privacy-preserving mechanism comparedwith the one derived based on the expected utility theorymodeling. Moreover, a more tolerant attitude on privacy losspredicted by a positive reference point does not always indicatea less conservative mechanism.For the future work, we will consider the case where theparticipants can misreport their data. For example, the partici-pant would like to protect his privacy on his own by reporting anoisy version of data. Considering both the risk aversion andloss aversion of the participants, the data collector needs todesign an incentive mechanism that effectively induces truthfulreporting from the users.R
EFERENCES[1] G. Liao, X. Chen, and J. Huang, “Optimal privacy-preserving datacollection: A prospect theory perspective,” in
Proc. IEEE GLOBECOM ,2017.[2] “https://en.wikipedia.org/wiki/Facebook%E2%80%93CambridgeAnalytica data scandal.” [3] A. Acquisti and J. Grossklags, “What can behavioral economics teachus about privacy,” Digital Privacy: Theory, Technologies and Practices ,vol. 18, pp. 363–377, 2007.[4] ——, “Privacy and rationality in individual decision making,”
IEEESecurity & Privacy , vol. 3, no. 1, pp. 26–33, 2005.[5] D. Kahneman and A. Tversky, “Prospect theory: An analysis of decisionunder risk,”
Econometrica , vol. 47, no. 2, pp. 263–291, 1979.[6] D. Prelec, “The probability weighting function,”
Econometrica , vol. 66,no. 3, pp. 497–527, 1998.[7] A. Tversky and D. Kahneman, “Advances in prospect theory: Cumulativerepresentation of uncertainty,”
Journal of Risk and Uncertainty , vol. 5,no. 4, pp. 297–323, 1992.[8] T. Tanaka, C. F. Camerer, and Q. Nguyen, “Risk and time preferences:Linking experimental and household survey data from vietnam,”
Amer-ican Economic Review , vol. 100, no. 1, pp. 557–571, 2010.[9] E. K. Zervoudi, “Value functions for prospect theory investors: Anempirical evaluation for us style portfolios,”
Journal of BehavioralFinance , vol. 19, no. 3, pp. 319–333, 2018.[10] T. Li and N. B. Mandayam, “When users interfere with protocols:Prospect theory in wireless networks using random access and datapricing as an example,”
IEEE Transactions on Wireless Communications ,vol. 13, no. 4, pp. 1888–1907, 2014.[11] Y. Yang, L. T. Park, N. B. Mandayam, I. Seskar, A. L. Glass, andN. Sinha, “Prospect pricing in cognitive radio networks,”
IEEE Trans-actions on Cognitive Communications and Networking , vol. 1, no. 1,pp. 56–70, 2015.[12] Y. Wang, W. Saad, N. B. Mandayam, and H. V. Poor, “Integrating energystorage into the smart grid: A prospect theoretic approach,” in
Proc.IEEE ICASSP , 2014.[13] L. Xiao, N. B. Mandayam, and H. V. Poor, “Prospect theoretic analysisof energy exchange among microgrids,”
IEEE Transactions on SmartGrid , vol. 6, no. 1, pp. 63–72, 2015.[14] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noiseto sensitivity in private data analysis,” in
Theory of cryptographyconference , 2006.[15] A. Ghosh and K. Ligett, “Privacy and coordination: Computing ondatabases with endogenous participation,” in
Proc. ACM Conference onElectronic Commerce , 2013.[16] L. K. Fleischer and Y.-H. Lyu, “Approximately optimal auctions forselling privacy when costs are correlated with data,” in
Proc. ACMConference on Electronic Commerce , 2012.[17] A. Ghosh and A. Roth, “Selling privacy at auction,”
Games andEconomic Behavior , vol. 91, pp. 334–346, 2015.[18] A. Ghosh, K. Ligett, A. Roth, and G. Schoenebeck, “Buying privatedata without verification,” in
Proc. ACM EC , 2014.[19] W. Wang, L. Ying, and J. Zhang, “The value of privacy: Strategicdata subjects, incentive mechanisms, and fundamental limits,”
ACMTransactions on Economics and Computation , vol. 6, no. 2, p. 8, 2018.[20] ——, “A game-theoretic approach to quality control for collectingprivacy-preserving data,” in
Proc. IEEE Annual Allerton Conference onCommunication, Control, and Computing , Sept 2015.[21] M. Chessa, J. Grossklags, and P. Loiseau, “A game-theoretic studyon non-monetary incentives in data analytics projects with privacyimplications,” in
Proc. IEEE CSF , 2015.[22] S. Ioannidis and P. Loiseau, “Linear regression as a non-cooperativegame,” in
International Conference on Web and Internet Economics ,2013.[23] H. Jin, L. Su, H. Xiao, and K. Nahrstedt, “Inception: Incentivizingprivacy-preserving data aggregation for mobile crowd sensing systems,”in
Proc. ACM MoBiHoc , 2016.[24] J. Lin, D. Yang, M. Li, J. Xu, and G. Xue, “Bidguard: A framework forprivacy-preserving crowdsensing incentive mechanisms,” in
Proc. IEEECNS , 2016.[25] Z. Zhang, S. He, J. Chen, and J. Zhang, “Reap: An efficient incentivemechanism for reconciling aggregation accuracy and individual privacyin crowdsensing,”
IEEE Transactions on Information Forensics andSecurity , vol. 13, no. 12, pp. 2995–3007, 2018.[26] Z. Zhang, H. Zhang, S. He, and P. Cheng, “Bilateral privacy-preservingutility maximization protocol in database-driven cognitive radio net-works,”
IEEE Transactions on Dependable and Secure Computing(Early Access) , 2017.[27] J. Pawlick and Q. Zhu, “A mean-field stackelberg game approach forobfuscation adoption in empirical risk minimization,” in
Proc. IEEEGlobalSIP , 2017.[28] G. Liao, X. Chen, and J. Huang, “Social-aware privacy-preservingcorrelated data collection,” in
Proc. ACM MoBiHoc , 2018. [29] J. Yu, M. H. Cheung, and J. Huang, “Spectrum investment underuncertainty: A behavioral economics perspective,”
IEEE Journal onSelected Area in Communication , vol. 34, no. 10, pp. 2667–2677, 2016.[30] L. Xiao, D. Xu, C. Xie, N. B. Mandayam, and H. V. Poor, “Cloud storagedefense against advanced persistent threats: A prospect theoretic study,”
IEEE Journal on Selected Areas in Communications , vol. 35, no. 3, pp.534–544, 2017.[31] L. Xiao, J. Liu, Q. Li, N. B. Mandayam, and H. V. Poor, “User-centricview of jamming games in cognitive radio networks,”
IEEE Transactionson Information Forensics and Security , vol. 10, no. 12, pp. 2578–2590,2015.[32] L. Xiao, C. Xie, M. Min, and W. Zhuang, “User-centric view ofunmanned aerial vehicle transmission against smart attacks,”
IEEETransactions on Vehicular Technology , vol. 67, no. 4, pp. 3420–3430,2018.[33] W. Saad, A. Sanjab, Y. Wang, C. A. Kamhoua, and K. A. Kwiat,“Hardware trojan detection game: A prospect-theoretic approach,”
IEEETransactions on Vehicular Technology , vol. 66, no. 9, pp. 7697–7710,2017.[34] G. El Rahi, S. R. Etesami, W. Saad, N. Mandayam, and H. V. Poor,“Managing price uncertainty in prosumer-centric energy trading: Aprospect-theoretic stackelberg game approach,”
IEEE Transactions onSmart Grid , vol. 10, no. 1, pp. 702–713, 2019.[35] B. Lu, C. Zhao, and S. Zhang, “Allocation strategy based on prospecttheory for power supplier,” in
Proc. IEEE International Conference onDRPT , 2015.[36] D. Easley and A. Ghosh, “Behavioral mechanism design: Optimalcrowdsourcing contracts and prospect theory,” in
Proc. ACM EC , 2015.[37] C.-H. Lee, “Prospect theoretic user satisfaction in wireless communica-tions networks,” in
Proc. IEEE WOCC , 2015.[38] A. Sanjab, W. Saad, and T. Basar, “Prospect theory for enhanced cyber-physical security of drone delivery systems: A network interdictiongame,” in
Proc. IEEE ICC , 2017.[39] G. El Rahi, A. Sanjab, W. Saad, N. B. Mandayam, and H. V. Poor,“Prospect theory for enhanced smart grid resilience using distributedenergy storage,” in
Proc. IEEE Annual Allerton Conference on Commu-nication, Control, and Computing , 2016.[40] S. R. Etesami, W. Saad, N. B. Mandayam, and H. V. Poor, “Stochasticgames for the smart grid energy management with prospect prosumers,”
IEEE Transactions on Automatic Control , vol. 63, no. 8, pp. 2327–2342,2018.[41] O. Morgenstern and J. Von Neumann,
Theory of games and economicbehavior . Princeton university press, 1953.[42] C. Dwork and A. Roth, “The algorithmic foundations of differentialprivacy,”
Foundations and Trends in Theoretical Computer Science ,vol. 9, no. 3–4, pp. 211–407, 2014.[43] M. O. Rieger and M. Wang, “Prospect theory for continuous distribu-tions,”
Journal of Risk and Uncertainty , vol. 36, no. 1, pp. 83–102,2008.[44]
Alibaba User Experience Survey, https://survey.alibaba.com/survey/kwRcnFft3 .[45] P. Domingos, “A few useful things to know about machine learning,”
Communications of the ACM , vol. 55, no. 10, pp. 78–87, 2012.[46] D. Niyato, M. A. Alsheikh, P. Wang, D. I. Kim, and Z. Han, “Marketmodel and optimal pricing scheme of big data and internet of things(iot),” in
Proc. IEEE ICC , 2016.[47] A. Ghosh, T. Roughgarden, and M. Sundararajan, “Universally utility-maximizing privacy mechanisms,”
SIAM Journal on Computing , vol. 41,no. 6, pp. 1673–1693, 2012.[48] M. Gupte and M. Sundararajan, “Universally optimal privacy mecha-nisms for minimax agents,” in
Proc. ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems , 2010.[49] K. E. Atkinson,
An introduction to numerical analysis . John Wiley &Sons, 2008.[50] “https://en.wikipedia.org/wiki/Truncated normal distribution.”[51] M. Abdellaoui, O. L’Haridon, and C. Paraschiv, “Experienced vs.described uncertainty: Do we need two prospect theory specifications?”
Management Science , vol. 57, no. 10, pp. 1879–1895, 2011.[52] M. Abdellaoui, H. Bleichrodt, and O. l’Haridon, “A tractable methodto measure utility and loss aversion under prospect theory,”
Journal ofRisk and uncertainty , vol. 36, no. 3, p. 245, 2008.[53] M. Abdellaoui, H. Bleichrodt, and C. Paraschiv, “Loss aversion underprospect theory: A parameter-free measurement,”
Management Science ,vol. 53, no. 10, pp. 1659–1674, 2007.[54] H. Fehr-Duda, M. De Gennaro, and R. Schubert, “Gender, financial risk,and probability weights,”
Theory and decision , vol. 60, no. 2-3, pp. 283–313, 2006. [55] A. M. Law, W. D. Kelton, and W. D. Kelton, Simulation modeling andanalysis . McGraw-Hill New York, 2007.
Guocheng Liao received the B.E. degree from SunYat-sen University in 2016. He is now pursuing thePh.D. degree with Department of Information Engi-neering, The Chinese University of Hong Kong. Hiscurrent research interests include data privacy andgame theory. He received 2017 IEEE GLOBECOMComSoc Young Professional Best Paper Award.
Xu Chen received the Ph.D. degree in informationengineering from the Chinese University of HongKong in 2012. He is a Full Professor with SunYat-sen University, Guangzhou, China, and the ViceDirector of the National and Local Joint EngineeringLaboratory of Digital Home Interactive Applica-tions. He was a Post-Doctoral Research Associatewith Arizona State University, Tempe, USA, from2012 to 2014, and a Humboldt Scholar Fellow withthe Institute of Computer Science, University ofGoettingen, Germany, from 2014 to 2016. He wasa recipient of the Prestigious Humboldt Research Fellowship awarded byAlexander von Humboldt Foundation of Germany, the 2014 Hong KongYoung Scientist Runner-Up Award, the 2016 Thousand Talents Plan Awardfor Young Professionals of China, the 2017 IEEE Communication SocietyAsiaPacific Outstanding Young Researcher Award, the 2017 IEEE ComSocYoung Professional Best Paper Award, the Honorable Mention Award of 2010IEEE international conference on Intelligence and Security Informatics, theBest Paper Runner-Up Award of 2014 IEEE International Conference onComputer Communications (INFOCOM), and the Best Paper Award of 2017IEEE International Conference on Communications. He is currently an AreaEditor of IEEE Open Journal of the Communications Society, an AssociateEditor of the IEEE Transactions Wireless Communications, IEEE Internetof Things Journal and IEEE Journal on Selected Areas in Communications(JSAC) Series on Network Softwarization and Enablers.
Jianwei Huang is a Presidential Chair Professorand the Associate Dean of the School of Scienceand Engineering, The Chinese University of HongKong, Shenzhen. He is also the Associate Directorof Shenzhen Institute of Artificial Intelligence andRobotics for Society, and a Professor in the De-partment of Information Engineering, The ChineseUniversity of Hong Kong. He received the Ph.D.degree from Northwestern University in 2005, andworked as a Postdoc Research Associate at PrincetonUniversity during 2005-2007. He has been an IEEEFellow, a Distinguished Lecturer of IEEE Communications Society, anda Clarivate Analytics Highly Cited Researcher in Computer Science. Heis the co-author of 9 Best Paper Awards, including IEEE Marconi PrizePaper Award in Wireless Communications in 2011. He has co-authored sixbooks, including the textbook on ”Wireless Network Pricing.” He receivedthe CUHK Young Researcher Award in 2014 and IEEE ComSoc Asia-Pacific Outstanding Young Researcher Award in 2009. He has served as anAssociate Editor of IEEE Transactions on Mobile Computing, IEEE/ACMTransactions on Networking, IEEE Transactions on Network Science andEngineering, IEEE Transactions on Wireless Communications, IEEE Journalon Selected Areas in Communications - Cognitive Radio Series, and IEEETransactions on Cognitive Communications and Networking. He has servedas an Editor of Wiley Information and Communication Technology Series,Springer Encyclopedia of Wireless Networks, and Springer Handbook ofCognitive Radio. He has served as the Chair of IEEE ComSoc CognitiveNetwork Technical Committee and Multimedia Communications TechnicalCommittee. He is the Associate Editor-in-Chief of IEEE Open Journal of theCommunications Society. He is the recipient of IEEE ComSoc MultimediaCommunications Technical Committee Distinguished Service Award in 2015and IEEE GLOBECOM Outstanding Service Award in 2010. More detailedinformation can be found at http://jianwei.ie.cuhk.edu.hk/. A PPENDIX
A. Proof of Theorem 2
The proof consists of three steps. We first derive the numberof participants in Stage II as a function of (cid:15) . We then obtainthe data collector’s utility as a function of (cid:15) and its derivate.We finally approximate the derivative and obtain the uniqueroot in the feasible set, which is the approximated optimalsolution.
Step 1 : Based on (10) and (11), we compute the number ofparticipants among all individuals as a function of (cid:15) : n ( (cid:15) ) = (cid:40) N W max + g ( (cid:15) p ) W max , if − g ( (cid:15) p ) < W max ;0 , otherwise . (22)Recall that N is the number of all individuals. Here g ( (cid:15) p ) = − M (cid:15) β based on (5) where M = c λ / m ( / m ) β (cid:205) mi = i β . Since β = we have g ( (cid:15) p ) = − M (cid:15) where M = c λ ( m + )/( m ) . Step 2 : We derive the data collector’s utility as a func-tion of (cid:15) and its derivate. We consider the nontrivial case − g ( (cid:15) p ) < W max , such that there always exists some individualswho would like to participate. So we have the feasible set: { (cid:15) : − g ( (cid:15) p ) < W max } . (23)Recall that (cid:15) p is the prospect privacy level of participation un-der the (cid:15) -differentially private mechanism. The data collector’sutility function and its derivative are as follows: U c ( (cid:15) ) = − k + lN W max + g ( (cid:15) p ) W max − N (cid:15) (cid:16) W max + g ( (cid:15) p ) W max (cid:17) , (24)and U (cid:48) c ( (cid:15) ) = klN g (cid:48) p ( (cid:15) ) W max (cid:104) + lN W max + g ( (cid:15) p ) W max (cid:105) + N W max + g ( (cid:15) p ) + g (cid:48) ( (cid:15) p ) (cid:15) W max (cid:104) W max + g ( (cid:15) p ) W max (cid:105) (cid:15) . (25) Step 3 : We approximate the derivative to find theapproximated optimal solution. We approximate + lN (cid:0) W max + g ( (cid:15) p ) (cid:1) /( W max ) with lN (cid:0) W max + g ( (cid:15) p ) (cid:1) /( W max ) ,considering a large population size N . Then we can obtain anapproximated (denoted by the superscript a ) version of (25): U (cid:48) ac ( (cid:15) ) = (cid:16) W max + g ( (cid:15) p ) W max (cid:17) (cid:15) N f ( (cid:15) ) . (26)Here f ( (cid:15) ) = (cid:18) W max + g ( (cid:15) p ) W max (cid:19) (cid:18) + k N l (cid:15) g (cid:48) ( (cid:15) p ) W max (cid:19) + (cid:15) g (cid:48) ( (cid:15) p ) W max , (27)where g ( (cid:15) ) = − M (cid:15) and g (cid:48) ( (cid:15) p ) is the derivative, i.e., g (cid:48) ( (cid:15) p ) = − M .As in the feasible set of (cid:15) in (23) we have W max + g ( (cid:15) p ) > in (26), computing the root of U (cid:48) ac = is equivalent tocomputing the root of the polynomial part f ( (cid:15) ) = . Theequation f ( (cid:15) ) = has two real roots as follows: (cid:15) L = W max M + r − r , (cid:15) H = W max M + r + r , (28) where r = (cid:114) A + A A + A A , r = (cid:114) A − A A − A A + A r , A = ( W max ) M , A = · / W , A = ( CM W + C M W + (cid:112) B , B = − C M W + ( CM W + C M W ) ) , A = · / CM , A = σ W CM + W M . Next, we show that there is a unique root of f ( (cid:15) ) = in the feasible set ( , W max / M ) (which is equivalent to (23)),while the other one is not in the feasible set. First, we showthat the function f ( (cid:15) ) is continuous, f ( ) · f ( W max / M ) < ,and f ( W max / M ) · f ( + ∞) < . This implies that the equation f ( (cid:15) ) = has at least one root in ( , W max / M ) and at least oneroot in ( W max / M , + ∞) . Together with (28), we know that theunique root in ( , W max / M ) is (cid:15) L . (cid:3) B. Proof of Corollary 1
The proof consists of two steps. We first obtain the approx-imated optimal solution under the EUT case, similar to thatunder the PT case. We then compare the approximated optimalsolutions of both cases.
Step 1 : We obtain the approximated optimal solution underthe EUT case similarly to that under the PT case. Recallthat when λ = and β = , (4) corresponds to the EUTrepresentation. We denote the privacy level of participationwith the EUT representation under the (cid:15) -differentially privatemechanism as (cid:15) e (we use this to characterize the feasibleset later for the ease of presentation). The privacy cost ofparticipation for the EUT case is given by g ( (cid:15) e ) = − M e (cid:15) ,where M e = c based on (5).Let f e ( (cid:15) ) = (cid:18) W max + g ( (cid:15) e ) W max (cid:19) (cid:18) + k N l (cid:15) g (cid:48) ( (cid:15) e ) W max (cid:19) + (cid:15) g (cid:48) ( (cid:15) e ) W max (29)be the polynomial part of the approximated derivative. Here g (cid:48) ( (cid:15) e ) is the derivative, i.e., g (cid:48) ( (cid:15) e ) = − M e . Let ˜ (cid:15) ∗ e be the rootof f e ( (cid:15) ) = in the feasible set { (cid:15) : − g ( (cid:15) e ) < W max } , i.e., theapproximated optimal solution for the EUT case. Step 2 : We compare the approximated optimal solution ˜ (cid:15) ∗ e for the EUT case with ˜ (cid:15) ∗ for the general PT case (excludingthe EUT case). Recall that in the proof of Theorem 2, we have f ( (cid:15) ) = (cid:18) W max + g ( (cid:15) p ) W max (cid:19) (cid:18) + C MW max (cid:15) g (cid:48) ( (cid:15) p ) (cid:19) + (cid:15) g (cid:48) ( (cid:15) p ) W max and ˜ (cid:15) ∗ is the root of f ( (cid:15) ) = . Based on the difference ofcharacteristics between PT modeling and EUT modeling, wehave f e ( ˜ (cid:15) ∗ e ) = (cid:18) W max + g ( (cid:15) p ) W max (cid:19) (cid:18) + k N l ˜ (cid:15) ∗ e g (cid:48) ( (cid:15) p ) W max (cid:19) + ˜ (cid:15) ∗ e g (cid:48) ( (cid:15) p ) W max < (cid:18) W max + g ( (cid:15) e ) W max (cid:19) (cid:18) + k N l ˜ (cid:15) ∗ e g (cid:48) ( (cid:15) e ) W max (cid:19) + ˜ (cid:15) ∗ e g (cid:48) ( (cid:15) e ) W max = f e ( ˜ (cid:15) ∗ e ) = . (30)The inequality holds for the following reasons. First, wehave g ( (cid:15) p ) < g ( (cid:15) e ) ⇔ − λ c (cid:15) β < − c (cid:15) . (31)The absolute value of privacy cost under the general ProspectTheory modeling is larger than that under the EUT modeling.Second, we have g (cid:48) ( (cid:15) p ) < g (cid:48) ( (cid:15) e ) ⇔ − λ c β (cid:15) β − < − c . (32)The increasing rate of the absolute value of privacy cost undergeneral Prospect Theory modeling is larger than that underEUT modeling.Since f ( ) > and f ( ˜ (cid:15) ∗ e ) < , we have f ( )· f ( ˜ (cid:15) ∗ e ) < . So ˜ (cid:15) ∗ , the root of f ( (cid:15) ) = , is in the interval ( , ˜ (cid:15) ∗ e ) , i.e., ˜ (cid:15) ∗ < ˜ (cid:15) ∗ e .This completes the proof. (cid:3) C. Proof of Corollary 2
The proof consists of two steps. We first characterize theapproximated optimal solution. We then compare the approx-imated optimal solutions of both cases.
Step 1 : We characterize the approximated optimal solutionthrough the approximated derivative of the objective functionunder both cases. Consider two cases: one with a higher valueof parameter λ and the other with a lower value of λ . Let f λ ( (cid:15) ) = (cid:18) W max + g ( (cid:15) λ ) W max (cid:19) (cid:18) + k N l (cid:15) g (cid:48) ( (cid:15) λ ) W max (cid:19) + (cid:15) g (cid:48) ( (cid:15) λ ) W max (33)be the corresponding polynomial part of the approximatedderivative for the case of λ = λ . Here the privacy cost is g ( (cid:15) p ) = − λ c ( m + ) (cid:15) β / m . Let f λ ( (cid:15) ) = (cid:18) W max + g ( (cid:15) λ ) W max (cid:19) (cid:18) + k N l (cid:15) g (cid:48) ( (cid:15) λ ) W max (cid:19) + (cid:15) g (cid:48) ( (cid:15) λ ) W max (34)be the corresponding polynomial part of the approximatedderivative for the case of λ = λ . Here the privacy cost is g ( (cid:15) λ ) = − λ c ( m + ) (cid:15) β / m .Let ˜ (cid:15) ∗ λ be the root of f λ ( (cid:15) ) = and ˜ (cid:15) ∗ λ be the root of f λ ( (cid:15) ) = . Then ˜ (cid:15) ∗ λ and ˜ (cid:15) ∗ λ are the approximated optimalsolution of the case of λ = λ and λ = λ , respectively. Step 2 : We compare the approximated optimal solutionsof both cases through comparing the polynomial part of theapproximated derivatives. Based on the difference of the PTcharacteristics between both cases, we have f λ ( ˜ (cid:15) ∗ λ ) = (cid:18) W max + g ( (cid:15) λ ) W max (cid:19) (cid:18) + k N l ˜ (cid:15) ∗ λ g (cid:48) ( (cid:15) λ ) W max (cid:19) + ˜ (cid:15) ∗ λ g (cid:48) ( (cid:15) λ ) W max < (cid:18) W max + g ( (cid:15) λ ) W max (cid:19) (cid:18) + k N l ˜ (cid:15) ∗ λ g (cid:48) ( (cid:15) λ ) W max (cid:19) + ˜ (cid:15) ∗ λ g (cid:48) ( (cid:15) λ ) W max = f λ ( ˜ (cid:15) ∗ λ ) = . (35) The inequality holds due to the following reasons. First, wehave g ( (cid:15) λ ) < g ( (cid:15) λ ) ⇔ − λ c ( m + ) m (cid:15) β < − λ c ( m + ) m (cid:15) β . (36)Second, we have g (cid:48) ( (cid:15) λ ) < g (cid:48) ( (cid:15) λ ) ⇔ − βλ c ( m + ) m (cid:15) β − < − βλ c ( m + ) m (cid:15) β − . (37)Since f λ ( ) > and f λ ( ˜ (cid:15) ∗ λ ) < , we have f λ ( )· f λ ( ˜ (cid:15) ∗ λ ) < . So ˜ (cid:15) ∗ λ , the root of f λ ( (cid:15) ) = , is in the interval ( , ˜ (cid:15) ∗ λ ) ,i.e, ˜ (cid:15) ∗ λ < ˜ (cid:15) ∗ λ . Thus, the approximated optimal solution ˜ (cid:15) ∗ decreases in the parameter λ . This completes the proof. (cid:3) D. Proof of Theorem 3
The proof consists of two steps. We first obtain the approx-imated optimal solution. We then compare the approximatedoptimal solutions of both cases.
Step 1 : We characterize the approximated optimal solutionthrough the approximated derivative of the objective functionunder the case of a positive reference point. The prospectprivacy level of participation based on (5) is as follow (withsubscript pos and superscript p ): (cid:15) ppos = tm t (cid:213) i = (cid:18) (cid:15) ref − im (cid:15) (cid:19) β − (cid:16) − tm (cid:17) λ m (cid:213) i = t + (cid:18) im (cid:15) − (cid:15) ref (cid:19) β , for t satisfying tm (cid:15) < (cid:15) ref and t + m (cid:15) > (cid:15) ref . (38)The first summation term corresponds to the gain part, and thesecond summation term corresponds to the loss part.The prospect privacy level of non-participation is as follow(with subscript pos and superscript n ): (cid:15) npos = (cid:15) β ref . (39)The implementation to obtain the approximated derivativeof the objective function is similar to that under the case ofa zero reference point. Let U (cid:48) apos ( (cid:15) ) (formulated below) be theapproximated derivative of the objective function and f pos ( (cid:15) ) be the polynomial part in U (cid:48) apos ( (cid:15) ) : U (cid:48) apos ( (cid:15) ) = (cid:16) W max + g ( (cid:15) ppos )− g ( (cid:15) npos )) W max (cid:17) (cid:15) N f pos ( (cid:15) ) , (40)and f pos ( (cid:15) ) = (cid:32) W max + g ( (cid:15) ppos ) − g ( (cid:15) npos ) W max (cid:33) × (cid:32) + k N l (cid:15) g (cid:48) ( (cid:15) ppos ) − g (cid:48) ( (cid:15) npos ) W max (cid:33) − g (cid:48) ( (cid:15) npos ) − g (cid:48) ( (cid:15) ppos ) W max (cid:15) . (41)Let ˜ (cid:15) ∗ pos be the root of f pos ( (cid:15) ) = in the feasible set { (cid:15) : g ( (cid:15) npos ) − g ( (cid:15) ppos ) < W max } . i.e., the approximated optimalsolution for the case of a positive reference point. Step 2 : We compare the approximated optimal solutionsof both cases through comparing the polynomial part of theapproximated derivatives. Recall that ˜ (cid:15) ∗ the root of f ( (cid:15) ) = ,i.e., the approximated optimal solution under the case of azero reference point. If f pos ( ˜ (cid:15) ∗ ) < , (42)since f pos ( ) > , then we have f pos ( ) · f pos ( ˜ (cid:15) ∗ ) < , i.e., ˜ (cid:15) ∗ pos is in the interval ( , ˜ (cid:15) ∗ ) . So we have ˜ (cid:15) ∗ pos < ˜ (cid:15) ∗ . If f pos ( ˜ (cid:15) ∗ ) > , (43)then ˜ (cid:15) ∗ pos is outside the interval ( , ˜ (cid:15) ∗ ) . So we have ˜ (cid:15) ∗ < ˜ (cid:15) ∗ pos .If f pos ( ˜ (cid:15) ∗ ) = , (44)then ˜ (cid:15) ∗ is also the root of f pos ( (cid:15) ) = . Based on the definitionof ˜ (cid:15) ∗ pos , we have ˜ (cid:15) ∗ = ˜ (cid:15) ∗ pos . Recall that there is only one rootin the feasible set in the proof of Theorem 2. This completesthe proof.. Recall that there is only one rootin the feasible set in the proof of Theorem 2. This completesthe proof.