Aggregating Votes with Local Differential Privacy: Usefulness, Soundness vs. Indistinguishability
Shaowei Wang, Jiachun Du, Wei Yang, Xinrong Diao, Zichun Liu, Yiwen Nie, Liusheng Huang, Hongli Xu
AAggregating Votes with Local Differential Privacy:Usefulness, Soundness vs. Indistinguishability
Shaowei Wang, Jiachun Du [email protected] Games
Wei Yang ∗ , Xinrong Diao, Zichun Liu,Yiwen Nie, Liusheng Huang ∗ , Hongli Xu University of Science and Technology of China ∗ Corresponding Authors
ABSTRACT
Voting plays a central role in bringing crowd wisdom to collec-tive decision making, meanwhile data privacy has been a commonethical/legal issue in eliciting preferences from individuals. Thiswork studies the problem of aggregating individual’s voting dataunder the local differential privacy setting, where usefulness andsoundness of the aggregated scores are of major concern. One naiveapproach to the problem is adding Laplace random noises, how-ever, it makes aggregated scores extremely fragile to new typesof strategic behaviors tailored to the local privacy setting: dataamplification attack and view disguise attack. The data amplifica-tion attack means an attacker’s manipulation power is amplifiedby the privacy-preserving procedure when contributing a fraudvote. The view disguise attack happens when an attacker coulddisguise malicious data as valid private views to manipulate thevoting result.In this work, after theoretically quantifying the estimation errorbound and the manipulating risk bound of the Laplace mechanism,we propose two mechanisms improving the usefulness and sound-ness simultaneously: the weighted sampling mechanism and theadditive mechanism. The former one interprets the score vector asprobabilistic data. Compared to the Laplace mechanism for Bordavoting rule with d candidates, it reduces the mean squared errorbound by half and lowers the maximum magnitude risk bound from + ∞ to O ( d nϵ ) . The latter one randomly outputs a subset of candi-dates according to their total scores. Its mean squared error boundis optimized from O ( d nϵ ) to O ( d nϵ ) , and its maximum magnituderisk bound is reduced to O ( d nϵ ) . Experimental results validate thatour proposed approaches averagely reduce estimation error by 50%and are more robust to adversarial attacks. CCS CONCEPTS • Security and privacy → Privacy protections . KEYWORDS privacy, vote, security, aggregation, election
Collective decision making is widely adopted by governing orga-nizations and commercial service providers, which benefits fromthe wisdom of crowd via aggregating individual preferences. Forexample, during an election for social choice, a profile of rankingdata from voters is summarized to determine the final preference
Submission for Review, Journal or Conference © 2019 ordering over several options; when online service providers arechoosing the next movement, millions of users’ preference dataare aggregated to measure relative popularity between alternativetreatments.The mapping from many individual preferences to a single result-ing ordering is called the voting rule. An intuitive and fundamentalclass of voting rule is positional voting rule, the general idea ofwhich is assigning each candidate a score according to its posi-tion/rank in each voter’s preference. Examples of positional votingrule include the Borda counts [7], plurality voting and Nauru voting[59]. Specifically in Borda voting, the i -th candidate in a vote scores d − i points, where d is the number of options or candidates. If 4voters’ preferences over 5 candidates are:voter 1 : A A A A A ; voter 2 : A A A A A ;voter 3 : A A A A A ; voter 4 : A A A A A , following the rule of Borda counts, the candidate A scores 10 points,and the candidate A wins the voting with highest 13 points.Privacy is a basic requirement in secured voting systems [34], asproviding secrecy of votes could avoid leaking personal preferencesand help to elicit honest responses, especially when voting onsensitive topics. Privacy threats not only come from parties outsidethe voting system, but also come from the administrator, the counteror other voters who might want to infer certain voters’ votes. Securemulti-party computation [11] alleviates these problems by securelyaggregating scores, but is still fragile to collusion between thecounter and other voters, and may have efficiency issues for large-scale online voting systems involving millions of voters. Differentialprivacy [22] could also be employed for privacy preserving voting,which ensures distinguishability of results no matter any singlevote presents or not in the voting profile. However, it relies on theexistence of a trustful data curator/counter for all voters.Another paradigm to privacy preserving voting is local differ-ential privacy, which sanitizes the vote locally and independentlyon the voter’s side, and ensures up to exp ( ϵ ) distinguishability onoutputting probabilities no matter what the true vote a voter holds.Local privacy has the advantages of being information-theoreticallyrigid, computationally efficient and operationally flexible. The voterhas full controllability during the privacy preserving procedurewithout the trust of any parties, the counter/administrator is alsotolerable to voters’ unsynchronized opt-out, withdrawal and modi-fication actions on votes. These advantages make local differentialprivacy the best fit for recently enacted General Data ProtectionRegulation (GDPR [67]) in the European Union, which emphasizesdata owner’s controllability on contribution, storage, analysis andtransfer of their data. a r X i v : . [ c s . CR ] A ug ubmission for Review, Journal or Conference A straight-forward way to realize local differential privacy forvoting data is adding Laplace random noises. After representingvotes in a score form as in Table 1, Laplace noises of scale ϵ ∆ areindependently added to each score in a vote, here the ϵ is privacylevel and the ∆ is the maximum absolute difference between anytwo scored votes. If d is odd, we have ∆ = d − for Borda voting.For example, the scored vote v ( ) possibly becomes:˜ v ( ) = [ . , − . , . , − . , − . ] , after adding Laplace noises of scale . . In order to reach the cri-terion of local differential privacy, the (unbiased) private view ˜ v might far deviate from the true vote v . The expected deviation ofthe private view here is E [| ˜ v − v | ] = d ∆ ϵ . This formula indicatespreserving privacy comes at the cost of data usefulness, improvingwhich is the central focus in the current local privacy literature(e.g., in [9, 20, 41, 42]). Table 1: An example of scored votes on candidates inBorda voting. A A A A A v ( ) v ( ) v ( ) v ( ) For voting or other data aggregation systems whose results havecritical consequences, the soundness of the system against to strate-gic participants is also fundamental. We observe that local privacypreservation comes at the cost of also the soundness. The power ofadversarial attacks (e.g., vote fraud) might be amplified by the pri-vacy preserving procedure, disguising as private views also makesadversaries much easier to manipulate the voting result.
Data amplification attack.
In the case that an adversary couldcontribute fraud data but is unable to skip the privacy preservingprocedure, the effects that the fraud data can have on the resultmight be amplified due to the intrinsic randomness nature of privacypreservation. A more rigid level of privacy preservation means theprivate view will have more randomness, and hence have more(maximum or expected) magnitude. Take the Laplace approach asan example, in the non-private setting (e.g., ϵ = + ∞ ), the magnitudeof one vote is | v | = (cid:205) j ∈[ , d ] ( d − j ) = d ( d − ) =
10, but the maximummagnitude of a private view ˜ v becomes infinite as Laplace randomnoises are unbounded. The expected magnitude of the private viewis a function of privacy level as follows: E [| ˜ v |] = ∆ ϵ · e ϵ / ∆ − e ( − d ) ϵ / ∆ e ϵ / ∆ − + d ( d − ) . Although local differential privacy ensures indistinguishability onoutputs of any possible votes, hence the constructive power of onevote on the voting result is diminished, but its deconstructive poweris amplified.
View disguise attack.
In the case that an adversary has directcontrol on the private view sent to the aggregator, the adversarywill be able to disguise a malicious private view as an ordinary (ran-domized) one, and thus make constructive/deconstructive changesto the aggregation result. The domain of private views is broaderthan the true data and grows with the level of privacy, hence anadversary’s constructive/deconstructive power becomes larger. Forexample, in the non-private setting, the domain of a scored vote isbounded by [ , d − ] d , while in the Laplace approach, the domainof the private view is scaled to [−∞ , + ∞] d . Even though we canfilter out private views that are extremely unlikely to be observed,the filtered domain [− ˜ Θ ( ϵ ) , + ˜ Θ ( ϵ )] also grows with the level ofprivacy preservation (see Section 4.3 for detail). Compared to se-lecting a value from the domain of scored votes in the non-privatesetting, an adversary is easier to manipulate the election result byselecting a value from the (filtered) domain of private views. As a remedy to above issues in the naive approach to local differ-ential private vote aggregation, this work aims to develop novelmechanisms improving the usefulness and soundness. The maincontent and contributions of this work are summarized as follows:I. We identify soundness issues due to privacy preservation inlocal private data aggregation systems and categorize themas data amplification/view disguise attack based on an adver-sary’s controllability over the privacy preserving procedure.The data amplification attack captures an adversary’s decon-structive power on the aggregation result by contributingfraudulent data. The view disguise attack captures an adver-sary’s constructive/deconstructive power on the aggregationresult by directly disguising private views. Formal quantifiedmetrics are defined (in Section 3) to measure the power ofadversarial attacks and the soundness of local private votingsystems.II. We present thorough analyses of Laplace mechanism for lo-cal private vote aggregation (in Section 4, partially in formerparagraphs), including sensitivity bounds of general posi-tional voting rules, error bounds of estimated votes, and riskbounds under strategic attacks.III. A novel alternative to the Laplace approach: the weightedsampling mechanism, is proposed for general positional vot-ing rules (in Section 5). The mechanism samples an optionwith a probability mass proportional to its score and thenapplies well-studied local private methods on the categoricaloption. For Borda counts, this mechanism reduces estimationerror bound from ∼ d nϵ to ∼ d nϵ , and reduces the maxi-mum manipulation risk bound from + ∞ to O ( d nϵ ) , comparedto the Laplace mechanism.IV. Given that the weighted sampling mechanism works unsatis-factorily in the high privacy regime, we further propose theadditive mechanism (in Section 6), which samples a subsetof candidates according to the summation of their scores.The sampling problem underlying the mechanism is a strictcase of the weighted random sampling problem [28], weprovide a recursive algorithm as a solution. The additive ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference mechanism has estimation error bound of O ( d nϵ ) , and ex-pected/maximum manipulation risk bounds both at O ( d nϵ ) .V. We discuss the interaction/trade-off between usefulness,soundness, truthfulness and indistinguishability in local pri-vate data aggregation (in Section 7). Quantified relation be-tween estimation error bound and manipulation risk boundis built, which shows optimizing usefulness usually benefitssoundness.VI. Experiments on extensive voting scenarios are conducted(i.e., in Section 8) to validate the usefulness and soundness ofproposed mechanisms. Results demonstrate that estimationerrors decrease by 1 / Security requirements in voting systems cover many aspects, suchas privacy, verifiability and soundness. Here we review some repre-sentative works on the privacy/anonymity/soundness in the areaof electronic voting and computational social choice, then retro-spect recent works on privacy preserving data analyses within thedifferential privacy framework.
Since the seminal work of Chaum [12],plenty of cryptographic schemes have contributed to keeping votersor (and) votes secret in electronic voting systems. Schemes basedon homomorphic encryption operate on encrypted votes to com-pute the sum/average of votes without knowing plain-text of votes(e.g., ElGamal encryption in [38, 53], Paillier encryption in [60, 71],and [39, 56]), hence keeps votes private from the vote counteror adversaries. Further combining cryptographic techniques withanonymous channels (e.g, the mixnet in [1, 8, 46, 55]) that randomlyshuffle a bundle of messages from voters, votes (or ciphertexts) arethen unlinkable to source voters. Having one single vote counterin a voting system is intolerant to failures or attacks, several workshence then multiple authorities secret sharing on decryption key[18] to improve robustness of the system. In case of multiple author-ities, the secrecy of every vote could be improved by decomposingthem into several parts, each part is then sent to some authorities(e.g, in [5, 14, 17]). Consequently, corrupted authorities less than athreshold number are unable to derive complete information of avote.Another line of works that could be employed for privacy pre-serving voting systems is data perturbation, which uses techniquesof generalization (e.g., k -anonymity [62] in [10, 74]) and random-ization (e.g., Gaussian noise adding [23], randomized response [70]and differential privacy [22]) to hide the exact values of each voteand the voting result. Compared to cryptographic techniques pro-vide computational secrecy and anonymity of votes/voters, dataperturbation approaches are usually much more efficient for imple-mentations. Among them, classic privacy notions and techniqueslike k -anonymity and Gaussian noise adding have shown to berisky for adversaries with prior knowledge [43, 48, 50]. Consider strategic behaviors in voting systemslike vote manipulation, fraud and bribery, many works have con-tributed to finding counter-measures for various voting rules. Oneapproach is putting restrictions on voters’ preference. Specifically,works of [21, 29, 30, 52] show that voting with single peak prefer-ence and quasilinear preference is truthful and non-manipulatable.Another approach is ensuring computational hardness of find-ing constructive/deconstructive manipulation strategies (e.g., in[3, 15, 32, 57]). However, for positional voting rules considered inthis work, there exist simple greedy algorithms finding strategicvotes that manipulate the result in polynomial time [2]. There arealso some works propose to introduce randomness to the votingprocess (e.g., sampling voter at random) for mitigating manipu-lation attacks, but the usefulness of the voting result is severelyharmed [35]. As a comparison, this work introduces randomnessto votes for the purpose of privacy preserving and demonstratesthat local differential privacy helps to defend against vote manipu-lation but makes the voting result more vulnerable to fraudulentvotes, when compared to non-private settings (see Section 7.2 fordiscussion).
As the state-of-the-artdata perturbation notion and technique for databases, differen-tial privacy [22] in the centralized setting ensures information-theoretical privacy. For numerical outputs such as counts and his-togram, injecting Laplace random noises [24] is the most popularmechanism (e.g., in [37, 47, 72]). For categorical outputs such aschoosing a winning candidate, exponential mechanism [51] satis-fies differential privacy by randomly selecting a category with aprobability according to its utility loss (e.g., in [6, 61]). For morecomplex data analyses and mining tasks (e.g, classification learning,clustering), a sequence combination of Laplace and exponentialmechanism needs to be used (e.g., in [13, 49]).Despite the functionality of data privacy preservation, differen-tial privacy also has a close relation to stability [25, 40], and couldavoid false discovery in scientific experiments [26, 27, 36]. Sincethe outputting results are almost indistinguishable when any sin-gle individual’s data present or not, the exponential mechanism isshown to be sound to data manipulation and data fraud [51]. How-ever, due to the discrepancy in soundness performance betweenunbounded and bounded differential privacy [45], local differentialprivate data aggregation is fragile to data fraud attacks (see Section7.2 for further discussion).
When defining neighboringdatasets as any pairs of values individuals may hold, differentialprivacy is preserved in the local setting [20] (LDP). Because of thesolidness of privacy guarantee and flexibility for deployment, LDPhas gained massive attention from both industry and academy. Gi-ant internet service providers are collecting user preference (e.g.,browser’s homepage) and usage records (e.g., typed words) fromtheir users in the local differential privacy manner, such as Google[31, 33], Apple [64, 65], and Microsoft [19]. Research works haveexplored local private data analyses and modeling tasks on variouskinds of data, such as distribution estimation on categorical data[4, 41] and set-valued data [58, 69], joint distribution estimation and ubmission for Review, Journal or Conference
Table 2: List of notations.Notation Description A The set of candidates/options d The number of candidates/options n The number of voters/participants w A voting rule’s score vector v ( i ) A scored vote of voter i that is a permutation of w D v The set of possible permutations of w ˜ v ( i ) An estimator (private view) of the scored vote v ( i ) D ˜ v The set of all possible private views θ Average scores of candidates.˜ θ Estimator of average scores. ϵ The privacy budgetfrequent itemset mining on multidimensional data [16, 73], meanestimation on numerical data [20]. There are also theoretical con-tributions to give lower error bounds of local private data analyses(e.g., in [20, 42, 66]).Existing works on local differential privacy focus mostly on theusefulness aspect, some of which may also consider computationaland communicational efficiency. This work calls for attention tothe soundness aspect in local private data analyses, which is se-vere in real-world systems (e.g., the RAPPOR of Google [31] andiOS/macOS data collection of Apple [63]) where there are maliciousand adversarial clients.It’s worth noting that for some specific voting rules, such asplurality/k-approval voting (see Section 3), the scored vote can bedirectly seen as categorical/set-valued data and then processed withexisting approaches (e.g., in [41, 58, 69]). This work intends to dealwith positional voting with the arbitrary design of score vector. Thelocal private vote aggregation problem can also be cast as the multi-dimensional mean estimation problem, one approach to which isadding Laplace noises to every score (see detail in Section 4), an-other is first randomly sampling one (data-independent) candidatewithout knowing the vote and then adding Laplace noises to thecandidate’s points (e.g., in [54, 68]). However, the latter approachassumes independence of each vote and can’t obtain an unbiasedestimator of a whole vote, hence is beyond discussions of this work.
This section formally introduces definitions of positional voting,local differential privacy and the model of local private vote aggre-gation. Usefulness and robustness metrics are also defined in thissection. We summarize notations throughout this work in Table 2.
A vote π is a linear ordering over all candidates A = { A , A , ..., A d } , where the relation ≻ between two candidates isthe preference of a voter. In positional voting, the j -th candidate π j in a vote is assigned by a score of w j . For reasonable positional vot-ing rules, the score vector w = { w , w , ..., w d } is non-increasing,which means w j ≥ w j + . Examples of score vector for popularpositional voting rules with 5 candidates are as follows: vote ... LDP view [ 2 3 4 1 0 ][ 0 4 3 1 2 ][ 0 3 2 1 4 ][ 4 3 1 0 2 ] [ 1.1 -0.2 6.4 1.5 0.2 ][ -1.3 2.9 1.8 0.2 1.3 ][ -0.7 1.6 3.8 1.5 1.8 ][ 4.6 0.6 1.2 -0.6 0.1 ] counter
Figure 1: Demonstration of vote aggregation with ϵ -LDP. • Borda: { , , , , } ; • Nauru: { / , / , / , / , / } ; • Plurality: { , , , , } ; • Anti-plurality: { , , , , } ; • k-Approval: { , , , , } ( k = ) .For the simplicity of reference, we rewrite the voter i ’s vote π ( i ) as numerical scores for each candidate: v ( i ) = [ v ( i ) , v ( i ) , ..., v ( i ) d ] ,where v ( i ) j is the score of candidate A j . The local differential private notion ensures bounded distinguisha-bility in outputs for any two possible inputs, hence blocks adver-saries from inferring much information from outputs. Let D π de-note the domain of votes, which represents all possible orderingsover candidates A , let M denote a randomized mechanism, and D M denote the output domain of the mechanism, Definition 3.1formally defines local differential privacy. Definition 3.1 ( ϵ -LDP). A randomized mechanism M satisfieslocal ϵ -differential privacy iff for any possible vote pair π , π ′ ∈ D π ,and any possible output t ∈ D M , P [M( π ) = t ] ≤ exp ( ϵ ) · P [D M ( π ′ ) = t ] . Here the parameter ϵ is called the privacy budget, which con-trols the level of privacy preservation. Practical values for ϵ rangebetween [ . , . ] . Consider n voters N = { , , ..., n } , the voter i holds a vote π ( i ) (or a scored vote v ( i ) ), for the purpose of privacy preservation, thevoter sanitizes π ( i ) to get the private view ˜ v ( i ) by running a ϵ -LDPmechanism M locally and independently. The private view ˜ v ( i ) from a meaningful mechanism is an estimator of the true scoredvote v ( i ) , hence the counter in the voting system could estimatethe actual average scores θ = n (cid:205) v ( i ) as:˜ θ = n (cid:213) ˜ v ( i ) . (1)Figure 1 demonstrates the above procedures of local private voteaggregation. In adversarial environments, the counter may filterout some potential malicious private views. ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference Usefulness metrics.
Estimators of average scores ˜ θ givenby different mechanisms have varied accuracy, here we use threeusefulness metrics: • mean squared error: err MSE = E [| ˜ θ − θ | ] ; • total variation error: err TVE = E [| ˜ θ − θ | ] ; • maximum absolute error: err MAE = E [ max j ∈[ , d ] | ˜ θ j − θ j |] . Since average scores eventually determine one winning can-didate, let j max = arg max j ∈[ , d ] θ j denote the candidate’s in-dex with maximum average score in true average scores θ , and˜ j max = arg max j ∈[ , d ] ˜ θ j denote the winning candidate’s index inthe estimated average scores ˜ θ , we define following metrics: • accuracy of winner: accuracy AOW = E [ ˜ j max = j max ] ; • loss of winner: err LOW = E [ θ j max − θ ˜ j max ] . Soundness metrics.
To measure an adversary’s deconstructivepower of data amplification attack by contributing one extra vote v , we use following metrics: • maximum magnitude: risk MM = max ˜ v ∈D ˜ v | ˜ v | n ; • expected magnitude: risk EM = E [ | ˜ v | n ] .These two metrics measure maximum possible/expected absolutedifference that one single private view can make to average scoresrespectively.For further measuring an adversary’s construc-tive/deconstructive power of view disguise attack by controllingone private view ˜ v , we define the diameter of the output space D ˜ v that an adversary could choose a value from as follows: • domain diameter: risk DD = max ˜ v , ˜ v ′ ∈D ˜ v | ˜ v − ˜ v ′ | . For numerical values like scored votes, the Laplace mechanism isthe most popular approach to (local) differential privacy. The scaleof the Laplace noises is calibrated to the privacy budget ϵ and thesensitivity ∆ of the vote as in Algorithm 1. Lemma 4.1 gives exactbound of the sensitivity ∆ for all positional voting rules, Theorem4.2 gives formal ϵ -LDP guarantee.Lemma 4.1. For any positional voting rules with non-increasingscore vector w , the sensitivity of scored votes is: ∆ = max v , v ′ ∈ D v | v − v ′ | = (cid:213) j ∈[ , d ] | w j − w d − j + | . Proof. Given that every v ∈ D v is a permutation of the scorevector w , we have ∆ = max v ∈ D v | w − v | . Consider any scoredvote v , if there exist two indexes j , j ′ that j < j ′ and v j > v j ′ , wedenote −→ v the scored vote that swapped and only swapped v j and v j ′ in the scored vote v , then we have: | w − v | − | w − −→ v | = | w j − v j | + | w j ′ − v j ′ | − | w j − v j ′ | − | w j ′ − v j | = (| w j − v j | − | w j ′ − v j |) − (| w j − v j ′ | − | w j ′ − v j ′ )|≤ , since f ( x ) = | w j − x | − | w j ′ − x | is a non-increasing function for x ∈ R when w j > w j ′ . By iteratively swapping values that v j > v j ′ ( j < j ′ ) in any scored vote v , when −→ v is the reverse of w , we finallyhave | w − v | ≤ (cid:205) j ∈[ , d ] | w j − w d − j + | . □ Theorem 4.2.
Algorithm 1 satisfies ϵ -LDP . Proof. Because every vote π is mapped to one scored vote v ∈ D v , to prove P [M( π ) = t ] ≤ exp ( ϵ ) · P [D M ( π ′ ) = t ] holds for any π and π ′ , it’s enough to show Pr [ ˜ v = t | v ] ≤ exp ( ϵ ) · Pr [ ˜ v = t | v ′ ] holds for any input scored votes v , v ′ and output t ∈ D ˜ v . By defini-tion of a Laplace random variable Pr [ Lap ( s ) = x ] = s exp (− | x | s ) ,we have: Pr [ ˜ v = t | v ] = ϵ d d ∆ d exp (− ϵ · | t − v | ∆ ) , hence Pr [ ˜ v = t | v ] Pr [ ˜ v = t | v ] = exp ( ϵ ·(| t − v ′ | −| t − v | ) ∆ ) ≤ exp ( ϵ ) . □ Algorithm 1
Laplace mechanism
Input:
A scored vote v ∈ D v , privacy budget ϵ and the scorevector w of the voting rule. Output:
An unbiased private view ˜ v ∈ R m that satisfies ϵ -LDP. ▷ Compute sensitivity ∆ ← (cid:205) j ∈[ , d ] | w j − w d − j + | ▷ Randomization by adding Laplace noises for j ← to d do ˜ v j ← v j + Lap ( ∆ ϵ ) end for return ˜ v = { ˜ v , ˜ v , ..., ˜ v d } The usefulness bound of the average score estimator (refer to Equa-tion 1) given by Laplace mechanism is analyzed in Theorem 4.3,proof of which is a simple application of Laplace random variables’variance formulation.Theorem 4.3.
The mean squared error of average score estimatorgiven by the Laplace mechanism in Algorithm 1 is:err
MSE = d · ( (cid:205) j ∈[ , d ] | w j − w d − j + |) nϵ . Consider the average score estimator ˜ θ given by the Laplace mech-anism, its risks under data amplification attack and view disguiseattack are presented in Theorem 4.4, proof of which is omitted asbeing almost trivial.Theorem 4.4. The risks of Laplace mechanism under adversarialattacks are: risk MM = + ∞ ; risk EM = (cid:213) j ∈[ , d ] ∆ ϵ exp ( −| w j | ϵ ∆ ) + | w j | ; risk DD = + ∞ . ubmission for Review, Journal or Conference These bounds show that the maximum possible risk of theLaplace mechanism is infinite and the expected risk grow linearlywith ϵ . Consequently, imposing a stringent level of privacy harmssoundness of the voting result. One possible solution to restrict theunlimited maximum possible risk is filtering out private views thatare extreme unlikely observed. For example, we may define theallowable output area (with threshold probability β ) as:˜ D p = { ˜ v | ˜ v ∈ R m , Pr [ ˜ v | v ] ≥ β for some v ∈ D v } . For Laplace mechanism, it is equivalent to: { ˜ v | ˜ v ∈ R d , | ˜ v − v | ≤ ∆ ( log ( / β ) + d log ( ∆ / ϵ )) ϵ for some v ∈ D v } . As a result, even if we can filter out outliers of private views, thediameter or volume of the allowable output area, which determinesmaximum possible risks, also grows with ϵ . Another common technique achieving ϵ -LDP for numerical vectorsis selecting one (data-dependent) option according to its correspond-ing value (e.g., for set-valued data [58], for probabilistic data [44]),and then sanitizing the selected option. This paradigm transformsthe numerical ϵ -LDP problem to well-studied categorical one.The naive sampling strategy for a scored vote v would be sam-pling the candidate j with a probability of | v j || v | as in [44, 58]. In-tuitively, the sampling probability should not be related to theabsolute magnitude of v j , since adding a constant value to the vec-tor should not change sampling probabilities. Hence we propose ageneral and flexible weighted sampling strategy with an interceptvalue c in Algorithm 2. The state-of-the-art binary randomizedresponse mechanism [20, 41] is used as the base randomizer forlater ϵ -LDP protection on the selected categorical candidate. Theweighed sampling masses m = { m , m , ..., m d } are the vector ofsampling probabilities for each rank, we assume m j ≥ . (cid:205) j ∈[ , d ] m j = . ϵ -LDP guarantee of the algorithm is presented in Theo-rem 5.1, unbiasedness of the private view ˜ v to the scored vote v isdescribed in Lemma 5.2 (see Appendix 10.1 for proof).Theorem 5.1. Algorithm 2 satisfies ϵ -LDP . Proof. Note that the probabilistic rank selection sub-process atline 1 to 7 in Algorithm 2 uses no information of π , hence consumesno privacy budget. The only step directly uses information of π isat line 10, which maps the rank j ∗ to an index π j ∗ of candidates,thus in order to prove the final private view satisfies ϵ -LDP, it’senough to show that for any π j ∗ , π ′ j ∗ ∈ [ , d ] , the correspondingprobabilities Pr [ ˜ v = t ] have up to exp ( ϵ ) discrepancy.Another observation is that ˜ v is derived from the vector ˜ B with-out directly using information about π , hence we only need toprove resulting vectors ˜ B is ϵ -LDP for any π j ∗ , π ′ j ∗ ∈ [ , d ] . Follow Algorithm 2
Weighted sampling mechanism
Input:
A vote π , privacy budget ϵ , score vector w of the votingrule and weighed sampling masses m . Output:
An unbiased private view ˜ v ∈ R d that satisfies ϵ -LDP. ▷ Select one rank r ← U ni f ormRandom ( . , . ) j ∗ ← while r ≥ . do j ∗ ← j ∗ + r ← r − m j ∗ end while ▷ Randomization by binary randomized response B ← { } d B π j ∗ ← for j ← to d do r ← U ni f ormRandom ( . , . ) if r < √ exp ( ϵ ) + then ˜ B j ← − B j end if end for ▷ Derive unbiased scores for j ← to d do ˜ v j ← ( √ exp ( ϵ ) + )· ˜ B j − √ exp ( ϵ )− · w j ∗ − cm j ∗ + c end for return ˜ v = { ˜ v , ˜ v , ..., ˜ v d } the proof sketch in [31], we have: Pr [ ˜ B = T | π j ∗ ] Pr [ ˜ B = T | π ′ j ∗ ] = (cid:113) exp ( ϵ · [ T π j ∗ = ] + ϵ · [ T π ′ j ∗ = ]) (cid:113) exp ( ϵ · [ T π j ∗ = ] + ϵ · [ T π ′ j ∗ = ])≤ (cid:112) exp ( ϵ · ) (cid:112) exp ( ϵ · ) ≤ exp ( ϵ ) . □ Lemma 5.2.
The private view ˜ v given by Algorithm 2 is an unbiasedestimation of the scored vote v . The accuracy/usefulness of the weighted sampling mechanism de-pends on the choice of its parameters, including the design of thesampling masses m and the intercept value c . Lemma 5.3 formulatesthe score estimator’s error bounds as a function of these parameters(see Appendix 10.2 for proof), the error can be decomposed intotwo parts, the first part n (cid:205) j ∈[ , d ] ( w j − c ) m j is the variance due toweighted sampling, the second part n d √ e ϵ (√ e ϵ − ) (cid:205) j ∈[ , d ] ( w j − c ) m j isthe variance due to binary randomized response. Further in Theo-rem 5.4, we establish achievable error bounds via choosing optimalsampling and interception parameters.Compared to error bounds of Laplace mechanism in Theorem 4.3,when privacy budget is low (e.g., when e ϵ ≈ ϵ + ( (cid:205) j ∈[ , d ] | w j − w d − j + |) to 2 ( (cid:205) j ∈[ , d ] | w j − w ⌈ d ⌉ |) . For the simplicity of notation, ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference (a) Borda rule l og ( e rr M S E ) (b) Nauru rule (c) plurality ruleprivacy level (cid:15) l og ( e rr M S E ) (d) anti-plurality ruleprivacy level (cid:15) Laplace Weighted Sampling Additive
Figure 2: Theoretical mean squared estimation error ofLaplace, weighted sampling and additive mechanism withBorda, Nauru, plurality and anti-plurality voting rules over candidates. we define: Ω w = (cid:213) j ∈[ , d ] | w j − w ⌈ d ⌉ | . Lemma 5.3.
The mean squared error of average score estimatorgiven by the Laplace mechanism in Algorithm 2 with sampling masses m and intercept value c is:err MSE = n ( + d √ e ϵ (√ e ϵ − ) ) (cid:213) j ∈[ , d ] ( w j − c ) m j . Theorem 5.4.
The mean squared error of average score estimatorgiven by the Laplace mechanism in Algorithm 2 is bounded as follows: min m ∈ ⊮ , c ∈ R err MSE ≤ n ( + d √ e ϵ (√ e ϵ − ) ) · ( (cid:213) j ∈[ , d ] | w j − w ⌈ d ⌉ |) , corresponding sampling masses m ∗ = [ | w − c ∗ | (cid:205) | m ∗− c ∗| , | w − c ∗ | (cid:205) | m ∗− c ∗| , ..., | w d − c ∗ | (cid:205) | m ∗− c ∗| ] , where the normalization fac-tor (cid:205) | m ∗ − c ∗ | is (cid:205) j ∈[ , d ] | w j − c ∗ | , and the intercept value c ∗ ismedian ( w ) or w ⌈ d ⌉ or w ⌊ d ⌋ + . Proof. The proof follows two steps, the first step (in Lemma 5.3)writes the mean squared error as a function of sampling masses m and the intercept value c , the second step finds optimal parametersby solving following equivalent problem:min m , c (cid:213) j ∈[ , d ] ( w j − c ) m j . s . t . m j ≥ . j ∈ [ , d ] (cid:213) j ∈[ , d ] m j = . When fixing the variable c , the sub-problemmin m (cid:205) j ∈[ , d ] ( w j − c ) m j has closed-form solution of m j = | w j − c | (cid:205) j ∈[ , d ] | w j − c )| . Consequently the optimizing problem becomes:min c ( (cid:213) j ∈[ , d ] | w j − c |) , which is minimized when c is the median value of the vector w or w ⌈ d ⌉ or w ⌊ d ⌋ + . Substitute the optimal parameters of m and c intothe formula in Lemma 5.3, we have:err MSE ≤ n ( + d √ e ϵ (√ e ϵ − ) ) · ( (cid:213) j ∈[ , d ] | w j − median ( w )|) . □ Consider a private view ˜ v from the weighted sampling mechanism,its risks to the voting result are presented in Lemma 10.1 (see Ap-pendix 10.3 for proof). Specifically, with parameters for optimalusefulness (see the former subsection), risk bounds of weighted sam-pling mechanism are presented in Theorem 5.5. Comparing to theLaplace mechanism, the maximum difference risk and the domaindiameter risk are shrunk from + ∞ to limited values that grow with √ exp ( ϵ ) √ exp ( ϵ )− . Numerical comparison on expected magnitude risks ofthe Laplace mechanism and weighted sampling mechanism forBorda voting are presented in Figure 3.Theorem 5.5. The manipulation risks of weighted sampling mech-anism with intercept value c = ⌈ d ⌉ and sampling masses m = { | w − c | Ω w , | w − c | Ω w , ..., | w d − c | Ω w } are: risk MM = dn max [| −√ e ϵ Ω w √ e ϵ − + c | , | √ e ϵ Ω w √ e ϵ − + c |] ; risk EM = (cid:213) j ∈[ , d ] m j [ w j > c ] (√ e ϵ + d − ) t ′ + + (√ e ϵ ( d − ) + ) f ′ + n · (√ e ϵ + ) + (cid:213) j ∈[ , d ] m j [ w j < c ] (√ e ϵ + d − ) t ′− + (√ e ϵ ( d − ) + ) f ′− n · (√ e ϵ + ) ; risk DD = √ e ϵ d √ e ϵ − Ω w . Where the t ′ + and t ′− denote | √ e ϵ √ e ϵ − Ω w + c | and | − √ e ϵ √ e ϵ − Ω w + c | respectively, the f ′ + and f ′− denote | − √ e ϵ − Ω w + c | and | √ e ϵ − Ω w + c | respectively. The usefulness/soundness performance of the Laplace mechanismand the weighted sampling mechanism largely depend on the ∆ w = (cid:205) j ∈[ , d ] | w j − w d − j + | and Ω w = (cid:205) j ∈[ , d ] | w j − w ⌈ d ⌉ | .When w ⌈ d ⌉ is relatively close to w d (e.g., in k -Approval voting andReciprocal voting), the difference between Ω w and ∆ w will be in-significant. As opposed to the sampling then randomized responseparadigm in the weighted sampling mechanism, here we proposean end-to-end approach: the additive mechanism. ubmission for Review, Journal or Conference Let C k = { S | S ⊆ C and | S | = k } denote the set of candidate subsetsthat have size of k , let w kmax = (cid:205) j ∈[ , k ] w j denote the maximumtotal weights of one subset, let w kmin = (cid:205) j ∈[[ d − k + , d ]] w j denotethe minimum total weights of one subset, the additive mechanismis presented in Definition 6.1.By its name, the additive mechanism randomly responses with asubset of candidates S with a probability linear to their total scores (cid:205) C j ′ ∈ S v j ′ . The mechanism is a novel mutant of the popular expo-nential mechanism for achieving differential privacy, which usuallyresponses with a probability proportional to the exponential ofcandidates’ scores. In additive mechanism, the probability is pro-portional to the additive summary of candidates’ scores, which isdesigned for deriving an unbiased estimation of average scores. Definition 6.1 (Additive Mechanism).
In a ϵ -LDP positional votingsystem, where the candidate is C and the scored vector is w , take ascored vote v as input, the additive mechanism randomly outputsan S ∈ C k according to following probability design:Pr[S|v] = (cid:205) C j ′ ∈ S v j ′ − w kmin w kmax − w kmin · exp ( ϵ ) − Φ + Φ , where the normalizer is Φ = (cid:0) dk (cid:1) kn ( e ϵ − ) (cid:205) j ∈[ , d ] w j − e ϵ w kmin + w kmax w kmax − w kmin . The estimator of the scored vote v is ( j ∈ [ , d ] ):˜ v j = a k · [ C j ∈ S ] − b k , where a k = [ (cid:205) j ′ ∈[ , d ] w j ′ ( e ϵ − )− dk e ϵ w kmin + dk w kmax ] d − ( d − k )( e ϵ − ) ,and b k = [ ( k − )( e ϵ − ) d − (cid:205) j ′ ∈[ , d ] w j ′ − e ϵ w kmin + w kmax ] d − ( d − k )( e ϵ − ) After giving formal ϵ -LDP guarantee and unbiasedness guaran-tee of the additive mechanism in Theorem 6.2 (see Appendix 10.5for proof) and Lemma 6.3 respectively, we turn to consider efficientimplementation of the additive mechanism, a naive sampling ap-proach would have (cid:0) dk (cid:1) computational costs. Actually, selecting asubset of size k from d options in additive mechanism is a strictercase of weighted reservoir sampling [28], where each option is ran-domly selected with given marginal weights (probabilities), but norestriction is put on joint probabilities of selected options. We herepresent a recursive implementation in Algorithm 3 (see Appendix10.6), which decomposes subsets in C k into d − k + k − O ( d · k ) .Theorem 6.2. The additive mechanism satisfies ϵ -LDP. Lemma 6.3.
The private view ˜ v given by additive mechanism isan unbiased estimation of the scored vote v . Proof. Consider the probability that an option C j shows in theoutput S when input scored vote is v : (cid:213) S ∈C k and C j ∈ S Pr [ S | v ] = (cid:18) d − k − (cid:19) ( v j − w kmin w kmax − w kmin · exp ( ϵ ) − Φ + Φ ) + (cid:213) j ′ ∈[ , d ] and j ′ (cid:44) j (cid:18) d − k − (cid:19) ( v j ′ − w kmin w kmax − w kmin · e ϵ − Φ ) = ( (cid:0) d − k − (cid:1) v j + (cid:0) d − k − (cid:1) [( (cid:205) j ′ ∈[ , d ] w j ′ ) − v j ] − w kmin w kmax − w kmin · e ϵ − Φ + (cid:0) d − k − (cid:1) Φ = (cid:0) d − k − (cid:1)(cid:0) dk (cid:1) ( d − k )( e ϵ − ) d − v j + ( k − )( e ϵ − ) d − (cid:205) j ′ ∈[ , d ] w j ′ − e ϵ w kmin + w kmaxkd (cid:205) j ′ ∈[ , d ] w j ′ ( e ϵ − ) − e ϵ w kmin + w kmax . Consequently, v j = a k · E [ C j ∈ S ] − b k , where a k = [ (cid:205) j ′ ∈[ , d ] w j ′ ( e ϵ − ) − dk e ϵ w kmin + dk w kmax ] d − ( d − k )( e ϵ − ) , and b k = [ ( k − )( e ϵ − ) d − (cid:205) j ′ ∈[ , d ] w j ′ − e ϵ w kmin + w kmax ] d − ( d − k )( e ϵ − ) . HenceE [ a k · [ C j ∈ S ] − b k ] is an unbiased estimation of v j . □ The estimation error bound of the additive mechanism is givenin Theorem 6.4. The formulation of the bound has a dependenceon the score vector of a voting rule. In Borda voting, we have (cid:205) j ′ ∈[ , d ] ˆ w j ′ = O ( d ) when ϵ = O ( ) , hence the mean squared errorbound is O ( d ϵ ) . As a comparison, the mean squared error boundsof the Laplace mechanism and weighted sampling mechanism are O ( d ϵ ) .For better illustration, we depict numerical bound of the additivemechanism for Borda rule in Figure 2, along with a comparisonwith the Laplace and weighed sampling mechanism. The numer-ical results show average 75% error reduction compared to theLaplace mechanism, and average 40% error reduction compared tothe weighted sampling mechanism.Theorem 6.4. The mean squared error E [| ˜ θ − θ | ] of average scoreestimator given by the additive mechanism is bounded as follows: ( (cid:205) j ′ ∈[ , d ] ˆ w j ′ ) − (cid:205) j ′ ∈[ , d ] ˆ w j ′ n ( e ϵ − ) , where ˆ w j = w j ( e ϵ − ) − e ϵ w d + w . Proof. The parameter k = w (e.g., of plurality vot-ing). Hence to characterize the usefulness performance of additivemechanism, we only need to analyze the case when k =
1. Giventhat a = (cid:205) j ′ ∈[ , d ] w j ′ − e ϵ de ϵ − w d + de ϵ − w , b = − e ϵ e ϵ − w d + e ϵ − w , and Pr [ C j ∈ S | v ] = v j ( e ϵ − )− e ϵ w d + w (cid:205) j ′ ∈[ , d ] w j ′ ( e ϵ − )− e ϵ d w d + d w . Thevariance of Bernoulli variable [ C j ∈ S ] is Pr [ C j ∈ S | v ]( − Pr [ C j ∈ S | v ]) , then the variance of ˜ v j = a [ C j ∈ S ] − b is: ( a ) Pr [ C j ∈ S | v ]( − Pr [ C j ∈ S | v ]) . ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference (a) Borda rule l og ( r i s k E M ) (b) Nauru rule (c) plurality ruleprivacy level (cid:15) l og ( r i s k E M ) (d) anti-plurality ruleprivacy level (cid:15) Laplace Weighted Sampling Additive
Figure 3: Theoretical expected magnitude risks of Laplace,weighted sampling and additive mechanism with Borda,Nauru, plurality and anti-plurality rules over candidates. Consequently, the total variance E [| ˜ v − v | ] is ( (cid:205) j ′∈[ , d ] ˆ w j ′ ) − (cid:205) j ′∈[ , d ] ˆ w j ′ ( e ϵ − ) . □ The adversarial risks of additive mechanism are presented in The-orem 6.5. When applying parameter k = a = O ( dϵ ) and b = O ( dϵ ) , hence the risk bounds of risk MM and risk EM are both O ( d ϵ ) , the risk bound of risk DD is O ( dϵ ) . As acomparison, in the weighted sampling mechanism, the risk boundsof risk MM and risk EM are O ( d ϵ ) , the risk bound of risk DD is O ( d ϵ ) .Figure 3 presents numerical results of these risks in additive mech-anism comparing with the Laplace and weighted sampling mech-anisms. In most cases, additive mechanism reduces 70% expectedmagnitude.Theorem 6.5. The manipulation risks of additive mechanism are:risk MM = k | a k − b k | + ( d − k )| b k | n ; risk EM = k | a k − b k | + ( d − k )| b k | n ; risk DD ≤ k | a k | . Proof. For any intermediate result of subset S ∈ C k , the corre-sponding private view ˜ v = { ˜ v , ˜ v , ..., ˜ v d } contains number of k of a k − b k and number d − k of − b k , hence both risk MM and risk EM are | a k − b k | + ( d − )| b k | n . Consider risk DD , for any paired intermediateresults S , S ′ ∈ C k , their corresponding private views differ in atmost 2 k positions, the magnitude of each difference is | a k | , hencethe maximum possible total differences are 2 k | a k | . □ The privacy budget of ϵ -LDP controls distinguishability in proba-bilistic outputs, thus limits mutual information between the privateview and the scored vote. Vote data, as numerical data with fixedand exclusive ordinal values, could be treated as numerical data(e.g., in Laplace mechanism) or categorical data (e.g., in weightedsampling mechanism), its mean squared error suffers a factor of θ ( ϵ ) when ϵ = O ( ) , which is the same as numerical data orcategorical data (e.g., lower bounds in [20]). Extra factor in meansquared error that is determined by the scored vector of a votingrule and the concrete design of an ϵ -LDP mechanism. As numericalerror bounds in Figure 2 demonstrated, for most voting rules, theerror of weighted sampling mechanism is about 50% of the Laplacemechanism’s, while the additive mechanism is about 25%. Since the probabilistic outputs are almost indistinguishable regard-less of the value of the manipulated/true vote, a more rigid level ofprivacy protection in ϵ -LDP has the advantage of limiting an adver-sary’s constructive power, hence the resulting scores are less led bypreferences of the manipulated vote. However, when an adversarycould falsely contribute an extra vote to the voting system, a morerigorous level of privacy means a larger magnitude of the privateview, thus the expected amount of scores an adversary added to theresulting scores is amplified. As shown in the soundness analysesof Laplace mechanism and our proposed mechanisms, the expectedmagnitude of private view is linear to ϵ , the deconstructive powerof every voter (or a possible adversary) gets larger due to the noisesinjecting for privacy preservation. A desirable property of the addi-tive mechanism is that the maximum possible magnitude of privateview equals the expected magnitude, meanwhile the Laplace andweighted sampling mechanisms don’t hold. In voting systems forbusiness decision making, a possible counter-measure to the dataamplification attack is broadening survey population, increasingnumber of voters decreases the relative magnitude of one possiblyadversarial vote.Consider the cases of view disguise attack that an adversary di-rectly sends a fraud private view, the ability/power of the adversaryis closely related to the domain of the private view, from which theadversary could choose a value to destroy or reform the final result.The private view’s domain of Laplace mechanism spreads to R d ,in the weighted sampling mechanism or the additive mechanism,the domain is reduced to [− Θ ( ϵ ) , Θ ( ϵ )] d . These results suggestthat a higher level of privacy preservation empowers higher abilitythat an adversary could manipulate the voting result. As counter-measures, imposing a lower level of privacy protection or bringingin more voters could strengthen the soundness of a voting system.Another approach is putting soundness metric into the design of ϵ -LDP mechanism, for example, risk bounds of the additive mech-anism are much better than the Laplace and weighted samplingmechanism’s.Another interesting aspect of soundness is the difference be-tween the centralized differential privacy model and the local dif-ferential privacy model. Centralized differential privacy (of the ubmission for Review, Journal or Conference unbounded differential privacy [45] notion) assumes all votes havebeen collected by a trustable database curator, and ensures probabil-ity distribution of the final scores of candidates be almost indistin-guishable, regardless of whether one single vote is in the database.Consequently, one fraudulent vote can’t significantly change thedifferentially private output in the probabilistic perspective, enforc-ing a more rigid level of unbounded differential privacy suppressesadversaries’ fraud power/income, and is helpful for truthfulnessin social welfare maximization [51]. There is also a notion of dif-ferential privacy defined by almost indistinguishable in outputswhen one single vote changes value, and is termed bounded differ-ential privacy [45]. The ϵ -unbounded differential privacy implies2 ϵ -bounded differential privacy, but not vice versa. Local differentialprivacy is equivalent to the bounded differential privacy definedon a single vote. The ϵ -LDP, unbounded and bounded differen-tial privacy all put limitations on an adversary’s data manipulationpower/income, but ϵ -LDP/bounded differential privacy suffers fromdata fraud.Inspired by the difference in soundness performance betweenunbounded differential privacy and bounded (or local) differentialprivacy, in order to improve soundness against fraudulent votes,the voting counter may force a probability of discard/opt-out prob-ability p for every vote, which is similar to data sampling for bud-get saving in unbounded differential privacy. Consequently, theexpected difference a fraudulent vote can make to the final totalscores is shrunk by a factor of p . However, this can’t change theexpected difference a fraudulent vote can make to final averagescores, such as soundness metrics of risk MM and risk EM , since theexpected number of available voters also shrinks by a factor of p . Recall that the weighted sampling mechanism and additive mech-anism improve usefulness and soundness metrics simultaneously,compared to the Laplace mechanism. Here we explore interactionsbetween these two performance metrics. Consider the soundnessmetric of expected magnitude risk EM and the usefulness metric ofmean squared error err MSE , we have:risk EM ≤ √ d · n · err MSE + (cid:205) j ∈[ , d ] | w j | n , which is derived as follows according to convexity of the squareroot:E [| ˜ v | ] n ≤ E [| ˜ v − v | ] + | v | n ≤ E [ (cid:113) d · | ˜ v − v | ] + (cid:205) j ∈[ , d ] | w j | n ≤ (cid:113) d · n · E [| ˜ θ − θ | ] + (cid:205) j ∈[ , d ] | w j | n . This inequality implies that a mechanism with good usefulnessperformance usually has good soundness performance too.In some scenarios that voting administrators pay much atten-tion to soundness performance, they may put hard constraints onrisk DD . A natural question arises, how do soundness constraintsaffect the usefulness of the voting system? Here we give a negativeresult that ϵ -LDP mechanism (having unbiased estimator) may noteven exist under these constraints. Based on Popoviciu’s inequality on variances, we have:risk DD ≥ (cid:114) err MSE n . Given that err
MSE goes to infinity as ϵ → n is fixed, constraints on risk DD become unsatisfied. We now evaluate the usefulness and soundness performance ofweighted sampling and additive mechanism, and compare themwith the Laplace mechanism [24] and the original sampling-basedapproach in [44, 58].
Datasets.
In order to thoroughly assess performance of mech-anisms in extensive settings, we use synthetic datasets. In eachvote aggregation simulation, each candidate C j is assigned with auniform random scale α j ∈ [ . , . ) . Each voter’s numerical pref-erence β ( i , j ) on candidate C j is an independent uniform randomvalue r ( i , j ) ∈ [ . , . ) multiplied by the scale α j ∈ [ . , . ) , thenthe voter’s ranking on candidates is determined by β ( i , j ) . In thesesimulations, the number of candidates d ranges from 4 to 32, thenumber of voters n ranges from 1000 to 1 000 000.Adversarial votes in the simulation of data amplification attackare uniform-randomly selected from the vote domain D v . Thenumber of adversarial votes n ′ ranges from n · .
1% to n · . C j (in thenon-adversarial voting result) benefits most. That is, we assume theadversary has the prior knowledge of 1st and 2nd ranked candidates,and the adversarial private view ˜ v has maximum ˜ v j − ˜ v j among thedomain of private view. Specifically for the Laplace mechanism thatthe private view’s domain is [−∞ , + ∞] d , we use 95% confidenceinterval of the Laplace distribution as filtered domain, and assign˜ v j = log ( − . ) ∆ + w , ˜ v j = − log ( − . ) ∆ + w d . The numberof adversarial private views n ′′ in simulations ranges from n · . n · . Table 3: Enumeration of experiment settings, the values inbold format are the default settings.Parameter Enumerated values voting rule
Borda , Naurunumber of candidates d , , , n , , n ′ n · . , n · . , n · . n ′′ n · . , n · . , n · . ϵ . , . , . , . , . , . , . , . , . ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference Evaluation Metrics.
We use usefulness metrics in Section 3.4to evaluate the performance of mechanisms in the non-adversarialand adversarial settings. Since the mean squared error and thesoundness metrics of mechanisms are theoretically analyzed andnumerically compared in former sections, hence their results areomitted.
Varying number of candidates.
Simulated with n = TVE by 25%, while theadditive mechanism averagely reduces err
TVE by 50%. The perfor-mance discrepancy between the weighted sampling and additivemechanism grows with the number of candidates, which confirmsour theoretical analyses of mean squared error bounds.
Varying number of voters.
Simulated with d = n = n =
100 000 voters aredemonstrated in Figures 9 and 11 (additional results can be found inAppendix 10.7). Comparing them with results on n = n = ϵ ≥ .
0. When the numberof voters is 100 000, all mechanisms achieve nearly 100% accuracyof winner even when the privacy budget is low (e.g., ϵ < . Simulated with d = n = TVE results under Borda rule with extra n ′ = , , ,
500 adversarial votes are presented in Figure 6 (andalso in Appendix 10.9). Results show that less than 1% adversarialvotes won’t have effective impacts on the voting result, but morethan 5% adversarial votes will significantly harm the usefulness ofthe result. The weighted sampling and the additive mechanismsoutperform the Laplace mechanism in all adversarial settings withfraudulent votes.
Simulated with d = n = TVE results under Borda rule with extra n ′′ = , , ,
500 adversarial private views are presented in Figure 7(and also in Appendix 10.10). Results show that less than 0 . Considering adversarial behaviors existing in real-world local pri-vate data aggregation systems, this work pays attention to boththe usefulness and soundness aspects of privacy preserving mech-anisms. Adversarial behaviors tailed for the local privacy settingare classified into data amplification attack and view disguise at-tack, which are then quantitatively measured by their manipulationpower over the aggregation result. In the context of vote aggrega-tion, two optimized mechanisms: weighted sampling mechanismand additive mechanism, are proposed to improve usefulness andsoundness upon the naive Laplace mechanism. Besides theoreticalanalyses showing a factor of d (or d ) reduction in estimation errorbounds and manipulation risk bounds for the Borda voting rule,their performance improvements are further validated by extensiveexperiments in both non-adversarial and adversarial scenarios. Thiswork also discusses subtle relations among usefulness, soundnessand indistinguishability, and calls for further researches solvingdilemmas/conflicts between these fundamental requirements ofpractical local private data aggregation systems. ubmission for Review, Journal or Conference privacy level (cid:15) (a) d=4 − . . . . . l og ( e rr TV E ) privacy level (cid:15) (b) d=8 − . . . . . privacy level (cid:15) (c) d=16 − . . . . . privacy level (cid:15) (d) d=32 − . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 4: Total variation error under Borda rule over , , , candidates with voters. privacy level (cid:15) (a) d=4 . . . . . . a cc u r a c y A O W privacy level (cid:15) (b) d=8 . . . . . . privacy level (cid:15) (c) d=16 . . . . . . privacy level (cid:15) (d) d=32 . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 5: Accuracy of winner under Borda rule over , , , candidates with voters. privacy level (cid:15) (a) n’=0 l og ( e rr TV E ) privacy level (cid:15) (b) n’=10 privacy level (cid:15) (c) n’=100 privacy level (cid:15) (d) n’=500 Laplace Naive Sampling Weighted Sampling Additive
Figure 6: Total variation error under Borda rule with honest voters and n ′ = , , , adversarial votes. privacy level (cid:15) (a) n”=0 l og ( e rr TV E ) privacy level (cid:15) (b) n”=10 privacy level (cid:15) (c) n”=100 privacy level (cid:15) (d) n”=500 Laplace Naive Sampling Weighted Sampling Additive
Figure 7: Total variation error under Borda rule with honest voters and n ′ = , , , adversarial private views. ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference (a) Borda ruleprivacy level (cid:15) l og ( e rr TV E ) (b) Nauru ruleprivacy level (cid:15) − Laplace Naive Sampling Weighted Sampling Additive
Figure 8: Total variation error under Borda and Nauru rulesover candidates with voters. (a) Borda ruleprivacy level (cid:15) l og ( e rr TV E ) (b) Nauru ruleprivacy level (cid:15) − − Laplace Naive Sampling Weighted Sampling Additive
Figure 9: Total variation error under Borda and Nauru rulesover candidates with voters. (a) Borda ruleprivacy level (cid:15) . . . . . . a cc u r a c y A O W (b) Nauru ruleprivacy level (cid:15) . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 10: Accuracy of winner under Borda and Nauru rulesover candidates with voters. (a) Borda ruleprivacy level (cid:15) . . . . . . a cc u r a c y A O W (b) Nauru ruleprivacy level (cid:15) . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 11: Accuracy of winner under Borda and Nauru rulesover candidates with voters. ubmission for Review, Journal or Conference REFERENCES [1] Masayuki Abe. 1998. Universally verifiable mix-net with verification work inde-pendent of the number of mix-servers. In
International Conference on the Theoryand Applications of Cryptographic Techniques . Springer, 437–447.[2] John J Bartholdi, Craig A Tovey, and Michael A Trick. 1989. The computationaldifficulty of manipulating an election.
Social Choice and Welfare
6, 3 (1989),227–241.[3] John J Bartholdi III, Craig A Tovey, and Michael A Trick. 1992. How hard is it tocontrol an election?
Mathematical and Computer Modelling
16, 8-9 (1992), 27–40.[4] Raef Bassily and Adam Smith. 2015. Local, private, efficient protocols for succincthistograms. In
Proceedings of the forty-seventh annual ACM symposium on Theoryof computing . ACM, 127–135.[5] Josh C Benaloh and Moti Yung. 1986. Distributing the power of a government toenhance the privacy of voters. In
PODC , Vol. 86. 52–62.[6] Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta. 2010.Discovering frequent patterns in sensitive data. In
Proceedings of the 16th ACMSIGKDD international conference on Knowledge discovery and data mining . ACM,503–512.[7] Duncan Black, Robert Albert Newing, Iain McLean, Alistair McMillan, and Burt LMonroe. 1958. The theory of committees and elections. (1958).[8] Philippe Bulens, Damien Giry, Olivier Pereira, et al. 2011. Running Mixnet-BasedElections with Helios.
EVT/WOTE
11 (2011).[9] Mark Bun, Jelani Nelson, and Uri Stemmer. 2018. Heavy hitters and the struc-ture of local privacy. In
Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAISymposium on Principles of Database Systems . ACM, 435–447.[10] Konstantinos Chatzikokolakis, Catuscia Palamidessi, and Prakash Panangaden.2008. Anonymity protocols as noisy channels.
Information and Computation
Proceedings of the twentieth annual ACM symposiumon Theory of computing . ACM, 11–19.[12] David L Chaum. 1981. Untraceable electronic mail, return addresses, and digitalpseudonyms.
Commun. ACM
24, 2 (1981), 84–90.[13] Rui Chen, Noman Mohammed, Benjamin CM Fung, Bipin C Desai, and Li Xiong.2011. Publishing set-valued data via differential privacy.
Proceedings of the VLDBEndowment
4, 11 (2011), 1087–1098.[14] Josh D Cohen and Michael J Fischer. 1985.
A robust and verifiable cryptographicallysecure election scheme . Yale University. Department of Computer Science.[15] Vincent Conitzer, Tuomas Sandholm, and Jérôme Lang. 2007. When are electionswith few candidates hard to manipulate?
Journal of the ACM (JACM)
54, 3 (2007),14.[16] Graham Cormode, Tejas Kulkarni, and Divesh Srivastava. 2018. Marginal releaseunder local differential privacy. In
Proceedings of the 2018 International Conferenceon Management of Data . ACM, 131–146.[17] Ronald Cramer, Matthew Franklin, Berry Schoenmakers, and Moti Yung. 1996.Multi-authority secret-ballot elections with linear work. In
International Con-ference on the Theory and Applications of Cryptographic Techniques . Springer,72–83.[18] Ronald Cramer, Rosario Gennaro, and Berry Schoenmakers. 1997. A secure andoptimally efficient multi-authority election scheme.
European transactions onTelecommunications
8, 5 (1997), 481–490.[19] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetrydata privately. In
Advances in Neural Information Processing Systems . 3571–3580.[20] John C Duchi, Michael I Jordan, and Martin J Wainwright. 2013. Local privacyand statistical minimax rates. In . IEEE, 429–438.[21] Michael Dummett and Robin Farquharson. 1961. Stability in voting.
Econometrica:Journal of The Econometric Society (1961), 33–43.[22] Cynthia Dwork. 2011. Differential privacy.
Encyclopedia of Cryptography andSecurity (2011), 338–340.[23] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, andMoni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In
Annual International Conference on the Theory and Applications of CryptographicTechniques . Springer, 486–503.[24] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali-brating noise to sensitivity in private data analysis. In
Theory of cryptographyconference . Springer, 265–284.[25] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ-ential privacy.
Foundations and Trends® in Theoretical Computer Science
9, 3–4(2014), 211–407.[26] Cynthia Dwork, Weijie Su, and Li Zhang. 2015. Private false discovery ratecontrol. arXiv preprint arXiv:1511.03803 (2015).[27] Cynthia Dwork, Weijie J Su, and Li Zhang. 2018. Differentially Private FalseDiscovery Rate Control. arXiv preprint arXiv:1807.04209 (2018).[28] Pavlos S Efraimidis and Paul G Spirakis. 2006. Weighted random sampling witha reservoir.
Inform. Process. Lett.
97, 5 (2006), 181–185. [29] Eithan Ephrati and Jeffrey S Rosenschein. 1991. The Clarke Tax as a ConsensusMechanism Among Automated Agents.. In
AAAI , Vol. 91. 173–178.[30] Eithan Ephrati, Jeffrey S Rosenschein, et al. 1993. Multi-agent planning as adynamic search for social consensus. In
IJCAI , Vol. 93. 423–429.[31] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Ran-domized aggregatable privacy-preserving ordinal response. In
Proceedings of the2014 ACM SIGSAC conference on computer and communications security . ACM,1054–1067.[32] Piotr Faliszewski, Edith Hemaspaandra, and Lane A Hemaspaandra. 2009. Howhard is bribery in elections?
Journal of Artificial Intelligence Research
35 (2009),485–532.[33] Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. 2016. Building a rappor withthe unknown: Privacy-preserving learning of associations and data dictionaries.
Proceedings on Privacy Enhancing Technologies
International Workshop on the Theoryand Application of Cryptographic Techniques . Springer, 244–251.[35] Allan Gibbard et al. 1977. Manipulation of schemes that mix voting with chance.
Econometrica
45, 3 (1977), 665–681.[36] Moritz Hardt and Jonathan Ullman. 2014. Preventing false discovery in interactivedata analysis is hard. In . IEEE, 454–463.[37] Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. 2010. Boosting theaccuracy of differentially private histograms through consistency.
Proceedings ofthe VLDB Endowment
3, 1-2 (2010), 1021–1032.[38] Martin Hirt. 2010. Receipt-free K-out-of-L voting based on ElGamal encryption.In
Towards Trustworthy Elections . Springer, 64–82.[39] Martin Hirt and Kazue Sako. 2000. Efficient receipt-free voting based on homo-morphic encryption. In
International Conference on the Theory and Applicationsof Cryptographic Techniques . Springer, 539–556.[40] Prateek Jain, Vivek Kulkarni, Abhradeep Thakurta, and Oliver Williams. 2015. Todrop or not to drop: Robustness, consistency and differential privacy propertiesof dropout. arXiv preprint arXiv:1503.02031 (2015).[41] Peter Kairouz, Keith Bonawitz, and Daniel Ramage. 2016. Discrete DistributionEstimation under Local Privacy. In
International Conference on Machine Learning .2436–2444.[42] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2014. Extremal mechanismsfor local differential privacy. In
Advances in neural information processing systems .2879–2887.[43] Hillol Kargupta, Souptik Datta, Qi Wang, and Krishnamoorthy Sivakumar. 2003.On the Privacy Preserving Properties of Random Data Perturbation Techniques..In
ICDM , Vol. 3. Citeseer, 99–106.[44] Yusuke Kawamoto and Takao Murakami. 2018. Differentially Private ObfuscationMechanisms for Hiding Probability Distributions. arXiv preprint arXiv:1812.00939 (2018).[45] Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy.In
Proceedings of the 2011 ACM SIGMOD International Conference on Managementof data . ACM, 193–204.[46] Byoungcheon Lee, Colin Boyd, Ed Dawson, Kwangjo Kim, Jeongmo Yang, andSeungjae Yoo. 2003. Providing receipt-freeness in mixnet-based voting protocols.In
International Conference on Information Security and Cryptology . Springer,245–258.[47] Chao Li, Michael Hay, Vibhor Rastogi, Gerome Miklau, and Andrew McGregor.2010. Optimizing linear counting queries under differential privacy. In
Proceedingsof the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles ofdatabase systems . ACM, 123–134.[48] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness:Privacy beyond k-anonymity and l-diversity. In . IEEE, 106–115.[49] Ninghui Li, Wahbeh Qardaji, Dong Su, and Jianneng Cao. 2012. Privbasis: Fre-quent itemset mining with differential privacy.
Proceedings of the VLDB Endow-ment
5, 11 (2012), 1340–1351.[50] Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrish-nan Venkitasubramaniam. 2006. l-diversity: Privacy beyond k-anonymity. In . IEEE, 24–24.[51] Frank McSherry and Kunal Talwar. 2007. Mechanism Design via DifferentialPrivacy.. In
FOCS , Vol. 7. 94–103.[52] Hervé Moulin. 1980. On strategy-proofness and single peakedness.
Public Choice
35, 4 (1980), 437–455.[53] Yi Mu and Vijay Varadharajan. 1998. Anonymous secure e-voting over a network.In
Proceedings 14th Annual Computer Security Applications Conference (Cat. No.98EX217) . IEEE, 293–299.[54] Thông T Nguyên, Xiaokui Xiao, Yin Yang, Siu Cheung Hui, Hyejin Shin, andJunbum Shin. 2016. Collecting and analyzing data from smart device users withlocal differential privacy. arXiv preprint arXiv:1606.05053 (2016).[55] Choonsik Park, Kazutomo Itoh, and Kaoru Kurosawa. 1993. Efficient anony-mous channel and all/nothing election scheme. In
Workshop on the Theory and ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference
Application of of Cryptographic Techniques . Springer, 248–259.[56] Kun Peng, Riza Aditya, Colin Boyd, Ed Dawson, and Byoungcheon Lee. 2004.Multiplicative homomorphic e-voting. In
International Conference on Cryptologyin India . Springer, 61–72.[57] Ariel D Procaccia and Jeffrey S Rosenschein. 2007. Junta distributions andthe average-case complexity of manipulating elections.
Journal of ArtificialIntelligence Research
28 (2007), 157–181.[58] Zhan Qin, Yin Yang, Ting Yu, Issa Khalil, Xiaokui Xiao, and Kui Ren. 2016.Heavy hitter estimation over set-valued data with local differential privacy. In
Proceedings of the 2016 ACM SIGSAC Conference on Computer and CommunicationsSecurity . ACM, 192–203.[59] Benjamin Reilly. 2002. Social choice in the south seas: Electoral innovation andthe borda count in the pacific island countries.
International Political ScienceReview
23, 4 (2002), 355–372.[60] Peter YA Ryan. 2008. Prêt à Voter with Paillier encryption.
Mathematical andComputer Modelling
48, 9-10 (2008), 1646–1662.[61] Thomas Steinke and Jonathan Ullman. 2017. Tight lower bounds for differentiallyprivate selection. In . IEEE, 552–563.[62] Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy.
Inter-national Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
10, 05(2002), 557–570.[63] Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and XiaofengWang. 2017. Privacy loss in Apple’s implementation of differential privacy onMacOS 10.12. arXiv preprint arXiv:1709.02753 (2017).[64] Abhradeep Guha Thakurta, Andrew H Vyrros, Umesh S Vaishampayan, GauravKapoor, Julien Freudiger, Vivek Rangarajan Sridhar, and Doug Davidson. 2019.Learning new words. US Patent App. 16/159,473.[65] Abhradeep Guha Thakurta, Andrew H Vyrros, Umesh S Vaishampayan, GauravKapoor, Julien Freudinger, Vipul Ved Prakash, Arnaud Legendre, and StevenDuplinsky. 2017. Emoji frequency detection and deep link frequency. US Patent9,705,908.[66] Jonathan Ullman. 2018. Tight lower bounds for locally differentially privateselection. arXiv preprint arXiv:1802.02638 (2018).[67] Paul Voigt and Axel Von dem Bussche. 2017. The EU General Data ProtectionRegulation (GDPR).
A Practical Guide, 1st Ed., Cham: Springer InternationalPublishing (2017).[68] Ning Wang, Xiaokui Xiao, Yin Yang, Jun Zhao, Siu Cheung Hui, Hyejin Shin,Junbum Shin, and Ge Yu. 2019. Collecting and Analyzing Multidimensional Datawith Local Differential Privacy. In
Proceedings of IEEE ICDE .[69] Shaowei Wang, Liusheng Huang, Yiwen Nie, Pengzhan Wang, Hongli Xu, andWei Yang. 2018. PrivSet: Set-Valued Data Analyses with Locale DifferentialPrivacy. In
IEEE INFOCOM 2018-IEEE Conference on Computer Communications .IEEE, 1088–1096.[70] Stanley L Warner. 1965. Randomized response: A survey technique for eliminatingevasive answer bias.
J. Amer. Statist. Assoc.
60, 309 (1965), 63–69.[71] Zhe Xia, Steve A Schneider, James Heather, and Jacques Traoré. 2008. Analysis,Improvement, and Simplification of Prêt à Voter with Paillier Encryption.. In
EVT’08 Proceedings of the Conference on Electronic Voting Technology .[72] Jia Xu, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, Ge Yu, and Marianne Winslett.2013. Differentially private histogram publication.
The VLDB JournalâĂŤTheInternational Journal on Very Large Data Bases
22, 6 (2013), 797–822.[73] Zhikun Zhang, Tianhao Wang, Ninghui Li, Shibo He, and Jiming Chen. 2018.Calm: Consistent adaptive local marginal for marginal release under local differ-ential privacy. In
Proceedings of the 2018 ACM SIGSAC Conference on Computerand Communications Security . ACM, 212–229.[74] Quanyu Zhao and Yining Liu. 2016. E-Voting scheme using secret sharing andk-anonymity. In
International Conference on Broadband and Wireless Computing,Communication and Applications . Springer, 893–900. ubmission for Review, Journal or Conference
10 APPENDICES10.1 Proof of Lemma 5.2
Proof. We first prove that the numerical vector ( √ exp ( ϵ ) + )· ˜ B − √ exp ( ϵ )− is an unbiased estimation of the binary vector B . Considering anelement ˜ B j in ˜ B , we have: E [ ( (cid:112) exp ( ϵ ) + ) · ˜ B j − (cid:112) exp ( ϵ ) − ] = (cid:112) exp ( ϵ ) (cid:112) exp ( ϵ ) + · ( (cid:112) exp ( ϵ ) + ) · B j − (cid:112) exp ( ϵ ) − ++ (cid:112) exp ( ϵ ) + · −( (cid:112) exp ( ϵ ) + ) · B j + (cid:112) exp ( ϵ ) (cid:112) exp ( ϵ ) − = ( (cid:112) exp ( ϵ ) − ) · ( (cid:112) exp ( ϵ ) + ) · B j ( (cid:112) exp ( ϵ ) + ) · ( (cid:112) exp ( ϵ ) − ) = B j . (2)Next, we want to prove that the vector B · w j ∗ − c m j + c is an unbiasedestimation of the scored vote v j defined by π as v π j = w j . Denote avector B having only the j ∗ -th bit being 1 as B [ j ∗ ] , by the definitionof the vector B at line 9 and 10, we have: (cid:213) j ∗ ∈[ , d ] w j ∗ · B [ π j ∗ ] = (cid:213) j ∗ ∈[ , d ] v π j ∗ · B [ π j ∗ ] = v . Consequently, we have: E [ B · w j ∗ − c m j ∗ + c ] = (cid:213) j ∗ = [ , d ] m j ∗ · ( B [ π j ∗ ] · w j ∗ − c m j ∗ + c ) = ( (cid:213) j ∗ ∈[ , d ] w j ∗ · B [ π j ∗ ] ) − ( (cid:213) j ∗ ∈[ , d ] B [ π j ∗ ] · c ) + c = v − c + c = v . (3)Combining unbiasedness results of the randomized response sub-procedures in Equation 2 and the weighted sampling sub-procedurein Equation 2, we conclude that the ˜ v j at line 19 is an unbiasedestimation of v j . □ Proof. According to the score estimator’s definition in Equation1 and the independence of each ˜ v and the unbiasedness of ˜ v inLemma 3, we have: E [| ˜ θ − θ | ] = E [| n (cid:213) i ∈[ , n ] ˜ v ( i ) − n (cid:213) i ∈[ , n ] v ( i ) | ] = n (cid:213) i ∈[ , n ] E [| ˜ v ( i ) − v ( i ) | ] = n (cid:213) j ∈[ , d ] E [( ˜ v π j ) ] − ( v π j ) . (4)Consider the score estimator ˜ v π j of the j -th rank candidate C π j in a vote π , according to sampling strategy, the binary value B π j hasprobability m j of being 1 (which happens when j ∗ == j ) and hasprobability 1 − m j of being 0. Further according to the rule of binaryrandomized response on B π j , the randomized bit ˆ B π j has probability √ exp ( ϵ ) √ exp ( ϵ ) + of being 1 when j ∗ == j and has probability √ exp ( ϵ ) + of being 1 when j ∗ (cid:44) j . Separately considering the random rank j ∗ ,we have E [( ˜ v π j ) ] = (cid:213) j ∗ ∈[ , d ] m j ∗ ( [ j ∗ = j ]√ e ϵ √ e ϵ + + [ j ∗ (cid:44) j ]√ e ϵ + )( √ e ϵ √ e ϵ − · w j ∗ − c m j ∗ + c ) + (cid:213) j ∗ ∈[ , d ] m j ∗ ( [ j ∗ = j ]√ e ϵ + + [ j ∗ (cid:44) j ]√ e ϵ √ e ϵ + )( − √ e ϵ − · w j ∗ − c m j ∗ + c ) , summarizing over j ∈ [ , d ] , we then have: (cid:213) j ∈[ , d ] E [( ˜ v π j ) ] = (cid:213) j ∗ ∈[ , d ] m j ∗ √ e ϵ + d − √ e ϵ + ( √ e ϵ √ e ϵ − · w j ∗ − c m j ∗ + c ) + (cid:213) j ∗ ∈[ , d ] m j ∗ + ( d − )√ e ϵ √ e ϵ + ( − √ e ϵ − · w j ∗ − c m j ∗ + c ) = e ϵ + d · √ e ϵ − · √ e ϵ + (√ e ϵ − ) (cid:213) j ∗ ∈[ , d ] ( w j ∗ − c ) m j ∗ + (cid:213) j ∗ ∈[ , d ] w j ∗ . (5)Combining results of Equation 5 and Equation 4, we have: E [| ˜ θ − θ | ] = n ( + d √ e ϵ (√ e ϵ − ) ) (cid:213) j ∈[ , d ] ( w j − c ) m j . (6) □ Lemma 10.1.
The manipulation risks of weighted sampling mech-anism with sampling masses m and intercept constant c are: risk MM = dn max j ∈[ , d ] max [| − √ e ϵ − w j − cm j + c | , | √ e ϵ √ e ϵ − w j − cm j + c |] ; risk EM = (cid:205) j ∈[ , d ] m j [(√ e ϵ + d − ) t j + (√ e ϵ ( d − ) + ) f j ] n · (√ e ϵ + ) ; risk DD = d ( max j ∗ ∈[ , d ] , ˆ B j ∈[ , ] д j ∗ , ˆ B j − min j ∗ ∈[ , d ] , ˆ B j ∈[ , ] д j ∗ , ˆ B j ) . Where t j denotes | √ e ϵ √ e ϵ − · w j − cm j + c | , f j denotes | − √ e ϵ − · w j − cm j + c | and д j ∗ , ˆ B j denotes [ ˆ B j = ]√ e ϵ −[ ˆ B j = ]√ e ϵ − · w j ∗ − cm j ∗ . Proof. The proof contains three parts, each part deals with oneof the risks in the theorem.
Part MM : Recall that the maximum is taken over allpossible j ∗ ∈ [ , d ] and ˆ B ∈ [ , ] d . For a given rank j ∗ , apparentlythe maximum is achieved when ˆ B is either [ ] d or [ ] d , hence wehave the result. Additionally, when the intercept value c is no lessthan 0, the results can be trimmed to dn max j ∈[ , d ] | √ e ϵ √ e ϵ − w j − cm j + c | . Part EM : Consider the conditional expection E [ | ˜ v | n | j ∗] given rank j ∗ , by the randomization subprocedure of binary ran-domized response, the randomized vector ˆ B = [ , ] d expectedly ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference has √ e ϵ + d − √ e ϵ + ones and √ e ϵ ( d − ) + √ e ϵ + zeros. When ˆ B j is 1, one ele-ment | ˜ v j | is | √ e ϵ √ e ϵ − · w j ∗ − cm j ∗ + c | ; when ˆ B j is 0, one element | ˜ v j | is | − √ e ϵ − · w j − cm j + c | . Consequently we have the result. Part DD : Consider one element j , we have max | ˜ v j − ˜ v ′ j | as follows: max | [ ˆ B j = ]√ e ϵ − [ ˆ B j = ]√ e ϵ − · w j ∗ − cm j ∗ − [ ˆ B ′ j = ]√ e ϵ − [ ˆ B ′ j = ]√ e ϵ − · w j + − cm j + | , for any j ∗ , j + ∈ [ , d ] , ˆ B j , ˆ B ′ j ∈ [ , ] . Due to symmetry, the for-mer formula is equivalent to: max j ∗ ∈[ , d ] , ˆ B j ∈[ , ] [ ˆ B j = ]√ e ϵ −[ ˆ B j = ]√ e ϵ − · w j ∗ − cm j ∗ − min j ∗ ∈[ , d ] , ˆ B j ∈[ , ] [ ˆ B j = ]√ e ϵ −[ ˆ B j = ]√ e ϵ − · w j ∗ − cm j ∗ , hence we havethe result. □ Proof. The proof contains three parts, each part deals with oneof the risks in the theorem.
Part MM : With the given sampling masses m and inter-ception value c , we can derive that w j − cm j is either Ω w , − Ω w or 0,then we have the risk MM as follows: dn max [| −√ e ϵ Ω w √ e ϵ − + c | , | √ e ϵ Ω w √ e ϵ − + c | , | Ω w √ e ϵ − + c | , | − Ω w √ e ϵ − + c |] . Since √ e ϵ >
1, hence max [| −√ e ϵ Ω w √ e ϵ − + c | , | √ e ϵ Ω w √ e ϵ − + c |] is no lessthan max [| Ω w √ e ϵ − + c | , | − Ω w √ e ϵ − + c |] , consequently we have the finalresults. Part EM : When w j > c , the | √ e ϵ √ e ϵ − · w j − cm j + c | equals to t ′ + , and the | − √ e ϵ − · w j − cm j + c | equals to the f ′ + ; When w j < c , the | √ e ϵ √ e ϵ − · w j − cm j + c | equals to t ′− , and the | − √ e ϵ − · w j − cm j + c | equalsto f ′− . Consequently we have the final results. Part DD : Recall that д j ∗ , ˆ B j = [ ˆ B j = ]√ e ϵ −[ ˆ B j = ]√ e ϵ − · w j ∗ − cm j ∗ ,where w j − cm j is either Ω w , − Ω w or 0. When Ω w >
0, the value of w j − cm j enumerates [ Ω w , − Ω w ] , further because √ e ϵ >
1, we have:max j ∗ ∈[ , d ] , ˆ B j ∈[ , ] д j ∗ , ˆ B j = √ e ϵ √ e ϵ − Ω w ;min j ∗ ∈[ , d ] , ˆ B j ∈[ , ] д j ∗ , ˆ B j = − √ e ϵ √ e ϵ − Ω w . Consequently we have the final results. □ Proof. Since ˜ v is mapped from S , to prove the private view ˜ v satisfies ϵ -LDP, it’s enough to show that the intermediate view S satisfies ϵ -LDP.Firstly we need to prove Pr [ S | v ] is a valid probability distribu-tion, that is Pr [ S | v ] ≥ . (cid:205) S ∈C k Pr [ S | v ] = . v D v . Since (cid:205) C j ′ ∈ S v j ′ ≥ w kmin , we have Φ > [ S | v ] ≥ .
0. Now consider (cid:205) S ∈C k Pr [ S | v ] , we have: (cid:0) dk (cid:1) Φ + (cid:213) S ∈C k (cid:213) C j ′ ∈ S v j ′ − w kmin w kmax − w kmin · exp ( ϵ ) − Φ = (cid:0) dk (cid:1) Φ + (cid:18) d − k − (cid:19) (cid:213) C j ′ ∈C v j ′ − w kmin w kmax − w kmin · exp ( ϵ ) − Φ = (cid:0) dk (cid:1) Φ + (cid:18) d − k − (cid:19) (cid:205) j ∈[ , d ] w j − w kmin w kmax − w kmin · exp ( ϵ ) − Φ = Φ (cid:18) dk (cid:19) · ( kd (cid:205) j ∈[ , d ] w j − w kmin w kmax − w kmin · ( e ϵ − ) + ) = . Secondly for any paired inputs v , v ′ ∈ D v and any output values S ∈ C k , we have:Pr [ S | v ] Pr [ S | v ′ ] ≤ max S ∈C k Pr [ S | v ] min S ∈C k Pr [ S | v ′ ]≤ ( w kmax − w kmin w kmax − w kmin · e ϵ − Φ + Φ )/( w kmin − w kmin w kmax − w kmin · e ϵ − Φ + Φ )≤ exp ( ϵ ) . □ The algorithm implementation 3 for additive mechanism (in Def-inition 6.1) contains a core procedure: additive _ select , which re-cursively select a top ranking position j ∗ from remaining positions [ , d ] (see Algorithm 4). The relative weights z j in the algorithmare proportional to the probability that the candidate C j will showin the private view S . Algorithm 3
Additive mechanism
Input:
A vote π , privacy budget ϵ ,voting rule’s score vector w andparameter k . Output:
An unbiased private view ˜ v ∈ R d that satisfies ϵ -LDP. ▷ Select k ranking positions for j ∈ [ , d ] do ▷ Compute weights of presence for a ranking position z j ← w j − w kmin / kw kmax − w kmin · ( e ϵ − ) + k end for T ← additive _ select ( d , k , z ) ▷ Deriving unbiased estimator S ← { π j | j ∈ T } for j ∈ [ , d ] do ˜ v j ← [ C j ∈ S ] · a k − b k end for return ˜ v = { ˜ v , ˜ v , ..., ˜ v d } ubmission for Review, Journal or Conference Algorithm 4 additive_select ( d , k , w ) Input:
The number of positions d , parameter k , and positions’weights z . Output: k ranking positions T ⊆ [ , d ] . ▷ Compute probabilities of p j = Pr [ min ( T ) = j ] for j ∈ [ , d − k + ] do p j ← (cid:0) d − jk − (cid:1) · ( z j + ( (cid:205) j ′ ∈[ j + , d ] z j ′ − z j ) k − d − j ) end for ▷ Select a minimum rank j ∗ j ∗ ← while r ≥ . do j ∗ ← j ∗ + r ← r − p j ∗ (cid:205) j ∈[ , d ] p j ∗ end while for j ∈ [ j ∗ + , d ] do z ′ j − j ∗ = w j + z j ∗ k − end for ▷ Recursively select k − T ′ = additive _ select ( d − j ∗ , k − , z ′ ) return T = { j ∗ } ∪ { j + j ∗ | j ∈ T ′ } Results of maximum absolute error and loss of winner error underthe Borda rule with n = n = . . . . . . (a) Borda ruleprivacy level (cid:15) l og ( e rr M A E ) . . . . . . (b) Nauru ruleprivacy level (cid:15) − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 12: Maximum absolute error under Borda and Naururules over candidates with voters. (a) Borda ruleprivacy level (cid:15) − − − l og ( e rr L O W ) (b) Nauru ruleprivacy level (cid:15) − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 13: Loss of winner error under Borda and Nauru rulesover candidates with voters. (a) Borda ruleprivacy level (cid:15) − l og ( e rr M A E ) (b) Nauru ruleprivacy level (cid:15) − − Laplace Naive Sampling Weighted Sampling Additive
Figure 14: Maximum absolute error under Borda and Naururules over candidates with voters. (a) Borda ruleprivacy level (cid:15) − − − − l og ( e rr L O W ) (b) Nauru ruleprivacy level (cid:15) − . − . − . − . − . Laplace Naive Sampling Weighted Sampling Additive
Figure 15: Loss of winner error under Borda and Nauru rulesover candidates with voters. Results of maximum absolute error and loss of winner error underthe Borda rule with varying number of candidates are demonstratedin Figures 16 and 17 respectively. The total variation error, maxi-mum absolute error, accuracy of winner and loss of winner errorresults under the Nauru rule with varying number of candidatesare demonstrated in Figures 18, 19, 20 and 21 respectively.
Results of maximum absolute error, accuracy of winner and loss ofwinner error under the Nauru rule with varying number of adver-sarial votes are demonstrated in Figures 22, 23 and 24 respectively.The total variation error, maximum absolute error, accuracy of win-ner and loss of winner error results under the Nauru rule withvarying number of adversarial votes are demonstrated in Figures25, 26, 27 and 28 respectively.
Results of maximum absolute error, accuracy of winner and lossof winner error under the Nauru rule with varying number ofadversarial private views are demonstrated in Figures 29,30 and31 respectively. The total variation error, maximum absolute error,accuracy of winner and loss of winner error results under the Naururule with varying number of adversarial votes are demonstrated inFigures 32, 33, 34 and 35 respectively. ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference privacy level (cid:15) (a) d=4 − l og ( e rr M A E ) privacy level (cid:15) (b) d=8 − privacy level (cid:15) (c) d=16 − privacy level (cid:15) (d) d=32 − Laplace Naive Sampling Weighted Sampling Additive
Figure 16: Maximum absolute error under Borda rule over , , , candidates with voters. privacy level (cid:15) (a) d=4 − . − . − . . . l og ( e rr L O W ) privacy level (cid:15) (b) d=8 − . − . − . . . privacy level (cid:15) (c) d=16 − . − . − . . . privacy level (cid:15) (d) d=32 − . − . − . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 17: Loss of winner error under Borda rule over , , , candidates with voters. privacy level (cid:15) (a) d=4 − − l og ( e rr TV E ) privacy level (cid:15) (b) d=8 − −
202 0 1 2 3 privacy level (cid:15) (c) d=16 − −
202 0 1 2 3 privacy level (cid:15) (d) d=32 − − Laplace Naive Sampling Weighted Sampling Additive
Figure 18: Total variation error under Nauru rule over , , , candidates with voters. privacy level (cid:15) (a) d=4 − − l og ( e rr M A E ) privacy level (cid:15) (b) d=8 − −
20 0 1 2 3 privacy level (cid:15) (c) d=16 − −
20 0 1 2 3 privacy level (cid:15) (d) d=32 − − Laplace Naive Sampling Weighted Sampling Additive
Figure 19: Maximum absolute error under Nauru rule over , , , candidates with voters. ubmission for Review, Journal or Conference privacy level (cid:15) (a) d=4 . . . . . . a cc u r a c y A O W privacy level (cid:15) (b) d=8 . . . . . . privacy level (cid:15) (c) d=16 . . . . . . privacy level (cid:15) (d) d=32 . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 20: Accuracy of winner under Nauru rule over , , , candidates with voters. privacy level (cid:15) (a) d=4 − − − − − l og ( e rr L O W ) privacy level (cid:15) (b) d=8 − − − − − privacy level (cid:15) (c) d=16 − − − − − privacy level (cid:15) (d) d=32 − − − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 21: Loss of winner error under Nauru rule over , , , candidates with voters. privacy level (cid:15) (a) n’=0 − − l og ( e rr M A E ) privacy level (cid:15) (b) n’=10 − − privacy level (cid:15) (c) n’=100 − − privacy level (cid:15) (d) n’=500 − − Laplace Naive Sampling Weighted Sampling Additive
Figure 22: Maximum absolute error under Borda rule with honest voters and n ′ = , , , adversarial votes privacy level (cid:15) (a) n’=0 . . . . . . a cc u r a c y A O W privacy level (cid:15) (b) n’=10 . . . . . . privacy level (cid:15) (c) n’=100 . . . . . . privacy level (cid:15) (d) n’=500 . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 23: Accuracy of winner under Borda rule with honest voters and n ′ = , , , adversarial votes. ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference privacy level (cid:15) (a) n’=0 − − − l og ( e rr L O W ) privacy level (cid:15) (b) n’=10 − − −
20 0 1 2 3 privacy level (cid:15) (c) n’=100 − − −
20 0 1 2 3 privacy level (cid:15) (d) n’=500 − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 24: Loss of winner error under Borda rule with honest voters and n ′ = , , , adversarial votes privacy level (cid:15) (a) n’=0 − − − l og ( e rr TV E ) privacy level (cid:15) (b) n’=10 − − −
10 0 1 2 3 privacy level (cid:15) (c) n’=100 − − −
10 0 1 2 3 privacy level (cid:15) (d) n’=500 − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 25: Total variation error under Nauru rule with honest voters and n ′ = , , , adversarial votes. privacy level (cid:15) (a) n’=0 − − − − l og ( e rr M A E ) privacy level (cid:15) (b) n’=10 − − − −
10 0 1 2 3 privacy level (cid:15) (c) n’=100 − − − −
10 0 1 2 3 privacy level (cid:15) (d) n’=500 − − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 26: Maximum absolute error under Nauru rule with honest voters and n ′ = , , , adversarial votes privacy level (cid:15) (a) n’=0 . . . . . . a cc u r a c y A O W privacy level (cid:15) (b) n’=10 . . . . . . privacy level (cid:15) (c) n’=100 . . . . . . privacy level (cid:15) (d) n’=500 . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 27: Accuracy of winner under Nauru rule with honest voters and n ′ = , , , adversarial votes. ubmission for Review, Journal or Conference privacy level (cid:15) (a) n’=0 − − − − − l og ( e rr L O W ) privacy level (cid:15) (b) n’=10 − − − − − privacy level (cid:15) (c) n’=100 − − − − − privacy level (cid:15) (d) n’=500 − − − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 28: Loss of winner error under Nauru rule with honest voters and n ′ = , , , adversarial votes privacy level (cid:15) (a) n”=0 − − l og ( e rr M A E ) privacy level (cid:15) (b) n”=10 − − privacy level (cid:15) (c) n”=100 − − privacy level (cid:15) (d) n”=500 − − Laplace Naive Sampling Weighted Sampling Additive
Figure 29: Maximum absolute error under Borda rule with honest voters and n ′ = , , , adversarial private views. privacy level (cid:15) (a) n”=0 . . . . . . a cc u r a c y A O W privacy level (cid:15) (b) n”=10 . . . . . . privacy level (cid:15) (c) n”=100 . . . . . . privacy level (cid:15) (d) n”=500 . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 30: Accuracy of winner under Borda rule with honest voters and n ′ = , , , adversarial private views. privacy level (cid:15) (a) n”=0 − − − l og ( e rr L O W ) privacy level (cid:15) (b) n”=10 − − −
20 0 1 2 3 privacy level (cid:15) (c) n”=100 − − −
20 0 1 2 3 privacy level (cid:15) (d) n”=500 − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 31: Loss of winner error under Borda rule with honest voters and n ′ = , , , adversarial private views. ggregating Votes with Local Differential Privacy Submission for Review, Journal or Conference privacy level (cid:15) (a) n”=0 − l og ( e rr TV E ) privacy level (cid:15) (b) n”=10 − privacy level (cid:15) (c) n”=100 − privacy level (cid:15) (d) n”=500 − Laplace Naive Sampling Weighted Sampling Additive
Figure 32: Total variation error under Nauru rule with honest voters and n ′ = , , , adversarial private views. privacy level (cid:15) (a) n”=0 − − l og ( e rr M A E ) privacy level (cid:15) (b) n”=10 − −
202 0 1 2 3 privacy level (cid:15) (c) n”=100 − −
202 0 1 2 3 privacy level (cid:15) (d) n”=500 − − Laplace Naive Sampling Weighted Sampling Additive
Figure 33: Maximum absolute error under Nauru rule with honest voters and n ′ = , , , adversarial private views. privacy level (cid:15) (a) n’=0 . . . . . . a cc u r a c y A O W privacy level (cid:15) (b) n’=10 . . . . . . privacy level (cid:15) (c) n’=100 . . . . . . privacy level (cid:15) (d) n’=500 . . . . . . Laplace Naive Sampling Weighted Sampling Additive
Figure 34: Accuracy of winner under Nauru rule with honest voters and n ′ = , , , adversarial private views. privacy level (cid:15) (a) n”=0 − − − − l og ( e rr L O W ) privacy level (cid:15) (b) n”=10 − − − − privacy level (cid:15) (c) n”=100 − − − − privacy level (cid:15) (d) n”=500 − − − − Laplace Naive Sampling Weighted Sampling Additive
Figure 35: Loss of winner error under Nauru rule with honest voters and n ′ = , , ,500