[PDF] SAFELearning: Enable Backdoor Detectability In Federated Learning With Secure Aggregation

Abstract

For model privacy, local model parameters in federated learning shall be obfuscated before sent to the remote aggregator. This technique is referred to as \emph{secure aggregation}. However, secure aggregation makes model poisoning attacks, e.g., to insert backdoors, more convenient given existing anomaly detection methods mostly require access to plaintext local models. This paper proposes SAFELearning which supports backdoor detection for secure aggregation. We achieve this through two new primitives - \emph{oblivious random grouping (ORG)} and \emph{partial parameter disclosure (PPD)}. ORG partitions participants into one-time random subgroups with group configurations oblivious to participants; PPD allows secure partial disclosure of aggregated subgroup models for anomaly detection without leaking individual model privacy. SAFELearning is able to significantly reduce backdoor model accuracy without jeopardizing the main task accuracy under common backdoor strategies. Extensive experiments show SAFELearning reduces backdoor accuracy from 100\% to 8.2\% for ResNet-18 over CIFAR-10 when 10\% participants are malicious.

Full PDF

SSAFELearning: Enable Backdoor Detectability In Federated Learning WithSecure Aggregation

Zhuosheng Zhang

Stevens Institute of [email protected]

Jiarui Li

Stevens Institute of [email protected]

Shucheng Yu

Stevens Institute of [email protected]

Christian Makaya [email protected]

Abstract

For model privacy, local model parameters in federated learn-ing shall be obfuscated before sent to the remote aggregator.This technique is referred to as secure aggregation . However,secure aggregation makes model poisoning attacks, e.g., to in-sert backdoors, more convenient given existing anomaly detec-tion methods mostly require access to plaintext local models.This paper proposes SAFELearning which supports backdoordetection for secure aggregation. We achieve this through twonew primitives - oblivious random grouping (ORG) and par-tial parameter disclosure (PPD) . ORG partitions participantsinto one-time random subgroups with group conﬁgurationsoblivious to participants; PPD allows secure partial disclo-sure of aggregated subgroup models for anomaly detectionwithout leaking individual model privacy. SAFELearning isable to signiﬁcantly reduce backdoor model accuracy withoutjeopardizing the main task accuracy under common backdoorstrategies. Extensive experiments show SAFELearning re-duces backdoor accuracy from 100% to 8 .

2% for ResNet-18over CIFAR-10 when 10% participants are malicious.

Federated learning [23] becomes increasingly attractive inemerging applications [1, 2]. As compared to centralizedlearning (i.e., training models at the central server), federatedlearning allows participants (i.e., users) to locally train mod-els with their private data sets and only transmit the trainedmodel parameters (or gradients) to the remote server. The lat-ter aggregates local parameters to obtain a global model andreturns it to users for next iteration. However, recent researchhas discovered that disclosing local models also poses threatsto data privacy, either directly or under subtle attacks suchas reconstruction attacks and model inversion attacks [4]. Toprotect local models against disclosure, nobody except for theparticipant shall know her own local model while the globalmodel will be revealed to all participants. This problem isknown as

Secure Aggregation , which can be generally real-ized using cryptographic primitives such as secure multiparty computation (SMC) and homomorphic encryption. However,there has yet been a practical cryptographic tool that supportsefﬁcient training of complex networks (e.g., for deep learningtasks) though promising progresses have been made towardsmall networks especially for inference tasks. Differentialprivacy [11, 33] provides efﬁcient alternative solutions tomodel privacy protection. However, it remains a challenge tomaintain an appropriate trade-off between privacy and modelquality ( in terms of accuracy loss caused by added noise) indeep learning tasks.Pairwise masking [17, 36] has recently caused attentionfor its efﬁciency in secure aggregation. Speciﬁcally, assume A and B have respective parameters x a and x b and a sharedpairwise mask s a , b . They simply hide their parameters by up-loading y a = x a + s a , b and y b = x b − s a , b (with an appropriatemodulus) respectively. The shared mask will be cancelledduring aggregation without distorting the parameters. Whilethe shared mask can be conveniently generated using well-known key exchange protocols and pseudo random generator,the main problem is to deal with dropout users who becomeofﬂine in the middle of the process and make the shared masknot cancellable. To address this problem, Bonawitz et. al. [9]proposed a protocol that allows online users to recover ofﬂineusers’ secrets through secret sharing. Without heavy crypto-graphic primitives, [9] supports secure (linear) aggregationfor federated learning without accuracy loss as long as thenumber of malicious users is less than a threshold number t .One outstanding problem with secure aggregation is thatit could make model poisoning attacks stealthier. This isbecause local models are no longer revealed to the aggre-gator though required in existing model anomaly detectiontechniques [15]. By uploading scaled erroneous parameters,attackers can launch model poisoning attacks through label-ﬂipping [7] or model backdoor [5], with the aim to manipulatethe global model at the attacker’s will. As shown in Fig. 1, thebackdoor can be conveniently inserted to the global modelin [9] even if one attacker is presented. Concurrent work [29]solves the problem by introducing a veriﬁable secret Sharingscheme that requires applying secret sharing and homomor-1 a r X i v : . [ c s . CR ] F e b hic encryption on users’ local models, while the conventionalsecure aggregation protocol itself only needs secret sharing onsecret keys. To our best knowledge, there is yet a design thatcan enable the backdoor detectability on secure aggregationwhile preserving the efﬁciency. A cc u r a c y Main taskBackdoor

Figure 1: Backdoor/main model accuracy of [9] with one attacker.

In this paper we design a new protocol, namely SAFE-Learning, to address backdoor attacks in secure aggregation.We achieve this with two new primitives - oblivious randomgrouping (ORG) and partial parameter disclosure (PPD) .ORG partitions users into one-time random subgroups ateach iteration with both group membership information andsubgroup conﬁgurations (including the number of subgroups) oblivious to users (membership information is also obliviousto the aggregation server). Neither users nor the aggregationserver can predict or determine which user is assigned towhich subgroup before local parameters are uploaded. Thisproperty forces attackers to work independently even if theyare willing to collude. By making subgroup conﬁgurationsoblivious, we further thwart opportunistic attackers who in-dependently manipulates local parameters based on statisti-cal estimation of the distribution of attackers in subgroups.PPD supports secure partial disclosure of aggregated modelsof subgroups with the privacy leakage no more than whatthe global model leaks. With ORG and PPD, the aggrega-tion server is able to randomly evaluate subgroup models foranomaly detection without additional privacy leakage (i.e.,leakage beyond what the global model leaks).As compared to Bonawitz et. al., the computational com-plexity of SAFELearning is reduced from O ( N + mN ) (atuser side) and O ( mN ) (at server side) to O ( N + n + m ) and O ( mnN + N ) , respectively, where N is the total number ofusers, n ( (cid:28) N ) the number of users in subgroups and m thesize of the model. This is attributed to the hierarchical sub-group design in ORG. SAFELearning is provably secureunder the simulation-based model. We conducted extensiveexperiments with ResNet-18 network over CIFAR-10 datasetunder well-known backdoor strategies. Experimental resultsshow that SAFELearning is able to reduce backdoor accuracyin a poisoned model from 100% to 8 .

2% on average when10% users are malicious.The main contributions of this paper can be summarizedas follows: 1) we design a new secure aggregation protocolthat simultaneously supports backdoor attack detection andmodel privacy; to our best knowledge, this work is the ﬁrstthat can detect model-poisoning attacks on encrypted model parameters for federated learning; 2) the proposed scheme sig-niﬁcantly improves system scalability in terms of both compu-tational and communication complexities as compared to thestate-of-the-art secure aggregation; 3) the proposed scheme isapplicable to other model-poisoning attacks wherein attackersattempt to manipulate the global model via local parameterscaling.This rest of the paper is organized as follows. Section 2presents models and technical preliminaries. An overview toour design is described in Section 3. Section 4 elaborates ourORG primitive and secure aggregation protocol. Section 5 ex-plains the PPD design and our backdoor detection mechanism.Section 7 is performance analysis and Section 8 presents ex-perimental evaluation of our backdoor detection mechanism.Section 9 review related work. We conclude the paper inSection 10.

We assume two parties in a federated learning system: anaggregation server S and a set of N participating users U .The server holds a global model X i of size m and each user u ∈ U possesses a private training data set. Users train theglobal model shared by the server with their private trainingdata at each iteration and upload the local model parametersto the server. The server aggregates local parameters andcompute ∑ u ∈ U x u , where x u (also of size m ) is the local modelparameter trained by u using X i and his local data. The serverreturns the latest global model to each user at the end of eachiteration. The server communicates with each user througha secure (private and authenticated) channel. A trusted thirdparty authenticates each user and generates a private-publickey pair < s SKu , s PKu > for her before the protocol execution.Each user u is assigned a unique “logical identity" Id u in a fullorder. Users may drop out at any phase of the protocol but atleast t users are online for any randomly selected subgroup of n users at any time. In the training phase, we assume the serveruses the baseline federated learning algorithm FedSGD [26]using following updated rule where η is learning rate: X i + = X i + η N ∑ x u ∈ U ( x u − X i ) We consider two types of attackers who have distinct objec-tives: type I attackers are interested in learning the value oflocal parameters to comprise model privacy; and type II at-tackers are motivated to poison the global model to generatea model with semantic backdoors. The global model in typeII attack shall exhibit a good accuracy on its main task butalso behave in a way at attacker’s will on attacker-chosenbackdoor inputs.2 ype I attackers:

Both the server and malicious users areconsidered potential type I attackers but the server is assumedhonest but curious. The server may collude with malicioususers in order to compromise model privacy of benign users.However, the ratio of users it can collude does not exceed rN for any randomly chosen subgroup of users. We assumethe server is against type II attackers since it may share thebeneﬁt of an accurate and robust global model. Type II attackers:

For type II attackers, we consider sim-ilar assumptions with [5], i.e., the attacker has full controlover one or several users, including their local training data,models, and training settings such as learning rate, batch sizeand the number of epochs. However, the attacker is not ableto inﬂuence the behavior of benign users. The number of typeII attackers (malicious or compromised users) are assumed tobe much smaller than benign users. Further more, we assumethe attackers can fully cooperate with each other, includingsharing their secret keys, whenever necessary. To be speciﬁc,we deﬁne the objective of the set U a of type II attackers ineach iteration is to replace the aggregated global model X i + with a target model X target (notice that the X target can be atransitional model in continuous attacks) as shown in (1). U h ( U a ) means the set of benign users (attackers) in this iteration. X i + = X i + η N ∑ x u ∈ U h ( x u − X i ) + η N ∑ x a ∈ U a ( x a − X i ) ≈ X target (1) To achieve his objective, each attacker on average needs toconstruct local parameters x a base on the following: x a − X i = N η | U a | ( X target − X i ) − | U a | ∑ x u ∈ U h ( x u − X i ) ≈ N η | U a | ( X target − X i ) (2)We consider following possible strategies that the attackercan adopt to make the attack more effective and stealthy: Sybil attacks:

In order to reduce the scaling factor γ = N η | U a | and make the attack stealthy, attackers tend to deploy asmany adversary participants as possible, e.g., by sybil attacks[14] to increase | U a | . In this work, we assume the trustedparty authenticates each user when issuing public/private keypairs to thwart sybil attacks. Adaptive Attacks:

Strategic attackers can launch adaptiveattacks [5, 6] by including the distance

Distance ( X target , X i ) between X target and X i (either geometric distance or cosine dis-tance of gradients) in the loss function while training X target .The purpose is to reduce the term X target − X i in (2) to makeattack more imperceptible. However, this term cannot be arbi-trarily optimized as long as the attackers’ objective is differentfrom the main task of the model. M i n i m u m s c a l e f a c t o r learning rate=10e-2learning rate=10e-4 Figure 2: Continuous attacks - scale factor vs. number of iterations

Continuous Attacks:

One efﬁcient backdoor attack is tojust inject erroneous parameters in the last round before con-vergence. However, this “one-shot" attack usually results inlarge updates from attackers due to the large disparity of pa-rameters between attackers and benign users. This is becausethe parameter updates from benign users tend to be very smallat the last round. To make the attack stealthier, attackers mayinstead choose to continuously perform the attack throughmultiple iterations, expecting a relatively smaller scale factorat each iteration. However, unless the malicious model X target is very close to the ﬁnal benign model X true , which is verylikely not true, the continuous attacks will boost benign users’updates x u − X i because of the “mismatch" between the ma-licious model and benign users’ training data. This in turnrequires the attackers to scale up their parameters in orderto dominate the global model. Our preliminary experimentalresults show that the minimum scale factor needed by contin-uous attacks is proportional to the inverse of learning rate asshown in Fig. 2. This observation is also consistent with Eq. 2.Note that the learning rate is controlled by the server, who isagainst model poisoning attacks, and not necessarily revealedto users. It is practically difﬁcult for attackers to choose anappropriate scale factor for the continuous attacks. The con-tinuous attack actually still remains a challenging problemeven in the plaintext setting of backdoor detection. Existingmethods [15, 29] usually require the update history of eachuser, which violates local model privacy.In this paper, we consider all above attack strategies andtheir combination except for sybil attacks. Following existingresearch [9], we assume the attackers only account for a smallportion of the entire population of users (i.e., | U a | (cid:28) N ). Recently, pairwise additive masking [9, 17, 36] has been uti-lized as an efﬁcient cryptographic primitive for secure aggre-gation in federated learning even for complex deep networks.As this paper utilizes pairwise masking for secure aggregation,we provide overview of a recent secure aggregation schemeby Bonawitz et. al. [9] as follows.Let x u denote the an m -dimensional vector of parametersthat user u ∈ U generates locally, where U is the set of allusers. Assume a total order of users and each user u is as-3igned a private-public key pair ( s SKu , s PKu ) . Each pair of users ( u , v ) , u < v , can agree on a random common seed s u , v usingDifﬁe-Hellman key agreement [13]. With the seed, a com-mon mask vector PRG ( s u , v ) can be computed by u and v using a pseudo-random generator ( PRG ) (e.g., a hash func-tion). When u obfuscates her parameter vector x u by addingthe mask vector and v subtracting it, the mask vector will becanceled when server aggregates the obfuscated parametervectors without reveal their actual values. Speciﬁcally, eachuser u obfuscates her parameter x u as following: y u = x u + ∑ ∀ v ∈ U : u < v PRG ( s u , v ) − ∑ ∀ v ∈ U : u > v PRG ( s v , u ) ( mod R ) and sends y u to the server. Then the server computes: z = ∑ u ∈ U (cid:18) x u + ∑ v ∈ U : u < v PRG ( s u , v ) − ∑ v ∈ U : u > v PRG ( s v , u ) (cid:19) = ∑ u ∈ U x u ( mod R ) To address user dropouts, each user u creates N sharesof her secret s SKu using Shamir’s t -out-of- N secret sharingscheme and sends the shares to the rest of users. Additionally,each user u generates another random seed b u which is mainlyto prevent the aggregation server from learning her parametervectors in case she is delayed and her secret has been recov-ered by other users before she becomes online and sends out y u . Random shares of b u are also generated and sent to otherusers. Each user u obfuscates the parameter vector x u using amask PRG ( b u ) in addition to the pairwise mask vector: y u = x u + PRG ( b u ) + ∑ ∀ v ∈ U : u < v PRG ( s u , v ) − ∑ ∀ v ∈ U : u > v PRG ( s v , u ) ( mod R ) In the unmask round, for each dropped user v , online usersreveal the shares of s SKv ; for each online users u , other onlineusers reveal the shares of b u . Then the server will be able tocompute PRG ( s v , u ) and PRG ( b u ) for any online user u andcancel it out from the sum z to get the aggregate model ofonline users. Note that an honest user u never reveals eithershares of s j , v or b v for any user v before the unmask round.The work saliently protects conﬁdentiality of local param-eters efﬁciently while taking into account user dropouts inpractical distributed systems. However, the solution makesit convenient for model poisoning attacks (e.g., backdoor at-tack). As pointed out in a recent work by Bagdasaryan et.al. [5], even a single malicious user is able to manipulatethe global model through model replacement attack. This ispossible because secure aggregation fully encrypts the users’local model which allow the attacker to submit any erroneousparameters. As the aggregation server does not necessarilyhave access to validation datasets, such attack is difﬁcult todetect by simple model validation. In SAFELearning we organize users into subgroups with ahierarchical k-ary tree structure as shown in Fig. 3. At theleaves are equal-sized subgroups of n users. At the aggrega-tion server, the models of each subgroups are ﬁrst aggregated;the aggregated model of the subgroups are further aggregatedat the next level of subgroups; the process repeats recursivelytoward the root of the tree. It is trivial to show that the ag-gregated global model remain the same as exiting federatedlearning algorithms. With the tree structure, users in the samesubgroup pairwise “mask" each other during the secure ag-gregation process. To protect privacy of the aggregated modelof each subgroup, a pairwise mask is also generated for eachsubgroup at internal layers of the tree as shown in dashed linein Fig. 3. Similarly, user secrets (i.e., s SKu and b u for user u asdiscussed in Section 2.4) can be securely shared within sub-groups. Intuitively, secure aggregation with the tree structureprovides similar level of protection to model privacy of usersas in [9], and the user dropouts can be handled similarly aswell. As secret sharing is within subgroups, we directly enjoythe beneﬁt of reduced complexity, i.e., from O ( N ) in [9] to O ( n ) , because of the hierarchical group design.However, strategic type I attackers can comprise localmodel privacy by deploying an overwhelming number ofmalicious users in a target subgroup. This is possible when n (cid:28) N and | U a | (cid:54)(cid:28) n . To prevent such attacks, we shall notallow either the server or any user to determine which usersbelong to which subgroups. Speciﬁcally, the assignment ofusers to subgroups shall be randomized so that nobody in thesystem can assign herself or others to a target subgroup witha non-negligible probability than random assignment.The randomized subgroup assignment also provides theopportunity for detecting type II attackers, whose purposeis to manipulate the aggregated global model, e.g., to insertbackdoors. Speciﬁcally, model poisoning attackers need toamplify their local parameters dramatically in order to inﬂu-ence the global model as discussed in Section 2.3, no matterin “one-shot" attacks or continuous attacks. If each attackerwere to work independently, the magnitude of aggregatedmodels at the subgroups will differ signiﬁcantly from eachother unless each subgroup has exactly the same number ofattackers. Due to the randomness of the tree-based subgroupassignment, it is difﬁcult for the attackers to maintain the ex-act number of them within each subgroup. The aggregationserver can also frequently change subgroup conﬁgurationsto make it difﬁcult for maintaining an even distribution ofattackers in subgroups. Considering collaborative attackers,however, such randomness alone is not enough for backdoorattack detection (or model poisoning detection in general).This is because collaborative attackers can intentionally adjustthe scales of their local parameters to make the distributionof aggregated subgroup models uniform unless subgroups (atleaf layer) outnumbers attackers. To defeat such collaboration,4 erver Layer 2Layer 1Layer

Figure 3: The random tree structure (blue and green dotted lines indicate subgroups at levels 2 and 3 respectively). we need to make the attackers oblivious to each other’s sub-group membership information. This means that the attackersshall not know whether or not a given user/attacker belongsto which subgroup. We call such subgroup assignment as oblivious random grouping (ORG) .With the tree-based ORG, we can detect model poison-ing attacks by evaluating aggregated models of the subgroups(i.e., those at the leaf layer of the tree). Speciﬁcally, subgroupswith more attackers will have much higher magnitude in theiraggregated models because of the scale-up of attackers’ pa-rameters unless each subgroup has exact number of attackers,the chance of which shall be very low in large-scale systems.However, directly revealing aggregated models of subgroupsmay also lead to privacy leakage to some extent dependingon the number of users in the subgroup. To address this issue,SAFELearning partially reveals some higher bits of the aggre-gated models of subgroups as long as the privacy leakage isno more than what is disclosed by the global model which ispublic to all users. Such partial parameter disclosure (PPD) allows us to compare aggregated models of subgroups andeven to conduct some statistical analysis with model privacypreserved.Fig. 4 overviews the high-level workﬂow of our protocol:(a) ﬁrst, the users and server work together to generate therandom tree with our tree generation sub-protocol; at the endof this step, the tree structure and the full orders of secretsharing and pair-wise masking are determined; (b) next, eachuser shares his secret keys s SKu and b u to the users in the samesubgroup, obliviously masks his input x u according to the treestructure (without knowing the subgroup membership infor-mation), and sends the encrypted input to the server. Afterhaving collected the input and secret shares from the users, (c)the server compares the partial information of the aggregatedmodel from each subgroup to detect abnormal subgroup(s). Itcomputes the global model and returns it to users if no modelpoisoning attacks detected. ORG is implemented throughsteps (a) and (b), and PPD is realized in steps (b) and (c).Next two sections elaborates our design of tree-based secureaggregation and poisoning attack detection. Users Random TreeStructureGroup Group Server (a)

Secret SharingMasked InputRandom TreeStructure (b)

ServerAggregated Model Dropped UserSecret ReconstructionSurvived UsersAbnormal Detection (c)

Figure 4: (a) users and the server together generate the randomfull-order and the tree structure. (b) users share the secret key toother assigned users and send their local models to the server. (c)the server inspects the subgroup aggregated models for anomalydetection. Meanwhile, survived users reconstruct secret keys to allowthe server to construct the global aggregated model.

We deﬁne two independent random tree structures T share and T masking , both of which share the same structure of Fig. 3 butwith independent subgroup membership assignments. T share is for secret sharing of secret keys s SKu and b u , and T masking isfor pairwise masking of model parameters. Decoupling thetwo trees is because of the different security requirements ofsecret sharing and pairwise masking. For secret sharing, tree T share divides users into small subgroups (rounded rectangleshows in Fig. 3). The secret sharing will perform betweenusers inside the same subgroup at the leaf layer of the tree.Secure aggregation is performed as in Section 2.4 but us-ing pairwise masks within subgroups. Each user applies twotypes of masking to her parameters - intra-group masking and inter-group masking . Intra-group masking is to protect5ndividual local models using pairwise masks of users withinthe same subgroup at the leaf level of the tree. The aggregatedmodel obtained at this level is called subgroup aggregatedmodel . Inter-group masking is to protect the subgroup aggre-gated models, and the pairwise masks are generated betweenpeers at higher non-leaf layer subgroups. As shown in Fig. 3,we logically form groups (represented as G i j , i is the layernumber and j means this group is the j -th child node of itsparent) over k subgroups at each non-leaf layer of the tree.This process repeats recursively toward the root. Speciﬁcally,for each user u a pairwise masking peer v should comply withfollowing rules:• Ancestors of u and v immediately under their least com-mon ancestor (LCA) shall be within κ immediate neigh-borhood based on the ancestors’ total oder at that layer,where κ is a system parameter.• The positions of u and v shall be the same by their totalorders in their respectively sub-trees, so are the positionsof their ancestors below the immediate children of theirLCA.• Only two peers are needed at each layer for each node.Based on these rules, when κ = G of G inFig. 3 has the following peers: users 1 and 3 in the samesubgroup; user 2 in G of G and user 2 in G k of G (because G is the neighbor (mod k) of G and G k at layer1); user 2 in G of G and user 2 in G of G k (because G is the neighbor (mod k) of G and G k at layer 2); so onand so forth. This is illustrated by the dotted line in Fig. 3.Therefore, our inter-group masking have following properties:ﬁrst, the number of pairwise masking operations for each useris 2 log Nn , twice of the tree height; second, those pairwisemasks cannot be cancelled until the server aggregates all thesubgroups’ aggregated models at that layer of the tree. Let G su be the intra-group masking peers of user u (i.e., pairwisepeers in the same subgroups as u ) and G pu inter-group maskingpeers. The masking equation for user u can be written as y u = x u + PRG ( b u ) + ∑ ∀ v ∈{ G su , G pu } : u < v PRG ( s u , v ) − ∑ ∀ v ∈{ G su , G pu } : u > v PRG ( s v , u ) ( mod R ) (3)The server is able to recover a dropped user u ’s pairwisemasks of any type through secure recovery of his private key s SKu as long as no less than t honest users in the dropped user’ssecret sharing subgroup survived. For secure ORG, the ﬁrst requirement is the randomness ofthe subgroup membership assignment. As users are grouped in the total order by their identities, the randomness can beassured if user identities are randomly generated, i.e., they arerandom and not solely determined by either the user herself orthe server. Speciﬁcally, user identity Id u for user u is generatedin the following way : Id u = HASH ( R s || c PKu || R u ) (4)In Eq. (4), c PKu (or s PKu if it were the pairwise mask tree T mask )is the public key used in Difﬁe-Hellman key agreement and isused one-time for each iteration. Random numbers R s and R u are generated by the server and user u respectively. Because ofthe randomness of hash function, Id u is randomly distributedand not predictable to both users and the server.For random subgroup assignment, however, the order ofthe disclosure of the tree structure T (generated by the server)and that of Id u (jointly produced by the server and user u )is important. In particular, if Id u is disclosed before T , theserver might be able to intentionally group certain users in asubgroup by adjusting the tree structure. On the other hand,if T is disclosed before Id u , malicious users could attempt togroup themself together by manipulating their identities (e.g.,via ﬁnding special hash results). Both could lead to modelprivacy disclosure of victim users. To defeat such potentialattacks, we design a commitment protocol, as shown in Fig.5, with which T and Id u are committed before disclosure.Speciﬁcally, the server ﬁrst broadcasts its commitment of arandom number R s . Users then send their public key c PKu (or s PKu if it were pairwise mask tree T mask ) and commitment of R u to the server. After collecting enough users, the serverwill decide the tree structure (degree and layers) T base onthe number of users N . At this point, Id u has been uniquelydetermined but yet disclosed; neither users nor the server cancompute or predict it. And the tree structure is determined bythe server independently to user identities. The server thenbroadcasts the commitment of T , R s and a list of commitmentsto R u ’s. On receiving the broadcast message from the server,user discloses R u to the server, allowing the server to computethe user’s Id u and make subgroup assignment. Users canverify the correctness of the protocol by requesting the serverto broadcast T and the list of c PKu (or list of s PKu ) and R u afterlocal parameters have been sent.The protocol has considered misbehaving server and users.However, there is still risks when the server colludes withmalicious users. For example, they can pre-compute Id u byexhaustively testing different R s and R u to obtain special iden-tities, e.g., those with leading zero(s) (like in proof of workof blockchain). As benign users are less likely to have suchspecial identities due to randomness of the hash function, themalicious users will be assigned to the same subgroup anddominate that group. To defeat such attacks, we generate theﬁnal identity of user u based on following function instead of It works similarly for T share and T mask . In our next description, we take T share as example. Users Server

Verify

Figure 5: Tree structure generation protocol. using Id u directly,: HASH ( ∑ ∀ v ∈ G : v (cid:54) = u Id v ) In this equation, ∑ means XOR. By this, the randomness isdetermined by all but the user himself. With a random andtrust-able full order (i.e., identities) for all users and an inde-pendently generated tree structure, we can achieve randomsubgroup assignment as shown in Fig. 3. As discussed, users shall be oblivious to subgroup member-ship information for secure ORG. Otherwise, malicious usersare able to coordinate and manipulate the distribution of thesubgroup aggregated model parameters to bypass anomalydetection. In particular, if attackers (malicious users) outnum-bers the subgroups (at the leaf layer), they can coordinate andstrategically adjust local parameters to make the distributionof subgroup aggregated models uniform. If the subgroupsoutnumbers attackers, however, the distribution is doomedimbalance unless attackers give up the attack. Without coordi-nation, the chance that each subgroup contains exact the samenumber of attackers is very low. For example, if there are x attackers and x subgroups, the probability that each subgroupshave exactly one attacker is x ! x x .However, if users are oblivious to subgroup membership,they are not able to identify their peers and generate pairwisemasks to encrypt local parameters. To solve this problem, welet the server (who is against model poisoning attacks for itsown beneﬁts) directly send each user the list of public keysof all the users who are her pairwise mask peers. However,directly sending original public keys may allow malicioususers to recognize each other and know their group mem-bership information. To address this problem, we design arandomized D-H key exchange protocol wherein the serverrandomizes each user’s public key before sending it out. Withthis randomized public key, two malicious users are not ableto tell whether or not they belongs to the same subgroup un-less they are pairwise peers. Speciﬁcally, our construction is as follows. Randomized D-H key exchange.

Assume the server is tocoordinate the exchange of public keys between users u and v ,with their respective public keys s PKu = g s SKu and s PKv = g s SKv .To prevent them from recognizing each other’s public key,the server “randomizes" their public keys before sending out.Speciﬁcally, it ﬁrst produces a random number r u , v , and thensends a randomized public key s ( PK , v ) u = ( s PKu ) r u , v to user v and s ( PK , u ) v = ( s PKv ) r u , v to user u . After key exchange like D-H,the shared key s u , v will become following form: s u , v = ( s ( PK , v ) u ) s SKv = ( s ( PK , u ) v ) s SKu = g s SKu ∗ s SKv ∗ r u , v Please note that pairwise peers are still able to verify thatthey are in the same subgroup by comparing the shared keythey computed. However, the purpose of our randomized D-Hkey exchange is to thwart users who are in the same subgroupbut not peers from knowing the fact that they are in the samesubgroup. This is achieved because of the unique randomnumber r u , v for each pair of peers. Without the randomization,however, two attackers will easily know that they in the samegroup if they receive common public keys of benign user(s). (a) Complete graph topology (b) Circular topology Figure 6: Undirected graph representation of users inside a subgroup.White (red) nodes represent benign users (attackers). Edges meanthe pairwise masking relationship.

Circular topology for pairwise masking relationship

Consider users inside a subgroup as vertices of an undirectedgraph, the edges of which represent pairwise masking peerrelationship as shown Fig. 6. If users choose all the otherusers in the subgroup as peers, the graph is complete. In thiscase attackers will know each other’s membership informa-tion because each pair of peers share the same seed s u , v . Toavoid such situation, we use a circular graph as shown in Fig.6 (b), in which each user u only pairs with 2 κ users - her κ immediate previous and next neighbors based on the totalorder Id u . For successful attack, attackers need to know themembership information of all of them. This is possible onlywhen the attackers inside the same subgroup form a chain.Assume there are a total of x attackers in a subgroup, thechance of which is at most k κ n k − , where k is the number ofattackers in same subgroup and n is the size of the subgroup.7 Backdoor Attack Detection

In the machine learning community, one approach to detectmodel poisoning attacks (e.g., backdoor attacks) is by sta-tistically analyzing the magnitude of the model parametervectors [8, 32]. As discussed in Section 2.3, attackers need toamplify model parameters by a factor of γ = N η | U a | on averageto successfully launch model poisoning attacks. With ran-dom subgroup assignment (Section 4.2) and oblivious secureaggregation (Section 4.3), users are partitioned in randomsubgroups at each iteration and perform pairwise maskingwithout knowing others’ group information. Moreover, therandom tree structure (including the subgroup conﬁguration)is not revealed before local parameters of all users have beenuploaded (Section 4.2). Attackers have to work independentlyand it is difﬁcult for any of them to control or even predict thedistribution of subgroup aggregated models. This providesthe opportunity to detect backdoor attacks by comparing thesubgroup aggregated models.In our secure aggregation scheme, local parameters aremasked by both intra-group masks and inter-group masks.After aggregation at leaf-layer subgroups, the subgroup ag-gregated models are only protected by inter-group masks. Bychanging inter-group masks from full-bit masks to partialmasks, i.e., by revealing few higher bits, we can observe par-tial information of subgroup aggregated models and performanomaly detection. Because attackers need to signiﬁcant scaleparameters, non-zero bits may presented in higher bits for at-tackers’ subgroups. However, disclosure of these higher bitsis acceptable only if it does not lead to extra privacy leakageas compared to what the global model discloses. To this endwe ﬁrst derive the number of bits that can be disclosed bythe partial parameter disclosure (PPD) mechanism and thenpresent our backdoor detection algorithm. Bit format of model parameters and masking:

Assumeeach element of x u is in range [ , R U ] and can be stored inﬁx-point format , and the high bit segment H ( x u ) in range [ R H , R U ] can be disclosed. The extracted high bits of the ag-gregated model of subgroup G i is { ∑ x u ∈ G i H ( x u ) } i ∈ N , i ≤ Nn . G i is i -th subgroup at the leaf layer (i.e., layer 1 as shown inFig. 3). To support higher bits disclosure, we can adjust the Previous research [19] discovered that using 14 bits ﬁx point data onlyhave 0 .

05% accuracy decreasing compare to 32 bits ﬂoat point data in MNISTand CIFAR-10 dataset. pairwise masking equation for user u as follows: y u = x u + PRG ( b u ) + ∑ ∀ v ∈ G su : u < v PRG ( s u , v ) − ∑ ∀ v ∈ G su : u > v PRG ( s v , u ) + ∑ ∀ v ∈ G pu : u < v ( Λ R H ∧ PRG ( s u , v )) − ∑ ∀ v ∈ G pu : u > v ( Λ R H ∧ PRG ( s v , u )) ( mod R ) (5)where Λ R H is a vector of binary masks of length m and G p is the set of inter-group peers. The bits at positions in [ log R H , log R U ] of Λ R H are set as 0’s, and the rest bits are 1.In this way, higher bits starting from log R H are not masked.To estimate the privacy disclosure by revealing the higherbits of the subgroup aggregated models, we analyze the poste-rior probability P ( | X i − y | < ε | Y ) , where X i ∈ R m is the localparameter vector, y is the aggregation result and Y is its range.We compare the probability bound of aggregating all N usersbut revealing all the bits (i.e., the disclosure of global model)with the bound of aggregating n users but revealing only bitsin range [ R H , R U ] (i.e., disclosure of higher log R U − log R H bits of subgroup aggregated models). We prove that when R H = ( − (cid:113) n − n ) ε , the expected bounds will be the samein the two situations. Theorem 1.

Let { X i } i ∈ [ , N ] be N samples from some arbitrarydistribution with mean vector E ( X ) and variance σ N . Theprobability bound is:P ( || X i − y || ≥ ε | y = E ( X )) ≤ σ N ε , ε > Proof.

This theorem can be proved by Chebyshev Inequalitydirectly.Theorem 1 shows that if the server only aggregates at rootnode, which is equivalent to Y = y = E ( X ) , the differencebetween the local vector X i and aggregate result y are limitedby the variance of X . Theorem 2.

Let { X (cid:48) i } i ∈ [ , n ] be n samples randomly se-lected from { X i } i ∈ [ , N ] in Theorem 1 and Y = y ∈ [ E ( X (cid:48) ) − R H / , E ( X (cid:48) ) + R H / ] , where E ( X (cid:48) ) is the mean vector of { X (cid:48) i } i ∈ [ , n ] . If R H = ( − (cid:113) n − n ) ε . The expectation of thebound of P ( | X i − y | < σ | Y ) is σ N ε .Proof. Please see Appendix A for detailed proof.Theorem 2 denotes that information leakage can be reducedby increasing the subgroup size n or decreasing the numberof bits revealed (i.e., increasing R H ). Therefore, the the servercan adjust parameters n and R H to minimize privacy discourserisk. In particular, when R H = ( − (cid:113) n − n ) ε , the privacyleaked by disclosing the higher log R U − log R H bits is thesame as what is disclosed by the global model.8 .2 Backdoor Attack Detection With the partially revealed subgroup aggregated parameters,we design a backdoor attack detection algorithm as shownin Algorithm 1. In the algorithm,

Out ( d , D ) is a function tocheck if a number d is the outlier of set D , which can beimplemented in multiple ways. In this paper we use follow-ing abnormal factor AF ( d , D ) as the decision condition for Out ( d , D ) : AF ( d , D ) = ( d − Mean ( D \ { d } ))( D max − D min ) Std ( D \ { d } ) (7)where the term D max − D min is to avoid false positives indetection during the later stage of the training process. Whenthe training is about to converge, the standard deviation of D is very small because every users’ local model will be veryclose to each other. We use the following outlier function foranomaly detection: Out ( d , D ) = (cid:26) TRUE AF ( d , D ) ≥ FALSE AF ( d , D ) < Algorithm 1:

Suspicious subgroup detection

Data: X L is the set of (partially revealed) aggregatedparameter vectors of the subgroup set G L atlayer L ; X i is global model of current round; Eucl ( , ) is the Euclidean distance between twovectors over the revealed bits; Out ( d , D ) is afunction that checks if d is an outlier in set D ; M = ∅ ; D = ∅ Result:

Attacker-inclusion set M for each vector x i ∈ X L do // Compute the distance to global model d i = Eucl ( H ( x ) , H ( X i )) ; D = D ∪ { d i } ; enddo // Repeat outlier funtion until no new outlier D = D − { d i } i ∈ M ; M (cid:48) = ∅ ; for each d i ∈ D doIf Out ( d i , D ) returns TRUE, add i to M (cid:48) ; end M = M ∪ M (cid:48) ; while M (cid:48) (cid:54) = ∅ ;After having detected subgroup(s) with attackers, there arefew possible reactions that the server can adopt: (1) rejectthis round of aggregation. This may warn attacker indirectlyand allow attacker to adjust the attack parameters; (2) drop orreplace the aggregation result contributed by malicious sub-group(s). In this work, we replace the higher bits of malicioussubgroup with the higher bits of the global model X i . We show the security of our protocol with the following theo-rems, where t ≥ | U a | denote the threshold number of attackers, C ⊆ U ∪ { S } an arbitrary subset of parties, x ∗ the input of aparty or a set of parties ∗ , U the set of all users, S the serverand k the security parameter. Theorem 3. (Local Model Privacy under Type I Attackers)There exists a PPT simulator

SIM such that for all t , U , x U and C ⊆ U ∪ { S } , where | C \{ S }| ≤ t, the output of SIM is computationally indistinguishable from the output of

Real U , t , kC ( x U , U ) Real U , t , kC ( x U , U ) ≈ SIM U , t , kC ( x C , z , U ) where z = (cid:40) ∑ u ∈ U \ C x u if |U| ≥ t ⊥ o.w.Proof. For space limit, detailed security proof is presented inAppendix B.

Theorem 4. (Random Tree Structure Secrecy) There exists aPPT simulator

SIM such that for all t , U , x U and C ⊆ U ∪ { S } ,where | C \{ S }| ≤ t, the output of SIM is indistinguishablefrom the output of real protocol:

Real U , t , kC ( R u , R S , C PKu ) ≈ SIM U , t , kC ( R u , R S , C PKu ) Proof.

We prove this by a standard hybrid argument. Thedetailed proof is in Appendix B.

Theorem 5. (Indistinguishability of Type II attackers in thesame subgroup) For any type II attacker A in a certain sub-group, if there is another type II attacker B in the same sub-group but is not peered with A for pairwise masking, the jointview of A and B View A , B is indistinguishable from View A ,the view of A : View A ≈ View A , B Proof.

The detailed proof is in Appendix C.

As shown in Table 1, the overall cost of each user in ourprotocol is proportional to the number of users in subgroup n rather than the total number of users N . In the followinganalysis, we use h and d to indicate the height and degree ofthe tree structure, respectively. User’s Computational Cost: O ( N + n + m ) . Each user u ’s computational cost includes: (1) performing n + ( κ + log Nn ) key agreements of complexity O ( n ) ; (2) creationof t -out-of- n Shamir’s secret shares of s SKu and b u , which9 able 1: Overall computational and communication costs Tree Based Secure Aggregation ProtocolUser Servercomputation O ( N + n + m ) O ( mN + N ) communication O ( N + m ) O ( N + mN ) is O ( n ) ; (3) veriﬁcation of random full order generation,which is O ( N ) ; and (4) performing intra-group masking andinter-group masking on data vector x u , which is O ( m ( κ + h )) = O ( m ) . The overall computational cost for each user is O ( N + n + m ) . User’s Communication Cost: O ( N + m ) . Each user u ’scommunication cost includes: (1) exchanging keys with allother users in the veriﬁcation phase, which accounts for O ( N ) ;(2) transmitting constant parameters like R u which is O ( ) ;(3) sending and receiving 2 ( n − ) encrypted secret shares ofconstant size, which is O ( n ) ; (4) sending a masked data vectorof size m to the server, which is O ( m ) ; (5) sending the server n secrete shares, which is O ( n ) . The overall communicationcost for each user is O ( N + m ) . Server’s Computational Cost: O ( mN + N ) . The server’scomputational cost includes: (1) reconstructing N t -out-of- n Shamir’s secrets, which takes a total time of O ( Nn ) (Notethat for each user in the same subgroup, the Lagrange ba-sis polynomials for reconstructing Shamir’s secrets remainthe same.); (2) random full order generation, which is O ( N ) ;(3) for each dropped user, the server needs to unmasking itsmask for all survived users in the same group. The expectedtime cost yields to O ( md ( κ + h )) , where d is the number ofdropped users. For each survived user, the server needs tounmasking its own mask. And the time cost is O ( m ( N − d )) .The overall computational cost for the server is O ( mN + N ) . Server’s communication Cost: O ( N + mN ) . Theserver’s communication cost is mainly dominated by (1) itsmediation of pairwise communications between the users,which is O ( N ) and (2) receiving masked data vectors fromeach user, which is O ( mN ) . The overall communication is O ( N + mN ) . To evaluate the performance of our protocol, we test our pro-tocol in different tree structures with 1000 and 1500 partic-ipants under a 15% drop rate. Comparison of our protocolwith the original secure aggregation protocol is presented inAppendix D. The experiments are performed on a Windowsdesktop with Intel i5-8400 (2.80 Ghz), with 16 GB of RAM.We assume that users will only drop out of the protocol aftersending their secret shares to other users in the same sub-group, but before sending masked input to the server. This isthe "worst-case" dropout because the server needs to performmask recovery for other users in that subgroup. As shown in Table 2, when the heights or degree of the tree structureincrease, the user’s running time and communication costs de-crease signiﬁcantly. Therefore, by adjusting the tree structure,SAFELearning is able to control its complexity by limitingthe size of subgroups, which makes it highly scalable.

In this section, we evaluate our protocol against

Type II At-tacker by testing our backdoor attack detection mechanismunder state-of-the-art semantic backdoor attack [5], whichis performed against the training of CIFAR-10 dataset usingResNet-18 network [20]. In our experiments, at each iteration,users and attackers each have 500 training samples randomlyselected from the CIFAR-10 training dataset. In addition tothat, each attacker also owns a group of data with a backdoortrigger. As shown in Fig. 7, we use three different types ofcar images (green car, racing stripe car, and car with verticalstripe background) as backdoor triggers. Standard federatedlearning setting is used and the learning rate is set to 0 .

01. Inthe abnormal detection experiment, we use 32 bits ﬁxed-pointnumber format and reveal the integer bits of the subgroupaggregated models to the server. All experiments results inthis section are the average of repeated experiments.

Attackers’ strategy.

In our experiments, attackers adoptstwo basic strategies:

Sybil attacks and

Adaptive attack . Andwe test

One-time attack and continuous attack (continuousattack for 5 rounds) separately. Speciﬁcally, in our experi-ments, the global model is already converging and the main(backdoor) task accuracy is 92 .

46% (0%). Notes that, be-cause the cooperative attackers have no knowledge about thegrouping information, so we assume that all the cooperativeattackers are the same. (a) (b) (c)

Figure 7: Examples of Backdoor data. Cars with certain attributesare classiﬁed as birds. (a) Cars painted in green; (b) Cars withracing stripe and (c) Cars in front of vertical stripes background.

Semantic backdoor attackers [5] is one type of adaptiveattack that has two critical strategies to make their attack efﬁ-cient. One is to scale their local model using Eq. (2) in section2.2. Another strategy is adaptively include detection criteria,in our case is Euclidean distance, into the loss function totrain the backdoor model X target : l model = α l class + ( − α ) l ano (8)In this equation, besides categorical cross-entropy loss l class ,10 able 2: users’ running time, total data transfer and server running time in different tree structure. The data vectorsize is ﬁxed to 100K entries, each entry is 10 bytes and the drop rate is ﬁxed to . × degree) Running time per user Total data transfer per user Server running time1000 2 × × × × × × L ano to mini-mize the difference between the backdoor model and the be-nign model. By adjusting the hyperparameter α , the attackercan control the trade-off between model accuracy and modelsimilarity to the global model. In our experiments, α = . A cc u r a c y Green, Main taskStripe, Main taskBackground, Main task Green, BackdoorStripe, BackdoorBackground, Backdoor (a) One time attack A cc u r a c y Green, Main taskStripe, Main taskBackground, Main task Green, BackdoorStripe, BackdoorBackground,Back acc (b) Continuous attack

Figure 8: After-attack accuracy of three different backdoor datasets on [5] verus the scale factor. 1000 users in total and . aremalicious. Semantic Backdoor attack to conventional secure ag-gregation protocol [5].

We ﬁrst tested how the attackerperformed in conventional secure aggregation protocol. Asshown in Fig. 8 (a), the attacker can always achieve the attackgoal by using appropriate scale factors around N η N A . Becausethe user’s parameters are fully encrypted in the conventionalprotocol, those malicious updates are invisible to the server.Moreover, because of the insigniﬁcant drop in the main taskaccuracy, the server may not even notice that the global modelhas already been poisoned. If the attacker uses continuousattack strategies, as shown in Fig. 8 (b), the needed scale fac-tor in each round is much smaller. Notes that, as the numberof attack iterations increase, the needed scale factor will not continuously dropping but converge to a minimum neededscale factor proportional to the inverse of learning rate as wealready showed in section 2.2. Another notable beneﬁt fromcontinuous attack is that when the attacker uploads an over-scaled update, the main accuracy won’t drop like the one-timeattack. That’s because if the attacker over-scaled in one round,the attacker will adaptively adjust its private vectors usingEq.(2) based on the difference between current global model X i and the term X target − X i will guide global model convergeto X target in following rounds. Semantic backdoor attack to our protocol.

To show howour protocol against the attacker, we deﬁne the following eval-uation criteria: (1)

Detection Rate (DR) : number of roundswith attackers detected/ number of rounds with attackers pre-sented, (2)

Correction Rate (CR) : Number of attackers inabnormal subgroups/ Total number of attackers, (3)

Falsepositive rate (FPR) : Number of benign subgroups been clas-siﬁed as malicious subgroups/ Total number of subgroups and(4)

After Attack Accuracy like we tested in the conventionalprotocol.In Fig. 9, we show the performance of our protocol againstthe same attack. In our experiments, we use 3-ary tree of theheight of 3 which divides 1000 participants into 27 subgroups.Shows in Fig. 9 (a), our protocol achieves 100% detection and0% false positive rate in most of the situations, which supportsthe distribution analysis that there will be signiﬁcant distribu-tion disparity between benign subgroup and subgroups withattackers. In the correction part, we found that the correctionrate is independent of the scale factor and whether continu-ous or one-time attack. That’s because increasing the scalefactor only inﬂuences the numerical distribution of the aggre-gated model. In contrast to that, shows in Fig. 9 (b), when wechange the number of attackers but keep the total scale factorof all attackers ﬁxed, the correction rate dropped when thenumber of attackers increases. That’s because increase thenumber of attackers will change the distribution of attackers.To show how our protocol mitigates the backdoor attacker,we evaluate the after attacker accuracy using different scalefactors or different numbers of attackers. As shown in Fig. 9(c), when we increase the scale factor, because the correctionrate is constant, the residual attackers will affect the globalmodel and will increase the backdoor accuracy of the poi-11

Green car

Car with tripe

Car in stripe background

DR, One TimeDR, ContinuousCR, One TimeCR, ContinuousFPR, One TimeFPR, Continuous

Local Parameter Scale Factor (a) DR, CR and FPR vs. local parameter scale factor.

Green car

Car with tripe

Car in stripe background

DR, One TimeDR, ContinuousCR, One TimeCR, ContinuousFPR, One TimeFPR, Continuous

Number of attackers (b) DR, CR and FPR vs. Number of attackers.

Green car

Car with tripe

Car in stripe background

Main task, One TimeBackdoor, One TimeMain task, ContinuousBackdoor, Continuous

Local Parameter Scale Factor (c) After attack accuracy vs. local parameter scale factor.

Green car

Car with tripe

Car in stripe background

Main task, One TimeBackdoor, One TimeMain task, ContinuousBackdoor, Continuous

Number of attackers (d) After attack accuracy vs. Number of attackers.

Figure 9: Performance of our protocol against backdoor attack with different backdoor trigger. soned model. But compare to the conventional protocol, ourprotocol can mitigate the attack by suppressing the backdooraccuracy, in another word, our protocol increases the neededminimum scale factor needed by the attacker. This allowsthe server can take action like adjusting the system hyperpa-rameters such as the learning rate or the allowed numericalrange of the user’s private vectors to avoid the attack. Forexample, if the numerical range is smaller than the minimumscale factor required by the attackers, then the attacker willnot able to achieve the attack goal.In another hand, different numbers of attackers will alsohave different attack performances. As shown in Fig. 9 (d),when the number of attackers is relatively smaller than thenumber of benign subgroups, most of the malicious subgroupswill be corrected and the global model shows a backdoor ac-curacy close to 0 in both continuous or one-time attack. Butwhen the number of attackers exceeds the number of sub-groups, the server will not able to correct all the malicioussubgroups and the residual malicious groups will poison theglobal model. In the case of one-time attack, because thecontribution from residual malicious groups is not enoughto achieve the attack goal, the backdoor accuracy is around0% − .

1% when the tree structure is changed from 2 × × degree) to 3 ×

2. But using 3 × × . n users been evenly distributedover n subgroup is n ! n n which is a negligible function.12 able 3: After attack accuracy on our protocol in different tree struc-ture and fraction of malicious user. Scale factor are ﬁxed to . Tree structure Malicious main task backdoor(heights × degree) fraction accuracy accuracy2 × .

92% 28 . × .

06% 65 . × .

64% 87 . × .

49% 16 . × .

89% 61 . × .

60% 85 . × .

46% 0%3 × .

87% 9 . × .

90% 8 . In distributed machine learning users and the aggregationserver, if any, do not have access to the training data sets pos-sessed by other users. This provides the opportunity for vari-ous malicious attacks. Data poisoning [7] and model replace-ment [5] are two most common attacks that aim at generatingmalicious global models at the attacker’s will. Speciﬁcally,traditional data poisoning attacks [12, 18, 25], which usuallytarget at cloud-centric learning, can inﬂuence the behaviorof the global model by constructing poisoning samples anduploading poisoned features. According to recent study, theattacker needs to pollute about 20% to 40% of the trainingdata in targeted classes [7] or focus on training data withrare features [21] in order to launch data poisoning attacks.In large-scale machine learning the attacker usually needsto compromise 10% to 50% of the participants [8, 32] whocontinuously upload malicious models.Model replacement attackers, on the other hand, leveragethe information of the global model and locally construct ma-licious inputs that modify the global model precisely in theway they desire. To make the attack more effective, attackerscan adjust and augment their local model parameters to dom-inate the global model during the aggregation process. Ascompared to data poisoning attacks, model poisoning is moreefﬁcient and effective. The objective of the attack, even if itis to completely replace the global model, can be achieved inone shot with one or few attackers. As sufﬁcient validationdatasets are not always available to the aggregator, detec-tion of such attacks is nontrivial. Existing model replacementdetection techniques like Byzantine-robust federated learn-ing [8] address the problem by evaluating the consistency ofmodels provided by all participating users, either by ampli-tudes [8] or by angles [15] of the received gradient updates.As these techniques need to evaluate models provided byindividual users at each iteration, a prohibiting complexityis introduced in large-scale systems with many participants. Moreover, all these detection techniques require access toplaintext of individual models which is not available in secureaggregation. There has yet been a model poisoning attackdetection mechanism for encrypted model parameters.

Privacy-preserving machine learning [4] aims to prevent orconstrain disclosure of training data or models to unautho-rized parties. To this end various techniques have been pro-posed which can be roughly categorized into secure multi-party computation (MPC) and differential privacy, based onthe underlying techniques they employ. In the area of dis-tributed machine learning, existing MPC-based proposals usu-ally rely on heavy cryptographic primitives include gablecircuits, partially or fully homomorphic encryption, oblivi-ous transfer, etc. Recently, promising progresses have beenmade toward small networks especially for inference tasks[10, 22, 24, 27, 28, 30, 31, 34]. However, there has yet been apractical cryptographic tool that supports efﬁcient training ofcomplex models, e.g., for deep learning tasks.Differential privacy (DP), on the other hand, focuseson publishing aggregated information with limited disclo-sure of private information. For example, one approach[3, 16] is to protect data/model/outputs by allowing usersto add zero-mean statistical noise (e.g., Laplace noise) tomake data/model/outputs indistinguishable and aggregatedata/model/outputs with affordable variance. To maintain adesired accuracy, one needs to carefully design the randomnoise without degrading the level of privacy protection whenthe DP mechanism is repeated in the training process. As aresult, it remains a challenge in DP to maintain an appropri-ate trade-off between privacy and model quality ( in termsof accuracy loss caused by added noise) especially in deeplearning tasks.

10 Conclusion

In this paper we propose a novel secure aggregation schemefor federated learning, which supports backdoor detectionand secure aggregation simultaneously with oblivious randomgrouping and partial parameter disclosure. Compare to con-ventional secure aggregation protocol, our protocol reducescomputation complexity from O ( N + mN ) (at user side) and O ( mN ) (at server side) to O ( N + n + m ) and O ( mnN + N ) ,respectively, where N is the total number of users, n the num-ber of users in subgroups and m the size of the model. Wevalidated our design through experiments with 1000 simu-lated users. Experimental results demonstrate the efﬁciencyand scalability of the proposed design.13 eferences [1] . Federated learning: Collaborative machinelearning without centralized training data.https://ai.googleblog.com/2017/04/federated-learning-collaborative.html. [Online; accessed June-2019].[2] . Under the hood of the pixel 2: How ai is supercharg-ing hardware. https://ai.google/stories/ai-in-hardware/.[Online; accessed June-2019].[3] Martin Abadi, Andy Chu, Ian Goodfellow, H BrendanMcMahan, Ilya Mironov, Kunal Talwar, and Li Zhang.Deep learning with differential privacy. In Proceedingsof the 2016 ACM SIGSAC Conference on Computer andCommunications Security , pages 308–318. ACM, 2016.[4] Mohammad Al-Rubaie and J. Morris Chang. Pri-vacy preserving machine learning: Threats and solu-tions.

IEEE Security and Privacy Magazine , 17(2):49–58, 2019.[5] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deb-orah Estrin, and Vitaly Shmatikov. How to backdoorfederated learning. arXiv preprint arXiv:1807.00459 ,2018.[6] Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mit-tal, and Seraphin Calo. Analyzing federated learningthrough an adversarial lens. In

International Conferenceon Machine Learning , pages 634–643, 2019.[7] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poi-soning attacks against support vector machines. arXivpreprint arXiv:1206.6389 , 2012.[8] Peva Blanchard, Rachid Guerraoui, Julien Stainer, et al.Machine learning with adversaries: Byzantine tolerantgradient descent. pages 119–129, 2017.[9] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, AntonioMarcedone, H Brendan McMahan, Sarvar Patel, DanielRamage, Aaron Segal, and Karn Seth. Practical secureaggregation for privacy-preserving machine learning. In

Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security , pages 1175–1191. ACM, 2017.[10] Raphael Bost, Raluca Ada Popa, Stephen Tu, and ShaﬁGoldwasser. Machine learning classiﬁcation over en-crypted data. In

NDSS , volume 4324, page 4325, 2015.[11] T-H Hubert Chan, Elaine Shi, and Dawn Song. Privacy-preserving stream aggregation with fault tolerance. In

International Conference on Financial Cryptographyand Data Security , pages 200–214. Springer, 2012. [12] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, andDawn Song. Targeted backdoor attacks on deep learn-ing systems using data poisoning. arXiv preprintarXiv:1712.05526 , 2017.[13] Whitﬁeld Difﬁe and Martin Hellman. New directionsin cryptography.

IEEE transactions on InformationTheory , 22(6):644–654, 1976.[14] John R Douceur. The sybil attack. In

International work-shop on peer-to-peer systems , pages 251–260. Springer,2002.[15] Clement Fung, Chris JM Yoon, and Ivan Beschastnikh.Mitigating sybils in federated learning poisoning. arXivpreprint arXiv:1808.04866 , 2018.[16] Robin C Geyer, Tassilo Klein, and Moin Nabi. Differen-tially private federated learning: A client level perspec-tive. arXiv preprint arXiv:1712.07557 , 2017.[17] Slawomir Goryczka and Li Xiong. A comprehensivecomparison of multiparty secure additions with differen-tial privacy. In

IEEE Transactions on Dependable andSecure Computing . IEEE, 2015.[18] Tianyu Gu, Brendan Dolan-Gavitt, and SiddharthGarg. Badnets: Identifying vulnerabilities in the ma-chine learning model supply chain. arXiv preprintarXiv:1708.06733 , 2017.[19] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan,and Pritish Narayanan. Deep learning with limited nu-merical precision. In

International Conference on Ma-chine Learning , pages 1737–1746, 2015.[20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun. Deep residual learning for image recognition. In

Proceedings of the IEEE conference on computer visionand pattern recognition , pages 770–778, 2016.[21] Ling Huang, Anthony D Joseph, Blaine Nelson, Ben-jamin IP Rubinstein, and J Doug Tygar. Adversarialmachine learning. In

Proceedings of the 4th ACM work-shop on Security and artiﬁcial intelligence , pages 43–58,2011.[22] Chiraag Juvekar, Vinod Vaikuntanathan, and AnanthaChandrakasan. Gazelle: A low latency framework for se-cure neural network inference. In . USENIX, 2018.[23] Jakub Koneˇcn`y, H Brendan McMahan, Felix X Yu, Pe-ter Richtárik, Ananda Theertha Suresh, and Dave Bacon.Federated learning: Strategies for improving commu-nication efﬁciency. arXiv preprint arXiv:1610.05492 ,2016.1424] Jian Liu, Mika Juuti, Yao Lu, and N. Asokan. Obliviousneural network predictions via minionn transformations.In . ACM, 2017.[25] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee,Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojan-ing attack on neural networks. 2017.[26] Brendan McMahan, Eider Moore, Daniel Ramage, SethHampson, and Blaise Aguera y Arcas. Communication-efﬁcient learning of deep networks from decentralizeddata. In

Artiﬁcial Intelligence and Statistics , pages 1273–1282. PMLR, 2017.[27] Payman Mohassel and Peter Rindal. Aby3: A mixedprotocol framework for machine learning. In . ACM, 2018.[28] Payman Mohassel and Yupeng Zhang. Secureml: Asystem for scalable privacy-preserving machine learning.In ,pages 19–38. IEEE, 2017.[29] Luis Muñoz-González, Kenneth T Co, and Emil CLupu. Byzantine-robust federated machine learningthrough adaptive model averaging. arXiv preprintarXiv:1909.05125 , 2019.[30] Valeria Nikolaenko, Udi Weinsberg, Stratis Ioannidis,Marc Joye, Dan Boneh, and Nina Taft. Privacy-preserving ridge regression on hundreds of millions ofrecords. In , pages 334–348. IEEE, 2013.[31] M. Sadegh Riazi, Christian Weinert, OleksandrTkachenko, Ebrahim M. Songhori, Thomas Schneider,and Farinaz Koushanfar. Chameleon: A hybridsecure computation framework for machine learningapplications. In . ACM,2018.[32] Shiqi Shen, Shruti Tople, and Prateek Saxena. A uror: de-fending against poisoning attacks in collaborative deeplearning systems. 2016.[33] Elaine Shi, HTH Chan, Eleanor Rieffel, Richard Chow,and Dawn Song. Privacy-preserving aggregation of time-series data. In

Annual Network & Distributed SystemSecurity Symposium (NDSS) . Citeseer, 2011.[34] Sameer Wagh, Divya Gupta, and Nishanth Chandran.SecureNN: Efﬁcient and private neural network training.In

PETS 2019 , 2019. [35] Ronald E Walpole, Raymond H Myers, Sharon L Myers,and Keying Ye.

Probability and statistics for engineersand scientists , volume 5. Macmillan New York, 1993.[36] Gergely Ács and Claude Castelluccia. I have a dream!(differentially private smart metering). In . Springer, 2011.

Appendices

A Proof of Theorem 2

Theorem 2 : Let { X (cid:48) i } i ∈ [ , n ] be n samples randomly se-lected from { X i } i ∈ [ , N ] in Theorem 1 and Y = y ∈ [ E ( X (cid:48) ) − R H / , E ( X (cid:48) ) + R H / ] , where E ( X (cid:48) ) is the mean vector of { X (cid:48) i } i ∈ [ , n ] . If R H = ( − (cid:113) n − n ) ε . The expectation of thebound of P ( | X i − y | < σ | Y ) is σ N ε . Proof.

Assume { X (cid:48) i } i ∈ → n have variance σ n . Using Cheby-shev Inequality, we can gets: P ( || X i − E ( X (cid:48) ) || ≥ ε ) ≤ σ n ε , ε > R H Y = y ∈ [ E ( X (cid:48) ) − R H / , E ( X (cid:48) ) + R H / ] , wecan derive the bounds: P ( || X i − y || ≥ ε | y ∈ [ E ( X (cid:48) ) − R H / , E ( X (cid:48) ) + R H / ]) ≤ σ n ( ε − R H ) , ε > R H { X (cid:48) i } i ∈ → n are randomly sampled from { X i } i ∈ → N ,according to the distribution of sample variance [35], theexpectation of σ n is n − n σ N . Therefore the expectation of thebounds can be written as: E ( σ n ( ε − R H ) ) = ( n − ) σ N n ( ε − R H ) When we let R H = ( − (cid:113) n − n ) ε the expectation will be σ N ε ,which is the same in Theorem 1. B Type I Attacker

Here we give the formal proof of Theorem 3. We have the fol-lowing parameters setting in our protocol execution: a securityparameter k which the underlying cryptographic primitivesare instantiated with, a threshold t to limit the number of cor-rupted parties, a server S and a user set U which contains N u i where i = { , , ..., N } .During the execution, we require that the total dropouts tobe limited by certain threshold. We denote the input of eachuser u as x u and x U (cid:48) = { x u } u ∈ U (cid:48) as the inputs of any subset ofusers U (cid:48) ⊆ U .In the protocol execution process, the view of a party con-sists of its inputs and the randomness and all messages thisparty received from other parties. If a party drop out during theprocess, its view remains the same as that before the dropoutoccurs.Given any subset C ⊆ U ∪ { S } of the parties, letReal U , t , kC ( x U , U ) be a random variable which represents thejoint view of all parties in C which includes some honest butcurious users and an honest but curious server who can com-bine knowledge of these users. Our theorem shows that thejoint view of any subset of honest but curious parties can besimulated given the inputs of these part of users and only thesum of the input from those users within the same group. Inparticular, we prove that with a certain threshold t , the jointview of the server and any set of less than t corrupted partywill not leak any information of other honest parties besidesthe output which is known to all the parties and server. Theorem 3 (Local Model Privacy under Type I Attackers)There exists a PPT simulator SIM such that for all t , U , x U and C ⊆ U ∪ { S } , where | C \{ S }| ≤ t , the output of SIMis computationally indistinguishable from the output ofReal U , t , kC ( x U , U ) Real U , t , kC ( x U , U ) ≈ SIM U , t , kC ( x C , z , U ) where z = (cid:40) ∑ u ∈ U \ C x u if |U| ≥ t ⊥ o.w. Proof.

We prove the theorem by a standard hybrid argument.We will deﬁne a simulator SIM through a series of subsequentmodiﬁcations to the random variable Real, so that any twoconsecutive random variables are computationally indistin-guishable.

Hybrid This random variable is distributed exactly asReal, the joint view of the parties C in a real execution ofthe protocol. Hybrid In this hybrid, we change the behavior of thosehonest parties in simulator. Instead of using the secret sharingkeys ( C SKu , C PKv ) produced by Difﬁe-Hellman key exchange,these parties use a uniformly random encryption key c u , v chosen by the simulator. This hybrid is computationally indis-tinguishable with the previous one guaranteed by our ServerInvolved Decisional Difﬁe-Hellman key exchange.Recall that in our Server Involved D-H key exchange, theonly modiﬁcation we made compared to the D-H key ex-change in original protocol is that, we add a randomness to thepublic key. In speciﬁc, instead of exchanging the public key { s PKu , s PKv } , user u and v will exchange { ( s PKu ) r u , v , ( s PKv ) r u , v } . We show that for any honest but curious parties who is giventhis pair of public key can not distinguish the secret s u , v com-puted by these keys from a uniformly random string.Assume there exists a PPT algorithm A which can breakthe server involved D-H key exchange with advantage ε , thenthere exists an a PPT algorithm B which can break DDHassumption with the same advantage. The proof follows thesetting in original protocol. Consider a multiplicative cyclicgroup G of order q , and with generator g , for a , b , c uniformlyrandom chosen in Z q and a randomness r a , b produced by V RF , A can distinguish g ab ra , b from g c , then for B , it times the resultof D-H key exchange for r a , b times, that is ( g ab ) r a , b and theninvoke A , it breaks the DDH with the same advantage. Hybrid In this hybrid, we change the ciphertexts en-crypted by the honest parties. Instead of using the encryptionof correct shares S SKu and b u , each honest party just encrypt0 ∗ which has the same length as those shares and send it toother honest parties. This hybrid is computationally indistin-guishable with the previous one. This is guaranteed by theIND-CPA security of secret-sharing scheme. Hybrid In this hybrid, we change the shares of b u givento the corrupted parties to shares of 0. Note that in our design,the view of the corrupted parties contains no more than (cid:100) t / n (cid:101) shares of each b u ,since the honest users do not reveal b u forcorrupted parties during Unmasking procedure. This hybridis distributed the same as the previous one. This is guaranteedby the properties of Shamir’s secret sharing scheme. Hybrid In this hybrid, we change a part of pariwise mask-ing information of all the parties in U . Instead of computing PRG ( b u ) , the simulator use a uniformly random vector of thesame size. Note that in previous hybrid, since b u is uniformlyrandom and is substituted by shares of 0,the output does notdepend on the seed of PRG . Thus, when we substitute theresult of

PRG with a random value. the result is guaranteedto have the same distribution by the security of

PRG whichmeans this hybrid is distributed the same as the previous one.

Hybrid In this hybrid,we substitute the masked input with-out the input of the users. That is,instead of sending y u = x u + PRG ( b u ) + ∑ ∀ v ∈ G su : u < v PRG ( s u , v ) − ∑ ∀ v ∈ G su : u > v PRG ( s v , u ) + ∑ ∀ v ∈ G pu : u < v ( Λ R H ∧ PRG ( s u , v )) − ∑ ∀ v ∈ G pu : u > v ( Λ R H ∧ PRG ( s v , u )) ( mod R ) (9)users send: y u = PRG ( b u ) + ∑ ∀ v ∈ G su : u < v PRG ( s u , v ) − ∑ ∀ v ∈ G su : u > v PRG ( s v , u ) + ∑ ∀ v ∈ G pu : u < v ( Λ R H ∧ PRG ( s u , v )) − ∑ ∀ v ∈ G pu : u > v ( Λ R H ∧ PRG ( s v , u )) ( mod R ) (10)16ecall that in the previous hybrid, PRG ( b u ) was changedto be uniformly random, so x u + PRG ( b u ) is also uniformlyrandom. Thus, this hybrid is identical to the previous oneand further more, all the following hybrids do not depend thevalues of x u Hybrid In this hybrid, we change the behavior by sending0 instead of S SKu generated by the honest parties to all otherparties. Similar with

Hybrid ,the properties of Shamir’s secretsharing scheme guarantee that this hybrid is identical to theprevious one. Hybrid In this hybrid, for a speciﬁc honest user u (cid:48) , in orderto compute y u , we substitute the key s u , v and s v , u for all otherusers in G pu with two uniformly random value. This hybrid iscomputationally indistinguishable with the previous one. Thisis guaranteed by the Decisional Difﬁe-Hellman assumption. Hybrid In this hybrid, instead of using

PRG ( s u , v ) and PRG ( s v , u ) in Λ R H ∧ PRG ( s u , v ) and Λ R H ∧ PRG ( s v , u ) , we com-pute y u with two uniformly random values with the same size.Recall that The bits of Λ R H ’s elements are 0 if they are inrange [ R H , R U ] , other wise the bits are 1. This operation willnot affect the randomness since all the values are doing an andoperation with 0 on high bits, this hybrid is indistinguishablewith the previous one. This is guaranteed by the security of PRG . Hybrid In this hybrid, for a speciﬁc honest user u (cid:48) , in orderto compute y u , we substitute the key s u , v and s v , u for all otherusers in G su with two uniformly random value. This hybrid iscomputationally indistinguishable with the previous one. Thisis guaranteed by the Decisional Difﬁe-Hellman assumption. Hybrid In this hybrid, instead of using

PRG ( s u , v ) and PRG ( s v , u ) , we compute y u with two uniformly random valueswith the same size. Similar to Hybrid , this hybrid is distin-guishable with the previous one. This is guaranteed by thesecurity of PRG . Hybrid In this hybrid, for all the honest users,instead ofsending y u = x u + PRG ( b u ) + ∑ ∀ v ∈ G su : u < v PRG ( s u , v ) − + ∑ ∀ v ∈ G su : u > v PRG ( s v , u ) + ∑ ∀ v ∈ G pu : u < v ( Λ R H ∧ PRG ( s u , v )) − ∑ ∀ v ∈ G pu : u > v ( Λ R H ∧ PRG ( s v , u )) ( mod R ) (11)users send y u = w u + PRG ( b u ) + ∑ ∀ v ∈ G su : u < v PRG ( s u , v ) − ∑ ∀ v ∈ G su : u > v PRG ( s v , u ) + ∑ ∀ v ∈ G pu : u < v ( Λ R H ∧ PRG ( s u , v )) − ∑ ∀ v ∈ G pu : u > v ( Λ R H ∧ PRG ( s v , u )) ( mod R ) (12)where w u is uniformly random and satisfy ∑ u ∈ U \ C w u = ∑ u ∈ U \ C x u = z . This hybrid is identically distributed as previousone.Thus, we can deﬁne a PPT simulator SIM as the last hybriddescribe and the output of this simulator is computationallyindistinguishable from the output of Real.In the next part we prove that for our random tree structureunder honest but curious setting, the ﬁnal order output by theprotocol is indistinguishable from a uniformly random valuein the co-domain of selected hash function. Theorem 4 (Random Tree Structure Secrecy) There existsa PPT simulator SIM such that for all t , U , x U and C ⊆ U ∪{ S } ,where | C \{ S }| ≤ t , the output of SIM is indistinguishablefrom the output of real protocol:Real U , t , kC ( R u , R S , C PKu ) ≈ SIM U , t , kC ( R u , R S , C PKu ) Proof. we show this by a standard hybrid argument. We willdeﬁne a simulator SIM through a series of modiﬁcations tothe random variable Real, so that any two consecutive randomvariables are indistinguishable.

Hybrid This hybrid is exactly the same as

Real . Hybrid In this hybrid, we ﬁx an speciﬁc honest user u (cid:48) and change its behavior. Instead of sending c PKu , u (cid:48) sends auniformly random key with the appropriate size to the server.Notice that Id u = HASH ( R s || c PKu || R u ) , thus this hybrid is dis-tributed exactly the same as the previous one. This is guaran-teed by the properties of hash function. Hybrid In this hybrid, for the speciﬁc user deﬁned in pre-vious hybrid, the ﬁnal output

HASH ( ∑ ∀ v ∈ G : v (cid:54) = u Id v ) is indistin-guishable from the previous one. It is easy to show that ifthere exists one more honest user except u (cid:48) , the result willguarantee to uniformly distribute by the properties of hashfunction.Moreover, we can prove that the result still holds for ma-licious users and compromised server under the security ofthe commitment protocol. After commitment, these corruptedusers and server could not make any change on R s and R u ,and then the result of Id u is indistinguishable from a randomvalue in co-domain of hash function by the properties of hashfunction. C Type II Attacker

In this part we give a formal statement to our claim. For aﬁxed attacker A in a certain subgroup, we show that even ifthere is another attacker B existing in the same subgroup, theycan not know they are in the same subgroup which means thatthe joint view of A and B is indistinguishable from the viewof A . Theorem 5 (Indistinguishability of Type II attackers inthe same subgroup) For any type II attacker A in a certainsubgroup, if there is another type II attacker B in the same17ubgroup but is not peered to A , View A , B , the joint view of A and B is indistinguishable from View A , the view of A : View A ≈ View A , B Proof.

We prove the theorem by a standard hybrid argument.Recall that there is n users in a subgroup and each user canonly pairwise mask with previous and next κ users accordingto Id u in our circle-like topology design. We ﬁrst ﬁx theposition of a given isolated attacker A which is peered with κ honest user in this subgroup and an unassigned attacker B . Hybrid This hybrid is exactly the same as our setting. Inthis hybrid, the view of attacker A contains the transcriptproduced by running the protocol with all the pairwise maskpeers of A and all the key pairs owned by the attackers besidesthe key pairs shared by the secret sharing scheme since weallows the cooperation of attackers. Hybrid In this hybrid, we substitute all the information ofthese κ honest user with random strings and the view of A remains the same. This part of proof follows the same as ourproof for type I attacker and we omit it. Hybrid In this hybrid, for a speciﬁc user u who is not apeer of A and let v denote the pairwise mask peers of u , wesubstitute the key c SKu with c SK B . This modiﬁcation will notchange the view of A since A has no information about c SKu guaranteed by the fact that the server will only send the thepublic key from user u ’s pairwise companions. Hybrid In this hybrid, instead of using s SKu and b u , user u shares the share of B . Similar with Hybrid , the server willonly relay the encrypted secret shares to the peers of user u .Thus, for attacker A , the view remains the same. Hybrid In this hybrid, we substitute the pairwise key pairs ( s SKv , s PKu ) by the keys of B . Similarly, this will not change theview of A . The view of A has already contains the key of B and the circle-like topology guarantee that A cannot executethe pairwise masking with u so they share no informationabout the pairwise mask keys. Till now, we have substitute allthe information of an honest user u by the information of anattacker B . The joint view of A and B contains the transcriptsproduced by running the protocols. Since we have alreadyprove that the protocol is privacy-preserving. The joint viewof A and B makes no difference with joint view of A and anhonest user u . Thus, if two attackers are not peered with eachother for pairwise masking, any of them cannot get any extrainformation more than the information it owns. D Performance Comparison

We compare the overhead of our protocol with Bonawitz et.al. [9] (abbr. as “original") in the server and user side withsimulation experiments. In the experiment, we use 2 × u pair mask with previous and next κ =

100 150 200 250 300 350 400 450 500 S e r v e r R un i n g T i m e ( m s ) Original, 0% dropoutOriginal, 10% dropoutOriginal, 20% dropoutOriginal, 30% dropoutOurs, 0% dropoutOurs, 10% dropoutOurs, 20% dropoutOurs, 30% dropout (a) Running time for the server in different clients number. The datavector size is ﬁxed to 100K entries. S e r v e r R un i n g T i m e ( m s ) Original, 0% dropoutOriginal, 10% dropoutOriginal, 20% dropoutOriginal, 30% dropoutOurs, 0% dropoutOurs, 10% dropoutOurs, 20% dropoutOurs, 30% dropout (b) Running time for the server, as the size of the data vector increases.The number of clients is ﬁxed to 500.

Figure 10: Server Running Time. Solid line represent our Tree BasedSecure Aggregation Protocol. Dash line represent original SecureAggregation Protocol

Fig. 10 shows the server’s running time comparison be-tween two protocols when the number of clients and data vec-tor size increases. Note that, with 0 dropout rate, the server’srunning time is almost the same between two protocols. Thisis because as long as all pairwise masks are canceled out, theserver of both protocols only needs to unmask b u for each sur-vived user. But when there are the dropped users, the serverin the original protocol needs to unmask O ( N ) pairwise maskfor each dropped user where our protocol only needs to un-mask O ( ) . So, in Fig. 10, the running time growth rate of theoriginal protocol is increased signiﬁcantly as the dropout raterises up when our protocol still keeps the same growth rate as0 dropout rate.As seen in Fig. 11 (a), The users’ running time of both pro-tocol increases linearly when the number of users increases.But our protocol has a signiﬁcantly lower increase rate be-cause the computation complexity of each user in our protocolis O ( N + n + m ) instead of O ( N + mN ) . And because thenumber of pairwise masking needed in our protocol is con-stant for each user, the user’s computation cost won’t havesigniﬁcant change when the data vector size increase. So inFig. 11 (b), the running time’s increase speed of our protocolis imperceptible compare to the original protocol that needs N pairwise masking operations for each user. In Fig. 11 (c),18

00 150 200 250 300 350 400 450 500 R un i n g T i m e p e r C li e n t ( m s ) Original, 0% dropoutOriginal, 10% dropoutOriginal, 20% dropoutOriginal, 30% dropoutOurs, 0% dropoutOurs, 10% dropoutOurs, 20% dropoutOurs, 30% dropout (a) Running time per users, as the numberof clients increases. The data vector size isﬁxed to 100K entries R un i n g T i m e p e r C li e n t ( m s ) Original, 0% dropoutOriginal, 10% dropoutOriginal, 20% dropoutOriginal, 30% dropoutOurs, 0% dropoutOurs, 10% dropoutOurs, 20% dropoutOurs, 30% dropout (b) Running time per users, as the data vec-tor size increases. The number of users isﬁxed to 500

100 150 200 250 300 350 400 450 500 T o t a l D a t a p e r C li e n t ( K B ) Original, 100K data entriesOriginal, 300K data entriesOriginal, 500K data entriesOurs, 100K data entriesOurs, 300K data entriesOurs, 500K data entries (c) Total data transfer per client, as thenumber of clients increases. Assumes nodropouts.