Federated Dynamic GNN with Secure Aggregation
FFederated Dynamic GNN with Secure Aggregation
Meng Jiang
University of Notre Dame [email protected]
Taeho Jung
University of Notre Dame [email protected]
Ryan Karl
University of Notre Dame [email protected]
Tong Zhao
University of Notre Dame [email protected]
Abstract
Given video data from multiple personal devices or street cameras, can we exploitthe structural and dynamic information to learn dynamic representation of objectsfor applications such as distributed surveillance, without storing data at a centralserver that leads to a violation of user privacy? In this work, we introduce FederatedDynamic Graph Neural Network (Feddy), a distributed and secured framework tolearn the object representations from multi-user graph sequences: i) It aggregatesstructural information from nearby objects in the current graph as well as dynamicinformation from those in the previous graph. It uses a self-supervised loss ofpredicting the trajectories of objects. ii)
It is trained in a federated learning manner.The centrally located server sends the model to user devices. Local models onthe respective user devices learn and periodically send their learning to the centralserver without ever exposing the user’s data to server. iii)
Studies showed that theaggregated parameters could be inspected though decrypted when broadcast toclients for model synchronizing, after the server performed a weighted average.We design an appropriate aggregation mechanism of secure aggregation primitivesthat can protect the security and privacy in federated learning with scalability.Experiments on four video camera datasets (in four different scenes) as well assimulation demonstrate that Feddy achieves great effectiveness and security.
Distributed surveillance systems have the ability to detect, track, and snapshot objects movingaround in a certain space [43]. The system is composed of several smart cameras (e.g., personaldevices or street cameras) that are equipped with a high-performance onboard computing andcommunication infrastructure and a central server that processes data [9]. Smart survelliance isgetting increasingly popular as machine learning technologies (ML) become easier to use [10].Traditionally, Convolutional Neural Networks (CNNs) were integrated in the cameras and employedto identify and segment objects from video streams [45]. Then raw features ( e.g. , color, position)of objects were sent to the server and used to train a learning model for tracking and/or anomalydetection. In order to render a model for making accurate decisions, it has to learn higher-levelfeatures and patterns from multi-user or multi-source video data. The model is expected to obtaincomplex patterns such as vehicles slowing down at traffic circles, stopping at traffic lights, andbicyclists cutting in and out of traffic in an unsupervised way.Graph neural networks (GNNs) have been applied to capture deep patterns in vision data acrossdifferent problems such as object detection [45], situation recognition [25], and traffic forecasting[26, 46]. When objects are identified, a video frame can be presented as a graph where nodes are the
Preprint. Under review. a r X i v : . [ c s . CR ] S e p bjects and links describe the spatial relationship between objects. Yet there are three challenges weidentify when designing and deploying GNN models in the distributed surveillance system.First, an effective, annotation-free task is desired for training a GNN model on long graph sequences.The models are expected to preserve the deep spatial and dynamic moving patterns as describedabove in the latent representations of objects [29, 47, 38, 35]. Second, collecting raw video or graphdata from a large number of devices and training on a central server would not be a feasible solution,though we need one global model [22, 6, 44]. After being deployed on real hardware, the GNN modelshould be updated locally at each device as needed, or “fine-tuned” with newly-collected local data,to adapt to unique scenarios so it can make decisions on-the-fly. When there are sufficient resources( e.g. , communication bandwidth), the model updates can be shared among the devices to let themagree on the global model, which captures various deep patterns learned from individual devices orusers. Third, distributed systems are vulnerable to various inference attacks [14, 42, 31]. Namely,adversaries who observe the updates of individual models are able to infer significant informationabout the individual training datasets ( e.g. , distribution or even samples of the training datasets)[1, 32]. When the central server is compromised, individual datasets are compromised as well withthe inference attacks.In this work, we propose a novel approach called Federated Dynamic Graph Neural Network (Feddy)to address the three challenges. Generally, it is an unsupervised, distributed, secured framework tolearn the object representations in graph sequences from several users or devices. i) We define an MSE loss on the task of future position prediction to train the GNN model. Givennode attributes and relational links in a graph sequence before time t , the model generates the latentrepresentation of nodes in the graph of time t via neural aggregation functions, and it learns theparameters to predict the node positions at time t + ∆ t . In our study, ∆ t is 5 seconds, i.e., 150graphs as default for 30 fps video. The node attributes include objects’ horizontal and verticalpositions, object box size, and RGB colors in the center, on the left, right, top, and bottom of thebox. The relational links include spatial relationships between nodes within a graph and the dynamicrelationship of the same node in neighboring graphs. Spatial and dynamic patterns are preservedby this dynamic GNN model. The boxes were identified by CNN tools, so no human annotation isneeded in the training process. ii) We use federated learning (FL) to train one dynamic GNN model across devices without exchangingtraining data. With FL, individual cameras compute the model updates ( e.g. , gradients, updatedweights) locally, and only these model updates are shared with the central server (called parameterserver ) who trains a global model using the aggregated updates collected from individual devices.The parameter server then shares the trained model with all other devices. FL provides a viableplatform for state-of-the-art ML, and it is privacy-friendly because the training data never leavesindividuals. iii)
We employ secure aggregation to prevent inference attacks launched by malicious parameterservers. Secure aggregation is a primitive that allows a third party aggregator to efficiently computean aggregate function ( e.g. , product, sum, average) over individuals’ private input values. When thisprimitive is applied to FL, the parameter server can access the aggregated model only, and adversariescan no longer launch the aforementioned inference attacks to infer individuals’ training data.In our experiments, we use Stanford Drone Dataset published by the Stanford Computational VisionGeometry Lab. The large-scale dataset collects images and videos of various types of agents ( e.g. ,pedestrians, bicyclists, skateboarders, cars, buses, and golf carts) that navigate in university campus[37]. We use videos in four scenes such as “bookstore,” “coupa,” “hyang,” and “little.” Experimentalresults demonstrate that the proposed framework can consistently deliver good performance in anunsupervised, distributed, secured manner. Given a video, suppose the video frame at time t has been processed to form an attributed graph G ( t ) = ( V ( t ) , p ( t ) , g ( t ) , e ( t ) ) using object detection techniques ( e.g. , CNN-based, or as processed inthe Stanford Drone Dataset), where https://cvgl.stanford.edu/projects/uav_data/ i(t) tt-1t-2 W B B W v i-1(t) v i-2(t) u i-2(t) u i-1(t) t+ △ t …[ p x(t+ △ t) (v),p y(t+ △ t) (v) ]… …… t …… t …… t Secure Aggregation Primitives
Figure 1: Dynamic GNN aggregates spatial information within graphs and temporal informationacross neighboring graphs. It is trained by the MSE loss on the task of predicting future positions ofobjects. (Left) We protect federated averaging functions with novel secure aggregation primitives toprevent inference attack by malicious parameter servers. (Right)• V ( t ) is the set of nodes ( i.e. , moving objects [45]);• p ( t ) ( v ) = [ p ( t ) x ( v ) , p ( t ) y ( v )] : V ( t ) → R gives two position values ( i.e. , horizontal andvertical positions) of the center of object v in the frame;• g ( t ) ( v ) : V ( t ) → R k gives k features of object v such as red/green/blue values on the leftand right, at the top and bottom, and in the center;• e ( t ) ( u, v ) : V ( t ) × V ( t ) → R gives the weight of the link between nodes u and v – theweight can be the Euclidean distance between the two nodes on the frame: e ( t ) ( u, v ) = (cid:107) p ( t ) ( u ) − p ( t ) ( v ) (cid:107) , u, v ∈ V ( t ) . (1)Given a video with frames at . . . T , we have attributed graph sequence G ( t ) | Tt =1 . We denote theset of nodes in the graph sequence by V = ∪ Tt =1 V ( t ) . The goal of our approach is to learn therepresentations of each node (called node embeddings ) in each graph that preserve spatial informationin the graph and dynamic information of the node through past graphs: f ( v, t ) : V × { . . . T } → R d ,denoted by v ( t ) , where d is the number of dimensions of node embeddings. The node embeddingscan be used for tasks such as object tracking, forecasting, and malicious behavior detection.Our proposed GNN model has two parts: One is an algorithm for node embedding generation givenraw data and model parameters; the other are loss function(s) based on self-supervised task(s) fortraining the model parameters. Node embedding generation:
First, we use matrix M ∈ R d × ( k +2) to transform from node’s rawfeatures [ p ( t ) ( v ) , g ( t ) ( v )] to the initial latent embeddings v ( t )0 ∈ R d : v ( t )0 = σ (cid:16) M · [ p ( t ) ( v ) , g ( t ) ( v )] (cid:17) , (2)where σ ( · ) is an activation function, which can be sigmoid, hyperbolic tangent, ReLU, etc.Second, for the i -th layer of the neural network ( i ∈ { . . . n } , where n is the number of layers),which means in the i -th iteration of the embedding generation algorithm, we generate the embeddingvector v ( t ) i ∈ R d for node v in graph G ( t ) as follows: v ( t ) i = σ (cid:16) α · B i · A GG u ∈ V ( t − \{ v } π ( e ( t − ( u, v )) · u ( t − i − + β · W i · v ( t − i − + (1 − α − β ) · v ( t ) i − (cid:17) , (3)where (1) u ( t − i − is the embedding vector of node u as a neighboring node of v in graph G ( t − at the ( i − -th iteration; v ( t − i − is the embedding vector of node v in graph G ( t − at the ( i − -th iteration;3 ( t ) i − is the embedding vector of node v in graph G ( t ) at the ( i − -th iteration; (2) B i ∈ R d × d isthe transformation matrix from the aggregated information of neighboring nodes on the ( i − -thlayer to the i -th layer; W i ∈ R d × d is the transformation matrix from the embedding of a node on the ( i − -th layer to the i -th layer; (3) α is a hyperparameter weighting the aggregation of neighboringnodes in the previous graph; β is a hyperparameter weighting the node in the previous graph; (4)A GG is an aggregation function, which can be mean pooling, max pooling, or LSTM aggregator, and(5) π ( e ) defines the importance of aggregating from a neighboring node: A shorter distance (i.e.,weight on the link between the two nodes) indicates a higher importance, so one choice is π ( e ) = e .The final embedding vectors are { ˆv ( t ) ≡ v ( t ) n }| Tt =1 , generated with model parameters including M , B i | ni =1 , and W i | ni =1 from graph sequence data G ( t ) | Tt =1 . Each final embedding vector ˆv ( t ) preservesstructural and dynamic information of node v in G min { t − n, } . . . , G ( t − , G ( t ) . Self-supervised loss:
We expect the final embedding vector ˆv ( t ) can be predictive for future valuesof node’s positions which can be denoted by [ p ( t +∆ t ) x ( v ) , p ( t +∆ t ) y ( v )] , i.e., v ’s positions after ∆ t frames. So we have the loss below by introducing A ∈ R × d from latent space to position space: L ( M , B i | ni =1 , W i | ni =1 , A ) = T − ∆ t (cid:88) t =2 (cid:88) v ∈ V ( t ) (cid:107) p ( t +∆ t ) ( v ) − A · ˆv ( t ) (cid:107) . (4) Complexity analysis:
The time complexity of the Dynamic GNN is O ( r n d (cid:80) Tt =1 | V ( t ) | ) , where n is the number of layers, r ≤ | V ( t ) | is the number of spatial neighbors for each node, and d is thenumber of dimensions of node embeddings. The memory complexity is O ( r n d + nd ) . Usually, thenumber of objects | V ( t ) | in a single graph is not too big (often between 2 and 10). And n is usually 1,2, or 3. When | V ( t ) | turns to be too big, we can sample r spatial neighbors for the training [16]. Joint optimization with multi-user graph sequences from distributed cameras.
Given m videos,which can be represented as G ( j,t ) | j = m,t = T j j =1 ,t =1 , where T j is the number of graphs ( i.e. , video frames)in the j -th graph sequence ( i.e. , video), we extend the dynamic GNN algorithm presented above togenerate final embedding vectors ˆv ( j,t ) | j = m,t = T j j =1 ,t =1 and extend the self-supervised loss as follows: L = m (cid:88) j =1 T j − ∆ t (cid:88) t =2 (cid:88) v ∈ V ( j,t ) (cid:107) p ( j,t +∆ t ) ( v ) − A · ˆv ( j,t ) (cid:107) . (5) Federated optimization.
Privacy, security, and scalability have become critical concerns in dis-tributed surveillance [6]. An approach that has the potential to address these problems is federatedlearning [22]. Federated learning (FL) is an emerging decentralized privacy-protection trainingtechnology that enables clients to learn a shared global model without uploading their private localdata to a central server. In each training round, a local device downloads a shared model fromthe central server, trains the downloaded model over the individuals’ local data and then sends theupdated weights or gradients back to the server. On the server, the uploaded models from the clientsare aggregated to obtain a new global model [48, 44]. Federated dynamic GNN aims to minimize theloss function in Equation (5) but in a distributed scheme: min Θ L (Θ) = m (cid:88) j =1 N j N L j (Θ) , where L ( j ) (Θ) = 1 N j (cid:88) v ∈ V ( j, :) L v (Θ) , (6)where the size of data indexes N j = (cid:80) t | V ( j,t ) | and N = (cid:80) j N j , j is the index of m clients, L ( j ) (Θ) is the loss function of the j -th local client, and Θ represents neural parameters of the dynamic GNNmodel. Optimizing the loss function L (Θ) in FL is equivalent to minimizing the weighted average oflocal loss function L (Θ) .Each user performs local training to calculate individual loss L ( j ) ( M , B i | ni =1 , W i | ni =1 , A ) (for the j -th user), after the loss gradient ∇L ( j ) ( M , B i | ni =1 , W i | ni =1 , A ) is calculated. Each user submits4his individual gradient to a central parameter server who takes the following gradient descent step: Θ t +1 ← Θ t − η · F ED A VG j =1 ...m (cid:16) α j ∇L ( j ) ( M , B i | ni =1 , W i | ni =1 , A ) (cid:17) , (7)where t is for iteration (not time or graph index), Θ is for neural network parameters, η is the learningrate, and α j is the weight of the j -th user’s individual gradient in the weighted average F ED A VG . α j is usually proportional to the size of the user’s dataset. The algorithm based on F ED A VG caneffectively reduce communication rounds by simultaneously increasing local training epochs anddecreasing local mini-batch sizes [30, 48]. Regular FL without explicit security mechanisms has been shown to be vulnerable to various inferenceattacks [31, 32, 42, 1, 14]. Namely, adversaries who observe the updates of individual models becomeable to infer significant information about the individual training datasets (e.g., distribution of thetraining datasets or even samples or training datasets). When the parameter server in the FL iscompromised, individual data (e.g., raw videos or the features) is compromised as well with theinference attacks.A secure aggregation scheme allows a group of distrustful users ν ∈ U ( U is the set of m users) withprivate input x ν to compute an aggregate value F ED A VG ν ∈ U x ν such as (cid:80) ν ∈ U x ν without disclosingindividual x ν ’s to others. Similar to existing work [4–6], we leverage secure aggregation to thwartattacks. However, we take advantage of different approaches and mitigate their shortcomings. Secure aggregation with pair-wise one-time pads.
In the secure aggregation adopted by Bonawitz et al. [4–6], a user ν chooses a random number s ν,µ ∈ Z q for every other user µ , where Z q is aset of integers { , , · · · , q − } . Specially, s ν,µ = 0 if ν = µ . Then, all pairs of users ν and µ exchange s ν,µ and s µ,ν over secure communication channels and compute the one-time pads as p ν,µ = s ν,µ − s µ,ν . Then, each user masks x ν as y ν = x ν + (cid:80) µ ∈ U p ν,µ mod q . Every user ν sends y ν to the server who computes the sum over y ν . Then, it follows that (cid:88) ν ∈ U y ν = (cid:88) ν ∈ U x ν + (cid:88) ν ∈ U (cid:88) µ ∈ U p ν,µ = (cid:88) ν ∈ U x ν + (cid:88) ν ∈ U (cid:88) µ ∈ U s ν,µ − (cid:88) ν ∈ U (cid:88) µ ∈ U s µ,ν = (cid:88) ν ∈ U x ν ( mod q ) (8)Such masking with one-time padding guarantees perfect secrecy (i.e., no information about x ν isrevealed from y ν ) as long as the bit length of q is larger than the bit length of x ν . Such secureaggregation requires that all users frequently share one-time pads at every aggregation, because theone-time pads cannot be reused. This leads to high communication overhead among the users. Thebenefit of such a scheme is that it does not rely on a trusted key dealer unlike the following scheme. Secure aggregation of time-series data.
In the secure aggregation for time-series data [41, 19], atrusted key dealer randomly samples for each user ν ∈ U their one-time pads { p ν }| ν ∈ U from Z suchthat (cid:80) ν ∈ U p ν = 0 . This can be done trivially by choosing the first | U | − numbers randomly and letthe last number be the negative sum of the | U | − numbers (note that − x is equal to q − x modulo q for any x ∈ Z q ). Then, each p ν is securely distributed to each user via secure communicationchannels. Each user ν then masks x ν as y ν = (1 + N ) x ν H ( t ) p ν mod N , where H : T → Z N isa cryptographic hash function, t is the time at which the aggregation needs to be performed, and N isthe product of two distinct unknown prime numbers (i.e., RSA number). Then, it follows that: ( (cid:89) ν ∈ U y ν (cid:1) = (cid:89) ν ∈ U H ( t ) p ν · (cid:89) ν ∈ U (1 + N ) x ν = (1 + N ) (cid:80) ν ∈ U x ν = (1 + N (cid:88) ν ∈ U x ν ) ( mod N ) , (9)where the last equality holds due to the binomial theorem, i.e., (1 + N ) x = 1 + xN (mod N ). Then,it further follows that: (cid:0) ( (cid:81) ν ∈ U y ν ) − (cid:1) mod N N mod N = (cid:88) ν ∈ U x ν . (10)Such masking guarantees semantic security (i.e., no statistical information about x ν is disclosed from y ν ) as long as N is sufficiently large and the Decision Composite Residuosity (DCR) problem [19]5ecomes hard. Such secure aggregation requires a trusted key dealer who performs the computation,however such an entity is hard to find in real-life applications. Even if there is one, it becomes thesingle-point-of-failure whose compromise leads to the compromise of the whole system. The benefitof such scheme is that it does not require frequent key sharing because H ( t ) p ν is computationallyindistinguishable from a random number from Z N as long as p ν is kept secret, and users can re-use p ν over and over as long as no same t is used in the aggregation. Our improved secure aggregation.
We present our secure aggregation scheme that takes the bestof the both worlds. Namely, we combine the two types of secure aggregation schemes above to letusers re-use the shared pads/keys without extra sharing or trusted key dealers.We first let all users { ν }| ν ∈ U generate the one-time pads p ν,µ in the same way as in the secureaggregation with pair-wise one-time pads. Then, we let every user calculate p ν = (cid:80) µ ∈ U p ν,µ ,whose sum (cid:80) ν ∈ U p ν is equal to 0 for some modulus. Then, we let all users use these { p ν }| ν ∈ U toparticipate in the secure aggregation of time-series data. By doing so, users can use H ( t ) ’s and p ν ’s tomask their inputs without relying on a trusted key dealer. At the same time, they can re-use the samepads p ν ’s repeatedly as long as H ( t ) is different every time. Informally, such a masking guaranteesthe correct aggregation at the parameter server side, and it also guarantees that adversaries cannotinfer any information related to individual users’ input data other than the length of the masked data.We present formal definitions and proofs in the appendix. Mapping between real numbers and integers.
All the computations in our secure aggregationscheme are integer computations, and we need to use integers to represent real numbers. We leveragethe fixed point representation [20]. Given a real number x , its fixed point representation is given by [ x ] = (cid:98) x · e (cid:101) for a fixed integer e . Then, it follows that [ x ± y ] = [ x ] ± [ y ] . With such homomorphism,users can convert a real number x to its integer version [ x ] and participate in the secure aggregation.The third-party aggregator can compute the sum (cid:80) ν ∈ U [ x ν ] which is equal to [ (cid:80) ν ∈ U x ν ] , and onecan approximately compute (cid:80) ν ∈ U x ν by computing the following division: (cid:80) ν ∈ U x ν ≈ [ (cid:80) ν ∈ U x ν ]2 e .The approximation error is bounded above by − ( e +1) | U | . Scene
Graph sequence data.
We transform videos in the Stanford Drone Dataset into graph sequences.Each 30-fps video spans over 10,000 frames ( i.e. , over 333 seconds). We use the first minute fortraining, last 3 minutes for testing, and the last 4 th for validation. Statistics can be found in Table 1. Parameter settings.
The number of object’s raw features is k + 2 : is for the object box’s horizontaland vertical positions. k = 17 is for box width, box height, and RGB colors at the center, left, right,top, and bottom of the box. The hyperparameters α is set as . , β is set as . and the number oflayers is set as for the best performance. Here α is the weight for aggregating spatial informationfrom neighbors; β is the weight for aggregating dynamic information from neighboring graphs. Theaggregation is applied every 10 epochs.We will set the number of dimensions of node embeddings d a value in { , , , , } . Wewill simulate federated learning with the number of users m in { , , , } . m = 1 means we disablefederated optimization but have all data on a single user.6 R M S E ( H o r i z on t a l po s iti on p x ) Number of dimensions d no GNN 1 user2 users 5 users10 users R M S E ( V e r ti ca l po s iti on p y ) Number of dimensions d no GNN 1 user2 users 5 users10 users R M S E ( H o r i z on t a l po s iti on p x ) Number of dimensions d no GNN 1 user2 users 5 users10 users R M S E ( V e r ti ca l po s iti on p y ) Number of dimensions d no GNN 1 user2 users 5 users10 users T i m ec o s t ( s ec ond s / e po c h )
32 64 128 256 512 T i m ec o s t ( s ec ond s / e po c h )
32 64 128 256 512 (a) Stanford Bookstore (bookstore): 1424 × × R M S E ( H o r i z on t a l po s iti on p x ) Number of dimensions d no GNN 1 user2 users 5 users10 users R M S E ( V e r ti ca l po s iti on p y ) Number of dimensions d no GNN 1 user2 users 5 users10 users R M S E ( H o r i z on t a l po s iti on p x ) Number of dimensions d no GNN 1 user2 users 5 users10 users R M S E ( V e r ti ca l po s iti on p y ) Number of dimensions d no GNN 1 user2 users 5 users10 users T i m ec o s t ( s ec ond s / e po c h )
32 64 128 256 512 T i m ec o s t ( s ec ond s / e po c h )
32 64 128 256 512 (c) Huang Y2E2 Buildings (hyang): 1340 × × p x (top left), predicting vertical position p y (topright), and running time (bottom left) as well as an example video frame (bottom right). Computational resource.
Our machine is an iMac with 4.2 GHz Quad-Core Intel Core i7, 32GB2400 MHz DDR4 memory, and Radeon Pro 580 8GB Graphics.
Figure 2 presents experimental results in four different scenes. We use Root Mean Square Error(RMSE) to evaluate the performance of predicting horizontal position p x and vertical position p y .We observe, first, the RMSE of Federated Dynamic GNN (Feddy) for any number of users and anynumber of dimensions is smaller than the best method ( i.e. , MLP) on learning raw features. Second,Feddy performs better when the number of dimensions d is bigger. When d ≥ , the performancesunder different number of users ( , , , or ) show very small difference. It means federatedoptimization achieves a consistent global model. From the bar charts in Figure 2 we observe that when the number of user m > , which enablesfederated optimization, the time cost is higher than that of m = 1 . However, it does not become7 T i m ec o s t ( m i c r o s ec ond s )
32 64 128 256 512 T i m ec o s t ( s ec ond s / s h a r i ng )
32 64 128 256 512 T i m ec o s t ( s ec ond s / s h a r i ng )
32 64 128 256 512 C o mm un i ca ti on c o s t ( M B s / s h a r i ng )
32 64 128 256 512 (a) Key Generation (b) Masking (c) Aggregation (d) CommunicationFigure 3: Computation and communication costs of secure aggregation per usersignificantly higher as m becomes bigger. Time cost under different m values is comparable. Thetime cost of non-federated learning with the number of dimensions d = 512 is close to that offederated learning with d = 32 . Feddy’s time cost is . × to × of that of non-federated GNN. Masking at the user side and the aggregation at the parameter server side are computationallyexpensive, however the masking and the aggregation for all weights can be computed in parallel.Therefore, we implemented a parallel program for the secure aggregation algorithms, where maskingand aggregation are parallelized, and performed a simulation to measure the computation costs. COTSsmart cameras are equipped with quad-core processors (e.g., NEON-1040 by ADLINK), therefore weperformed the simulation with 4 threads and measured the end-to-end elapsed time for each user. Werepeated all simulation for 50 times and measured the average. All parameters are chosen such thatwe have 112-bit security as recommended by NIST [3] (the bidwith of p ν ’s is 118 bits, the bitwidthof N is 2048 bits, etc.).The costs of key generation (Figure 3(a)) is negligible. Although they grow linearly w.r.t. the numberof users, the key generation is a one-time process, therefore the overhead of the key generation isnegligible. The masking (Figure 3(b)), however, is not negligible. The costs grow linearly with thenumber of users and quadratically with the dimension. The size of keys grow linearly w.r.t. thenumber of users (since p ν = (cid:80) µ ∈ U p ν,µ ) up to the modulus, therefore the costs will not grow afterthe number of users reaches certain point. However, it is unavoidable to have a quadratic growthbecause the number of the parameters that need to be masked and shared are quadratic w.r.t. thedimension. Still, the costs of masking are acceptable because the aggregation via Feddy occurs onceevery 10 epochs, and the cost of the masking is comparable to the total costs of the GNN trainingduring those epochs. This is a tradeoff one needs to make to achieve a strong provable data security.The aggregation performed by the parameter server (Figure 3(c)) is negligible, since the aggregationoccurs once every 10 epochs. The costs grow linearly with the number of users, however reasonablystrong servers can handle such computation. Finally, we present the total size of the messages oneuser needs to receive from all other users (Figure 3(d)). There are non-negligible communicationcosts caused by the masking, and this is also the tradeoff for the sake of provable security. Dynamic/evolutionary GNN.
GNN models have been developed to learn static graph data [18, 34,13, 17, 39, 40]. Dynamic graph embeddings are expected to preserve specific structures such as triadicclosure processes [47], attribute-value dynamics [24], continuous-time patterns [33], interactiontrajectories [23], and out-of-sample nodes [28]. Graph convolutional networks are equipped with self-attention mechanisms [38], Markov mechanisms [36], or recurrent models [35] for dynamic graphlearning [29]. We focus on modeling spatial and dynamic information in video graph sequences.
Federated machine learning.
As clients have become more powerful and communication efficient,developing deep networks for decentralized data has attracted lots of research [30, 22]. Federatedoptimization algorithms can distribute optimization beyond datacenters [21]. Bonawitz et al. proposeda new system design to implement at scale [6, 48]. Yang et al. presented concepts (forming a newontology) and methods on employing federated learning in machine learning applications [44].8 ecure aggregation on time-series data.
Bonawitz et al. developed practical secure aggregation forfederated learning on user-held data [4] and for privacy-preserving machine learning [5]. However,those methods cannot be directly applied for time-series or sequential data. Existing work on privacy-preserving aggregation of time-series data needs semantic analysis on dynamic user groups usingstate-of-the-art machine learning [41, 19, 20].
We presented Federated Dynamic Graph Neural Network, a distributed and secure framework to learnthe object representations from multi-user graph sequences. It aggregates both spatial and dynamicinformation and uses a self-supervised loss of predicting the trajectories of objects. It is trained ina federated learning manner. The centrally located server sends the model to user devices. Localmodels on the respective user devices learn and periodically send their learning to the central serverwithout ever exposing the user’s data to the server. We design secure aggregation primitives thatprotect the security and privacy in federated learning with scalability. Experiments on four real-worldvideo camera datasets demonstrated that Feddy achieves great effectiveness and security.
Broader impacts
This paper presents provably secure federated learning based on a novel secure aggregation scheme.Specifically, this paper addresses the data security issues in the federated learning for the GNN, whichis a promising deep learning framework for time-series video datasets. Considering the sensitivity ofthe video data collected by surveillance cameras, the broader impacts of this paper lie in the enhanceddata security which leads to enhanced cybersecurity and individual privacy.The increased computation and communication costs may negatively impact the broader impacts,however there is active research in applied cryptography for accelerating cryptographic primitives.Therefore, the negative impacts from the increased overhead will be mitigated over time.
References [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar,and Li Zhang. Deep learning with differential privacy. In
Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security , pages 308–318, 2016.[2] Prabhanjan Ananth and Abhishek Jain. Indistinguishability obfuscation from compact functionalencryption. In
Annual Cryptology Conference , pages 308–326. Springer, 2015.[3] Elaine Barker and Allen Roginsky. Transitioning the use of cryptographic algorithms and keylengths. Technical report, National Institute of Standards and Technology, 2018.[4] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan,Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation forfederated learning on user-held data. In
NIPS Workshop on Private Multi-Party Machine Learning ,2016.[5] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan,Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation forprivacy-preserving machine learning. In
Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security , pages 1175–1191, 2017.[6] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, VladimirIvanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, et al. Towardsfederated learning at scale: System design. arXiv preprint arXiv:1902.01046 , 2019.[7] Dan Boneh. The decision diffie-hellman problem. In
International Algorithmic Number TheorySymposium , pages 48–63. Springer, 1998.[8] Elette Boyle, Niv Gilboa, and Yuval Ishai. Group-based secure computation: Optimizing rounds,communication, and computation. In
Eurocrypt , pages 163–193. Springer, 2017.99] Michael Bramberger, Andreas Doblander, Arnold Maier, Bernhard Rinner, and HelmutSchwabach. Distributed embedded smart cameras for surveillance applications.
Computer ,39(2):68–75, 2006.[10] Jianguo Chen, Kenli Li, Qingying Deng, Keqin Li, and S Yu Philip. Distributed deep learningmodel for intelligent video surveillance systems with edge computing.
IEEE Transactions onIndustrial Informatics , 2019.[11] Jean-Sébastien Coron. On the exact security of full domain hash. In
Annual InternationalCryptology Conference , pages 229–235. Springer, 2000.[12] Ivan Damgård and Gert Læssøe Mikkelsen. Efficient, robust and constant-round distributed rsakey generation. In
Theory of Cryptography Conference , pages 183–200. Springer, 2010.[13] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networkson graphs with fast localized spectral filtering. In
Advances in neural information processingsystems , pages 3844–3852, 2016.[14] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploitconfidence information and basic countermeasures. In
Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Security , pages 1322–1333, 2015.[15] S Dov Gordon, Feng-Hao Liu, and Elaine Shi. Constant-round mpc with fairness and guaranteeof output delivery. In
CRYPTO , pages 63–82. Springer, 2015.[16] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on largegraphs. In
Advances in Neural Information Processing Systems , pages 1024–1034, 2017.[17] William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methodsand applications. arXiv preprint arXiv:1709.05584 , 2017.[18] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structureddata. arXiv preprint arXiv:1506.05163 , 2015.[19] Marc Joye and Benoît Libert. A scalable scheme for privacy-preserving aggregation of time-series data. In
International Conference on Financial Cryptography and Data Security , pages111–125. Springer, 2013.[20] Taeho Jung, Junze Han, and Xiang-Yang Li. Pda: semantically secure time-series data analyticswith dynamic user groups.
IEEE Transactions on Dependable and Secure Computing , 15(2):260–274, 2016.[21] Jakub Koneˇcn`y, Brendan McMahan, and Daniel Ramage. Federated optimization: Distributedoptimization beyond the datacenter. arXiv preprint arXiv:1511.03575 , 2015.[22] Jakub Koneˇcn`y, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh,and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXivpreprint arXiv:1610.05492 , 2016.[23] Srijan Kumar, Xikun Zhang, and Jure Leskovec. Predicting dynamic embedding trajectoryin temporal interaction networks. In
ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining , pages 1269–1278. ACM, 2019.[24] Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. Attributed networkembedding for learning in a dynamic environment. In
Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management , pages 387–396. ACM, 2017.[25] Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. Situationrecognition with graph neural networks. In
Proceedings of the IEEE International Conference onComputer Vision , pages 4173–4182, 2017.[26] Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neuralnetwork: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 , 2017.1027] Yehuda Lindell. How to simulate it–a tutorial on the simulation proof technique. In
Tutorialson the Foundations of Cryptography , pages 277–346. Springer, 2017.[28] Jianxin Ma, Peng Cui, and Wenwu Zhu. Depthlgp: learning embeddings of out-of-sample nodesin dynamic networks. In
Thirty-Second AAAI Conference on Artificial Intelligence , 2018.[29] Franco Manessi, Alessandro Rozza, and Mario Manzo. Dynamic graph convolutional networks. arXiv preprint arXiv:1704.06199 , 2017.[30] H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et al. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 ,2016.[31] Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploitingunintended feature leakage in collaborative learning. In , pages 691–706. IEEE, 2019.[32] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deeplearning: Passive and active white-box inference attacks against centralized and federated learning.In , pages 739–753. IEEE, 2019.[33] Giang Hoang Nguyen, John Boaz Lee, Ryan A Rossi, Nesreen K Ahmed, Eunyee Koh, andSungchul Kim. Continuous-time dynamic network embeddings. In
Companion of the The WebConference 2018 on The Web Conference 2018 , pages 969–976. International World Wide WebConferences Steering Committee, 2018.[34] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neuralnetworks for graphs. In
International conference on machine learning , pages 2014–2023, 2016.[35] Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kaneza-shi, Tim Kaler, and Charles E Leisersen. Evolvegcn: Evolving graph convolutional networks fordynamic graphs. In
Proceedings of the AAAI Conference on Artificial Intelligence , 2020.[36] Meng Qu, Yoshua Bengio, and Jian Tang. Gmnn: Graph markov neural networks. In
Proceed-ings of the 36th International Conference on Machine Learning , 2019.[37] Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learningsocial etiquette: Human trajectory understanding in crowded scenes. In
European conference oncomputer vision , pages 549–565. Springer, 2016.[38] Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. Dynamic graph repre-sentation learning via self-attention networks. arXiv preprint arXiv:1812.09430 , 2018.[39] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, andMax Welling. Modeling relational data with graph convolutional networks. In
European SemanticWeb Conference , pages 593–607. Springer, 2018.[40] Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. Structuredsequence modeling with graph convolutional recurrent networks. In
International Conference onNeural Information Processing , pages 362–373. Springer, 2018.[41] Elaine Shi, TH Hubert Chan, Eleanor Rieffel, Richard Chow, and Dawn Song. Privacy-preserving aggregation of time-series data. In
Proc. NDSS , volume 2, pages 1–17. Citeseer,2011.[42] Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, and Wenqi Wei. Demystifying member-ship inference attacks in machine learning as a service.
IEEE Transactions on Services Computing ,2019.[43] Maria Valera and Sergio A Velastin. Intelligent distributed surveillance systems: a review.
IEEProceedings-Vision, Image and Signal Processing , 152(2):192–204, 2005.[44] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Conceptand applications.
ACM Transactions on Intelligent Systems and Technology (TIST) , 10(2):1–19,2019. 1145] Mehran Yazdi and Thierry Bouwmans. New trends on moving object detection in video imagescaptured by a moving camera: A survey.
Computer Science Review , 28:157–177, 2018.[46] Bing Yu, Haoteng Yin, and Zhanxing Zhu. Spatio-temporal graph convolutional networks: Adeep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 , 2017.[47] Lekui Zhou, Yang Yang, Xiang Ren, Fei Wu, and Yueting Zhuang. Dynamic network embeddingby modeling triadic closure process. In
Thirty-Second AAAI Conference on Artificial Intelligence ,2018.[48] Hangyu Zhu and Yaochu Jin. Multi-objective evolutionary federated learning.
IEEE transactionson neural networks and learning systems , 2019.
A Correctness of Feddy
We rigorously prove that the secure aggregation employed in Feddy correctly let parameter servercompute the aggregated values.
Theorem 1.
Suppose (cid:80) ν ∈ U p ν = 0 (mod N ) and H is defined as H : T → G H , where G H is amultiplicative cyclic group of order N with multiplication modulo N being the multiplicative groupoperator. Then, we have: (cid:0) ( (cid:81) ν ∈ U y ν ) − (cid:1) mod N N mod N = (cid:88) ν ∈ U x ν . Note that the H above can be easily constructed as H ( t ) := ( r ϕ ( N ) ) t if r ϕ ( N ) is included in thesystem-wide parameters. This can be done securely either by employing a crypto server who generatesthe system-wide parameters and destroys all secret values [20], or by employing a secure multi-partycomputation protocol that generates RSA keys [12]. We adopt the former approach in this paper. Proof.
We start by simplifying (( (cid:81) ν ∈ U y ν ) −
1) mod N N . (cid:0) ( (cid:81) ν ∈ U y ν ) − (cid:1) mod N N = (cid:16)(cid:0) (cid:81) ν ∈ U (1 + N ) x ν H ( t ) p ν (cid:1) − (cid:17) mod N N = (cid:16) (1 + N ) (cid:80) ν ∈ U x ν H ( t ) (cid:80) ν ∈ U p ν − (cid:17) mod N N (due to the binomial theorem) = (cid:16) (1 + N (cid:80) ν ∈ U x ν ) H ( t ) (cid:80) ν ∈ U p ν − (cid:17) mod N N (because the order of the group G H is N ) = (cid:16) (1 + N (cid:80) ν ∈ U x ν ) − (cid:17) mod N N = (cid:16) (1 + N (cid:80) ν ∈ U x ν ) − (cid:17) mod N N = N (cid:80) ν ∈ U x ν mod N N = ( N (cid:80) ν ∈ U x ν ) − kN N for some integer k = ( (cid:88) ν ∈ U x ν ) − kN for some integer k Then, it follows that (( (cid:81) ν ∈ U y ν ) −
1) mod N N mod N = (cid:80) ν ∈ U x ν . Note that N is much larger than x ν ’s, so (cid:80) ν ∈ U x ν will not likely exceed the modulus N . For example, in our implementation, N ’s12itwidth is 2048 bits (i.e., as large as ) while x ν s’ bitwidths are less than 100 bits. In such aparameter setting, we can add at least x ν ’s without exceeding the modulus N . Therefore, inpractical settings where we are dealing with gradients of the neural networks, we do not need toworry about the result being incorrect due to the modulus overflow issues.Then, since (cid:80) ν ∈ U x ν can be calculated correctly with Feddy, the parameter server can map the integersum to the real-number sum, which yields the aggregated gradients. Recall that the approximationerror caused by integer-real approximation with the fixed-point representation is bounded above by − ( e +1) | U | as described in Section 4. B Security of Feddy
Here we present a privacy analysis of the framework inspired by [20]. In general this uses standardtechniques for proving indistinguishability (IND-CPA) [19, 2].
B.1 Adversary Model
Note that due to privacy concerns, the result of the analytical computation should only be givento the aggregator (parameter server), and any user’s data should be kept secret from anyone elsebut the owner unless it is deducible from the aggregated value. Additionally, both users and theaggregator are assumed to be semi-honest adaptive adversaries. Informally, these adversaries willfollow the protocol specifications correctly, but they may perform extra computation to try to inferothers’ private values (i.e., semi-honest), and the computation they perform can be based on theirhistorical observation (i.e., adaptive). For our scenario, if users tamper with the protocol (i.e., notfollowing the protocol specifications correctly), it is highly likely that the aggregator will detect itsince the outcome of the protocol will not be in a valid range due to the cryptographic operations onlarge integers. However, the aggregator is interested in recovering the correct result, so they will notbe motivated to attempt to maliciously tamper with the protocol. Note that users may report a valuewith small deviation such that the analytic result still appears reasonable for many reasons (by mistakeetc.), but evaluating the reliability of the reported value is beyond the scope of this paper. Also notethat adversaries are adaptive in the sense that they may produce their public values adaptively afterseeing others’ public values. We assume all the communication channels are open to anyone (i.e. as aresult anyone can overhear/synthesize any message).
B.2 Security Definition
To formally define the security of the framework, we present a precise definition of Feddy:
Definition B.1.
Our Federated Dynamic Graph Neural Network (
F eddy ) is the collection of thefollowing four polynomial time algorithms:
Setup , KeyGen , M ask , and
Aggregate . Setup (1 κ ) −→ params is a probabilistic setup algorithm that is run by a crypto server (which isdifferent from the parameter server) to generate system-wide public parameters that define the integergroups/rings the protocols will be operated on, denoted params given a security parameter κ asinput. KeyGen ( params ) −→ { EK ν } ν is a probabilistic and distributed algorithm jointly run by the users.Each user ν ∈ U will secretly receive his own secret key EK ν . M ask ( x ν , EK ν , T ) −→ y ν = (1 + N ) x ν H ( t ) p ν is a deterministic algorithm run by each user i tomask his private value x ν into the maksed value y ν using t ∈ T f . The output y ν is published in aninsecure channel. Aggregate ( y ν |∀ ν ∈ U ) −→ (cid:80) ν ∈ U ( x ν ) is run by the aggregator to aggregate all the encodedprivate values C ( x ν ) ’s to calculate the sum over { x ν } ν ∈ U .The security of the aggregation is formally defined via a data publishing game (Figure 4), similar to[20].We define the security of the masking in F eddy as follows:13 etup:) disjoint time domains are chosen: T for phase , T for phase , and T c for thechallenge phase. Init:)
The adversary declares their role in the scheme (i.e., aggregator or user), and the challengercontrols the remaining users. The users engage in the key generation.
Phase 1 in T :) The adversary submits polynomially many queries to the masking oracle a and receive the masked values for any x and any time window T ⊆ T and any user in U including those are not adversaries. If the declared time windows do not overlap with each other,the masking oracles returns all masked values to the adversary; otherwise, adversaries receivenothing. Challenge in T c :) The adversary declares the target time window T c . Then, they submit two setsof values { x ν, } , { x ν, } , such that (cid:80) ν ∈ U x ν, = (cid:80) ν ∈ U x ν, , to the challenger. The challengerflips a fair binary coin b and generate the masked values { y ν,b } based on x ν,b , which are givento the adversary. Phase 2 in T :) Phase 1 is repeated adaptively, but the time window T should be a subset of T . Guess:)
The adversary gives a guess b (cid:48) on b . The advantage of the adversary in this game isdefined adv A = | Pr [ b (cid:48) − b ] − | . a The masking oracle’s role is to return the masked values when x , t , and ν are given. Figure 4: Data Publishing Game
Definition B.2.
The random masking in the
F eddy is indistinguishable against the chosen-plaintextattack (IND-CPA) if all polynomial time adversaries’ advantages in the game are of a negligiblefunction w.r.t. the security parameter κ when T , T , and T c are three disjoint time domains.Next, we present a standard simulation-based definition of the security [27, 8, 15], that is achievedby our Feddy protocol. Note that Feddy is not an encryption scheme, and although we leverageDefinition B.2 to prove security later, techniques such as proving IND-CCA or IND-CPA alone donot directly demonstrate the security of the entire protocol. Informally speaking, the F eddy schemeis private if adversaries do not gain more information than the input that they control, the output, andwhat can be inferred from each of them.We define the security of
F eddy as follows:
Definition B.3.
The aggregation scheme
F eddy for a class of summation functions F is said to beprivate for F against semi-honest adversaries if for any f ∈ F and for any probabilistic polynomialtime adversary A controlling a subset A of all players, there exists a probabilistic polynomial-timesimulator S such that for any set of inputs X := ( x , · · · , x n ) in the domain of f where the i -thplayer P i controls x i , {S ( f ( X ) , A, { x j | P j ∈ A } ) } κ c ≡ { View
F eddyA ( X ) } κ where c ≡ refers to computational indistinguishability, κ is the security parameter, and View
F eddyA ( X ) represents the messages received by members of A during execution of protocol F eddy . B.3 Security Proof
Before we present the security proofs, we present the computational problems and the hardnessassumptions Feddy relies on.
Definition B.4.
Decisional Diffie-Hellman (DDH) problem in a group G with generator g is todecide whether g c = g ab given a triple ( g a , g b , g c ) , where a, b, c ∈ Z . An algorithm A ’s advantagein solving the DDH problem is defined as adv DDH A , G = (cid:12)(cid:12)(cid:12) Pr (cid:2) ← A ( g a , g b , g ab ∈ G ) (cid:3) − Pr (cid:2) ← A ( g a , g b , g c ∈ G ) (cid:12)(cid:12) c ← R Z (cid:3)(cid:12)(cid:12)(cid:12) where ← A ( · ) if the algorithm outputs ‘yes’ and 0 otherwise, and the probabilities are taken overthe uniform random selection c ← R Z as well as the random bits of A .14 efinition B.5. Decisional Composite Residuosity (DCR) problem in Z ∗ N is to decide whether agiven element x ∈ Z ∗ N is an N -th residue modulo N or not.The DDH problem in Z ∗ N and the DCR problem are widely belived to be intractable [20, 19, 7]. Wewill prove the security of Feddy by proving the following theorem. Theorem 2.
With the assumptions that the DDH problem is hard in Z ∗ N and that the DCR problem ishard, the random masking in our Feddy scheme is indistinguishable against chosen-plaintext attacks(IND-CPA) under the random oracle model. Namely, for any PPTA A , its advantage adv A in the datapublishing game is bounded as follows: adv A ≤ e ( q c + 1) q c · adv DDH A where e is the base of the natural logarithm, q c is the number of adversaries’ queries submitted to themasking oracle, and adv DDH A is the advantage in solving the DDH problem. Note that adv DDH A isnegligibly small since DDH problem is widely believed to be hard.We prove the theorem by adapting the proofs from [20, 19]. Proof.
To prove the theorem, we present three games Game , Game , and Game , in which weuse A and B to denote the adversary and the challenger. For each l ∈ { , , } , we denote E l as theevent that B outputs 1 in the Game l , and we define adv l = (cid:12)(cid:12) Pr [ E l ] − (cid:12)(cid:12) Game 1:)
This game is exactly identical to the earlier data publishing game. A ’s masking queries ( T, { x ν } ν ) are answered by returning the masked values { y ν } ν . In the challenge phase, the adversary A the associated time window T c , and two sets of values { x ν, } , { x ν, } which satisfy (cid:80) ν ∈ U x ν, = (cid:80) ν ∈ U x ν, . Then, the challenger B returns the corresponding masked values to the adversary A . When the game terminates, B outputs 1 if b (cid:48) = b and 0 otherwise. By definition, adv = (cid:12)(cid:12) Pr [ E ] − (cid:12)(cid:12) = adv A Game 2:)
In Game 2 , the adversary A and the challenger B repeat the same operations as in Game 1using the same time windows of those operations. However, for each masking query in Game 1 attime t ∈ T ∪ T , the challenger B flips a biased binary coin µ T for the entire time window T whichtakes 1 with probability q c +1 and 0 with probability q c q c +1 . When the Game 2 terminates, B checkswhether any µ T = 1 . If there is any, B outputs a random bit. Otherwise, B outputs 1 if b (cid:48) = b and0 if b (cid:48) (cid:54) = b. If we denote F as the event that µ T f = 1 for any T f , the analysis in [11] shows that Pr[ ¯ F ] = e ( q c +1) . According to [19] Game 1 to Game 2 is a transition based on a failure event oflarge probability, and therefore we have adv = adv Pr[ ¯ F ] = adv e ( q c +1) . Game 3:)
In this game, the adversary A and the challenger B repeat the same operations as in Game1 using the same time windows of those operations. However, there is a change in the answers to themasking queries ( T, { x ν } ν ) . The oracle will respond to the query with the following masked values: ∀ ν : y ν = (cid:26) (1 + N ) x ν H ( t ) p ν µ T = 1(1 + N ) x ν ( H ( t ) s ) p ν µ T = 0 where s is a uniform randomly chosen element from Z N that is fixed for the same aggregation.When Game 3 terminates, B outputs 1 if b (cid:48) = b and 0 otherwise. Due to Lemma 8 from [20],distinguishing Game 3 from Game 2 is at least as hard as a DDH problem in Z N for any adversary A in Game 2 and Game 3. It follows then: | Pr [ E ] − Pr [ E ] | ≤ adv DDH A . The answers to maskingqueries in Game 3 are identical to those of Game 2 with probability q c +1 and different from thoseof Game 2 with probability q c q c +1 . In the latter case, due to the random element s , y ν is uniformlydistributed in the subgroup (cid:104) H ( t ) (cid:105) which completely blinds (1 + N ) x ν and A can only randomlyguess b (cid:48) unless he can solve the DCR problem with non-negligible advantages (which is false underour assumption that the DCR problem is hard). Then, b (cid:48) = b with probability / , and the totalprobability Pr [ E ] = Pr[ E ] q c +1 + q c q c +1) . Then, we have: | Pr [ E ] − Pr [ E ] | = (cid:12)(cid:12)(cid:12)(cid:12) (Pr [ E ] − / · q c q c + 1 (cid:12)(cid:12)(cid:12)(cid:12) = adv · q c q c + 1 ≤ adv DDH A adv · q c q c + 1 = adv A · q c e ( q c + 1) ≤ adv DDH A thus completing the proof.With the above, we are now ready to prove the security of F eddy in the well known simulationsecurity model [27] in the following theorem:
Theorem 3.
Assuming |C| = O (log κ ) , all parties properly instantiate and use the F eddy algorithmsin Definition B.1, and the random masking in Definition B.2 is used, our scheme
F eddy privatelycomputes according to Definition B.3
Proof.
Without loss of generality, assume the adversary controls the first m < n variables, where n isthe total number of all variables in the scheme (i.e. each person j participating in the protocol inputsa collection of private variables (e.g. x j, , · · · , x j,r j ) where user j controls r j variables and the totalnumber of variables for all participants sums to n variables). We show that a probabilistic polynomialtime simulator can generate an entire simulated view, given z = f ( x , · · · , x m , x m +1 , · · · , x n ) and x , · · · x m for an adversary indistinguishable from the view an adversary sees in a real executionof the protocol. Note the simulator is able to find x (cid:48) m +1 , · · · , x (cid:48) n such that z = f ( x (cid:48) , · · · , x (cid:48) m ,x (cid:48) m +1 , · · · , x (cid:48) n ) in polynomial time since |C| = O (log κ ) . Besides this, the adversary follows theprotocol as described in Figure B.1, pretending to be honest. Note that in any world (i.e. Ideal or Real ), the adversary can only compute the function over the fixed values submitted by the honestusers, because the adversary can only access the fully aggregated set of submitted values. Individualvalues submitted by the honest players are secure by the IND-CPA property of the underlyingmasking scheme as shown in the proof of Theorem B.3. Now the simulator S generates a viewindistinguishable from that of a real execution, since all parameters broadcast i.e. x (cid:48) ν masked by p (cid:48) ν ,are indistinguishable from the corresponding ones in the real protocol i.e. x ν masked by p ν as theyare generated identically and have the exact same distribution i.e. a uniformly random distribution.More specifically all masked values are indistinguishable in both worlds as both sets of masks aregenerated at random and will have the same random distribution, i.e. the masked values in bothworlds will be indistinguishable from random, and thus indistinguishable from each other. Recall thatthe aggregator does not send the result to users (note that if the users tamper with their submissionsthe aggregator will likely detect it since the outcome of the protocol will not be in a valid range; theaggregator can abort if necessary depending on the use case of the protocol). Similarly, masked valuessent to the aggregator in both worlds will also be indistinguishable by the assumption that the userscorrectly follow the protocol, which utilizes IND-CPA secure masking, and the fact that we choseinputs that give the same output. By the security of the DCR and DDH problems, no information canbe gained by an adversary intercepting messages. This demonstrates that for the class of functions F ,our protocol is secure against adversaries A since: {S ( f ( X ) , A, { x j | P j ∈ A } ) } κ c ≡ { View
F eddyA ( X ) } κκ