A Federated Multi-View Deep Learning Framework for Privacy-Preserving Recommendations
Mingkai Huang, Hao Li, Bing Bai, Chang Wang, Kun Bai, Fei Wang
AA Federated Multi-View Deep Learning Frameworkfor Privacy-Preserving Recommendations
Mingkai Huang ∗ , Hao Li ∗ , Bing Bai ∗ , Chang Wang ∗ , Kun Bai ∗ , Fei Wang † Tencent Inc, Cornell UniversityEmail: ∗ mingkhuang, leehaoli, icebai, coracwang, [email protected], † [email protected] Abstract —Privacy-preserving recommendations are recentlygaining momentum, since the decentralized user data is increas-ingly harder to collect, by recommendation service providers, dueto the serious concerns over user privacy and data security. Thissituation is further exacerbated by the strict government regula-tions such as Europe’s General Data Privacy Regulations (GDPR).Federated Learning (FL) is a newly developed privacy-preservingmachine learning paradigm to bridge data repositories withoutcompromising data security and privacy. Thus many federatedrecommendation (FedRec) algorithms have been proposed to real-ize personalized privacy-preserving recommendations. However,existing FedRec algorithms, mostly extended from traditionalcollaborative filtering (CF) method, cannot address cold-start problem well. In addition, their performance overhead w.r.t.model accuracy, trained in a federated setting, is often non-negligible comparing to centralized recommendations. This paperstudies this issue and presents
FL-MV-DSSM , a generic content-based federated multi-view recommendation framework thatnot only addresses the cold-start problem, but also significantlyboosts the recommendation performance by learning a federatedmodel from multiple data source for capturing richer user-level features. The new federated multi-view setting, proposedby
FL-MV-DSSM , opens new usage models and brings in newsecurity challenges to FL in recommendation scenarios. Weprove the security guarantees of
FL-MV-DSSM , and empiricalevaluations on
FL-MV-DSSM and its variations with publicdatasets demonstrate its effectiveness. Our codes will be releasedif this paper is accepted.
Index Terms —Federated Learning, Privacy-Preserving Recom-mendation, Multi-View, Federated Multi-View
I. I
NTRODUCTION
Privacy-preserving recommendation, motivated by the in-creasing interest in user privacy, data security, and strict govern-ment regulations like Europe’s GDPR [25], is recently gainingmomentum. Federated Learning (FL) has been recognizedas one of the effective privacy-preserving machine learningparadigms for bridging data repositories while respecting dataprivacy [26], thanks to its decentralized collaborative machinelearning process without exposing local raw data of any FLparticipant. Therefore, the combination of recommendationand FL has received widespread attention, which leads tomany federated recommendation (FedRec) algorithms beingproposed.Existing FedRec algorithms are mostly derived from col-laborative filtering (CF) [3, 23], which privately uses user’sprevious history of interaction to predict the most relevantitems for recommendation. The performance overhead of CF-based FedRec [2, 7] is observable but acceptable comparing to traditional CF. In addition, inherited from CF, CF-basedFedRec suffers from cold-start problem [24 ? ]. The drawbackof CF-based FedRec motivates an initial attempt on content-based FedRec, FedNewsRec [21], by simply applying FL’sFedAvg [19] to a deep learning model designed specificallyfor news recommendation, which is hard to generalize to otherFedRec scenarios. Therefore, the limitations of existing FedRecmotivate us to take further steps on addressing both the cold-start problem and recommendation performance.To achieve the goals mentioned above, in this work wepropose FL-MV-DSSM , a generic content-based federated multi-view recommendation framework. First of all, by transforminga generic deep learning model, Deep Structured SemanticModels (DSSM [15]) which can map users and items to a sharedsemantic space for further content-based recommendation, intoa federated setting,
FL-MV-DSSM is able to handle existingFedRec’s cold-start problem. Then, by designing a novelapproach for
FL-MV-DSSM to learn a federated model frommultiple data source for capturing richer user-level features,
FL-MV-DSSM greatly boosts its recommendation performance.Moreover,
FL-MV-DSSM presents a new federated multi-viewsetting, which potentially opens new usage models, e.g. jointlylearning a federated model using data from different mobilephone Apps (e.g. gaming Apps and mobile App markets foraccurate mobile game recommendations), and inevitably bringsin new challenges, e.g. preventing data leakage among differentphone Apps. We will address these challenges in this paper.The contribution of this paper is threefold. First, to thebest of our knowledge, we present the first generic content-based federated multi-view framework and several algorithmsthat address the cold-start problem and recommendationperformance simultaneously. Second, we extend the vanillaFedAvg [4, 19] from traditional federated setting to a newfederated multi-view setting, and correspondingly present anovel approach for securely learning and aggregating multiplelocal models that share a single model. Third, we carefullystudy the challenges of the new federated multi-view setting,and present a solution to guarantee its security requirement.Empirical evaluations on
FL-MV-DSSM and its variations withpublic datasets demonstrate the effectiveness of our framework.II. R
ELATED W ORK
Recommendation Systems:
In general, traditional recom-mendation can be divided into CF-based [3, 23] recommenda-tion and content-based recommendation [15, 17, 18]. CF relies a r X i v : . [ c s . I R ] A ug n user’s considerable history data before it can predict themost relevant item for recommendation, and thus suffers fromthe problem known as cold-start problem [24]. Content-basedrecommendation [15, 17, 18 ? ], on the other hand, relies onthe similarity computed from latent features, learned from userand item information, to recommend items for users, thus itcan handle cold-start problem better than CF, but may fail todeliver high quality recommendations when it is difficult toacquire sufficient user information and item information [10 ? ]. In addition, traditional recommendation relies on centralizeddata for training and prediction. Federated Learning (FL):
FL is a newly developed privacy-preserving machine learning paradigm to bridge data reposito-ries without compromising data security and privacy [26]. Byonly transmitting summative information (gradients [19], dotproducts [13], etc.) of training process, rather than collectinguser raw data, FL by essence is a distributed decentralizedcollaborative machine learning framework without exposingFL participant’s raw data. In practice, to prevent sensitiveinformation leakage through summative information [27], FLis further secured by privacy-preserving mechanisms such asDP [8], HE [22], MPC [12], and so on.
Federated Recommendation Algorithms (FedRec):
Sim-ilar to traditional recommendation, existing FedRec algorithmscan be mainly divided into CF-based, e.g. FCF [2], FedMF [7],etc., and content-based, e.g. FedNewsRec [21]. Similarly,both FCF and FedMF locally update user matrix on FLparticipants, and globally aggregate item matrix, through itemgradients, on FL server. The difference between FCF andFedMF is that FedMF further protects item gradients withhomomorphic encryption to prevent private information leakagethrough gradients. Similar to traditional CF, both FCF andFedMF cannot handle cold-start problem [24]. Unlike exitingCF-based FedRec,
FL-MV-DSSM and its variations are allcontent-based FedRec. FedNewsRec is the first content-basedFedRec that uses complicated deep learning models designedspecifically towards news recommendation. In addition, therecommendation quality of FedNewsRec mainly depends on itsmodel design. Unlike FedNewsRec, our methods use generalDSSM as our basic model to further research on performanceoptimization methods in federated multi-view setting.
Multi-View Recommendation Algorithms:
There aresome multi-view recommendation algorithms, e.g. MV-DSSM [10] and FED-MVMF [11]. MV-DSSM [10] extendsDSSM by utilizing information from different Apps and trainsa shared user sub-model, leading to better performance on itemrecommendation. However, MV-DSSM is originally proposedfor multi-item-views learning and requires centralized datasetand thus cannot work in a FL setting. Moreover, our methodsconsider multi-user-views learning to enrich the user featureswhen training a shared item sub-model. Meanwhile out methodspreserve data privacy between user views since we think usersub-model contains user private information that should notbe shared across different views. FED-MVMF extends FCFto matrix factorization with multiple data sources, and thusoutperforms FCF only using single data source. Comparing to FED-MVMF, our methods are content-based FedRec supportingdeep learning model, which is more flexible.III. P
RELIMINARIES
This section gives our problem definition and reviews relatedtechniques to
FL-MV-DSSM . Problem Definition.
We define a federated multi-viewsetting, the aim of which is to learn a model over data thatnot only resides on m distributed nodes, but also locates in n isolated views on each distributed node. One possible form of a“view” can be a user mobile phone App. As a running example,consider the scenario of recommending a mobile game App inan App market. Not only the user behaviors in App markets (e.g.Google Play) matter, but also her behaviors in gaming Apps (e.g.Fortnite) will contribute a lot. However, such data collaborationamong multiple business entities is often unachievable due todata security issues if without privacy-preserving FL methods.We thus denote the decentralized federated multi-viewdatasets on each distributed node, or a FL client, as D n =( U , I ) , . . . , ( U i , I ) , . . . , ( U n , I ) , where all user view datasets U ∈ R n × d U are generated by different views, and item dataset I ∈ R d I is downloaded from server, e.g. a mobile App’sbackend service platform. The semantic vectors can be extractedfrom each view-level user dataset U i ∈ R d U i and item dataset I respectively by using deep models like DSSM. Then our goalis to find a non-linear mapping f ( · ) for each view such that thesum of similarities, in the semantic space between mapping ofall user view datasets U and item dataset I , is maximized oneach client. Our objective on each client is defined as follows: argmax W I , W U ,..., W U n S (cid:88) j =1 exp( γ cos( y I , y i,j )) (cid:80) X (cid:48) ∈ U i exp( γ cos( y I , f i ( X (cid:48) , W i ))) , (1)where S denote the number of positive user-item pair ( X U i ,j , X I j ) , i is the index of the view U i in sample j , y denote the mapping result of f ( · ) , and γ is the temperatureparameter. Security Definition.
In addition to the security requirementsfrom traditional FL,
FL-MV-DSSM requires extra securityguarantees. In our federated multi-view setting, although allviews collaboratively train a model with datasets U and I on each FL client, there should be no raw data interactionbetween views since each dataset U i contains private view-specific information that should be protected. Moreover, eachview’s contribution to the item sub-model, learned from sharedlocal dataset I , should be protected as well since maliciousview could otherwise infer innocent view’s raw data from hergradients [27] by monitoring her changes to the shared localitem sub-model. Threat Models.
We consider the following threat modelsin our federated multi-view setting: • [ Traditional FL ] : FL clients and/or FL server are activeadversaries who deviate from the FL protocol, e.g.,sending incorrect and/or arbitrarily chosen messages tohonest users, aborting, omitting messages, and sharingheir entire view of the protocol with each other, and alsowith the server if server is an active adversary. • [ Federated Multi-View ] : Certain view can be fully mali-cious, which means as an APP, it would act arbitrarily, e.g.monitoring network interface to observe innocent view’snetwork traffic, making null updates to shared local itemsub-model to infer innocent view’s update, monitoringchanges of item sub-model, etc., in order to infer innocentview’s data information.In addition, we make the following assumption: • [ View-Level Isolation ] : Each view’s dataset U i and model W U i are only accessible to the i -th view, such thatmalicious view cannot access U i and W U i . The isolationcan be achieved through encryption or TEE [20]. Federated Learning.
FedAvg [19], which allows trainingmultiple epochs locally on each FL client before aggregating aglobal model on FL server, is widely used in FL setting. Theaggregation process is defined as below: W = K (cid:88) k =1 m k m W k , (2)where K is the number of clients, m is the number ofdecentralized samples, and m k is the number of samples onthe k -th FL client. Deep Structured Sematic Models (DSSM).
TheDSSM [15], originally designed for web search, can extractsemantic vectors from user’s query words and candidatedocuments, by multi-layer neural networks, and then employcosine similarity to measure the relevance between queryand documents in semantic space. In our generic federatedmulti-view recommendation setting, we adopt DSSM asour basic model, shown in Figure 1a, and extend it into
FL-MV-DSSM . Specifically, in
FL-MV-DSSM , DSSM’s userquery is equivalent to
FL-MV-DSSM ’s user feature of i -thview U i , and document is equivalent to item I .More formally, if we denote x as the original feature vectorof query words or documents, y as the semantic vector, l i , i =1 , . . . , N − , as the intermediate hidden layers, W i as the i -thweight matrix, and b i as the i -th bias term, we have DSSM’sforward propagation process defined as: l = W x ,l i = f ( W i l i − + b i ) , i = 2 , . . . , N − , y = f ( W N l N − + b N ) . (3)The semantic relevance score between a query Q and adocument D is then measured as: R ( Q, D ) = cosine ( y Q , y D ) = y (cid:62) Q y D (cid:107) y Q (cid:107) · (cid:107) y D (cid:107) , (4)where y Q and y D are semantic vectors of query and document,respectively.We assume that a query is relevant to the documents that areclicked on for that query, and the parameters of the DSSM, i.e.,the weight matrix W are optimized using this information tomaximize the conditional likelihood of the clicked documents given queries. The posterior probability of a document given aquery is calculated from the semantic relevance score betweenthem through a softmax function P ( D | Q ) = exp( γR ( Q, D )) (cid:80) D (cid:48) ∈ D exp( γR ( Q, D (cid:48) )) , (5)where γ is the temperature parameter in the softmax function.The D denotes the set of candidate documents to be ranked. Inpractice, for each pair, (cid:104) query, clicked-document (cid:105) , denoted by ( Q, D + ) where Q is a query and D + is the clicked document,we approximate D by including D + and N randomly selectedunclicked documents, denote by { D − j ; j = 1 , , N } . In training,the loss function we need to minimize is L (Λ) = − log (cid:89) ( Q,D + ) P ( D + | Q ) , (6)where Λ denotes the parameter set of the neural networks.IV. FL-MV-DSSM : G
ENERIC M ULTI -V IEW F EDERATED R ECOMMENDATION F RAMEWORK
This section presents our generic multi-view federated learn-ing framework. Specifically, regarding recommendation sce-nario, we first introduce
FL-MV-DSSM ’s training and predictionalgorithms respectively, and then show how
FL-MV-DSSM guar-antees data privacy and user privacy between different views infederated multi-view setting. We also present some variationsof
FL-MV-DSSM , such as
FL-DSSM and
SEMI-FL-MV-DSSM ,as shown in Figure 1.
A. FL-MV-DSSM Training Algorithm
We assume each FL client has N views (Apps) of user-levelfeatures, denoted as U i for the i th view. The i th view (App) canonly access the U i dataset. The item dataset I is downloadedfrom recommendation provider. All views can access the localshared dataset I . Regarding a FedRec task, old users areassumed to have some behavior data that can generate y , whilenew users don’t have any behavior data. FL-MV-DSSM buildson traditional FedAvg [19] algorithm, which requires FL serverto provide initial models.Algorithm 1 shows
FL-MV-DSSM ’s training algorithm.Assuming in
FL-MV-DSSM ’s training phase, all FL clients areold users having behavior data w.r.t. item dataset I to generate y . Within each view i , gradients of user sub-model and itemsub-model are calculated based on i th view’s private user data U i and local shared item data I . Although FL-MV-DSSM is acontent-based FedRec, we empirically found that aggregatinggradients of item sub-model leads to better recommendationperformance, comparing to only aggregating gradients of usersub-model, the result of which is also found in CF-basedFedRec [2, 7]. Thus in
FL-MV-DSSM , gradients of item sub-model will be aggregated in a FL manner, while the aggregationof user gradients is configurable, by “aggregate_user_sub-model” flag in line 9 of Algorithm 1, which leads to onevariations of
FL-MV-DSSM , SEMI-FL-MV-DSSM . After eachFL training round, both user and item sub-models are updatedaccording to new global gradients distributed by FL server, ina FedAvg manner.
SSMApp userdata
ItemUser item data (a) DSSM
App1 userdata
Client 1 {[user],item} Gradients Secure Updates ......
App1Item userdata
Client M {[user],item}gradients
ItemUser User itemdata itemdata download {[user],item} Gradients Secure Updates (b)
FL-DSSM
User-View1App1 App2User-View2 userdata userdata
Client 1 ......
User-View1App1 App2User-View2 userdata userdata
Client M itemdata itemdata download {[user1],[user2],item}gradients {[user1],[user2],item}GradientsSecure Updates {[user1],[user2],item}GradientsSecure Updates (c)
FL-MV-DSSM
User-View1App1 App2User-View2 userdata userdata
Client 1 ......
User-View1App1 App2User-View2 userdata userdata
Client M itemdata itemdata download {item}gradients {item}GradientsSecure Updates {item}GradientsSecure Updates (d)
SEMI-FL-MV-DSSM
Fig. 1:
FL-MV-DSSM and its variations.The gradients for both user and item sub-models containview specific information that should be protected, thus
FL-MV-DSSM provides two secure aggregation primitives, local_secure_aggregate() and remote_secure_aggregate() , tosecure both local and remote gradients aggregation. We discussmore about both secure aggregation primitives in Section IV-C.
B. FL-MV-DSSM Prediction Algorithm
Algorithm 2 shows
FL-MV-DSSM ’s prediction algorithm.For each item x I j , old or new, item sub-model output its result y I j . Meanwhile for user’s output, y U , which is locally secureaggregated from user sub-models in multiple views, is used tocompare with all y I j s to determine their similarities. Based onthe similarity results, FL-MV-DSSM will output top- K itemsfor the user, old or new. C. Secure Primitives for Privacy Protection
To defend against the threat models defined in SectionIII,
FL-MV-DSSM mainly adopts two secure primitives, lo-cal_secure_aggregate() and remote_secure_aggregate() , usedin Algorithm 1 and 2.The purpose of both local_secure_aggregate() and re-mote_secure_aggregate() is to securely aggregate N vectors,locally or remotely, and return the aggregation results, without exposing raw data of each participant to other participants orthe curator, either local FL-MV-DSSM framework or remoteFL server. However, different execution environment leads todifferent implementations for both primitives.
1) local_secure_aggregate():
It is mainly used on user’smobile phones. It has two usages. First, securely aggregate N gradients of item sub-models in N different views in FL-MV-DSSM ’s training algorithm; Second, securely aggregate N probabilities, each of which is the possibility that anitem is interesting to a user according to a user view U i ,in FL-MV-DSSM ’s prediction algorithm. In practice, N (cid:62) .Because N can be 2, and the aggregated gradient will be sentto each view for next round of training, solutions based onSecure Multi-Party Computation (MPC) [5] can not be usedto implement local_secure_aggregate() since the aggregationresult is accurate and it will expose another view’s gradients bysubtracting oneself’s own gradients from the result gradients.In addition, the computation cost of MPC-based solution isquadratic in FL client [5]. Therefore we leverage differentialprivacy (DP) to realize local_secure_aggregate() .Specifically, we apply Gaussian Mechanism [9] in DP to line7 in Algorithm 1, by adding noise to each gradients of item sub-model ( g k I ) i in i -th view, before aggregation. For Algorithm 2,we add Gaussian noise to each probability before aggregation. lgorithm 1 FL-MV-DSSM
Training Algorithm FL Client:
Input:
Number of views N . Dataset D = { ( X i , y ) , i ∈{ , . . . , N }} , where y is user-item behavior data, and X i = ( U i , I ) , where U i is the user dataset from view (e.g.App) i , and I is the downloaded item dataset to berecommended. Number of FL training round T . Learningrate η . Initial user sub-models: { W U , . . . , W U N } . Initialitem sub-model weights: W I . Output:
The user sub-models weights: { W T U , . . . , W T U N } .The item sub-model weights: W T I . for k = 1 : T do for each view i = 1 : N do ( g k I ) i = ∂L ( W k I , W k U i , X i , y ) ∂ W k I , ( g k U i ) i = ∂L ( W k I , W k U i , X i , y ) ∂ W k U i . end for g k I = local _ secure _ aggregate ( { ( g k I ) i } ,i ∈ { , . . . , N } ) . G k I = remote _ secure _ aggregate ( g k I ) . if aggregate_user_sub-model then for each view i = 1 : N do ( G k U i ) i = remote _ secure _ aggregate (( g k U i ) i ) . end for end if W k +1 I = W k I − η G k I . for each view i = 1 : N do W k +1 U i = W k U i − η G k U i . end for end for FL Server:
Input:
Number of FL clients M . Number of FL training round T . Client updates g kj in FL round k . Output:
Securely aggregate FL client’s summative informa-tion, e.g. gradients. for k = 1 : T do server _ secure _ aggregate ( { g kj } ,j ∈ S ⊆ { , . . . , M } ) . end for Following moments accountant [1] work, the concrete stepsare: • Step1: Random sub-sampling.
For N views in eachclient, a random subset B ( | B | ≤ N ) is sampled in eachFL round. • Step2: Gradient clipping.
Clip each gradient in (cid:96) norm, i.e., the gradient ( g k I ) i is replaced by ( g k I ) i / max(1 , (cid:107) ( g k I ) i (cid:107) C ) , for a clipping threshold C . • Step3: Distorting.
A Gaussian Mechanism is used todistort the sum of all updates. Then, we have (cid:103) ( g k I ) = 1 | B | ( (cid:88) i ∈ B ( g k I ) i + N (0 , σ C )) , (7) Algorithm 2
FL-MV-DSSM
Prediction Algorithm FL Client:
Input:
Number of views N . Number of items M . User sub-models: { W U i } , i ∈ { , . . . , N } , where W U i is usersub-model for view i . Item sub-model: W I . User feature: x U i for i th view. Item features: x I j for j th item. Output:
List of top- K items for recommendation. for each item j = 1 : M do Compute { y I j } , where y I j = f ( W I , x I j ) according toEq.(3). end for for each item j = 1 : M do for each view i = 1 : N do Compute y U i = f ( W U i , x U i ) according to Eq.(3). P ( x I j | x U i ) = exp( γ cos( y U i , y I j )) (cid:80) y (cid:48) I ∈{ yI j } exp( γ cos( y U i , y (cid:48) I )) . end for P ( x I j | x U ) = local _ secure _ aggregate ( { P ( x I j | x U i ) } , i ∈ { , . . . , N } ) . end for Select top- K items by sorting { P ( x I j | x U ) } , where j ∈{ , . . . , M } .where the value of σ satisfies the Theorem 1 in [1]. Fromthe Eq. 7, the average distortion is governed by the valueof σ and C , and from the Theorem 1 in [1], we canknow that the value of σ is inversely proportional to thevalue of privacy budget (cid:15) . For example, with | B | = N , σ = 4 , δ = 10 − , and T = 100 , we have (cid:15) ≈ . usingthe moments accountant theory [1]. We can reduce thenoise scale by reducing the value of | B | and trainingrounds T while ensuring our model performance.
2) remote_secure_aggregate():
It is used in
FL-MV-DSSM ’straining algorithm to securely aggregate N ( g k U i ) i and g k I ) on each FL client. The secure aggregation is well studied intraditional FL [5], thus we follow the methods and proof ofthe proposed “Secure Aggregation Protocol” to realize the remote_secure_aggregate() in FL-MV-DSSM . In this way, wecan ensure that FL server will only see aggregated result withoutknowing each FL client’s update. In addition, the “SecureAggregation Protocol” protocol can handle client drop-outproblem well.
D. FL-MV-DSSM VariationsFL-MV-DSSM has several variations.
FL-DSSM . Based on the
FL-MV-DSSM algorithms in-troduced in previous sections, both
FL-DSSM training andprediction algorithms can be simply derived by setting thenumber of views N to . SEMI-FL-MV-DSSM . We can get
SEMI-FL-MV-DSSM , bysetting the “aggregate_user_sub-model” flag in Algorithm1 to be f alse . SEMI-FL-MV-DSSM only conducts secureaggregation on gradients of item sub-model, rather thanaggregating gradients of user sub-model.ABLE I: Structures of item sub-model and user sub-modelsof two views.
Item User (View-1) User (View-2)Input Input (4739) Input (23) Input (30)Layer1 Dense (64) Dense (64) Dense (64)Layer2 Dense (32) Dense (32) Dense (32)Layer3 Dense (16) Dense (16) Dense (16)
We will evaluate more on these
FL-MV-DSSM variations inSection V. V. E
XPERIMENTS
This section evaluates
FL-MV-DSSM . We have provedin Section IV-C that
FL-MV-DSSM is able to protect dataprivacy among different views. Now we mainly address thefollowing questions: ( Q1 ) Is FL-MV-DSSM able to addresscold-start problem? ( Q2 ) How is recommendation performanceof FL-MV-DSSM , and its variations?
A. Environmental Setup
We implement
FL-MV-DSSM and its variations in Google’sTensorFlow Federated (TFF) simulation framework [16]. For
FL-MV-DSSM , we take two user views as an example. Table Ishows the DSSM model architectures we use for evaluations.
Data Pre-Processing.
Similar to existing FedRec algorithms,we use the popular public dataset, MovieLens-100K [14], forour evaluations. The MovieLens-100K not only consists of100K ratings from 943 users on 1682 items (movies), but alsocontains user and item information, e.g. user’s age and movie’stitle. For label pre-processing, we create implicit feedbacks as1 for all (cid:104) user, item (cid:105) pairs where a user explicitly interactedwith an item in the dataset, and 0 for the rest. For user featurespre-processing, we randomly sample a portion of MovieLensdata and select age (normalized to less than or equal to 1),gender (binary feature), and occupation (one-hot vector) asuser features for one view (
View-1 ); meanwhile we use userembedding learned by singular value decomposition (SVD)from MovieLens’ interaction matrix, orthogonal to the data ofView-1, as user features for another view (
View-2 ). For itemfeature pre-processing, we select title and genre, coded with3-gram representation and a series of bits, respectively.
Evaluation Metrics.
We consider the following evaluationmetrics, including Precision@10, Recall@10, NDCG@10,and AUC. Among the metrics, Precision@10, Recall@10and NDCG@10 only concentrate on the very top of recom-mendation list, while AUC evaluates the overall accuracy ofrecommendations.
Hyper-parameter setting.
Adam optimizer with learningrate 0.001 is used in centralized training and for server sideaggregation of FL setting. SGD optimizer with learning rate0.2 is used for client side training of FL setting. Batch size isset to 20. The dimension is set to 30 for matrix factorization.For DP parameters, C = 0 . , σ = 1 , δ = 0 . , | B | = N . B. Cold-Start Recommendations
To evaluate cold-start recommendations, we mainly focuson
FL-MV-DSSM , and conduct three cold-start scenarios: cold-start users (CS-Users), cold-start items (CS-Items), and cold-start users-items (CS-Users-Items). For the case of cold-startusers, a random subset of 10% users and their interactiondata are completely excluded during model training and modelparameters are learned with the remaining 90% of the usersand their interaction data. For the case of cold-start items, arandom subset of 10% items are left-out during model trainingand model parameters are learned with the remaining 90%of the items. For the case of cold-start users-items, a randomsubset of 10% users and items are excluded from the modeltraining and model parameters are learned with the rest ofusers, interaction data, and items. We use the 10% held-outdatasets in all three scenarios as our testing datasets.Table II shows the results of three cold-start recom-mendations. The results demonstrate that without loss ofgenerality,
FL-MV-DSSM can be used for cold-start recom-mendation reliably. Specifically, the results also show that
FL-MV-DSSM achieves good cold-start prediction performancefor a new user, which is valuable for privacy-preservingrecommendations since new users are continuously enrolledin the recommendation service. However, the performanceof cold-start items and users-items are lower than that ofcold-start users. The reason behind this might be that in ourdatasets, the difference between users (considering age, gender,occupation, etc.) is less than that of items (movie title, genres,etc.), and
FL-MV-DSSM could learn this difference correctlyand recommend with higher precision.
C. Recommendation Performance
To evaluate recommendation performance, we examine
FL-MV-DSSM , FL-DSSM , SEMI-FL-MV-DSSM , centralizedDSSM, and existing FedRec algorithms including FCF[2]and FED-MVMF[11]. Specifically, for
FL-MV-DSSM relatedevaluations, we use two generated user views, View-1 and View-2 as described in Section V-A, to train two user sub-models. For
FL-MV-DSSM and
SEMI-FL-MV-DSSM , we randomly select100 users within each FL training round, and for each user, thetwo sub-models share a single item sub-model, as introducedin Figure1c and Figure 1d, to form a
FL-MV-DSSM task. For
FL-DSSM , we launch two FL training tasks, each of whichrandomly selects 100 users within each FL training round,and each user sub-model is paired with a item sub-model. Forcentralized DSSM, the datasets for all users are centralized andtrained at the single place. For both FCF and FED-MVMF, wefollow the experimental setups in their papers. All the datasets,centralized or decentralized, are randomly divided into an 80%training set and 20% testing set.Figure 2 shows the recommendation performance, duringthe FL training process, precision and recall, of
FL-MV-DSSM , SEMI-FL-MV-DSSM , FL-DSSM , and centralized DSSM. Finalresults about
FL-MV-DSSM and its related works are listedin Table III. From the results we can see that among FedRecalgorithms,
FL-MV-DSSM achieves better performance thanABLE II: Cold-Start recommendation performance of
FL-MV-DSSM on MovieLens dataset with three scenarios. The valuesdenote the mean ± standard deviation across 3 different model builds. Precision@10 Recall@10 NDCG@10 AUCCS-Users . ± . . ± . . ± . . ± . CS-Items . ± . . ± . . ± . . ± . CS-Users-Items . ± . . ± . . ± . . ± . TABLE III: Recommendation performance of
FL-MV-DSSM , its variations, and existing FedRec algorithms on MovieLensdataset, after 100 FL training rounds. The values denote the mean ± standard deviation across 3 different model builds. Precision@10 Recall@10 NDCG@10 AUCCentralize-DSSM(View-1) . ± . . ± . . ± . . ± . Centralize-DSSM(View-2) . ± . . ± . . ± . . ± . FCF [2] . ± . . ± . . ± . . ± . FED-MVMF [11] . ± . . ± . . ± . . ± . FL-DSSM (View-1) . ± . . ± . . ± . . ± . FL-DSSM (View-2) . ± . . ± . . ± . . ± . FL-MV-DSSM . ± . . ± . . ± . . ± . SEMI-FL-MV-DSSM . ± . . ± . . ± . . ± . FL-DSSM , since
FL-MV-DSSM can incorporate more userfeatures from multiple views, e.g. from multiple user Apps,to jointly train a better model. One interesting result is thatwe find
SEMI-FL-MV-DSSM , which is only aggregating theshared item sub-model but not the user sub-models, achievesthe best performance among FedRec algorithms including FCFand FED-MVMF, and its result is even better than one of thecentralized DSSM’s result after 60 FL training rounds. This isunderstandable in that for all FedRec algorithms, their perfor-mance data are collected through “federated evaluation [6]”, andthe performance of user sub-model will fit to user local data fastif not aggregating the contributions from other FL participants.We also observe some variations of
SEMI-FL-MV-DSSM ’sperformance, caused by randomness in user selection.VI. C
ONCLUSIONS
This paper presents
FL-MV-DSSM , the first generic content-based federated multi-view framework that could addresscold-start problem and recommendation quality at the sametime. In addition, this paper extends the traditional federatedsetting into a new federated multi-view setting, which mightpotentially enable new usage models of FL in recommendationscenario and bring in new security challenges. By carefullystudying the challenges, this paper presents a novel solutionaddressing the security requirements. Thorough empiricalevaluations on
FL-MV-DSSM and its variations with publicdatasets demonstrate that
FL-MV-DSSM can address cold-startproblem, and boost recommendation performance significantly.R
EFERENCES [1] Martin Abadi, Andy Chu, Ian Goodfellow, H BrendanMcMahan, Ilya Mironov, Kunal Talwar, and Li Zhang.Deep learning with differential privacy. In
Proceedings of the 2016 ACM SIGSAC Conference on Computer andCommunications Security , pages 308–318, 2016.[2] Muhammad Ammad-ud-din, Elena Ivannikova,Suleiman A. Khan, Were Oyomno, Qiang Fu,Kuan Eeik Tan, and Adrian Flanagan. Federatedcollaborative filtering for privacy-preserving personalizedrecommendation system.
CoRR , abs/1901.09888, 2019.[3] Robert M. Bell and Yehuda Koren. Improvedneighborhood-based collaborative filtering. 2007.[4] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp,Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe MKiddon, Jakub Konecny, Stefano Mazzocchi, BrendanMcMahan, Timon Van Overveldt, David Petrou, DanielRamage, and Jason Roselander. Towards federatedlearning at scale: System design. In
SysML 2019 , 2019.To appear.[5] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, AntonioMarcedone, H. Brendan McMahan, Sarvar Patel, DanielRamage, Aaron Segal, and Karn Seth. Practical secureaggregation for privacy-preserving machine learning. In
Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security , CCS ’17, page1175–1191, New York, NY, USA, 2017. Association forComputing Machinery.[6] Sebastian Caldas, Peter Wu, Tian Li, Jakub Konecný,H. Brendan McMahan, Virginia Smith, and Ameet Tal-walkar. LEAF: A benchmark for federated settings.
CoRR ,abs/1812.01097, 2018.[7] Di Chai, Leye Wang, Kai Chen, and Qiang Yang. Securefederated matrix factorization.
CoRR , abs/1906.05108,2019.[8] Cynthia Dwork, Frank McSherry, Kobbi Nissim, andAdam Smith. Calibrating noise to sensitivity in private a) Precision (b) Recall
Fig. 2: Recommendation Performance of
FL-MV-DSSM and its variations.data analysis. In
Theory of Cryptography Conference ,pages 265–284. Springer, 2006.[9] Cynthia Dwork, Aaron Roth, et al. The algorithmicfoundations of differential privacy.
Foundations andTrends® in Theoretical Computer Science , 9(3–4):211–407, 2014.[10] Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. Amulti-view deep learning approach for cross domain usermodeling in recommendation systems. In
Proceedingsof the 24th International Conference on World WideWeb , WWW ’15, page 278–288, Republic and Cantonof Geneva, CHE, 2015. International World Wide WebConferences Steering Committee.[11] Adrian Flanagan, Were Oyomno, Alexander Grigorievskiy,Kuan Eeik Tan, Suleiman A Khan, and MuhammadAmmad-Ud-Din. Federated multi-view matrix factoriza-tion for personalized recommendations. arXiv preprintarXiv:2004.04256 , 2020.[12] O. Goldreich, S. Micali, and A. Wigderson. How to playany mental game. In
Proceedings of the Nineteenth AnnualACM Symposium on Theory of Computing , STOC ’87,page 218–229, New York, NY, USA, 1987. Associationfor Computing Machinery.[13] Stephen Hardy, Wilko Henecka, Hamish Ivey-Law,Richard Nock, Giorgio Patrini, Guillaume Smith, andBrian Thorne. Private federated learning on verticallypartitioned data via entity resolution and additivelyhomomorphic encryption.
CoRR , abs/1711.10677, 2017.[14] F Maxwell Harper and Joseph A Konstan. The movielensdatasets: History and context.
Acm transactions oninteractive intelligent systems (tiis) , 5(4):1–19, 2015.[15] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng,Alex Acero, and Larry Heck. Learning deep structuredsemantic models for web search using clickthrough data.In
Proceedings of the 22nd ACM International Conference on Information & Knowledge Management
IEEEInternet Computing , 7(1):76–80, 2003.[18] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. Per-sonalized news recommendation based on click behavior.In
Proceedings of the 15th International Conferenceon Intelligent User Interfaces , IUI ’10, page 31–40,New York, NY, USA, 2010. Association for ComputingMachinery.[19] Brendan McMahan, Eider Moore, Daniel Ramage, SethHampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralizeddata. In Aarti Singh and Xiaojin (Jerry) Zhu, editors,
Proceedings of the 20th International Conference onArtificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA , volume 54of
Proceedings of Machine Learning Research , pages1273–1282. PMLR, 2017.[20] Sandro Pinto and Nuno Santos. Demystifying armtrustzone: A comprehensive survey.
ACM Comput. Surv. ,51(6), January 2019.[21] Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang,and Xing Xie. Fedrec: Privacy-preserving news rec-ommendation with federated learning. arXiv preprintarXiv:2003.09592 , 2020.[22] R L Rivest, L Adleman, and M L Dertouzos. On databanks and privacy homomorphisms.
Foundations ofSecure Computation, Academia Press , pages 169–179,1978.[23] Badrul Sarwar, George Karypis, Joseph Konstan, and Johniedl. Item-based collaborative filtering recommendationalgorithms. In
Proceedings of the 10th InternationalConference on World Wide Web , WWW ’01, page285–295, New York, NY, USA, 2001. Association forComputing Machinery.[24] Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar,and David M. Pennock. Methods and metrics for cold-start recommendations. In
Proceedings of the 25th AnnualInternational ACM SIGIR Conference on Research andDevelopment in Information Retrieval , SIGIR ’02, page253–260, New York, NY, USA, 2002. Association forComputing Machinery.[25] Paul Voigt and Axel von dem Bussche.
The EU GeneralData Protection Regulation (GDPR): A Practical Guide .Springer Publishing Company, Incorporated, 1st edition,2017.[26] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong.Federated machine learning: Concept and applications.
ACM Trans. Intell. Syst. Technol. , 10(2), January 2019.[27] Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakagefrom gradients. In H. Wallach, H. Larochelle, A. Beygelz-imer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,