[PDF] Collaborative Adversarial Learning for RelationalLearning on Multiple Bipartite Graphs

Abstract

Relational learning aims to make relation inference by exploiting the correlations among different types of entities. Exploring relational learning on multiple bipartite graphs has been receiving attention because of its popular applications such as recommendations. How to make efficient relation inference with few observed links is the main problem on multiple bipartite graphs. Most existing approaches attempt to solve the sparsity problem via learning shared representations to integrate knowledge from multi-source data for shared entities. However, they merely model the correlations from one aspect (e.g. distribution, representation), and cannot impose sufficient constraints on different relations of the shared entities. One effective way of modeling the multi-domain data is to learn the joint distribution of the shared entities across this http URL this paper, we propose Collaborative Adversarial Learning (CAL) that explicitly models the joint distribution of the shared entities across multiple bipartite graphs. The objective of CAL is formulated from a variational lower bound that maximizes the joint log-likelihoods of the observations. In particular, CAL consists of distribution-level and feature-level alignments for knowledge from multiple bipartite graphs. The two-level alignment acts as two different constraints on different relations of the shared entities and facilitates better knowledge transfer for relational learning on multiple bipartite graphs. Extensive experiments on two real-world datasets have shown that the proposed model outperforms the existing methods.

Full PDF

CCollaborative Adversarial Learning for RelationalLearning on Multiple Bipartite Graphs

Jingchao Su , Xu Chen , Ya Zhang *1 , Siheng Chen , Dan Lv , and Chenyang Li Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China Mitsubishi Electric Research Laboratories, Cambridge, MA, USA StataCorp LLC, College Station, TX, USA { sujingchao, xuchen2016, ya zhang } @sjtu.edu.cn, [email protected], [email protected], [email protected] Abstract —Relational learning aims to make relation inferenceby exploiting the correlations among different types of entities.Exploring relational learning on multiple bipartite graphs hasbeen receiving attention because of its popular applications suchas recommendations. How to make efﬁcient relation inferencewith few observed links is the main problem on multiple bipartitegraphs. Most existing approaches attempt to solve the sparsityproblem via learning shared representations to integrate knowl-edge from multi-source data for shared entities. However, theymerely model the correlations from one aspect (e.g. distribution,representation), and cannot impose sufﬁcient constraints ondifferent relations of the shared entities. One effective way ofmodeling the multi-domain data is to learn the joint distributionof the shared entities across domains. In this paper, we proposeCollaborative Adversarial Learning (CAL) that explicitly modelsthe joint distribution of the shared entities across multiplebipartite graphs. The objective of CAL is formulated from avariational lower bound that maximizes the joint log-likelihoodsof the observations. In particular, CAL consists of distribution-level and feature-level alignments for knowledge from multiplebipartite graphs. The two-level alignment acts as two differentconstraints on different relations of the shared entities andfacilitates better knowledge transfer for relational learning onmultiple bipartite graphs. Extensive experiments on two real-world datasets have shown that the proposed model outperformsthe existing methods.

Index Terms —relational learning, bipartite graph, joint distri-bution matching, cross-domain recommendation

I. I

NTRODUCTION

In the age of abundant yet fragmented information on theInternet, it is challenging to explore the relationship betweenentities from heterogeneous data sources. As relational learn-ing is an effective technique to learn representations of multi-source data [1], it is widely applied in various data miningtasks such as recommender systems [2], association prediction[3]. In this work, we focus on the scenario with multiplebipartite graphs, where one type of entities is shared betweendomains while other types have relations with the shared one.The goal is to conduct link prediction between the sharedentities and other entities. A typical example is cross-domainrecommendations as Figure 1 shows, where the domain-sharedentities are users and domain-speciﬁc entities are music, booksand movies. Since a user’s limited interaction with items ina certain domain may lead to unsatisfactory recommendation,leveraging his/her information in other domains via relationallearning can boost the recommendation performance.

Fig. 1. An example of multiple bipartite graphs in recommendation, whereuser is shared entity type between multiple bipartite graphs.

Relational learning on multiple bipartite graphs can begenerally regarded as correlation exploration and knowledgetransfer of the shared entities between multiple domains. Theexisting methods of integrating the knowledge from differentdomains mainly fall into the following two categories. 1) cross-domain representation alignment , which aims at cap-turing the relationship across domains through representa-tion alignment [4]–[7]. 2) cross-domain representation fusion ,which fuses the features from different domains into a singlerepresentation [2], [8]–[10].In general, the mentioned methods attempt to learn jointrepresentations that are shared in latent space across domains.Despite the promising performance of previous methods, theymerely enhance cross-domain correlations from one aspect(e.g. distribution, representation), which does not impose suf-ﬁcient constraints on different relations of the shared entities.For instance, DARec [4] employs the domain adaptationtechnique [11] to learn domain-invariant representation viadistribution matching. However, distribution matching empha-sizes the consistent between the whole, which is inadequate forindividual prediction tasks. CMF [2] factors relation matricesin all domains and directly shares the representations ofshared entities. However, capturing knowledge from differentdomains with only one representation, the predictive capacityof a single domain may decline, since irrelevant informationmay be introduced from other domains. Hence there remainsa challenge of how to capture the cross-domain correlationin a holistic way that can effectively improve the predictionperformance in each single domain.Recent work on knowledge transfer [12], [13] has shown a r X i v : . [ c s . I R ] J u l hat modeling the joint distribution of entities from differentdomains facilitates better knowledge transfer because the jointdistribution essentially captures the relevance of entities fromdifferent domains. To this end, we propose Collaborative Ad-versarial Learning (CAL) that explicitly aims this goal. Sincethe joint distribution is intractable due to its complexity, a vari-ational lower bound is formulated as the optimizing objectivefor training. In the objective, the latent representations arealigned by matching their parametrized distributions and byenforcing a cross reconstruction. Thus we propose a two-levelalignment criterion involving distribution-level and feature-level alignment in agreement with the objective. A CAL modelcomposed of two components—adversarial distribution match-ing and shared representation matching is built in accordancewith the criterion. Compared with the previous methods, thetwo-level alignment in our model ensures the integrity of jointdistribution matching and a sufﬁcient bidirectional knowledgetransfer across domains.Our major contributions can be summarized as follows: • To explore relational learning on multiple bipartitegraphs, we explicitly model the joint distribution of theshared entities. The learning objective is formulated fromthe variational lower bound that maximizes the joint log-likelihoods. • To optimize the variational learning objective, we proposea two-level alignment criterion involving distribution-level and feature-level alignment, and build a modelaccording to the criterion. • Experiments are conducted on two real-world datasets toverify the superiority of our model. Further experimentsshow that our model performs better with three domainsthan two domains, demonstrating its extensibility formulti-domain tasks.II. R

ELATED W ORK

As we regard the relational learning on multiple bipartitegraphs as representation learning of the shared entities indifferent domains, we pay attention to the work of multi-domain representation learning, which is broadly applied invarious tasks, e.g. recommender systems, computer visions,cross-media retrieval. Considering the ways of modeling thecorrelations across domains, the methods can be mainly di-vided into cross-domain representation alignment and cross-domain representation fusion [14].

A. Cross-domain representation alignment

Cross-domain representation alignment transforms the orig-inal data from different domains into a shared latent space withcertain constraints [4]–[7], [15].

Correlation-based alignment aims at maximizing the correlations of representations betweendomains via CCA [6]. Further, in order to capture deepnon-linear associations between different domains, deep CCAis proposed by using multiple stacked layers of nonlinearmappings [15].

Similarity-based alignment learns a scoring ormapping function to measure the relevance of paired entitiesacross domains [5], [16], [17]. For instance, DeVise [5] employs dot-product similarity and hinge rank loss, aimingat producing a higher score for the paired input entitiesthan the unpaired ones.

Distribution-based alignment matchesthe distributions of the representations of shared entities indifferent domains. A domain classiﬁer is usually employed asa discriminator to distinguish the input from different domainswhile GRL [18] or GAN [19] technique is used to generaterepresentations that deceive the discriminator [4], [20].

B. Cross-domain representation fusion

Cross-domain representation fusion fuses the features fromdifferent domains into a single representation.

Non-deep-learning-based fusion learns a probabilistic model over thejoint space of the shared latent representations. A typicalmodel is Collective Matrix Factorization (CMF) [2], whichfactors rating matrices in all domains and directly shares therepresentations of shared entities. As deep neural networkshave been widely applied recently, deep-learning-based fusion facilitates stronger inter-domain connections and has shownits superiority for knowledge integration. [3], [8]–[10], [21].Taking multimodal deep learning [8] for example, it leveragesa bimodal deep autoencoder to exploit the concatenated repre-sentations from different modality as a shared representation,from which reconstructs both modalities.

Fig. 2. A graphic model of CAL. x and y are variables indicating a sharedentity’s relation in two domains. z is the shared latent representation of theentity that bridges the gap between different domains. III. T

HE PROPOSED MODEL

In this section, we ﬁrst describe the relational learningproblem on multiple bipartite graphs. Then we propose anoptimizing objective and formulate a variational lower boundof the objective for training. Finally we introduce the modelframework in accordance with the variational objective, fol-lowing by the details of each component.

A. Problem Description

Consider that there are n+1 types of entities with n bipartitegraphs in n domains. |I U | entities of type U are shared in eachbipartite graph. Take a two-domain scenario as an example,there are |I X | entities in domain X and |I Y | entities in domain Y . The bipartite graph of domain X deﬁnes the relationbetween entities in U and X . It is represented as matrix X = (cid:8) x , x , ..., x |I U | (cid:9) , where x u = (cid:8) x u, , x u, , ..., x u, |I X | (cid:9) T isa vector that describes the relationship between the sharedentity u and all the entities in X , with x u,i = 1 indicating ig. 3. The overall architecture of CAL. Discriminators D X and D Y play minimax game with generators G X , G Y and decoders F X , F Y to carry out thedistribution-level alignment. The stream in blue with G X , F X and F Y and the stream in orange with G Y , F Y and F X illustrate the feature-level matching.Thus the two-level alignment ensures joint distribution matching. that u has an observed relationship with entity x i , and x u,i = 0 indicating not. Similarly, we use the correspondingindications in domain Y . Our goal is to exploit the relationamong the domains X and Y and conduct link predictionby completing matrices ˆX = (cid:8) ˆ x , ˆ x , ..., ˆ x |I U | (cid:9) and ˆY = (cid:8) ˆ y , ˆ y , ..., ˆ y |I U | (cid:9) . B. The Variational Learning Objective1) Maximum Joint Likelihood:

To capture the correlationsacross domains, it is natural to learn to maximize log-likelihood of the joint distribution. As Figure 2 illustrates, x and y denote random variables that represent the relationbetween the shared entity u and the entities in two domains. Ashared latent representation z of entity u is assumed to bridgethe gap of the two domains. The inference and generation netsare presented by conditional distributions q φ ( z | x ) , q φ ( z | y ) and p θ ( x | z ) , p θ ( y | z ) , where θ , φ denote their parameters. q φ ( x , y ) is any joint distribution of random variables ( x , y )parametrized by φ . Given the above description, our goal is toﬁnd the parameters so as to maximize the joint log-likelihood log p θ ( x , y ) , which can be written as follows: E q φ ( x , y ) [log p θ ( x , y )] = 12 E q φ ( x , y ) [log p θ ( x | y )+ log p θ ( y ) + log p θ ( y | x ) + log p θ ( x )] (1)Hence the objective can be decomposed into matching the dis-tribution of marginal likelihoods and conditional likelihoods.

2) The Variational Lower Bound:

For the marginal andconditional distribution matching, the computation requiresthe marginalization of latent variable z . However, since theinference of z is intractable, we resort to the minimization oftheir variational lower bound. Theorem 1.

A variational lower bound of Equation (1) is L = − KL ( q φ ( z | y ) || p θ ( z )) − KL ( q φ ( z | x ) || p θ ( z )) (2) + E q φ ( z | x ) [log p θ ( x | z ) + log p θ ( y | z )] (3) + E q φ ( z | y ) [log p θ ( y | z ) + log p θ ( x | z )] , (4) where KL () indicate KullbackLeibler divergence. The ﬁrst two terms in (2) ensure that the distribution ofthe encoding latent representation should be closed to a priordistribution p ( z ) , and the third term in (3) and the fourth termin (4) indicate that the latent representation encoded from eachdomain should be decoded to reconstruct the input space inboth domains. We now sketch the proof. Proof.

For the conditional distribution matching, the compu-tation requires the marginalization of latent variable z, whichis formulated as p ( x | y ) = (cid:82) p ( x , z | y ) d z . However, since theinference of latent variable z is intractable, we resort to thevariational approach, which is achieved by approximating thetrue posterior distribution p θ ( z | x , y ) by a tractable distribution q φ ( z | x , y ) . In addition, we assume q φ ( z | x , y ) = q φ ( z | x ) = q φ ( z | y ) , which indicates that the latent representation z isindependent of one domain given another domain.The optimization of the conditional log-likelihood p θ ( x | y ) can be formulated as E q φ ( z | y ) log p θ ( x | y ) = E q φ ( z | y ) log p θ ( x , z | y ) p θ ( z | x , y )= KL ( q φ ( z | y ) || p θ ( z | x , y )) + E q φ ( z | y ) log p θ ( x , z | y ) q φ ( z | y ) (5) > = E q φ ( z | y ) log p θ ( x , z | y ) q φ ( z | y ) (6)here KL denotes the KL-divergence and is always non-negative. Equation (6) is the variational lower bound of theconditional log-likelihood, which acts as a surrogate objectivefunction. It can be rewritten as L con = − KL ( q φ ( z | y ) || p θ ( z | y )) + E q φ ( z | y ) log p θ ( x | z , y ) (7)The prior of the latent variables z can be relaxed to makethe latent variables statistically independent of input variables[22], denoting as p ( z | y ) = p ( z ) . Besides, p ( x | z , y ) = p ( x | z ) as Figure 1 illustrates. The ﬁnal objective is L con = − KL ( q φ ( z | y ) || p θ ( z )) + E q φ ( z | y ) log p θ ( x | z ) (8) p ( y | x ) is optimized in the similar way.For the marginal likelihoods, considering the constraint onthe latent variable z which is consistent with the conditionalmatching, the optimization is the same as the conventionalvariational autoencoder (VAE) [22]. Its variational lowerbound is written as E q φ ( z | y ) log p θ ( y )= KL ( q φ ( z | y ) || p θ ( z | y )) + E q φ ( z | y ) log p θ ( y , z ) q φ ( z | y ) (9) ≥ − KL ( q φ ( z | y ) || p θ ( z )) + E q φ ( z | y ) p θ ( y | z ) = L mar (10)Incorporating the conditional and marginal distribution match-ing, the ﬁnal variational objective is formulated as Theorem 1presents. C. Collaborative Adversarial Learning

Guided by the the variational lower bound in Theorem 1,we build Collaborative Adversarial Learning framework; seethe the overall framework in Figure 3. This framework can begenerally applied to multiple domains. Here we show a modelwith two domains for simplicity. To optimize the variationalobjective, a two-level alignment criterion is proposed: • Distribution-level alignment: The whole distributions ofshared entities’ latent representations, p ( z x ) and p ( z y ) ,should match a ﬁxed prior distribution, corresponding toEquation (2); • Feature-level alignment: The latent representations of theshared entities z xu and z yu should share the same latentspace, corresponding to Equations (3) and (4).Under these conditions, our model mainly consists of twocomponents: adversarial distribution matching and shared rep-resentation matching.

1) Adversarial Distribution Matching:

According to thedistribution-level alignment, the distributions of shared enti-ties’ latent representations p ( z x ) and p ( z y ) should be alignedto a ﬁxed prior distribution p ( z ) . An intuitive example inthe application of cross-domain recommendation is that ifstatistics shows that users prefer pop culture to classics whenchoosing a movie, a similar tendency will be reﬂected intheir choice of books or musics. An important approach todistribution matching is adversarial learning [19], [23], whichemploys a generator and a discriminator to play minimaxgame, until the distribution of latent representations from different domains is indistinguishable. It encouranges theaggregated posterior p ( z x ) = (cid:82) p ( z x | x ) p ( x ) d x aligned to p ( z ) . Besides, The mode collapses problems is avoided in ourmodel since the reconstruction operation ensures that the latentrepresentation can reconstruct their own input space.To be concise, we take the domain X as an example.The discriminator D X is a domain classiﬁer that predictsthe domain label c ∈ { , } of the input vectors z xu , where1 denotes that the vector is drawn from a prior distributionand 0 denotes that the vector belongs to the encoded latentrepresentations in X . Meanwhile, the generator G X along withdecoder F X are trying to generate latent representation z xu thatcannot be distinguished by the discriminator D X . Therefore,the adversarial training procedure is to minimize domainclassiﬁer loss L adv by optimizing θ D , and to maximize thesame loss by optimizing θ G , θ F . θ G , θ F , θ F are the parametersof the generators, decoders and discriminators respectively.The minimax objective is as follows. min θ G ,θ F max θ D L adv ( θ G , θ F , θ D ) = |U| (cid:88) u =1 [log D X ( z ) + log D Y ( z )+ log(1 − D X ( z xu )) + log(1 − D Y ( z yu ))] (11)where z is randomly drawn from the prior distribution.

2) Shared representation Matching:

Under the condition ofthe feature-level alignment, the latent representation z xu / z yu ofa shared entity u encoded from each domain is supposed toreconstruct the relation vectors x u and y u in both domains.The condition indicates that given a shared entity’s relationin a certain domain, its potential relation in both the currentdomain and the other domains can be inferenced. Accordingly,we divide the reconstruction into two parts: self reconstructionand cross reconstruction.Inspired by Collaborative Denoising AutoEncoder (CDAE)[24], we corrupt ˜ x u and ˜ y u by dropping out the non-zerovalues of x u and y u independently with probability q. Wethen feed ˜ x u into the encoder G X . The shared entity’s latentrepresentation z xu is learned by adding a shared-entity embed-ding v u to the output of the encoder G X . And z yu is obtainedaccordingly. G X , G Y are multi-layer perceptrons. The processis formulated as z xu = G X (˜ x u ) + v u , z yu = G Y (˜ y u ) + v u . (12)The latent representation z xu and z yu are then decoded toreconstruct its relation vectors from both the original domainand the other domain. F X , F Y are multi-layer perceptrons.The reconstruction loss can be formulated as: L rec = L xrec + L yrec + L xycrs + L yxcrs = |U| (cid:88) i =1 [ l bce ( x i , F X ( z xi )) + l bce ( y i , F Y ( z yi ))+ l bce ( y i , F Y ( z xi )) + l bce ( x i , F X ( z yi ))] (13) ABLE ID

ATASETS AND S TATISTICS . Dataset Users Domain Items Interactions DensityDouban 16930 Movie 24323 2505980 0.609%Book 40002 804818 0.119%Amazon 35530 Movie 27797 724189 0.073%Book 49697 733711 0.042% where l bce is the binary cross entropy: l bce ( a u , ˆ a u ) = (cid:80) Ni =1 [ − a u,i log ˆ a u,i − (1 − a u,i ) log(1 − ˆ a u,i )] (14)

3) Full Objective:

With all the components, the full objec-tive of our model is: L ( θ G , θ F , θ ) = L adv + L rec (15)We aim to solve: ( θ ∗ G , θ ∗ F ) = arg min θ G ,θ F max ,θ D L ( θ G , θ F , θ D ) (16)where θ ∗ G and θ ∗ F are optimal parameters of the generators anddecoders. D. Implementation Details

An illustration of the implemented model is shown in Figure3, where u is a shared entity between domain X and Y .The relation vectors of u, x u and y u are fed into the thegenerators G X and G Y respectively. The latent representation z x / z y is obtained by adding the output from G X / G Y andthe shared-entity embedding from G U . z x / z y and a randomlysampled z from a prior distribution are further fed into thediscriminators D X / D Y . The discriminators try to distinguishthe latent representations z x / z y from z , while the generatorstry to generate good representations similar with z to deceivethe discriminators. Besides, the latent representation from eachsingle domain is fed into both decoders F X and F Y , recon-structing x u and y u simultaneously. Note that the architectureof generators, decoders and discriminators are implemented bythe two-layer perceptrons, and G U is a learnable embeddinglookup table.To train our model, we optimize Equation (16) by alter-natively updating the parameters of generators & decodersand discriminators in mini-batch. Stochastic gradient descent(SGD) is adopted to optimize discriminators while ADAM isused to optimize generators. In the testing stage, the relationin domain X is predicted by computing ˆ x u = F ∗ X ( G ∗ X ( x u )) .And ˆ y u is obtained accordingly.IV. E XPERIMENTS

In this section, we evaluate CAL model on the task ofcross-domain recommendation. The users can be viewed asthe shared entities and different types of items can be viewedas entities in different domains. We validate the superiority ofour model by comparing it with several state-of-art methods.Sparsity experiments and embedding size experiments areconducted to prove the robustness of our model in handling various situations. Further experiments on multiple domainsare also carried out to verify the generality of our model inmulti-domain tasks.

A. Dataset • Douban is a Chinese social networking service websitewhich contains users’ rating records of movies, books,etc. We select users that rate both movies and books andconvert the rated items to positive samples for each user. • Amazon is an E-commerce platform selling multiplecategories of product. We choose the categories of Moviesand Books since they contain rich rating data among allcategories. For multi-domain experiments, the users whohave records in all the categories of Movies , Books,Music are chosen. The positive samples are obtained inthe same way as Douban dataset.

B. Baselines

We test our method with the following baselines of bothsingle-domain recommendation and cross-domain recommen-dation.

1) Methods for single-domain recommendation: • PMF [25] is a widely-used model for matrix completion.The missing relations are predicted via the inner productof the interacted entities’ representations. • MLP [26] is a basic deep learning approach that modelsthe relation of interacted entities via feeding the con-catenation of their representations into multilayer neuralnetworks. • CDAE [27] is an denoising extension of the classicalAuto-Encoder and is designed for top-N recommendation. • AAE [23] uses GAN to match the aggregated posteriorof the latent vector with a prior distribution. We modifyAAE for our top-N recommendation task by replacing itsAuto-Encoder with CDAE.

2) Methods for cross-domain recommendation: • CMF [2] simultaneously factors matrices in differentdomains and shares representations among shared entitiesfor handling information from multiple relations. • Conet [10] is an approach based on MLP. It enables dualknowledge transfer across domains via a cross switchnetwork between the adjacent layers in different domains. • DARec [4] is the state-of-the-art method. It leveragesdomain adversarial neural networks [18] to extract theshared representations from two domains. We replace itsencoder with CDAE to apply for the top-N recommen-dation scenario. • AAE+ takes the union of entities in all domains and runsa AAE model. • AAE++ combines two AAE models [23] and shares theirprior distribution. It is a degraded model of CAL withoutcross reconstruction loss.

C. Experimental Settings1) Evaluation Protocol:

For top-N recommendation task,we adopt the leave-one-out evaluation, which is widely used

ABLE IIC

OMPARASION WITH BASELINES ON D OUBAN AND A MAZON . T

HE RANKED LIST IS CUT OFF AT TOP

N=10.

Dataset Domain Metric PMF MLP CDAE AAE CMF AAE+ AAE++ Conet DARec CALDouban Movies hr 0.8711 0.8746 0.8955 0.8961 0.8963 0.8853 0.9005 0.8944 0.9003 ndcg 0.6281 0.6286 0.7038 0.6919 0.6699 0.6683 0.7037 0.6725 0.7004 mrr 0.5509 0.5505 0.6426 0.6269 0.5675 0.5991 0.6408 0.6112 0.6365

Books hr 0.7579 0.7325 0.7908 0.7829 0.8017 0.7609 0.7906 0.7644 0.7877 ndcg 0.5573 0.5250 0.6199 0.6054 0.6034 0.5776 0.6223 0.5651 0.6130 mrr 0.4938 0.4597 0.5660 0.5469 0.5406 0.5197 0.5694 0.5234 0.5577

Amazon Movies hr 0.6068 0.5349 0.6349 0.6050 0.6127 0.6006 0.6331 0.5771 0.6411 ndcg 0.3792 0.3214 0.4222 0.3918 0.3850 0.3852 0.4193 0.3210 0.4202 mrr 0.3089 0.2558 0.3563 0.3259 0.3147 0.3187 0.3532 0.3520 0.3519

Books hr 0.4983 0.4103 0.5410 0.5191 0.5409 0.5298 0.5297 0.4668 0.5350 ndcg 0.3166 0.2419 0.3482 0.3310 0.3411 0.3332 0.3431 0.2720 0.3454 mrr 0.2606 0.1904 0.2889 0.2731 0.2795 0.2727 0.2856 0.2787 0.2871 (a) Douban Movie HR@10 (b) Douban Movie NDCG@10 (c) Douban Books HR@10 (d) Douban Books NDCG@10(e) Amazon Movie HR@10 (f) Amazon Movie NDCG@10 (g) Amazon Books HR@10 (h) Amazon Books NDCG@10Fig. 4. Sparsity analysis when the number of user-item interaction ranges from 20% to 100% on Douban and Amazon. in the literature of recommendation [26], [28]. For each user,one interacted item is reserved as a positive valid sample and99 non-interacted items are randomly chosen as negative validsamples. We evaluate the performance by ranking the positivevalid sample among the 100 items and compute its hit ratio(HR), normalized discounted cumulative gain (NDCG) andmean reciprocal rank (MRR).

2) Experimental Implementation:

For PMF, MLP, CDAE,AAE, CMF and Conet, we use the code released by the author.AAE+ and AAE++ is implemented based on AAE. DARec iscoded referring to the paper. For PMF, MLP, CMF and Conetwhich require feeding user-item pairs, we utilize a negativesampling approach, in which 5 negative instances are sampledper positive instance.Our model is implemented with Pytorch. The optimizer fordiscriminators is SGD and the optimizer for the rest of thenetworks is ADAM. The parameters are updated in mini-batchand the learning rate is 0.001. The prior distribution here is setas Standard Gaussian N (0 , , which is a common setting in recent adversarial-based method [19], [23]. For CDAE, AAE,AAE+, AAE++ and CAL, the layers of encoders and decodersare 2, and the embedding size is 200. D. Overall Performance

Table II shows the results of different methods on the twodatasets under three ranking metrics, from which we have thefollowing ﬁndings: • The cross-domain methods outperform their correspond-ing single domain approaches that are similar in structure(e.g., PMF & CMF, MLP & Conet, AAE & AAE++).This suggests that relational learning on multiple bipartitegraphs facilitates knowledge transfer from relevant do-mains, which enriches the training data and thus improvesthe recommendation performance. • Comparing deep methods with shallow ones, we ﬁndthe shallow methods get relatively low NDCG and MRReven their HR do not make much difference in Douban.It indicates that shallow methods may be appropriate a) Amazon Movie HR@10 (b) Amazon Movie NDCG@10 (c) Amazon Books HR@10 (d) Amazon Books NDCG@10Fig. 5. Embedding size analysis when the embedding size of users’ latent representations varies from 100 to 500 on Amazon. for link prediction in dense datasets, but are not assatisfactory in sparse datasets or in ranking tasks. • AAE++ gets better performance compared to AAE+,while the main difference is that AAE++ utilizes onenetwork per domain and AAE+ uses one for all domains.This shows that the shared representations of differentdomains is hard to learn from only one network due tothe distinction between domains. CAL achieves superiorperformance than AAE++, while the difference is thatCAL employs an extra cross reconstruction loss. Thisdemonstrates the importance of the two-level alignmentof CAL for the adequate transfer of knowledge, as itallows for more holistic joint distribution matching. Thisalso accounts for our better performance than DARec. • CAL outperforms all the single-domain and cross-domainmethods in both datasets. In particular, CAL improvesover the state-of-the-art methods with a 0.28%, 1.98%,2.03%, 10.83% relative gain of HR@10 in DoubanMovie, Douban Book, Amazon Movie and Amazon Bookrespectively. Generally the gain is more obvious as thedata gets sparser. The superior performance of CALshows its strong ability of relation inference betweenbipartite graphs, especially in sparse datasets.

E. Sparsity Analysis

To investigate whether CAL can still perform better thanother methods under more sparse conditions, we conductsparsity experiments by gradually reducing the quantity of thetarget bipartite graph’s links. Speciﬁcally, we randomly draw80%, 60%, 40% and 20% of the links as training sets. Ourmethod is compared with the competitive methods for bothsingle-domain and cross-domain methods respectively, and theresults are shown in Figure 4.CAL consistently outperforms other methods on almostall sparsity conditions. In particular, all the methods do notmake much difference in dense Douban Movie dataset. Butas the dataset gets sparser and the number of user-item linksdecreases, our method increasingly outperforms baselines inmost of the situations. However, in the extreme sparse case20% in Amazon, the gap among these methods is not obvious,because the observed links in target bipartite graph are too fewto train a reliable model even the other bipartite graphs haverich links.

TABLE IIIS

TATISTICS FOR T HREE D OMAINS IN A MAZON . Dataset Users Domain Items Interactions DensityAmazon 11781 Movies 18411 359348 0.166%Books 20648 229559 0.094%Music 13708 168504 0.104%

TABLE IVT

HREE - DOMAINS R ECOMMENDATION IN A MAZON . Domain Metric Movie Book Music Three&Book &Music &Movie domainsMovies hr 0.5725 - 0.5791 ndcg 0.3566 - ndcg 0.3416 0.3356 - mrr 0.2792 0.2758 - 0.

Music hr - 0.5253 0.5422 ndcg - 0.3281 0.3420 mrr - 0.2675 0.2806

F. Embedding Size Analysis

The dimension of the latent representations is an importantfactor for representation learning. We thus investigate the per-formance of CDAE, AAE, AAE++, and CAL under differentusers embedding sizes on Amazon dataset. The results areshown in Figure 5.The results show the adaptive abilities of the tested methodsto the changes of the embedding size. The performance ofCDAE declines rapidly with the increase of the embeddingsize due to overﬁtting. AAE is stable but does not performwell. AAE++ suffer from underﬁtting when the embeddingsize is below 300 for movie domain and below 400 for bookdomain. CAL, however, is stable with preeminent performanceregardless of the changes, showing its capability of handlingthe latent representations with different embedding sizes.

G. Multi-Domain Analysis

To prove the generality of CAL on more bipartite graphs,we ﬁrst conduct experiments respectively on each two of thethree domains in Amazon dataset. The model is then trainedwith all three domains. Tables III and IV show the statisticsof the data and the result of the experiments respectively.n most of the cases, the introduction of the bipartitegraphs in extra domains can improve the performance. Thisindicates that although more domains may bring more noise,our model can effectively extract useful knowledge fromheterogeneous data, which demonstrating its extensibility onmultiple domains. However, for movie recommendation, theresult of music & movie slightly surpasses that of threedomains combination. This is intuitively reasonable becausewhen people watch a movie, they tend to listen to the soundtrack of this movie, leading to a closer connection betweenthese two bipartite graphs. But it should still be noticedthat an increasing number of bipartite graphs may lead tothe decline of the performance, for the reason that theirshared representations are inclined to capture only the sharedknowledge among domains.V. C

ONCLUSIONS AND FUTURE WORK

EFERENCES[1] D. Koller, N. Friedman, S. Dˇzeroski, C. Sutton, A. McCallum, A. Pfeffer,P. Abbeel, M.-F. Wong, D. Heckerman, C. Meek et al. , Introduction tostatistical relational learning . MIT press, 2007.[2] A. P. Singh and G. J. Gordon, “Relational learning via collective matrixfactorization,” in

Proceedings of the 14th ACM SIGKDD internationalconference on Knowledge discovery and data mining . ACM, 2008, pp.650–658.[3] R. Mariappan and V. Rajan, “Deep collective matrix factorization foraugmented multi-view learning,”

Machine Learning , vol. 108, no. 8-9,pp. 1395–1420, 2019.[4] F. Yuan, L. Yao, and B. Benatallah, “Darec: Deep domain adaptationfor cross-domain recommendation via transferring rating patterns,” arXivpreprint arXiv:1905.10760 , 2019.[5] A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato,and T. Mikolov, “Devise: A deep visual-semantic embedding model,”in

Advances in neural information processing systems , 2013, pp. 2121–2129.[6] H. Hotelling, “Relations between two sets of variates,” in

Breakthroughsin statistics . Springer, 1992, pp. 162–190.[7] F. Feng, X. Wang, and R. Li, “Cross-modal retrieval with correspondenceautoencoder,” in

Proceedings of the 22nd ACM international conferenceon Multimedia , 2014, pp. 7–16. [8] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng,“Multimodal deep learning,” 2011.[9] A. M. Elkahky, Y. Song, and X. He, “A multi-view deep learningapproach for cross domain user modeling in recommendation systems,”in

Proceedings of the 24th International Conference on World Wide Web .International World Wide Web Conferences Steering Committee, 2015,pp. 278–288.[10] G. Hu, Y. Zhang, and Q. Yang, “Conet: Collaborative cross networksfor cross-domain recommendation,” in

Proceedings of the 27th ACMInternational Conference on Information and Knowledge Management .ACM, 2018, pp. 667–676.[11] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio-lette, M. Marchand, and V. Lempitsky, “Domain-adversarial training ofneural networks,”

The Journal of Machine Learning Research , vol. 17,no. 1, pp. 2096–2030, 2016.[12] Z. Chen, Z. Yang, X. Wang, X. Liang, X. Yan, G. Li, and L. Lin,“Multivariate-information adversarial ensemble for scalable joint distri-bution matching,” in

International Conference on Machine Learning ,2019, pp. 1112–1121.[13] C. Du, C. Du, X. Xie, C. Zhang, and H. Wang, “Multi-view adversariallylearned inference for cross-domain joint distribution matching,” in

Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining , 2018, pp. 1348–1357.[14] Y. Li, M. Yang, and Z. Zhang, “A survey of multi-view representa-tion learning,”

IEEE transactions on knowledge and data engineering ,vol. 31, no. 10, pp. 1863–1883, 2018.[15] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonicalcorrelation analysis,” in

International conference on machine learning ,2013, pp. 1247–1255.[16] W. Fu, Z. Peng, S. Wang, Y. Xu, and J. Li, “Deeply fusing reviews andcontents for cold start users in cross-domain recommendation systems,”in

Proceedings of the AAAI Conference on Artiﬁcial Intelligence , vol. 33,2019, pp. 94–101.[17] F. Zhu, Y. Wang, C. Chen, G. Liu, M. Orgun, and J. Wu, “A deepframework for cross-domain and cross-system recommendations,” in

IJCAI International Joint Conference on Artiﬁcial Intelligence , 2018.[18] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation bybackpropagation,” arXiv preprint arXiv:1409.7495 , 2014.[19] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in

Advances in neural information processing systems , 2014, pp. 2672–2680.[20] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-inative domain adaptation,” in

Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , 2017, pp. 7167–7176.[21] J. Lian, F. Zhang, X. Xie, and G. Sun, “Cccfnet: a content-boostedcollaborative ﬁltering neural network for cross domain recommendersystems,” in

Proceedings of the 26th international conference on WorldWide Web companion . International World Wide Web ConferencesSteering Committee, 2017, pp. 817–818.[22] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXivpreprint arXiv:1312.6114 , 2013.[23] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adver-sarial autoencoders,” arXiv preprint arXiv:1511.05644 , 2015.[24] W. Yao, C. Dubois, A. X. Zheng, and M. Ester, “Collaborative denoisingauto-encoders for top-n recommender systems,” in

Acm InternationalConference on Web Search & Data Mining , 2016.[25] A. Mnih and R. R. Salakhutdinov, “Probabilistic matrix factorization,”in

Advances in neural information processing systems , 2008, pp. 1257–1264.[26] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neuralcollaborative ﬁltering,” in

Proceedings of the 26th international confer-ence on world wide web . International World Wide Web ConferencesSteering Committee, 2017, pp. 173–182.[27] Y. Wu, C. DuBois, A. X. Zheng, and M. Ester, “Collaborative denoisingauto-encoders for top-n recommender systems,” in

Proceedings of theNinth ACM International Conference on Web Search and Data Mining .ACM, 2016, pp. 153–162.[28] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr:Bayesian personalized ranking from implicit feedback,” in