HHSR: Hyperbolic Social Recommender
Anchen Li
College of Computer Science and Technology, JilinUniversity, [email protected]
Bo Yang β College of Computer Science and Technology, JilinUniversity, [email protected]
ABSTRACT
With the prevalence of online social media, usersβ social connectionshave been widely studied and utilized to enhance the performanceof recommender systems. In this paper, we explore the use of hyper-bolic geometry for social recommendation. We present H yperbolic S ocial R ecommender (HSR), a novel social recommendation frame-work that utilizes hyperbolic geometry to boost the performance.With the help of hyperbolic spaces, HSR can learn high-quality userand item representations for better modeling user-item interactionand user-user social relations. Via a series of extensive experiments,we show that our proposed HSR outperforms its Euclidean coun-terpart and state-of-the-art social recommenders in click-throughrate prediction and top- πΎ recommendation, demonstrating the ef-fectiveness of social recommendation in the hyperbolic space. KEYWORDS
Social Recommendation; Social Network; Hyperbolic Space.
In the era of information explosion, recommender systems havebeen playing an indispensable role in meeting user preferences byrecommending relevant items. Collaborative Filtering (CF) is oneof the dominant techniques used in recommender systems [19, 20,37, 45]. Traditional CF-based methods mainly rely on the historyof user-item interaction to generate recommendations. However,they are often impeded by data sparsity and cold start issues in realrecommendation scenarios.With the emergence of online social media, many E-commercesites have become popular social platforms in which users can notonly select items they love but also follow other users. According tothe social influence theory [2, 3, 23], usersβ preferences are similarto or influenced by their social neighbors. Therefore, researcherspropose using social network as another information stream tomitigate the lack of user-item interaction and enhance recommen-dation performance, also known as the social recommendation [27].To tackle the social recommendation problem, a diverse plethora ofsocial-aware models have been proposed, which utilizes differenttechniques to integrate social relations into recommendation, suchas matrix factorization [16, 28], multi-layer perceptron [11] andgraph neural networks [25, 44].Notably, all of these social recommenders operate in Euclideanspaces. In a real-world scenario, user-item interaction and user-usersocial relations exhibit power-law structures. Also, social networkspresent a hierarchical structure [1, 10, 15]. Recent research showsthat hyperbolic geometry enables embeddings with much smallerdistortion when embedding data with the power-law distributionand hierarchical structure [6, 31]. This motivates us to consider β Corresponding author. whether we can utilize hyperbolic geometry for boosting perfor-mance of social recommendation.
Our approach.
In this work, we propose a novel social rec-ommendation model, namely, H yperbolic S ocial R ecommender(HSR). Specifically, HSR learn user and item representations in thehyperbolic space β or more precisely in a Poincar Β΄ e ball. The keycomponent of our framework is that we design a hyperbolic ag-gregator on the usersβ social neighbor sets to take full advantageof the social information. With the help of hyperbolic geometry,HSR can better model user-item interaction and user-user socialrelations to enhance the performance in social recommendation. Our contributions.
In summary, our main contributions in thispaper are listed as follows: β’ We propose a novel framework HSR, which utilizes hyper-bolic geometry for social recommendation. To the best of ourknowledge, this is the first study to make use of hyperbolicspace for social recommendation task. β’ Experimental results show that our proposed HSR not onlyoutperforms its Euclidean counterpart but also boosts theperformance over the state-of-the-art social recommendersin click-through rate prediction and top- πΎ recommendation,demonstrating the effectiveness of social recommendationin hyperbolic geometry.The remainder of this paper is organized as follows. Section 2discusses the relevant background that forms the basis of our work.In Section 3, we introduce the proposed method HSR. In Section 4,we conduct experiments on four real-world datasets and presentthe experimental results. In Section 5, we review work related toour methods, followed by a conclusion in Section 6. In this section, we review the background of hyperbolic geometryand gyrovector space, which forms the basis of our method.
The hyperbolic space is uniquely defined as a complete and simplyconnected Riemannian manifold with constant negative curvature[22]. A key property of hyperbolic spaces is that they expand fasterthan Euclidean spaces. To describe the hyperbolic space, thereare multiple commonly used models of hyperbolic geometry, suchas the Poincar Β΄ e model, hyperboloid model, and Klein model [47].These models are all connected and can be converted into eachother. In this paper, we work with the Poincar Β΄ e ball model becauseit is well-suited for gradient-based optimization [31]. Poincar Β΄ e ball model. Let D π = { x β R π : β₯ x β₯ < } be the πππππ -dimensional unit ball, where β₯Β·β₯ denotes the Euclidean norm. ThePoincar Β΄ e ball model is the Riemannian manifold ( D π , π D ) , which isdefined by the manifold D π equipped with the Riemannian metric a r X i v : . [ c s . I R ] F e b nchen Li and Bo Yang tensor π D x = π x π E , where π x = ββ₯ x β₯ ; x β D π ; and π E = I denotesthe Euclidean metric tensor. The framework of gyrovector spaces provides vector operationsfor hyperbolic geometry [14]. We will make extensive use of thesegyrovector operations to design our model. Specifically, these oper-ations in gyrovector spaces are defined in an open π -dimensionalball D ππ = (cid:8) x β R π : π β₯ x β₯ < (cid:9) of radius β π ( π β₯ ) . Some widelyused vector operations of gyrovector spaces are defined as follows: β’ MΓΆbius addition : For x , y β D ππ , the MΓΆbius addition of x and y is defined as follows: x β π y = ( + π β¨ x , y β© + π β₯ y β₯ ) x + ( β π β₯ x β₯ ) y + π β¨ x , y β© + π β₯ x β₯ β₯ y β₯ . (1)In general, this operation is not commutative nor associative. β’ MΓΆbius scalar multiplication : For π > , the MΓΆbius scalar multi-plication of x β D ππ \ { } by π β R is defined as follows: π β π x = β π tanh (cid:16) π tanh β (cid:16) β π β₯ x β₯ (cid:17)(cid:17) x β₯ x β₯ , (2)and π β π = . This operation satisfies associativity: β’ MΓΆbius matrix-vector multiplication : For M β R π β² Γ π and x β D ππ ,if Mx β , the MΓΆbius matrix-vector multiplication of M and x is defined as follows: M β π x = β π tanh (cid:18) β₯ Mx β₯β₯ x β₯ tanh β (cid:16) β π β₯ x β₯ (cid:17)(cid:19) Mx β₯ Mx β₯ . (3)This operation satisfies associativity. β’ MΓΆbius exponential map and logarithmic map : For x β D ππ , it hasa tangent space π x D ππ which is a local first-order approximationof the manifold D ππ around x . The logarithmic map and theexponential map can move the representation between the twomanifolds in a correct manner. For any x β D ππ , given v β and y β x , the MΓΆbius exponential map exp π x : π x D ππ β D ππ andlogarithmic map log π x : D ππ β π x D ππ are defined as follows: exp π x ( v ) = x β π (cid:18) tanh (cid:18) β π π π x β₯ v β₯ (cid:19) v β π β₯ v β₯ (cid:19) , (4) log π x ( y ) = β ππ π x tanh β (β π β₯β x β π y β₯) β x β π y β₯β x β π y β₯ , (5)where π π x = β π β₯ x β₯ is the conformal factor of ( D ππ , π π ) , where π π is the generalized hyperbolic metric tensor. β’ Distance : For x , y β D ππ , the generalized distance between themin Gyrovector spaces are defined as follows: π π ( x , y ) = β π tanh β (β π β₯β x β π y β₯) . (6)We will make use of these MΓΆbius gyrovector space operationsto design our recommendation framework. In this section, we first introduce the notations and formulate theproblems. We then elaborate our proposed HSR method. Based onthat, we introduce two strategies to further improve our model.Finally, we discuss the learning algorithm of our model.
In a typical recommendation scenario, we suppose there are π users U = { π’ , π’ , ..., π’ π } and π items V = { π£ , π£ , ..., π£ π } . Wedefine Y β R π Γ π as the user-item historical interaction matrixwhose element π¦ ππ = if user π is interested in item π and zerootherwise. In addition to the interaction matrix Y , we also havea social network S β R π Γ π among users U , where its element π ππ = if π’ π trust π’ π and zero otherwise.The goal of our method HSR is to utilize the historical interactionand the social information to predict userβs personalized interests initems. Specifically, given the user-item interaction matrix Y as wellas user social network S , HSR aims to learn a prediction function ^ π¦ ππ = F ( π’ π , π£ π | Ξ , Y , S ) , where ^ π¦ ππ is the preference probability fromuser π’ π to item π£ π which she has never engaged before, and Ξ isthe model parameters of function F . We now present the our method HSR. There are three componentsin the model: the embedding layer, the aggregation layer, and theprediction layer. Details of each part are described in the following.
Such layer takes a user and an item as aninput, and encodes them with dense low-dimensional embeddingvectors. Specifically, given one hot representations of target user π’ π and target item π£ π , the embedding layer outputs their embeddings u π and v π , respectively. We will learn user and item embeddingvectors in the hyperbolic space D ππ . Due to the social influence theories [2,3, 23], userβs preference will be indirectly influenced by her socialfriends. We should utilize social information for better user em-bedding modeling. Specially, we devise a social aggregator on theusersβ trusted neighbors to refine usersβ embeddings in the hyper-bolic space, formulating the aggregation process with two majoroperations: neighbor aggregation and feature update .The neighborhood aggregation stage learns the representation ofa neighborhood by transforming and aggregating the feature infor-mation from the neighborhood. The center-neighbor combinationstage learns the representation of the central node by combiningthe representation of the neighborhood with the features of thecentral node.
Neighbor aggregation.
Given a user, neighbor aggregation firstaggregates the given userβs neighbor representations into a singleembeddings, and then combines the representation of the neigh-borhood with the feature of the given user. We can directly utilizeMΓΆbius addition to achieve the neighbor aggregation. Denote
N ( π ) as user π’ π βs social neighbor set, the neighbor aggregation for user π’ π is defined as follows: u N( π ) = βοΈ β π π βN( π ) u π u π΄πΊπΊπ = u π β π (cid:16) πΎ β π u N( π ) (cid:17) (7)where (cid:205) β π is the accumulation of MΓΆbius addition, and πΎ is acoefficient which controls the social influence. SR: Hyperbolic Social Recommender
Feature update.
In this stage, we further update the aggregatedrepresentations to obtain sufficient representation power as: u β π = π (cid:16) M β π u π΄πΊπΊπ (cid:17) (8)where M β R π Γ π is the trainable matrix, and π is the nonlinearactivation function defined as LeakyReLU [29]. High-order aggregation.
Through a single aggregation layer, userrepresentation is dependent on itself as well as the direct socialneighbors. We can further stack more layers to obtain high-orderinformation from multi-hop neighbors of users. More formally, inthe β -th layer, for user π’ π , her representation is defined as: u βπ = π (cid:16) M β β π (cid:16) u β β π β π (cid:16) πΎ β π βοΈ β π π βN( π ) u β β π (cid:17)(cid:17)(cid:17) (9)where M β is the trainable matrix in the β -th layer and u π = u β π . After the πΏ aggregation layer, for targetuser π’ π , we obtain her final representation u πΏπ . We then feed userrepresentation u πΏπ and target item representation v π into a function π for predicting the probability of π’ π engaging π£ π : ^ π¦ ππ = π ( u πΏπ , v π ) .Here we implement the prediction function π as the Fermi-Diracdecoder [22, 31], a generalization of sigmoid function, to computeprobability scores between π’ π and π£ π , as follows: ^ π¦ ππ = π ( π π ( u πΏπ , v π )β π )/ π‘ + (10)where π and π‘ are hyper-parameters. In this subsection, we further improve our method from two aspects.
Since MΓΆbius addition operation is notcommutative nor associative [14], we have to calculate the accu-mulation of MΓΆbius addition by order in Equation (7), as follows: u π΄πΊπΊπ = u π β π (cid:16) πΎ β π βοΈ β π π βN( π ) u π (cid:17) = u π β π (cid:16) πΎ β π (cid:0) (cid:0) ( u π β π u π ) β π u π (cid:1) β π Β· Β· Β· (cid:1)(cid:17) (11)As is known to all, there exist some active users who have many so-cial behaviors in social network, so the calculation in Equation (11)is seriously time-consuming, which will affect the efficiency of ourmethod HSR. Therefore, it is necessary to devise a new way tocalculate the aggregation.We resort to MΓΆbius logarithmic map and exponential map, asillustrated in Figure 1. Specifically, we first utilizes the logarith-mic map to project user representations into a tangent space, thenperform the accumulation operation to aggregate the user rep-resentations in the tangent space, and finally project aggregatedrepresentations back to the hyperbolic space with the exponentialmap. The process is formulated as: u π΄πΊπΊπ = exp π (cid:16) log π ( u π ) + πΎ Β· βοΈ π βN( π ) log π ( u π ) (cid:17) (12)Different from Equation (7), we can calculate the results in a parallelway in Equation (12) because the accumulation operation in thetangent space is commutative and associative, which enables ourmodel more efficient. logc (.) expc (.) hyperbolic space target usersocial neighbors L= L= tangent space Figure 1: Illustration of the acceleration strategy of second-order embedding propagation for the target user.
For a user, social influence strength ofher friends should be different and dynamic. Therefore, we intro-duce an attention mechanism, which assigns non-uniform weightsto usersβ social neighbors. We donate π ππ as the social influencestrength from user π’ π βs social neighbor π’ π to π’ π , which can becompute as follows: π ππ = (cid:0) log π ( u π ) β log π ( u π ) (cid:1) β€ tanh (cid:0) w β€ (cid:2) log π ( u π ) , log π ( v π ) (cid:3)(cid:1) (13)where [Β· , Β·] and β mean concatenation operation and element-wiseproduct between two vectors in the tangent space. w β R π Γ π isthe trainable weighted matrix of the attention mechanism. We alsoemploy tanh as the nonlinear activation function.We consider target user, target item and target userβs social neigh-bor to design the attention mechanism. The first term calculatesthe compatibility between user π’ π and her social neighbor π’ π , andthe second term computes the opinions of the neighbor π’ π on thetarget item π£ π . Here we simply employ inner product on the twoterms, one can design a more sophisticated attention mechanism. β¦β¦ N (a) matrix M a U ( L -1)b U ( L -1)c U ( L -1) (cid:1239) Ο (cid:68)(cid:69) (cid:1239) Ο (cid:68)(cid:70) Layer L Layer (cid:335)(cid:335) L e a k y R e L U a U ( L ) Ξ³ a U ( L -1) Figure 2: Illustration of the aggregation operation.
By integrating Equation (12) and (13), the final aggregation rule(as illustrated in Figure 2) for each user π’ π is calculated as follows: u βπ = π (cid:16) M β β π exp π (cid:16) log π ( u β β π ) + πΎ Β· βοΈ π βN( π ) Λ π ππ Β· log π ( u β β π ) (cid:17)(cid:17) (14)where Λ π ππ is the normalized attention weight which can be calcu-lated by softmax function with temperature parameter π : Λ π ππ = exp ( π ππ / π ) (cid:205) π β² βN( π ) exp ( π ππ β² / π ) (15) nchen Li and Bo Yang The complete objective function consistsof two parts: recommendation loss and social relation loss.
Recommendation Loss.
For the recommendation task, the lossfunction of recommendation is defined as: L π = β βοΈ ( π’,π,π ) βD π (cid:0) π¦ π’π log ( ^ π¦ π’π ) + ( β π¦ π’ π ) log ( β ^ π¦ π’ π ) (cid:1) (16)where D π is the set of training triplets. We donate V π’ is the itemset which user π’ has interacted with, and D π can be defined as: D π = {( π’, π, π ) | π’ β U β§ π β V π’ β§ π β V\V π’ } (17) Social Relation Loss.
To model user social relations in social net-work, we devise the social relation loss. We first define the score(similarity) function for a user pair ( π’ π , π’ π ) according to their dis-tance in the hyperbolic space, as follows: π ππππ ( π’ π , π’ π ) = β π π ( u π , u π ) (18)We want the observed trusted user pairs to have higher scoresthan unobserved user pairs in social network. We utilize the Bayesianpersonalized ranking (BPR) method to define L π , as follows: L π = β βοΈ ( π’,π,π ) βD π log π ( π ππππ ( π’, π ) β π ππππ ( π’, π )) (19)where π is the sigmoid function π ( π₯ ) = /( + π β π₯ ) , and D π isdefined as follow: D π = (cid:8) ( π’, π, π ) | π’ β U β§ π β N ( π’ ) β§ π β U\N ( π’ ) (cid:9) (20) Multi-Task Learning.
We integrate the recommendation loss L π and the social relation loss L π into an end-to-end fashion through amulti-task learning framework, and utilize π to balance two terms.The complete loss function of HSR is defined as follows: min Ξ L = L π + π L π (21)where Ξ is the total parameter space, including user embeddings { u π } |U | π = , item embeddings { v π } |V | π = , and weight parameters of thenetworks (cid:8) M π , w π (cid:9) πΏπ = . Since the Poincar Β΄ e Ball has a Riemann-ian manifold structure, we utilize Riemannian stochastic gradientdescent (RSGD) to optimize our model [4]. As similar to [31], theparameter updates are of the following form: π π‘ + = β π π‘ (β π π‘ β π L( π π‘ )) , (22)where β π π‘ denotes a retraction onto D at π and π π‘ denotes thelearning rate at time π‘ . The Riemannian gradient β π can be com-puted by rescaling the Euclidean gradient β πΈ with the inverse ofthe Poincar Β΄ e ball metric tensor as β π = ( ββ₯ π π‘ β₯ ) β πΈ . The training process of the HSR is sum-marized in Algorithm 1.
In this subsection, we introduce the datasets, baselines, evaluationprotocols, and the choice of hyper-parameters.
Algorithm 1
Training algorithm of HSR
Input:
Interaction matrix Y ; social network SOutput:
Prediction function
F ( π’, π£ | Ξ , Y , S ) Initialize all parameters in Ξ repeat Draw a mini-batch of ( π’, π, π ) from D π Draw a mini-batch of ( π’, π, π ) from D π Calculate L π according to Equation (7) to (17) Calculate L π according to Equation (18) to (20) Calculate
L β L π + π L π Update parameters of F by using RSGD to optimize L until L converges or is sufficiently small return F ( π’, π£ | Ξ , Y , S ) We experimented with four datasets: Ciao , Yelp ,Epinion , and Douban . Each dataset contains usersβ ratings tothe items and the social connections between users. In the datapreprocessing step, we transform the ratings into implicit feedback(denoted by β1β) indicating that the user has rated the item positively.Then, for each user, we sample the same amount of negative samples(denoted by β0β) as their positive samples from unwatched items.Also, we filtered out users with no links in social networks. Thestatistics of the datasets are summarized in Table 1. Table 1: Statistical details of the four datasets. βinteractionsβmeans user-item historical records, and βrelationsβ denotesuser-friend connections in social network. dataset
To verify the performance of our pro-posed method HSR, we compared it with the following state-of-artsocial recommendation methods. The characteristics of the com-parison methods are listed as follows: β’ FM is a feature enhanced factorization model. Here we utilizethe social information as additional input features. Specifically,we concatenate the user embedding, item embedding and theaverage embeddings of user social neighbors as the inputs [33]. β’ DeepFM is also a feature enhanced factorization model, whichcombines factorization machines and deep neural networks. Weprovide the same inputs as FM for DeepFM [17]. β’ SoReg models usersβ social relation as regularization terms toconstrain the matrix factorization framework [28]. β’ TrustSVD extends SVD++ [21] framework by modelling bothuser-user social relations and user-item interaction [16]. β’ DeepSoR combines deep neural networks to learn usersβ pref-erences from social networks and integrate usersβ preferencesinto PMF [36] framework for recommendation [11]. Epinion: http://alchemy.cs.washington.edu/data/epinions/ Douban: http://book.douban.com
SR: Hyperbolic Social Recommender
Table 2: The results of
AUC and
Accuracy in CTR prediction on four datasets. ** denotes the best values among all methods,and * denotes the best values among all competitors.
Method Ciao Yelp Epinion DoubanAUC ACC AUC ACC AUC ACC AUC ACCFM 0.7538 0.6828 0.8153 0.7574 0.8048 0.7378 0.8304 0.7595DeepFM
HSR
ESR 0.7645 0.6881 0.8330 0.7682 0.8063 0.7405 0.8307 0.7581 β’ DiffNet is a graph neural network based social recommender,which designs a layer-wise influence propagation structure forbetter user embedding modeling [44]. β’ HOSR is a another graph neural network based social recom-mender, which propagates user embeddings along the socialnetwork and designs a attention mechanism to study the im-portance of different neighbor orders [25]. β’ ESR is the Euclidean counterpart of HSR, which replaces MΓΆbiusaddition, MΓΆbius matrix-vector multiplication, Gyrovector spacedistance with Euclidean addition, Euclidean matrix multiplica-tion, Euclidean distance, and remove MΓΆbius logarithmic mapand exponential map. β’ HSR is our complete model.
We implemented our methods with Py-torch which is a Python library for deep learning. For each dataset,we randomly split it into training, validation, and test sets fol-lowing 7 : 1 : 2. The hyper-parameters were tuned on the val-idation set by a grid search. Specifically, the learning rate π istuned among [ β , Γ β , β , Γ β ] ; the embedding size π is searched in [ , , , , ] ; the balancing factor π is cho-sen from [ β , β , β , β ] ; and the temperature π is tunedamongst [ . , . , . , . ] . For the aggregation layer size πΏ , weset πΏ = for all datasets and find them sufficiently good. We find us-ing 2 or more layers can slightly increase the performance but takemuch longer training time. In addition, we set batch size π = ,curvature π = , social coefficient πΎ = , and Fermi-Dirac decoderparameters π = , π‘ = . We will further study the impact of keyhyper-parameters in the following subsection. The best settings forhyper-parameters in all baselines are reached by either empiricalstudy or following their original papers. We evaluate our methods in two scenar-ios: (i) in click-through rate (CTR) prediction, we adopt two metrics
AUC (area under the curve) and
Accuracy , which are widely utilizedin binary classification problems; and (ii) in top- πΎ recommendation,we use the model obtained in CTR prediction to generate top- πΎ items. Since it is time-consuming to rank all items for each userin the evaluation procedure, to reduce the computational cost, fol-lowing the strategy in [18, 44], for each user, we randomly sample500 unrated items at each time and combine them with the positiveitems in the ranking process. We use Precision@K and
Recall@K to evaluate the recommended sets. We repeated each experiment 5times and reported the average performance.
Researches show that data with a power-law structure can be nat-urally modeled in the hyperbolic space [22, 31, 43]. Therefore, weconduct an empirical study to check whether the power-law distri-bution also exists in the user-item interaction and user-user socialrelations. For the user-item interaction relation, we present the dis-tribution of the number of interactions for users and items, respec-tively. For the user-user social relation, we present the distributionof number of social behaviors for users. We show the distributionsof these two relations on the Ciao and Epinion datasets in Fig-ure 3. We observed that these distributions show the power-lawdistribution: (i) for the user-item interaction relation, a majorityof users/items have very few interactions and a few users/itemshave a huge number of interactions; and (ii) for the user-user socialrelation, a majority of users have very few social behaviors anda few users have a huge number of social behaviors. The abovefindings empirically demonstrate user-item interaction and user-user social relations exhibit power-law structure, thus we believethat using hyperbolic geometry might be suitable for the socialrecommendation. Number of interactions N u m b er o f u s er s (a) Distribution of user's interaction (Ciao) Number of interactions N u m b er o f i t e m s (b) Distribution of item's interaction (Ciao) Number of social behaviors N u m b er o f u s er s (c) Distribution of social network (Ciao) Number of interactions N u m b er o f u s er s (d) Distribution of user's interaction (Epinion) Number of interactions N u m b er o f i t e m s (e) Distribution of item's interaction (Epinion) Number of social behaviors N u m b er o f u s er s (f) Distribution of social network (Epinion) Figure 3: Distributions of user-item interaction and user-user social relations on Ciao (in top row) and Epiniondatasets (in bottom row). nchen Li and Bo Yang
10 15 20 25 30 K P rec i s i o n @ K (a) Ciao HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
10 15 20 25 30 K P rec i s i o n @ K (b) Yelp HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
10 15 20 25 30 K P rec i s i o n @ K (c) Epinion HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
10 15 20 25 30 K P rec i s i o n @ K (d) Douban HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
Figure 4: The results of
Precision@K in top- πΎ recommendation on four datasets
10 15 20 25 30 K R ec a ll @ K (a) Ciao HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
10 15 20 25 30 K R ec a ll @ K (b) Yelp HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
10 15 20 25 30 K R ec a ll @ K (c) Epinion HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
10 15 20 25 30 K R ec a ll @ K (d) Douban HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
Figure 5: The results of
Recall@K in top- πΎ recommendation on four datasets Table 2 and Figures 4, 5 show the performance of all comparedmethods in CTR prediction and top- πΎ recommendation, respec-tively (ESR are not plotted in Figure 4, 5 for clarity). From theresults, we have the following main observations:(i) Deep learning based models (i.e., DeepFM, DeepSoR, DiffNetand HOSR) generally outperform shallow representation methods(i.e., FM, SoReg and TrustSVD), which indicates the effectivenessof applying neural components for social recommendation.(ii) Among baselines, graph neural network based social rec-ommenders DiffNet and HOSR achieve strongly competitive per-formance. Such improvement verifies graph neural networks arepowerful in representation learning for graph data, since it inte-grates the usersβ social information as well as usersβs topologicalstructure in social network.(iii) Intuitively, HSR has made great improvements over state-of-the-art baselines in both recommendation scenarios. For CTRprediction task, our method HSR consistently yields the best per-formance on four datasets. For example, HSR improves over thestrongest baselines w.r.t. Accuracy by 3.73%, 3.03%, 1.81%,and 1.49%in Ciao, Yelp, Epinion and Douban datasets, respectively. In top- πΎ recommendation, HSR achieves 3.79%, 5.68%, 5.97%, and 2.46%performance improvement against the strongest baseline w.r.t. Re-call@20 in Ciao, Yelp, Epinion and Douban datasets, respectively.Considering the Euclidean counterpart of HSR, HSR achieves betterscores than ESR. This indicates the effectiveness of social recom-mendation in the hyperbolic space. The data sparsity problem is a great challenge for most recom-mender systems. To investigate the effect of data sparsity, we binthe test users into four groups with different sparsity levels basedon the number of observed ratings in the training data, meanwhilekeep each group including a similar number of interactions. Forexample, [11,26) in the Ciao dataset means for each user in thisgroup has at least 11 interaction records and less than 26 interac-tion records. Figure 6 shows the
Accuracy results on different usergroups with different models on Ciao and Epinion datasets. Fromthe results, we observe that HSR consistently outperforms the othermethods including the state-of-the-art social enhanced methodslike DiffNet and HOSR, which verifies that our method is able tomaintain a decent performance in different sparse scenarios. [0-11) [11-26) [29-59) [59-)
Group A cc u r a c y (a) Ciao HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR [0-11) [11-24) [24-54) [54-)
Group A cc u r a c y (b) Epinion HSR(Ours)FMDeepFMSoRegTrustSVDDeepSoRDiffNetHOSR
Figure 6: Performance comparison over the sparsity distri-bution of user groups on Ciao and Epinion datasets.
SR: Hyperbolic Social Recommender
We explore the impact of four hyper-parameters: embedding size π , balancing factor π , and temperature π . The results on Ciao andYelp datasets are plotted in Figure 7. We have the following obser-vations: (i) A proper embedding size π is needed. If it is too small,the model lacks expressiveness, while a too large π increases thecomplexity of the recommendation framework and may overfitthe datasets. In addition, we observe that HSR always significantlyoutperforms ESR regardless of the embedding size, especially ina low-dimensional space, which shows that utilizing hyperbolicgeometry can effectively learn high-quality representations for so-cial recommendation. (ii) For balancing factor π , we find that when π = β is good enough on Ciao and Yelp datasets because a larger π will let the model focus on modeling social relation task. (iii) Wefind the empirically optimal temperature π to be . on Ciao andYelp datasets. The performance increases when the temperature istuned from 0 to the optimal value and then drops down afterwards,which indicates that proper value of π can distill useful informationfor social neighbor aggregation. d AUC (a) embedding size
HSRESR (b) balancing factor HSRESR (c) temperature
HSRESR d AUC (d) embedding size
HSRESR (e) balancing factor HSRESR (f) temperature
HSRESR
Figure 7: Parameter sensitivity on Ciao and Yelp datasets.
To explore the effect of our neighbor attention mechanism andsocial relation modeling task, we conducted experiments with thetwo variants of HSR and ESR: (1) HSR-A and ESR-A (performing amean operation in Equation (12), and (2) HSR-S and ESR-S (onlymodeling the recommendation task in Equation (21)). Table 3 showsthe
AUC results on four datasets. From the results, we find thatremoving any components will decrease recommendation perfor-mance of our models. For example, HSR-A performs worse thanthe complete model HSR, which shows that considering differentinfluences of social neighbors in aggregation operation is necessaryto model user preference. HSR also achieves better scores thanHSR-S. This validates that explicitly modeling social relations ishelpful for improving the model performance.
Social networks often present a hierar-chical structure [1, 10, 15]. In this case study, we evaluate whether
Table 3: Effect of the attention mechanism and social rela-tion modeling on four datasets.
Dataset Ciao Yelp Epinion DoubanHSR 0.7889 0.8552 0.8295 0.8477HSR-A 0.7773 0.8490 0.8234 0.8431HSR-S 0.7784 0.8465 0.8247 0.8426ESR 0.7645 0.8330 0.8063 0.8307ESR-A 0.7593 0.8278 0.8002 0.8267ESR-S 0.7581 0.8295 0.8036 0.8253our models can reflect hierarchical structures of social behaviors.In general, the distances between embeddings and the origin canreflect the latent hierarchy of graphs [31, 43]. We utilize Gyrovectorspace distance and Euclidean distance to calculate the distance tothe origin for HSR and ESR, respectively. We bin the nodes in theuser-user social graph into four groups according to their distancesto the origin (from the near to the distant), meanwhile, keep eachgroup including a similar number of nodes. For example, nodesin group 1 have the nearest distances to the origin while nodes ingroup 4 have the furthest distances to the origin. To evaluate thenodesβ activity in the social graph, we compute the average numberof nodesβ social behaviors in each group. Figure 8 shows the re-sults on Epinion and Douban datasets. From the results, we can seethat the average number of interaction behaviors decreases fromgroup 1 to group 4. This result indicates that hierarchy of socialbehaviors can be modeled by our methods HSR and ESR. Comparedwith ESR, we find that HSR more clearly reflects the hierarchicalstructure, which indicates that hyperbolic space is more suitablethan Euclidean space to embed data with the hierarchical structure.
Group av er ag e nu m b er o f d e g ree (a) Epinion HSRESR 1 2 3 4
Group av er ag e nu m b er o f d e g ree (b) Douban HSRESR
Figure 8: Analysis of hierarchical structure in social net-works on Epinion and Douban datasets.
Attention Analysis.
Benefiting from the attention mechanism,we can visualize the attention weights placed on the social neigh-bors for users, which reflects how the model learned. We randomlyselected one user π’ who has six friends from Yelp dataset, andthree relevant items π£ , π£ , π£ (from the test set). Figure 9shows the attention weights of the user π’ βs social neighborsfor the three user-item pairs. For convenience, we label neighborID starting from 1, which may not necessarily reflect the true IDfrom the dataset. From the heatmap, we have the following findings:(i) Not all neighbors have the same contribution when generatingrecommendations. For instance, for the user-item pair ( π’ , π£ ),the attention weights of user π’ βs neighbor nchen Li and Bo Yang rated item π£ in the training set. Therefore, neighbor ( u , v )( u , v )( u , v ) attention value Figure 9: Attention heatmap for the neighbors of three user-item pairs from Yelp dataset.
In this section, we provide a brief overview of two areas that arehighly relevant to our work.
Social Recommendation.
With the prevalence of online social me-dia, social relations have been widely studied and exploited to boostthe recommendation performance. The most common approach insocial recommendation is to design loss terms for social influenceand integrate both social loss and recommendation loss into a uni-fied loss function for jointly optimizing [24, 28, 39]. For instance,SoReg assumes that social neighbors share similar feature represen-tations and devise regularization terms to smooth connected usersβsembeddings [28]; CSR models the characteristics of social relationsand designs a characterized regularization term to improve SoReg[24]. In addition to the regularization-based social recommenders,a more explicit and straightforward way is to explicitly modelssocial relations in the predictive model [7, 16, 26]. For example,TrustSVD extended SVD++ [21] by incorporating each userβs socialneighborsβ preferences as the auxiliary feedbacks [16]. Inspiredby the immense success of deep learning, some recent proposedsocial recommenders utilize neural components to enhance recom-mendation performance [8, 9, 11, 25, 44]. For instance, DeepSoRcombines a multi-layer perceptron to learn latent preferences ofusers from social networks with probabilistic matrix factorization[11]; SAMN considers both aspect-level and friend-level differencesand utilizes memory network and attention mechanisms for so-cial recommendation [8]; DiffNet stacks more graph convolutionallayers and propagate users embeddings along the social networkto capture high-order neighbor information [44]. Different fromabove-mentioned social recommenders based on Euclidean repre-sentation methods, we propose a hyperbolic representation modelHSR for social recommendation, which utilizes hyperbolic geom-etry to learn high-quality user and item representations in thenon-Euclidean space.
Hyperbolic Representation Learning.
In recent years, represen-tation learning in hyperbolic spaces has attracted an increasingamount of attention. Specifically, [31] embedded hierarchical data into the Poincar Β΄ e ball, showing that hyperbolic embeddings canoutperform Euclidean embeddings in terms of both representationcapacity and generalization ability. [32] focused on learning embed-dings in the Lorentz model and showed that the Lorentz model ofhyperbolic geometry leads to substantially improved embeddings.[13] extended Poincar Β΄ e embeddings to directed acyclic graphs byutilizing hyperbolic entailment cones. [35] analyzed representationtrade-offs for hyperbolic embeddings and developed and proposeda novel combinatorial algorithm for embedding learning in hyper-bolic space. Besides, researchers began to combining hyperbolicembedding with deep learning. [14] introduced hyperbolic neuralnetworks which defined core neural network operations in hyper-bolic space, such as MΓΆbius addition, MΓΆbius scalar multiplication,exponential and logarithmic maps. After that, hyperbolic analoguesof other algorithms have been proposed, such as Poincar Β΄ e Glove[40] and hyperbolic attention networks [46].There are some recent works using hyperbolic representationlearning for recommendation [5, 12, 30, 38, 41, 42]. For instance,HyperBPR learns user and item hyperbolic representations andutilizes BPR [34] framework for collaborative filtering [42]. HMEis designed to solve next-Point-of-Interest (POI) recommendationtask, which projects the check-in data into a hyperbolic space forrepresentation learning [12]. Different from the above literature,we propose HSR which utilizes hyperbolic geometry for socialrecommendation. To our knowledge, HSR is the first work that usesthe hyperbolic space for social recommendation tasks. In this work, we develop a novel method called HSR which leverageshyperbolic geometry for social recommendation. The key compo-nent of HSR is that we design a hyperbolic aggregator on the usersβsocial neighbor sets to take full advantage of the social information.We further introduce acceleration strategy and attention mecha-nism to improve our method HSR. Additionally, we also devisethe social relation modeling task to make the user embeddings bereflective of the relational structure in social network, and jointlytrain it with recommendation task. To the best of our knowledge,HSR is the first model to explore the hyperbolic space in socialrecommendation. We conduct extensive experiments on four real-world datasets. The results demonstrate (i) the superiority of HSRcompared to strong baselines, and (ii) the effectiveness of socialrecommendation in hyperbolic geometry.For future work, we will (i) extend our model for temporal socialrecommendation to consider usersβ dynamic preferences, and (ii)try to generate recommendation explanations for comprehendingthe user behaviors and item attributes.
SR: Hyperbolic Social Recommender
REFERENCES [1] Clauset Aaron, Cristopher Moore, and Mark EJ Newman. 2008. Hierarchicalstructure and the prediction of missing links in networks.
Nature
Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining .[3] Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer, CameronMarlow, Jaime E Settle, and James H Fowler. 2012. A 61-million-person experi-ment in social influence and political mobilization. (2012), 295β-298.[4] Silvere Bonnabel. 2013. Stochastic Gradient Descent on Riemannian Manifolds.
IEEE Trans. Automat. Control
58, 9 (2013), 2217β2229.[5] Benjamin Paul Chamberlain, Stephen R. Hardwick, David R. Wardrope, FabonDzogang, Fabio Daolio, and SaΓΊl Vargas. 2019. Scalable Hyperbolic RecommenderSystems. In
CoRR abs/1902.08648 .[6] Ines Chami, Zhitao Ying, Christopher RΓ©, and Jure Leskovec. 2019. HyperbolicGraph Convolutional Neural Networks. In
Advances in Neural Information Pro-cessing Systems 32 . 4869β4880.[7] Allison June-Barlow Chaney, David M. Blei, and Tina Eliassi-Rad. 2015. A Proba-bilistic Model for Using Social Networks in Personalized Item Recommendation.In
Proceedings of the 9th ACM Conference on Recommender Systems . 43β50.[8] Chong Chen, Min Zhang, Yiqun Liu, and Shaoping Ma. 2019. Social AttentionalMemory Network: Modeling Aspect- and Friend-Level Differences in Recom-mendation. In
Proceedings of the Twelfth ACM International Conference on WebSearch and Data Mining . 177β185.[9] Chong Chen, Min Zhang, Chenyang Wang, Weizhi Ma, Minming Li, Yiqun Liu,and Shaoping Ma. 2019. An Efficient Adaptive Transfer Neural Network forSocial-aware Recommendation. In
Proceedings of the 42nd International ACMSIGIR Conference on Research and Development in Information Retrieval . 225β234.[10] Fengjiao Chen and Kan Li. 2015. Detecting hierarchical structure of communitymembers in social networks.
Knowledge Based Systems
87 (2015), 3β15.[11] Wenqi Fan, Qing Li, and Min Cheng. 2018. Deep Modeling of Social Relationsfor Recommendation. In
Proceedings of the Thirty-Second AAAI Conference onArtificial Intelligence . 8075β8076.[12] Shanshan Feng, Lucas Vinh Tran, Gao Cong, Lisi Chen, Jing Li, and Fan Li. 2020.HME: A Hyperbolic Metric Embedding Approach for Next-POI Recommendation.In
Proceedings of the 43rd International ACM SIGIR conference on research anddevelopment in Information Retrieval . 1429β1438.[13] Octavian-Eugen Ganea, Gary BΓ©cigneul, and Thomas Hofmann. 2018. HyperbolicEntailment Cones for Learning Hierarchical Embeddings. In
Proceedings of the35th International Conference on Machine Learning . 1632β1641.[14] Octavian-Eugen Ganea, Gary BΓ©cigneul, and Thomas Hofmann. 2018. HyperbolicNeural Networks. In
Annual Conference on Neural Information Processing Systems2018 . 5350β5360.[15] FrΓ©dΓ©ric Gilbert, Paolo Simonetto, Faraz Zaidi, Fabien Jourdan, and RomainBourqui. 2011. Communities and hierarchical structures in dynamic social net-works: analysis and visualization.
Social Network Analysis and Mining
1, 2 (2011),83β95.[16] Guibing Guo, Jie Zhang, and Neil Yorke-Smith. 2015. TrustSVD: CollaborativeFiltering with Both the Explicit and Implicit Influence of User Trust and ofItem Ratings. In
Proceedings of the Twenty-Ninth AAAI Conference on ArtificialIntelligence . 123β129.[17] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017.DeepFM: A Factorization-Machine based Neural Network for CTR Prediction.In
Proceedings of the 26th International Joint Conference on Artificial Intelligence .1725β1731.[18] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, and Tat-Seng Chua Xia Hu.2017. Neural collaborative filtering. In
Proceedings of the 26th InternationalConference on World Wide Web . 173β182.[19] Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. FastMatrix Factorization for Online Recommendation with Implicit Feedback. In
Proceedings of the 39th International ACM SIGIR conference on Research and Devel-opment in Information Retrieval . 549β558.[20] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering forImplicit Feedback Datasets. In
Proceedings of the 8th IEEE International Conferenceon Data Mining . 263β272.[21] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifacetedcollaborative filtering model. In
Proceedings of the 14th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining . 426β434.[22] Dmitri V. Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, andMariΓ‘n BoguΓ±Γ‘. 2010. Hyperbolic Geometry of Complex Networks. In
CoRRabs/1006.5169 .[23] Kevin Lewis, Marco Gonzalez, and Jason Kaufman. 2012. Social selection andpeer influence in an online social network. (2012), 68β72.[24] Tzu-Heng Lin, Chen Gao, and Yong Li. 2018. Recommender Systems with Char-acterized Social Regularization. In
Proceedings of the 27th ACM InternationalConference on Information and Knowledge Management . 1767β1770. [25] Yang Liu, Liang Chen, Xiangnan He, Jiaying Peng, Zibin Zheng, and Jie Tang.2020. Modelling High-Order Social Relations for Item Recommendation. In
CoRRabs/2003.10149 .[26] Hao Ma, Irwin King, and Michael R. Lyu. 2009. Learning to recommend withsocial trust ensemble. In
Proceedings of the 32nd Annual International ACM SIGIRConference on Research and Development in Information Retrieval . 203β210.[27] Hao Ma, Haixuan Yang, Michael R. Lyu, and Irwin King. 2008. SoRec: socialrecommendation using probabilistic matrix factorization. In
Proceedings of the17th ACM Conference on Information and Knowledge Management . 931β940.[28] Hao Ma, Dengyong Zhou, Chao Liu, Michael R. Lyu, and Irwin King. 2011. Recom-mender systems with social regularization. In
Proceedings of the 4th InternationalConference on Web Search and Web Data Mining . 287β296.[29] Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. 2013. Rectifier nonlinear-ities improve neural network acoustic models. In
ICML .[30] Leyla Mirvakhabova, Evgeny Frolov, Valentin Khrulkov, Ivan V. Oseledets, andAlexander Tuzhilin. 2020. Performance of Hyperbolic Geometry Models onTop-N Recommendation Tasks. In
Fourteenth ACM Conference on RecommenderSystems . 527β532.[31] Maximilian Nickel and Douwe Kiela. 2017. PoincarΓ© Embeddings for Learning Hi-erarchical Representations. In
Advances in Neural Information Processing Systems30 . 6338β6347.[32] Maximilian Nickel and Douwe Kiela. 2018. Learning Continuous Hierarchies inthe Lorentz Model of Hyperbolic Geometry. In
Proceedings of the 35th InternationalConference on Machine Learning . 3776β3785.[33] Steffen Rendle. 2010. Factorization Machines. In
Proceedings of the 10th IEEEInternational Conference on Data Mining . 995β1000.[34] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In
Proceedingsof the 25th Conference on Uncertainty in Artificial Intelligence . 452β461.[35] Frederic Sala, Christopher De Sa, Albert Gu, and Christopher RΓ©. 2018. Rep-resentation Tradeoffs for Hyperbolic Embeddings. In
Proceedings of the 35thInternational Conference on Machine Learning . 4457β4466.[36] Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization.In
Proceedings of the 21st Annual Conference on Neural Information ProcessingSystems . 1257β1264.[37] Badrul Munir Sarwar, George Karypis, Joseph A. Konstan, and John Riedl. 2001.Item-based collaborative filtering recommendation algorithms. In
Proceedings ofthe 10th International World Wide Web Conference . 285β295.[38] Timothy Schmeier, Joseph Chisari, Sam Garrett, and Brett Vintch. 2019. Musicrecommendations in hyperbolic space: an application of empirical bayes andhierarchical poincarΓ© embeddings. In
Proceedings of the 13th ACM Conference onRecommender Systems . 437β441.[39] Jiliang Tang, Suhang Wang, Xia Hu, Dawei Yin, Yingzhou Bi, Yi Chang, andHuan Liu. 2016. Recommendation with Social Dimensions. In
Proceedings of theThirtieth AAAI Conference on Artificial Intelligence . 251β257.[40] Alexandru Tifrea, Gary BΓ©cigneul, and Octavian-Eugen Ganea. 2019. PoincarΓ©Glove: Hyperbolic Word Embeddings. In .[41] Lucas Vinh Tran, Yi Tay, Shuai Zhang, Gao Cong, and Xiaoli Li. 2020. HyperML:A Boosting Metric Learning Approach in Hyperbolic Space for RecommenderSystems. In
The Thirteenth ACM International Conference on Web Search and DataMining . 609β617.[42] Tran Dang Quang Vinh, Yi Tay, Shuai Zhang, Gao Cong, and Xiao-Li Li. 2018.Hyperbolic Recommender Systems. In
CoRR abs/1809.01703 .[43] Xiao Wang, Yiding Zhang, and Chuan Shi. 2019. Hyperbolic Heterogeneous In-formation Network Embedding. In
The Thirty-Third AAAI Conference on ArtificialIntelligence . 5337β5344.[44] Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng Wang. 2019.A Neural Influence Diffusion Model for Social Recommendation. In
Proceedingsof the 42nd International ACM SIGIR Conference on Research and Development inInformation Retrieval . 235β244.[45] Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and Tat-Seng Chua. 2016. Discrete Collaborative Filtering. In
Proceedings of the 39thInternational ACM SIGIR conference on Research and Development in InformationRetrieval . 325β334.[46] Yiding Zhang, Xiao Wang, Xunqiang Jiang, Chuan Shi, and Yanfang Ye. 2019.Hyperbolic Graph Attention Network. In
CoRR abs/1912.03046 .[47] Γaglar GΓΌlΓ§ehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu,Karl Moritz Hermann, Peter W. Battaglia, Victor Bapst, David Raposo, AdamSantoro, and Nando de Freitas. 2019. Hyperbolic Attention Networks. In