[PDF] Learning Fair Representations for Recommendation: A Graph-based Perspective

Abstract

As a key application of artificial intelligence, recommender systems are among the most pervasive computer aided systems to help users find potential items of interests. Recently, researchers paid considerable attention to fairness issues for artificial intelligence applications. Most of these approaches assumed independence of instances, and designed sophisticated models to eliminate the sensitive information to facilitate fairness. However, recommender systems differ greatly from these approaches as users and items naturally form a user-item bipartite graph, and are collaboratively correlated in the graph structure. In this paper, we propose a novel graph based technique for ensuring fairness of any recommendation models. Here, the fairness requirements refer to not exposing sensitive feature set in the user modeling process. Specifically, given the original embeddings from any recommendation models, we learn a composition of filters that transform each user's and each item's original embeddings into a filtered embedding space based on the sensitive feature set. For each user, this transformation is achieved under the adversarial learning of a user-centric graph, in order to obfuscate each sensitive feature between both the filtered user embedding and the sub graph structures of this user. Finally, extensive experimental results clearly show the effectiveness of our proposed model for fair recommendation. We publish the source code at this https URL

Full PDF

LLearning Fair Representations for Recommendation:A Graph-based Perspective

Le Wu

Hefei University of TechnologyInstitute of Artificial Intelligence,Hefei Comprehensive NationalScience CenterIntelligent Interconnected SystemsLaboratory of Anhui [email protected]

Lei Chen ∗ Hefei University of TechnologyIntelligent Interconnected SystemsLaboratory of Anhui [email protected]

Pengyang Shao

Hefei University of TechnologyIntelligent Interconnected SystemsLaboratory of Anhui [email protected]

Richang Hong

Hefei University of TechnologyIntelligent Interconnected SystemsLaboratory of Anhui [email protected]

Xiting Wang

Microsoft [email protected]

Meng Wang ∗ Hefei University of TechnologyInstitute of Artificial Intelligence,Hefei Comprehensive NationalScience CenterIntelligent Interconnected SystemsLaboratory of Anhui [email protected]

ABSTRACT

As a key application of artificial intelligence, recommender sys-tems are among the most pervasive computer aided systems to helpusers find potential items of interests. Recently, researchers paidconsiderable attention to fairness issues for artificial intelligenceapplications. Most of these approaches assumed independence ofinstances, and designed sophisticated models to eliminate the sen-sitive information to facilitate fairness. However, recommendersystems differ greatly from these approaches as users and itemsnaturally form a user-item bipartite graph, and are collaborativelycorrelated in the graph structure. In this paper, we propose a novelgraph based technique for ensuring fairness of any recommendationmodels. Here, the fairness requirements refer to not exposing sen-sitive feature set in the user modeling process. Specifically, giventhe original embeddings from any recommendation models, welearn a composition of filters that transform each user’s and eachitem’s original embeddings into a filtered embedding space basedon the sensitive feature set. For each user, this transformation isachieved under the adversarial learning of a user-centric graph, inorder to obfuscate each sensitive feature between both the filtereduser embedding and the sub graph structures of this user. Finally,extensive experimental results clearly show the effectiveness of ourproposed model for fair recommendation. We publish the sourcecode at https://github.com/newlei/FairGo. ∗ Lei Chen and Meng Wang are corresponding authors.This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-8312-7/21/04.https://doi.org/10.1145/3442381.3450015

CCS CONCEPTS • Information systems → Collaborative filtering ; •

Human-centeredcomputing → User models.

KEYWORDS graph based recommendation, user modeling, fairness, fair Repre-sentation learning, fair Recommendation

ACM Reference Format:

Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and MengWang. 2021. Learning Fair Representations for Recommendation: A Graph-based Perspective . In

Proceedings of the Web Conference 2021 (WWW ’21),April 19–23, 2021, Ljubljana, Slovenia.

ACM, New York, NY, USA, 11 pages.https://doi.org/10.1145/3442381.3450015

With the information explosion, recommender systems have beenwidely deployed in most platforms and have penetrated our dailylife [5, 17, 21, 32, 35]. These systems shape the news we consume,the movie we watch, the restaurant we choose, the job we seek andso on. While recommendation systems could better help users tofind potentially interesting items, the recommendation results arealso vulnerable to biases and unfairness. E.g., current recommenda-tion results are empirically shown to favor a particular demographicgroup over others [8, 9]. Career recommendation shows apparentgender-based discrimination even for equally qualified men andwomen [20]. Ad recommendation results display racial biases be-tween users with similar preferences [27].As biases in algorithms have been ubiquitous in these humancentric artificial intelligence applications, how to evaluate and im-prove algorithmic fairness to benefit all users has become a hot re-search topic [14, 25]. Given a specific sensitive attribute, researchershave designed metrics for measuring fairness in supervised set-tings [6, 14]. These metrics encourage the proportion of sensitive a r X i v : . [ c s . I R ] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang E m b e dd i n g Sp a ce CF Model v User embeddingFirst orderSecond orderThird order u u Average Aggregation Average Aggregation Average Aggregation

Gender ?Age?Occupation ? U s er - I t e m G r a ph (a) Data processing step Model Input Sensitive attribute prediction performanceGender(AUC) Age(F1) Occupation(F1)PMF User embedding 0.6615 0.3821 0.1332First order 0.6181 0.3569 0.1407Second order 0.5102 0.3431 0.1405Third order 0.5004 0.3234 0.1289GCN User embedding 0.7041 0.4215 0.1485First order 0.6804 0.3782 0.1474Second order 0.5811 0.3509 0.1418Third order 0.5129 0.3449 0.1296 (b) Attribute prediction performance

Figure 1: Performance of two recommendation models (PMF [24] and GCN [21]) for sensitive attribute prediction on MovieLensdataset. After learning user and item embeddings, we then extract the 𝑙 -th user-centric subgraph embedding of each user. Thelearned 𝑙 -th order embedding vector is treated as feature input for sensitive attribute prediction. We observe that each 𝑙 -thorder user-centric graph representation is helpful for attribute prediction. Details can be found in the experiments. attribute values in a protected group classified as positive is identi-cal to that of the unprotected group [7, 14]. Among all debiasingmodels, fair representation learning has become very popular andwidely studied due to the simplicity, generality and the advancesof representation learning techniques [2, 3, 7, 33, 37]. These fairrepresentation learning approaches learn data representations tomaintain the main task while filtering any sensitive informationhidden in the data representations. The fairness requirements areachieved by specific fairness regularization terms [34, 37, 38], orrelied on adversarial learning techniques [12] that try to matchthe conditional distribution of representations given each sensitiveattribute value to be identical [2, 3, 7, 33].In this paper, we focus on fair representation learning for fairrecommendation, which tries to eliminate sensitive informationin the representation learning [2, 3, 37]. Here, the fairness require-ments refer to the fact that recommender systems do not expose anysensitive user attribute, such as gender, occupation. In fact, state-of-the-art recommender systems rely on learning user and item em-beddings for recommendation. E.g., the popular latent factor modelslearn free user and item embeddings for recommendation [24, 26].Recently, researchers argued that users and items naturally form auser-item bipartite graph structure [31, 32], and neural graph basedmodels learn the user and item embeddings by injecting the graphstructure in user and item embedding process, then receive state-of-the-art recommendation performance [21, 35]. As learning user anditem representations have become the key building block for mod-ern recommender systems, we also focus on learning fair user anditem embeddings, such that the fair representation learning couldbe integrated into modern recommendation architecture. In otherwords, the fair recommendation problem turns to learning fair userand item representations, such that any sensitive information couldnot be exposed from the learned embeddings.In fact, even the user-item interaction behavior do not explicitlycontain any user sensitive information, directly applying state-of-the-art user and item representation learning would lead to usersensitive information leakage, due to the widely supported cor-relation between user behavior and her attributes in social theo-ries [18, 30, 31]. E.g., a large scale study shows that users’ privatetraits (e.g., gender, political views) are predictable from their likebehaviors from Facebook . Therefore, a naive idea is to borrow the current fairness-aware supervised machine learning techniques toensure fairness on the user embeddings. This solution alleviatesunfairness of user representation learning to some extend. However,we argue that it is still far from satisfactory due to the uniqueness ofthe recommendation problem. Most fairness based machine learn-ing tasks assume independence of entities, and eliminate unfairnessof each entity independently without modeling the correlationswith other entities. In recommender systems, users and items nat-urally form a user-item bipartite graph, and are collaborativelycorrelated in the systems. In these systems, each user’s embed-ding is not only related to her own behavior, but also implicitlycorrelated with similar users’ behaviors, or the user’s behavior onsimilar items. The collaborative correlation between users break theindependence assumption in previous fairness based models, andis the foundation of collaborative filtering based recommendation.As such, even though a user’s sensitive attributes are eliminatedfrom her embedding, the user-centric structure may expose hersensitive attribute and lead to unfairness. To validate this assump-tion, we show an example of how a user’s attribute can be inferredfrom the local graph structure of this user with state-of-the-artembedding models. It can be observed from Figure 1b that the at-tributes of users are not only exposed through her embedding, butalso through surrounding neighbors’ embeddings. This preliminarystudy empirically shows that each user’s sensitive attributes arealso related to the user-centric graph. As users and items form agraph structure, it is important to learn fair representations forrecommendation from a graph based perspective.To this end, in this paper, we propose a graph based perspectivefor fairness aware representation learning of any recommenda-tion models. We argue that as the recommendation models arediversified and complicated in the real production environment,the proposed model should better be model-agnostic. By defin-ing a sensitive feature set, our proposed model takes the user anditem embeddings from any recommendation models as input, andlearns a filter space such to obfuscate any sensitive informationin the sensitive attribute set, while simultaneously maintains rec-ommendation accuracy. Specifically, we learn a composition ofeach sensitive attribute filter that transforms each user’s and item’soriginal embeddings into a filtered embedding space. As each usercan be represented as an ego-centric graph structure, the filters earning Fair Representations for Recommendation:A Graph-based Perspective WWW ’21, April 19–23, 2021, Ljubljana, Slovenia are learned under a graph based adversarial training process. Eachdiscriminator tries to predict the corresponding attribute, and thefilters are trained to eliminate any sensitive information that maybe exposed by the user-centric graph structure. Finally, we per-form extensive experimental results on two real-world datasetswith varying sensitive information. The results clearly show theeffectiveness of our proposed model for fair recommendation.

Recommendation Algorithms.

In a recommender system, thereare two sets of entities: a user set 𝑈 ( | 𝑈 | = 𝑀 ) , and an item set 𝑉 ( | 𝑉 | = 𝑁 ) . Users interact with items to form a user-item interactionmatrix R ∈ R 𝑀 × 𝑁 . If user 𝑢 has rated item 𝑣 , then 𝑟 𝑢𝑣 is the detailedrating value, otherwise 𝑟 𝑢𝑣 =

0. Naturally, we could formulate auser-item bipartite graph as G = < 𝑈 ∪ 𝑉 , A > , with A is formulatedbased on the rating matrix R as: A = (cid:20) R 0 𝑁 × 𝑀 𝑀 × 𝑁 R T (cid:21) . (1)Learning high quality user and item embeddings has become thebuilding block for successful recommender systems [21, 24, 28, 32].Let E ∈ R 𝐷 ×( 𝑀 + 𝑁 ) denote the embeddings of users and items learnedby a recommendation 𝐸𝑛𝑐 : E = 𝐸𝑛𝑐 (G) = [ e , ..., e 𝑢 , ..., e 𝑣 , ... e 𝑀 + 𝑁 ] .After that, the predicted preference ˆ 𝑟 𝑢𝑣 of user 𝑢 to item 𝑣 is cal-culated as the inner product between the corresponding user anditem embeddings as: ˆ 𝑟 𝑢𝑣 = e 𝑇𝑢 × e 𝑣 .Currently, there are two classes of embedding approaches: theclassical latent factor based models [24, 26] and neural graph basedmodels [21, 28]. Latent factor models adopt matrix factorization ap-proaches to learn the free user and item ID embeddings. In contrast,the neural graph based models iteratively stack multiple graph con-volution layers for node embedding in this user-item graph. At eachiteration 𝑙 +

1, each node’s embedding at this layer is a convolutionneighborhood aggregation by neighborhood’s embeddings at layer 𝑙 . Empirically, these neural graph based models show better perfor-mance by injecting the collaborative signal hidden in the graph foruser and item enbeddubg learning [21, 28]. Algorithmic Fairness and Applications.

As machine learn-ing and data mining are widely applied for knowledge discoveryto guide automated decision making, there is much interest in dis-covering, measuring and ensuring fairness [1, 23, 25]. Among allfairness metrics, group fairness is widely used to measure howthe underrepresented group is treated in this process [14]. Currentsolutions for fairness requirements can be classified into causalbased approaches [16, 19], ranking based models [1], and fair rep-resentation learning models [3, 7, 23, 37]. In this paper, we focuson fair representation learning due to its generality and the recentrapid progress of representation learning techniques. Fair represen-tation learning models either added fairness-based regularizationterms [34, 37, 38] in the objective function or relied on the adver-sarial learning models to ensure group fairness [3, 23]. Borrowingthe success of GANs [12], adversarial fair representation modelshave a feature learning module and an additional discriminatorto guess the sensitive information. These two parts play a mini-max game, and adversarial upper bounds on group fairness metricscan be achieved [23]. Compared to the manually defined fairness regularization terms, adversarial training for fairness shows the the-oretical elegance and the learned representations can be transferredfor many downstream tasks. Most of the current fairness represen-tation learning focused on binary supervised tasks. A recent worktackled the problem of learning fair representation learning fromgraph [3]. This approach advanced previous works with state-of-the-art graph embedding based representation learning models, anda composition of discriminators for modeling the correlation of sen-sitive features [3]. However, the graph structure is only utilized foraccurate node embedding learning, and the fairness is still achievedby independently filtering out each node’s sensitive information.E.g., in recommender systems with user-item bipartitie graph, thismodel may lead to unfairness as users’ sensitive information isexposed by the items they like.

Recommendation Fairness.

In recommender systems, researchersobserved popularity and demographic disparity of the currentuser-centric applications and recommender systems, with differ-ent demographic groups obtain different utility from the recom-mender systems [8, 9, 20]. Researchers empirically showed that,the post-processing technique that improves recommendation di-versity would amplify user unfairness [22]. Researchers proposedfour new metrics for collaborative filtering based recommendationwith a binary sensitive attribute, in order to measure the discrep-ancy between the prediction behavior for disadvantages users andadvantaged users. Theses fairness metrics are treated as fairnessregularization terms for group fairness in recommendation [34].A fairness aware tensor based recommendation is proposed byisolating sensitive attributes in the latent factor matrix, and theremaining features are regularized to keep away from sensitiveattributes [38]. Instead of directly debiasing results in the modellearning process, re-ranking models are also applied in search andrecommendation systems, with the well designed fairness metricsto guide the learning to process to mitigate the disparity [1, 10].Some studies tried to find the casual effect or design explainablemodels for users’ behaviors, in order to ensure fairness grounded oncausal effect or explainable components [16]. We differ from theserecommendation fairness models as we argue that users naturallyform a user-item bipartite graph structure, and users’ sensitive in-formation can be exposed from her local graph structure. Therefore,we consider the fairness issue from a graph based perspective.

FAIRGO

MODEL

Most recommender systems are based on embedding based models,and can be very complex and time-consuming due to the largevolume of users and heterogeneous data [5, 35]. Therefore, the userand item embedding learning process are performed offline, andit is nearly impossible to retrain the embedding models from timeto time. We attempt to design a model that takes user and itemembeddings from any recommendation model as input, i.e., E , andour goal is to achieve a model-agnostic based fair representationlearning in the filtered space. Here, the fairness requirements referto a protected or sensitive user attribute set X ∈ R 𝐾 ∗ 𝑀 with 𝐾 sen-sitive attributes (e.g., gender and age), which are not encouraged tohelp recommendation. In the following, we introduce our proposed Fair G raph based Rec O mmendation (FairGo) model for fairness WW ’21, April 19–23, 2021, Ljubljana, Slovenia Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang requirements in recommender systems, followed by the theoreticalanalysis.

Given the original embedding matrix E and the sensitive attributes X , FairGo designs a combination of 𝐾 sub filters as the filter structure F to remove information about the user protected attributes X , suchthat each node (user node or item node) is filtered from the originalembedding space E to a filtered embedding space as: F = F( 𝐺, E , X ) ,with F = [ F 𝑈 , F 𝑉 ] ∈ R 𝐷 ×( 𝑀 + 𝑁 ) . As there are 𝐾 sensitive attributes,the filter network F is composed of 𝐾 sub filters as: F = [F 𝑘 ] 𝐾𝑘 = ,with each sensitive attribute 𝑘 is associated with a sub filter F 𝑘 .Then, each entity (user or item) is filtered and represented in thefiltered embedding space as: f 𝑖 = (cid:205) 𝐾𝑘 = F 𝑘 ( e 𝑖 ) 𝐾 . (2)

Given the filtered embedding space, the predicted preference ˆ 𝑟 𝑢𝑣 of user 𝑢 to item 𝑣 is calculated as: ˆ 𝑟 𝑢𝑣 = f 𝑇𝑢 × f 𝑣 . (3) With the overall filter network structure to filter original em-beddings in a filter space, we argue that the fairness-aware recom-mender systems need to satisfy two goals: representative for users’personalized preferences while fair for the sensitive attributes. Onone hand, the filtered embeddings should be representative of users’preferences to facilitate recommendation accuracy. On the otherhand, these filtered embeddings should be fair and do not leak anyinformation that correlates to each user’s personalized sensitiveinformation.In this paper, we adopt adversary training techniques to achievefairness. Specifically, given the filtered networks [F 𝑘 ] 𝐾𝑘 = , there are 𝐾 d iscriminator sub networks. By taking the filtered embedding f 𝑢 as input, the 𝑘 -th sub discriminator attempts to predict the value ofthe 𝑘 -th sensitive attribute. In other words, each sub discriminator D 𝑘 works as a classifier to guess the 𝑘 -th attribute. The filter net-work and the discriminator network play the following two-playerminimax game with the following value function 𝑉 (F , D) :arg max F arg min D 𝑉 (F , D) = 𝑉 𝑅 − 𝜆𝑉 𝐺 (4) = E ( 𝑢,𝑣,𝑟,𝑥 𝑢 )∼ 𝑝 ( E , R , X ) [ ln 𝑞 R ( 𝑟 |( f 𝑢 , f 𝑣 ) = F ( e 𝑢 , e 𝑣 ))− 𝜆 ln 𝑞 D ( 𝑥 |( f 𝑢 , p 𝑢 ) = F ( e 𝑢 , e 𝑣 ))] , (5)where 𝑉 𝑅 is the log likelihood of the rating distribution and 𝑉 𝐺 is the log likelihood of the predicted attribute distribution. 𝜆 is abalance parameter that balances these two value functions. When 𝜆 equals zero, the fairness requirements disappear.For the rating distribution, we assume it follows a Gaussiandistribution, with the mean of the Gaussian distribution is thepredicted rating as shown in Eq.(3). Therefore, the value functionof 𝑉 𝑅 is modeled as: 𝑉 𝑅 = − 𝑀 ∑︁ 𝑢 = 𝑉 ∑︁ 𝑣 = ( 𝑟 𝑢𝑣 − ˆ 𝑟 𝑢𝑣 ) , (6) where the precision parameter in the Gaussian distribution is omit-ted as we can perform a reweight trick by tuning the balance pa-rameter 𝜆 of these two tasks. Given the sensitive attribute vector x 𝑖 , a naive idea is to design thevalue function based on the current node’s embedding as: 𝑉 𝑁 = E ( 𝑢,𝑣,𝑟,𝑥 𝑢 ) 𝐾 ∑︁ 𝑘 = 𝑥 𝑢𝑘 𝑙𝑛 D 𝑘 ( f 𝑢 ) . (7) In fact, the above value function only considers the fairness inthe filtered embedding space with independence assumption ofusers. In recommender systems, users and items form a user-itembipartite graph. For each user 𝑢 , we use 𝐺 𝑢 to denote the ego-centric network of user 𝑢 in the user-item graph 𝐺 . Specifically, theego-centric network 𝐺 𝑢 takes 𝑢 as the central node, and is a localneighborhood network that spans from 𝑢 . With the ego-centricnetwork 𝐺 𝑢 of 𝑢 , the goal towards fairness requirements is that, 𝑢 ’ssensitive attribute is not exposed by her local network 𝐺 𝑢 .The above Eq.(7) simplifies the user-centric graph 𝐺 𝑢 as a filterednode level representation, i.e., f 𝑢 . Nevertheless, the independenceassumption among users is not well supported in the user-itembipartite graph. In fact, the collaborative correlations between usersare the foundation for building recommender systems. In the user-item bipartite graph 𝐺 , users are correlated through items in thisgraph structure. Trivial global representation f 𝑢 of each user 𝑢 maynot well capture the local graph structure of this user. Therefore,given the filtered node embedding space, we also seek to obtain anego-centric graph based structure representation of each user 𝑢 as: p 𝑢 = P ( 𝐺 𝑢 , F ) = P ( 𝐺 𝑢 , F( 𝐺, E , X ))) , (8) where P is a structure representation function of the local graphsummary of a user, and p 𝑢 is the output of the patch network thatsummarizes user 𝑢 from her ego-centric graph structure 𝐺 𝑢 .E.g.,it can be an aggregation of a user’s up to 𝐿 -th order neighbor-hood representation, or can be implemented with state-of-the-artsophisticated graph representation learning models [36].Similar as Eq.(7), given the local graph structure summary p 𝑢 of each user 𝑢 , we also employ adversarial training to ensure eachuser’s sensitive attributes are not exposed by local graph structure: 𝑉 𝑆 = E ( 𝑢,𝑣,𝑟,𝑥 𝑢 ) 𝐾 ∑︁ 𝑘 = 𝑥 𝑢𝑘 𝑙𝑛 D 𝑘 ( p 𝑢 ) . (9) As such, the fairness requirement is defined under the graphbased adversarial learning process, with each user’s ego-centricnetwork structure is summarized as both the node-level basedvalue function in as shown in Eq.(7) and the graph structure levelfunction in Eq.(9). Then, the fairness based value function 𝑉 𝐺 is acombination of these two parts as: 𝑉 𝐺 = 𝑉 𝑁 + 𝑉 𝑆 , where the firstpart captures the node-level fairness, and the second part modelsthe ego-centric fairness. Now we focus onhow to model ego-centric fairness based value function 𝑉 𝑆 . In otherwords, we need to build a summary network p 𝑢 for better ego-centric representation. Weighted Average Pooling.

A simple yet effective implemen-tation of the ego-centric graph summary structure as: p 𝑢 = P ( 𝐺 𝑢 , F ) = (cid:205) 𝑣 ∈ 𝐴 𝑢 𝑟 𝑢𝑣 f 𝑣 (cid:205) 𝑣 ∈ 𝐴 𝑢 𝑟 𝑢𝑣 , (10) where p 𝑢 is the average filter embedding of local first order neigh- earning Fair Representations for Recommendation:A Graph-based Perspective WWW ’21, April 19–23, 2021, Ljubljana, Slovenia ... v Node Level FairnessEgo-centric Level Fairness. v Sensitive AttributesEgo-centric Network 𝐺 𝒖 Original Embedding Matrix 𝑬 Original EmbeddingSpace Filtered EmbeddingSpace 𝓕 𝓕 𝓕 𝑲 𝓓 𝓓 i : F … u vv v v u u 𝐞 * 𝐞 + Filter Network 𝐟 * 𝐟 + 𝐩 * ... 𝓓 𝑲 Figure 2: The overall structure of our proposed

FairGo model. bors of user 𝑢 given the graph 𝐺 𝑢 .The above pooling technique is adopted for summarizing firstorder user-centric network, i.e., the direct connected neighborsof user 𝑢 . For modeling the up to 𝐿 -th higher order user-centricnetwork, we extend Eq.(10) to aggregate the up to 𝐿 -th order ego-centric graph structure as: h 𝑖 = (cid:205) 𝑗 ∈ 𝐴 𝑖 ( 𝑎 𝑖𝑗 f 𝑗 ) (cid:205) 𝑗 ∈ 𝐴 𝑖 𝑎 𝑖𝑗 , ∀ 𝑙 ≥ , h 𝑙𝑖 = (cid:205) 𝑗 ∈ 𝐴 𝑖 ( 𝑎 𝑖𝑗 h 𝑙 − 𝑗 ) (cid:205) 𝑗 ∈ 𝐴 𝑖 𝑎 𝑖𝑗 (11) p 𝑢 = 𝐿 𝐿 ∑︁ 𝑙 = h 𝑙𝑢 (12) ,where 𝑎 𝑖 𝑗 is an edge weight in edge weight matrix A (Eq.(1)). 𝐴 𝑖 isthe subset that directly connects to node 𝑖 in this matrix. Eq.(11)calculates each node’s 𝑙 -th order ego-centric graph representation,and Eq.(11) averages each layer’s representation as the ego-centricrepresentation.However, this simple average aggregation fails, as it does notaccount for the different higher graph structure in the modelingprocess. As illustrated in Figure 1b, as 𝑙 increases, each 𝑙 -th orderneighbor becomes more distant from the current user, and the abilityof distant neighbors to expose this user’s sensitive informationbecomes smaller compared to the closer neighbors. Local Value Aggregation . For each user 𝑢 , instead of directlymodeling the up-to-K-th order subgraph representation, we arguethat her sensitive attribute is better not exposed by any 𝑙 -th layeruser-centric graph structure representation h 𝑙𝑢 . Let 𝑉 𝑙 denote thevalue function of the 𝑙 -th subgraph structure, we have the followingvalue function: 𝑉 𝑙𝑆 = E ( 𝑢,𝑣,𝑟,𝑥 𝑢 ) 𝐾 ∑︁ 𝑘 = 𝑥 𝑢𝑘 𝑙𝑛 D 𝑘 ( h 𝑙𝑢 ) . (13) After that, the subgraph based value function in Eq.(9) is a com-bination of the up to 𝐿 -th order value function:l 𝑉 𝑆 = 𝜆 𝑉 𝑆 + ... + 𝜆 𝑙 𝑉 𝑙𝑆 + ... + 𝜆 𝐿 𝑉 𝐿𝑆 = 𝐿 ∑︁ 𝑙 = 𝜆 𝑙 𝑉 𝑙𝑆 , (14) where 𝜆 𝑙 is a balance parameter that needs to be tuned to balancedifferent 𝑙 -th order value function. The larger the 𝜆 𝑙 , the moreimportant this 𝑙 -th order value function. Learning based Aggregation.

The above local value aggrega-tion function needs to involve human labor to manually tune thebalance parameter 𝜆 𝑙 . For each user 𝑢 , we propose to directly learnthe ego-centric representation 𝑝 𝑢 with each 𝑙 -th layer representa-tion h 𝑙𝑢 . We propose to adopt deep neural networks to learn the sub-graph representation. We use a Multilayer Perceptron (MLP)to model the non-linear aggregation of all layers for sub graphrepresentation, as MLPs are powerful to approximate any universalcomplex functions [11]. Specifically, the learning based aggregationinvolves an MLP to learn the final ego-centric graph embedding as: p 𝑢 = 𝑀𝐿𝑃 ( h 𝑢 , h 𝑢 , ..., h 𝐿𝑢 ) , (15) where the learnable parameters are the parameters in the MLPstructure, which can be learned with other parameters in a unifiedtraining procedure.Please note that, someone may argue that there are advancedgraph embedding models with carefully designed architecture forlearning the ego-centric graph representation. As the focus of thispaper is not to design more sophisticated graph embedding models,we use a simple yet effective summary network for ego-centricgraph representation, and focus on whether modeling the graphstructure is effective for fair representation learning. In this section, we theoretically analyze the implications of ourproposed model.Specifically, in the supplementary material, we show that theoverall value function in Eq.(4) can be seen as independent combi-nations of each sub discriminator D 𝑘 with attribute 𝑘 . Without lossof generality, we consider the overall value function with regard tothe 𝑘 -th attribute is: 𝑉 (F , D 𝑘 ) = E ( 𝑢,𝑣,𝑟,𝑥 )∼ 𝑝 ( E , R , X ) [ ln 𝑞 R ( 𝑟 |F ( 𝐺 𝑢 , E , X ))− (16) 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |F ( 𝐺 𝑢 , E , X ))] . Since both the rating prediction part and the discriminator relyon the filtered embeddings F = F ( 𝐺, E , X ) that is directly filteredfrom the original embeddings E from any recommendation model,we define an alternative distribution over the filtered embeddingspace F as follows: ˆ 𝑝 ( f 𝑢 , f 𝑣 , p 𝑢 ,𝑟, 𝑥 ) = ∫ e 𝑢, e 𝑣 ˆ 𝑝 ( e 𝑢 , e 𝑣 , f 𝑢 , f 𝑣 , p 𝑢 ,𝑟, 𝑥 ) 𝑑 ( e 𝑢 , e 𝑣 ) = ∫ e 𝑢, e 𝑣 𝑝 ( e 𝑢 , e 𝑣 ,𝑟, 𝑥 ) 𝑝 F ( f 𝑢 , f 𝑣 , p 𝑢 | e 𝑢 , e 𝑣 ) 𝑑 ( e 𝑢 , e 𝑣 ) = ∫ e 𝑢, e 𝑣 𝑝 ( e 𝑢 , e 𝑣 ,𝑟, 𝑥 ) 𝛿 (F( 𝐺 𝑢 , E , X ) = ( f 𝑢 , f 𝑣 , p 𝑢 )) 𝑑 ( e 𝑢 , e 𝑣 ) . (17) With the alternative distribution that relies on the filtered em-bedding space in Eq.(24), we replace Eq.(23) to:

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang 𝑉 (F , D 𝑘 ) = E ( f 𝑢 , f 𝑣 , p 𝑢 ,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢 , f 𝑣 , p 𝑢 ,𝑟,𝑥 ) [ ln 𝑞 R ( 𝑟 |F ( 𝐺 𝑢 , E , X ))− (18) 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |F ( 𝐺 𝑢 , E , X ))] . After that, we have the following propositions.Lemma 1.

If the discriminator network has enough capacity, theoptimal solution of 𝑞 ∗D 𝑘 is ˆ 𝑝 ( 𝑥 𝑢𝑘 | f 𝑢 , p 𝑢 ) . Proof. Given the equality constraints of the predicted proba-bility distribution (cid:205) 𝑥 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) =

1, we can obtain theLagrangian dual optimization problem, and solve it. We show thedetails of this proof in the supplementary material. □ Lemma 2.

When 𝜆 → ∞ , if both the filter F and the discriminator D have enough capacity, and at each step the discriminator and filterare allowed to reach their optimal values. Then, the filtered embeddingspace is conditionally independent with the sensitive attributes. Proof. In fact, when 𝜆 → ∞ , 𝑉 𝑅 disappears in Eq.(4). And theabove proposition could be easily validated if we check the proofsin Section 4 of the original GAN model [12]. □ However, the above proposition is too strict as when 𝜆 → ∞ , therating prediction objective 𝑉 𝑅 is discarded in the proposed model.In the following, we do not give restrictions on 𝜆 , and give a detailedanalysis of the objective function in Eq.(4).Theorem 3. Given enough capacity of the discriminator network,the objective function in Eq. (26) is equivalent to min F 𝐻 ( R | F ) − 𝜆𝐾𝐻 ( X 𝑘 |( F , P )) , i.e., minimizing the conditional entropy between theratings and filtered embeddings, while maximizing the conditionalentropy between the sensitive attribute and the filtered embeddings. Proof. By replacing the best discriminator in Lemma 1, theobjective goal in Eq. (26) is equal to:arg min F 𝑉 (F , D 𝑘 ∗ ) = E ( f 𝑢 , f 𝑣 , p 𝑢 ,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢 , f 𝑣 , p 𝑢 ,𝑟,𝑥 ) [ ln 𝑞 𝑅 ( 𝑟 |( f 𝑢 , f 𝑣 ))− (19) 𝜆𝐾 ln ˆ 𝑝 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 ))] = − 𝐻 ( R | F ) + 𝜆𝐾𝐻 ( X 𝑘 |( F , P )) . (20) □ By combining Eq.(4) and Theorem 3, we can easily extend theabove theory to multiple sensitive attributes as:arg min F 𝑉 (F , D) = − 𝐻 ( R | F ) + 𝜆 𝐾 ∑︁ 𝑘 = 𝐻 ( X 𝑘 |( F , P )) . (21)Therefore, we have the following theorem as:Theorem 4. Given enough capacity of the discriminator network,the objective function in Eq. (4) is equivalent to min F [− 𝐻 ( R | F ) + 𝜆 (cid:205) 𝐾𝑘 = 𝐻 ( X 𝑘 |( F , P ))] , i.e., minimizing the conditional entropy be-tween the ratings and filtered embeddings, while maximizing theconditional entropy between each sensitive attribute and the filteredembeddings. Datasets.

MovieLens-1M is a benchmark dataset for recommendersystems [15]. The dataset contains 6040 users’ 1 million ratingrecords to about 4000 movies. Users are associated with three at-tributes, including gender (two classes), age (seven classes), andoccupation (21 classes) . Similar as the previous works for fair-ness based recommendation [3], we split the historical ratings intotraining and test with a ratio of 9:1. Lastfm-360K is a music recommendation dataset that containsusers’ ratings to artists collected from the music website of

Last.fm [4].The dataset contains about 360 thousand users’ 17 million recordsto 290 thousand artists. We treat the play times as the rating values.As the detailed rating values are in a large range, we first prepro-cess ratings with log transformations, and then normalize ratingsinto range 1 to 5. Users are associated with a profile, includinggender (two classes), and age. For the age attribute, we transformages into three classes . Similar as many classical recommendationdata split approaches, we split the historical ratings into training,validation, and test parts with the ratio of 7:1:2. Experimental Setup and Evaluation.

Our model involvesthree steps, a pretrained recommendation algorithm, followed bythe proposed

FairGo model for fairness consideration. After that,we need to evaluate the fairness performance. We first use the train-ing data to complete the first two steps, with the rating recordsin the training data as ground truth preference data, and the userattributes in the training data as ground truth sensitive informa-tion. The validation data is used for model parameter tuning. Whenfinishing the model training, in order to evaluate whether the sen-sitive information is exposed by the learned model, similar as manyworks for fairness models [3, 23], we randomly select 80% users’attributes as ground truth and train a linear classification model bytaking the learned fair representations. We test the classificationaccuracy on the remaining 20% users for fairness evaluation.For measuring the recommendation performance, we use RootMean Squared Error (RMSE) metric [17]. For measuring the fair-ness goal, we calculate classification performance of the 20% testusers. As the binary attribute (i.e., gender) is imbalanced on bothdatasets, with about 70% males and 30% females, we use Area Un-der Curve (AUC) metric for measuring binary classification per-formance. For the remaining attributes with multiple values, weuse micro-averaged F1 measure [13]. AUC or F1 can be used as ameasure of whether the sensitive gender information is exposedin the representation learning process. The smaller values of theseclassification metrics denote better fairness performance with lesssensitive information leakage.As our proposed model is model-agnostic and can be applied tofair recommendation with multiple attributes, we design severalexperiments with different settings for model evaluation. First, wechoose two base recommendation models: a free latent embeddingmodel of PMF [24] and a state-of-the-art GCN based recommen-dation model [21]. As this GCN based recommendation model isoriginally designed with ranking based loss function, we modify itto the rating based loss function, and add the detailed rating values https://grouplens.org/datasets/movielens/ http://ocelma.net/MusicRecommendationDataset/lastfm-360K.html earning Fair Representations for Recommendation:A Graph-based Perspective WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Table 1: Performance on MovieLens-1M. We test performance on both the single attribute and the compositional setting withmultiple sensitive attributes (denoted as Com.). Smaller values mean better performance.

Sensitive Att. PMF GCN Non-parity ICML_2019 FairGo_PMF FairGo_GCNRMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1Gen. 0.8681 0.6615 0.8564 0.7041 0.8621 0.8428 0.9203 0.5175 0.9150 0.5042 0.9068 0.5042Age 0.3821 0.4215 \ \ 0.9203 0.3420 0.9059 0.3220 0.9051 0.3140Occ. 0.1332 0.1485 \ \ 0.9186 0.1190 0.9367 0.1130 0.9069 0.1070Com.-Gen. 0.6615 0.7041 \ \ 0.9191 0.5389 0.9325 0.5026 0.9185 0.5134Com.-Age 0.3821 0.4215 \ \ 0.3620 0.3380 0.3260Com.-Occ. 0.1332 0.1485 \ \ 0.1240 0.1060 0.1250

Table 2: Performance on Lastfm-360K.

Sensitive Att. PMF GCN Non-parity ICML_2019 FairGo_PMF FairGo_GCNRMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1Gen. 0.7112 0.5506 0.7034 0.5696 0.7346 0.6649 0.7259 0.5409 0.7096 0.5428 0.7072 0.5354Age 0.4695 0.4716 \ \ 0.7204 0.4682 0.7195 0.4689 0.7061 0.4672Com.-Gen 0.5506 0.5696 \ \ 0.7173 0.5379 0.7081 0.5347 0.7049 0.5367Com.-Age 0.4695 0.4716 \ \ 0.4688 0.4681 0.4662 in the graph convolution process to facilitate our setting. Second,as the sensitive attribute setting varies, we perform experimentson both the single sensitive attribute setting and the compositionalsetting with multiple sensitive attributes. For example, we havethree settings of one single sensitive attribute, i.e., gender, age,and occupation, and one compositional setting of three sensitiveattributes on MovieLens-1M dataset.

Baselines and Parameter Setting.

We compare our proposedmodel with the following baselines, including state-of-the-art rec-ommendation models of PMF [24] and GCN [21]. To explicitlymodel the fairness metrics, we choose a state-of-the-art model thatcan leverage multiple sensitive attributes, i.e., ICML_2019 [3] as abaseline. Besides, we choose a fairness regularization based model,i.e., Non-parity [29] as a baseline. Given a binary valued sensitiveattribute, Non-parity defines different metrics for unfairness andincorporates the corresponding unfairness regularization terms forrecommendation. Each unfairness metric is based on the averagerating prediction of the advantaged group with attribute value of1 and the remaining group with attribute value of 0. Due to theconstraint that Non-parity is suitable for binary valued attributes,we apply this baseline to the gender attribute on the two datasets.In practice, in our proposed

FairGo model, we choose MLPs asthe detailed architecture of each filter and each discriminator. Thefiltered embedding size is set as 𝐷 =

64. For MovieLens dataset,each filter network has 3 layers which the hidden layer sizes as128 and 64 respectively, and the discriminator has 4 layers whichthe hidden layer sizes are 16 and 8 respectively. For Lastfm-360Kdataset, each filter network has 4 layers with the hidden layer sizesas 128, 64, 32 respectively, and each discriminator has 4 layers withthe hidden layer sizes as 16, 8, and 4. We use LeaklyReLU as theactivation function. The balance parameter 𝜆 in Eq.(4) is set as0.1 on MovieLens and 0.2 on Lastfm-360K. All the parameters aredifferentiable in the objective function, and we use Adam optimizerwith the initial learning rate of 0.005. We report the overall results in Table 1 and Table 2. In these twotables, our proposed

FairGo adopts the simple ego-centric graphrepresentation with weighted first order aggregation in Eq.(10). We have several observations from this table. First, when comparingthe results of two state-of-the-art recommendation models of PMFand GCN, GCN has better recommendation performance (smallerRMSE values) and exposes more sensitive information (larger clas-sification metric values). This is due to the fact that GCN directlymodels the graph structure for embedding learning, which allevi-ates the sparsity issue, and discovers some hidden features thatare correlated with sensitive feature set. Second, we observe thatall models that directly consider the sensitive information filterwould decrease the recommendation performance to 5% to 10%,as we need to eliminate any latent dimensions that are useful forrating, but may expose the sensitive attribute. Non-parity doesnot achieve satisfactory performance on these two dataset. Weguess a possible reason is that, the Non-parity baseline measuresthe discrepancy of the predicted ratings of the two groups, anddoes not directly remove sensitive attribute information in embed-dings. When comparing these fairness-aware models, FairGo_GCNconsiders the correlation of entities from a graph perspective andreaches the best performance for both the rating prediction and fair-ness elimination task. As to the FairGo_PMF, it has better fairnessperformance compared to ICML_2019, but the recommendationperformance of RMSE is not consistent, as it shows worse perfor-mance in the compositional setting. This is due to the fact that thebase model (i.e., PMF) in FairGo_PMF does not perform well asbase graph embedding model. Please note that, the fairness resultsof occupation have a large variance. We guess a possible reasonis that, the occupation values are imbalanced and have 21 distinctvalues. Given limited 6040 users, the adversary network is hardto train in practice. For the Lastfm dataset, Table 2 shows a simi-lar overall trend as analyzed above. Therefore, we conclude thatour proposed

FairGo framework could improve fairness with verylittle recommendation accuracy loss. By using a more advancedbase recommendation model, our proposed FairGo_GCN reachesthe best performance for both recommendation and fairness. Inthe following, we choose FairGo_GCN for detailed analysis sinceFairGo_GCN shows better recommendation and fairness comparedto FairGo_PMF.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang

Table 3: Performance of different summary networks for ego-centric structure on MovieLens-1M, with “value” denotes thelocal value function aggregation, and “learning” denotes the learning based aggregation.

Senstive Att. FairGo_PMF FairGo_GCNL=1 L=2(value) L=2(learning) L=1 L=2(value) L=2(learning)RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1Gen. 0.9150 0.5042 0.9082 0.5045 0.9055 0.5018 0.9068 0.5042 0.9070 0.5065 0.9004 0.5014Age 0.9059 0.3220 0.9077 0.3200 0.9001 0.3160 0.9051 0.3140 0.9036 0.3080 0.9045 0.3100OCC. 0.9367 0.1130 0.9332 0.1150 0.9186 0.1060 0.9069 0.1070 0.9079 0.1090 0.9010 0.1000

Table 4: Performance of different summary networks for ego-centric structure on Lastfm-360K.

Senstive Att. FairGo_PMF FairGo_GCNL=1 L=2(value) L=2(learning) L=1 L=2(value) L=2(learning)RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1 RMSE AUC/F1Gen. 0.7096 0.5428 0.7025 0.5361 0.7020 0.5357 0.7072 0.5354 0.7091 0.5442 0.7068 0.5337Age 0.7195 0.4689 0.7082 0.4678 0.7099 0.4666 0.7061 0.4672 0.7015 0.4691 0.7047 0.4669

Performance of different user-centric subgraph modeling.

In this part, we would like to explore the performance under differ-ent higher order graph modeling techniques. We focus on the exper-imental settings on single attribute. We conduct experiments on thetwo proposed approaches: local value function aggregation (Eq.(14))and learning based aggregation (Eq.(15)) with second-order user-centric subgraph. Specifically, the local value function aggregationis calculated as 𝑉 𝑆 = 𝜆 𝑉 𝑆 + 𝜆 𝑉 𝑆 , while the two parameters 𝜆 and 𝜆 are set to 4 : 1 in FairGo_PMF and 1 : 1 in FairGo_GCN.The MLP structure in the learning based aggregation has two non-linear layers and one linear layer. The results on MovieLens andLastfm-360K are shown in Table 3 and Table 4. As can be observedfrom both tables, the learning based aggregation shows the bestperformance for all settings. The local value function aggregationshows better performance than the first order neighborhood mod-eling for most settings, as it relies on manual tuning of balanceparameters. Therefore, we empirically conclude higher order graphstructure can achieve better fairness results. By using the learningbased subgraph modeling, our proposed model can further improverecommendation accuracy and fairness. However, we notice thatmodeling the higher order graph structure also introduces moreruntime, and more difficulty in the model training process.Please note that when considering the second order local graphstructure, on average each user’s ego-centric graph includes 10%nodes on MovieLens-1M and about 5000 nodes on Lastfm-360K. Ifwe further increase the layer size to 3, each user’s subgraph largelyoverlaps with the subgraph of other users. Therefore, we do notreport the results with more than 3 layers. (a) MovieLens-1M (b) Lastfm-360K Figure 3:

Performance of statistical parity measure.

Relation to group fairness.

As there are many fairness met-rics, in this part, we would show the results of our proposed model (a) MovieLens-1M (b) Lastfm-360K

Figure 4:

Performance of equal opportunity measure. on group fairness measures. For all group fairness based metrics, sta-tistical parity and equal opportunity are widely used. For attributeswith multiple values, we borrow the idea of statistical parity. Forattributes with binary values, we use the equal opportunity. Theconcrete formulas of two group fairness metrics is recorded in thesupplementary material. We show the results of statistical parityand equal opportunity in Figure 3 and Figure 4. In short, our pro-posed model achieves the best results for binary gender attribute.Our proposed model also reaches the best results for the two groupbased fairness metrics on Lastfm-360K dataset.

In this paper, we argued that most current works on fairness basedmodels assumed independence of instances, and could not be wellapplied to the recommendation scenario. To this end, we proposed a

FairGo model that considered fairness from a graph perspective forany current recommendation models. The proposed framework ismodel-agnostic and can be applied to multiple sensitive attributes.Experimental results on real-world datasets clearly showed theeffectiveness of our proposed model. In the future, we would liketo explore the potential of our proposed model to domain specificapplications, such as job or education recommendation.

ACKNOWLEDGEMENTS

This work was supported in part by grants from the National Natu-ral Science Foundation of China (Grant No. 61972125, U19A2079,U1936219, 61932009, 91846201), and CAAI-Huawei MindSpore OpenFund. earning Fair Representations for Recommendation:A Graph-based Perspective WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

REFERENCES [1] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, ZheZhao, Lichan Hong, Ed H Chi, et al. 2019. Fairness in recommendation rankingthrough pairwise comparisons. In

SIGKDD . 2212–2220.[2] Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. 2017. Data decisions andtheoretical implications when adversarially learning fair representations. In

FAT/ML .[3] Avishek Joey Bose and William Hamilton. 2019. Compositional fairness con-straints for graph embeddings. In

ICML . 715–724.[4] Òscar Celma Herrada et al. 2009.

Music recommendation and discovery in the longtail . Universitat Pompeu Fabra.[5] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks foryoutube recommendations. In

RecSys . 191–198.[6] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and RichardZemel. 2012. Fairness through awareness. In

ITCS . 214–226.[7] Harrison Edwards and Amos Storkey. 2016. Censoring representations with anadversary. In

ICLR .[8] Michael D Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D Ekstrand,Oghenemaro Anuyah, David McNeill, and Maria Soledad Pera. 2018. All the coolkids, how do they fit in?: Popularity and demographic biases in recommenderevaluation and effectiveness. In

FAT . 1–15.[9] Michael D Ekstrand, Mucun Tian, Mohammed R Imran Kazi, Hoda Mehrpouyan,and Daniel Kluver. 2018. Exploring author gender in book rating and recommen-dation. In

RecSys . 242–250.[10] Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-aware ranking in search & recommendation systems with application to LinkedIntalent search. In

SIGKDD . 2221–2231.[11] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016.

Deeplearning . Vol. 1. MIT press Cambridge.[12] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarialnets. In

NIPS . 2672–2680.[13] David Hand and Peter Christen. 2018. A note on using the F-measure for evalu-ating record linkage algorithms.

Statistics and Computing

28, 3 (2018), 539–547.[14] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity insupervised learning. In

NIPS . 3315–3323.[15] F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: Historyand context.

TIIS

5, 4 (2015), 1–19.[16] Aria Khademi, Sanghack Lee, David Foley, and Vasant Honavar. 2019. Fairnessin algorithmic decision making: An excursion through the lens of causality. In

WWW . 2907–2914.[17] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-niques for recommender systems.

Computer

42, 8 (2009), 30–37.[18] Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits andattributes are predictable from digital records of human behavior.

PNAS

NIPS . 4066–4076.[20] Anja Lambrecht and Catherine E Tucker. 2018. Algorithmic bias? An empiricalstudy into apparent gender-based discrimination in the display of STEM careerads.

Management Science (2018).[21] Chen Lei, Wu Le, Hong Richang, Zhang Kun, and Wang Meng. 2020. RevisitingGraph based Collaborative Filtering: A Linear Residual Graph ConvolutionalNetwork Approach. In

AAAI .[22] Jurek Leonhardt, Avishek Anand, and Megha Khosla. 2018. User Fairness inRecommender Systems. In

WWW . 101–102.[23] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learningadversarially fair and transferable representations. In

ICML . 3381–3390.[24] Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization.In

NIPS . 1257–1264.[25] Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-awaredata mining. In

SIGKDD . 560–568.[26] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian personalized ranking from implicit feedback. In

UAI . 452–461.[27] Latanya Sweeney. 2013. Discrimination in online ad delivery.

Queue

11, 3 (2013),10–29.[28] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.Neural Graph Collaborative Filtering. In

SIGIR . 165–174.[29] Honghao Wei, Fuzheng Zhang, Nicholas Jing Yuan, Chuan Cao, Hao Fu, Xing Xie,Yong Rui, and Wei-Ying Ma. 2017. Beyond the words: Predicting user personalityfrom heterogeneous information. In

WSDM . 305–314.[30] Le Wu, Yong Ge, Qi Liu, Enhong Chen, Richang Hong, Junping Du, and MengWang. 2017. Modeling the evolution of users’ preferences and social links insocial networking services.

TKDE

29, 6 (2017), 1240–1253.[31] Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng Wang. 2019.A neural influence diffusion model for social recommendation. In

SIGIR . 235–244. [32] Le Wu, Yonghui Yang, Kun Zhang, Richang Hong, Yanjie Fu, and Meng Wang.2020. Joint item recommendation and attribute inference: An adaptive graphconvolutional network approach. In

SIGIR . 679–688.[33] Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, and Graham Neubig. 2017.Controllable invariance through adversarial feature learning. In

NIPS . 585–596.[34] Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collabora-tive filtering. In

NIPS . 2921–2930.[35] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-ScaleRecommender Systems. In

SIGKDD . 974–983.[36] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and JureLeskovec. 2018. Hierarchical graph representation learning with differentiablepooling. In

NIPS . 4800–4810.[37] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013.Learning fair representations. In

ICML . 325–333.[38] Ziwei Zhu, Xia Hu, and James Caverlee. 2018. Fairness-aware tensor-basedrecommendation. In

CIKM . 1153–1162.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang

A PROOFS

We give the details of some proofs in Section 4, i.e., correlationbetween the overall value function (Eq.(4)) and the sub value func-tion (Eq.(16)), and proofs of lemma 1.

A.1 Correlation between the overall valuefunction (Eq.(4)) with multiple attributesand the sub value function (Eq.(16)) thatdeals with a single attribute.

The overall value function can be written as: 𝑉 (F , D) = E ( 𝑢,𝑣,𝑟,𝑥 )∼ 𝑝 ( E , R , X ) [ ln 𝑞 R ( 𝑟 |( f 𝑢 , f 𝑣 , p 𝑢 )) − 𝜆 ln 𝑞 D ( 𝑥 𝑢 |( f 𝑢 , f 𝑣 , p 𝑢 )) ] . = E ( 𝑢,𝑣,𝑟,𝑥 )∼ 𝑝 ( E , R , X ) [ ln 𝑞 R ( 𝑟 |( f 𝑢 , f 𝑣 , p 𝑢 )) ]− 𝜆 𝐾 ∑︁ 𝑘 = E ( 𝑢,𝑣,𝑟,𝑥 )∼ 𝑝 ( E , R , X ) ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , f 𝑣 , p 𝑢 )) = / 𝐾 𝐾 ∑︁ 𝑘 = E ( 𝑢,𝑣,𝑟,𝑥 )∼ 𝑝 ( E , R , X ) [ ln 𝑞 R ( 𝑟 |( f 𝑢 , f 𝑣 , p 𝑢 ))− 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , f 𝑣 , p 𝑢 )) ] , (22) where ( f 𝑢 , f 𝑣 , p 𝑢 ) = F ( 𝐺 𝑢 , E , X ) is the mapping function from theorigin embedding space to the filtered embedding space, and 𝑝 𝑢 is summarized from the filtered embedding space. This Eq. (22)corresponds to Eq.(4) in Section 3. Thus, the overall value functioncan be easily seen as a combination of each sub discriminator D 𝑘 with attribute 𝑘 . Without loss of generality, we consider the overallvalue function with regard to the 𝑘 -th attribute as: 𝑉 (F , D 𝑘 ) = E ( 𝑢,𝑣,𝑟,𝑥 )∼ 𝑝 ( E , R , X ) [ ln 𝑞 R ( 𝑟 |( f 𝑢 , f 𝑣 , p 𝑢 ))− 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , f 𝑣 , p 𝑢 )) ] . (23) This Eq. (23) corresponds to Eq.(16) in Section 4.Since both the rating prediction part and the discriminator relyon the filtered embeddings F = F ( 𝐺 𝑢 , E , X ) , we define an alterna-tive distribution over the filtered embedding space F as follows: ˆ 𝑝 ( f 𝑢 , f 𝑣 , p 𝑢 ,𝑟, 𝑥 ) = ∫ e 𝑢, e 𝑣 ˆ 𝑝 ( e 𝑢 , e 𝑣 , f 𝑢 , f 𝑣 , p 𝑢 ,𝑟, 𝑥 ) 𝑑 ( e 𝑢 , e 𝑣 ) = ∫ e 𝑢, e 𝑣 𝑝 ( e 𝑢 , e 𝑣 ,𝑟, 𝑥 ) 𝑝 F ( f 𝑢 , f 𝑣 , p 𝑢 | e 𝑢 , e 𝑣 ) 𝑑 ( e 𝑢 , e 𝑣 ) = ∫ e 𝑢, e 𝑣 𝑝 ( e 𝑢 , e 𝑣 ,𝑟, 𝑥 ) 𝛿 (F( 𝐺 𝑢 , E , X ) = ( f 𝑢 , f 𝑣 , p 𝑢 )) 𝑑 ( e 𝑢 , e 𝑣 ) . (24) With the alternative distribution that relies on the filtered embed-ding space in Eq.(24), we replace Eq.(23) to: 𝑉 (F , D 𝑘 ) = E ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 ) [ ln 𝑞 R ( 𝑟 |( f 𝑢 , f 𝑣 , p 𝑢 ))− 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , f 𝑣 , p 𝑢 )) ] , = E ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 ) [ ln 𝑞 R ( 𝑟 |F( 𝐺 𝑢 , E , X ))− 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |F( 𝐺 𝑢 , E , X )) ] . (25) This Eq. (25) corresponds to Eq.(18) in Section 4.From the above, we can split multiple attributes into independentcombinations of single attributes for analysis. Thus, the analysisof a single attribute can be easily extended to multiple attributesnaturally.

A.2 Proofs of Lemma 1

Lemma 5.

If the discriminator network has enough capacity, theoptimal solution of 𝑞 ∗D 𝑘 is ˆ 𝑝 ( 𝑥 𝑢𝑘 | f 𝑢 , p 𝑢 ) . Proof. We begin with the value function with regard to the 𝑘 -thattribute in the filtered embedding space: 𝑉 (F , D 𝑘 ) = E ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 ) [ ln 𝑞 R ( 𝑟 |F( 𝐺 𝑢 , E , X ))− 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |F( 𝐺 𝑢 , E , X )) ] . (26) Note that, p 𝑢 is an aggregation of f 𝑢 and f 𝑣 , and f 𝑣 is irrelevant tothe best solution for discriminator. In the above value function, withthe fixed embeddings F , only the second term − 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |F( 𝐺 𝑢 , E , X ) is correlated with the discriminator. Given the equality constraints of thepredicted probability distribution (cid:205) 𝑥 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) =

1, we can ob-tain the Lagrangian dual optimization problem: 𝐿 ( 𝛼 ( ℎ )) = ∑︁ ℎ 𝛼 ( ℎ ) ( − ∑︁ 𝑥 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )))− E ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 ) 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) . (27) To seek the maximum value of 𝐿 ( 𝛼 ( ℎ )) , we take the partial derivativeof 𝑞 D 𝑘 and let the partial derivative equals 0. 𝜕𝐿 ( 𝛼 ( ℎ )) 𝜕𝑞 ∗D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) = − ∑︁ ℎ 𝛼 ( ℎ ) − E ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 ) 𝜆𝐾 ln 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) 𝜕𝑞 ∗D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) = − ∑︁ ℎ 𝛼 ( ℎ ) − E ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 )∼ ˆ 𝑝 ( f 𝑢, f 𝑣, p 𝑢,𝑟,𝑥 ) ( 𝜆𝐾𝑞 ∗D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) ) . (28) By letting Eq.(28) equals zero, and we employ the equality constraint as (cid:205) 𝑥 𝑞 D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) =

1, we get: ∑︁ ℎ 𝛼 ( ℎ ) = − 𝜆𝐾 (cid:205) 𝑟 ˆ 𝑝 ( f 𝑢 , f 𝑣 , p 𝑢 , 𝑟, 𝑥 ) 𝑞 ∗D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) = − 𝜆𝐾 ˆ 𝑝 ( f 𝑢 , p 𝑢 ) . (29)After that, we substitute Eq.(29) back to Eq.(28) to get the optimal dis-criminator as: 𝑞 ∗D 𝑘 ( 𝑥 𝑢𝑘 |( f 𝑢 , p 𝑢 )) = ˆ 𝑝 ( 𝑥 𝑢𝑘 | f 𝑢 , p 𝑢 ) . (30) □ B DETAILS OF GROUP FAIRNESS RESULTS

In this part, we measure fairness based on the classification accu-racy of each sensitive attribute. Then, we show the results of ourproposed model on group fairness measures.

B.1 Statistical Parity

For all group fairness based metrics, statistical group parity is widelyused to measure the predicted rating discrepancy for binary valuedsensitive attribute [3, 34]. Correspondingly, we measure the sta-tistical parity of binary attribute (i.e., gender) in recommendationas: 1 / 𝑁 (cid:205) 𝑁𝑣 = ∥ 𝐸 𝑢 ∈ male [ ˆ 𝑟 𝑢𝑣 ] − 𝐸 𝑢 ∈ female [ ˆ 𝑟 𝑢𝑣 ]∥ . For attributes withmultiple values, we borrow the idea of statistical parity and binusers into different groups based on the different attribute values. earning Fair Representations for Recommendation:A Graph-based Perspective WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Then, we take the standard deviation of predicted ratings of eachuser group to measure statistical parity.

B.2 Equal Opportunity

Besides statistical group parity, equal opportunity is also a widelyused group fairness metric. Equalized opportunity advances statisti-cal group fairness by considering the parity of prediction accuracyof each group [14]. The equal opportunity measures group fairnessof binary attribute (i.e., gender) as: / 𝑁 𝑁 ∑︁ 𝑣 = ∥ 𝐸 𝑢 ∈ male [| | ˆ 𝑟 𝑢𝑣 − 𝑟 𝑢𝑣 | |] − 𝐸 𝑢 ∈ female [| | ˆ 𝑟 𝑢𝑣 − 𝑟 𝑢𝑣 | |] ∥ . For attributes with multiple values, we use the idea of equalopportunity for binary values of attribute, and take the standard deviation of equal opportunity of each user group to measure groupfairness. We only list our proposed

FairGo under GCN as it showsbetter performance under PMF. Besides, the performance on Non-parity is only calculated for the binary attributes.As shown in Figure 3 and Figure 4, our proposed model achievesthe best results for binary gender attribute. Our proposed modelalso reaches the best results for the two group based fairness metricson Lastfm-360K dataset. However, our proposed model could notperform the best for sensitive attributes with multiple attributes onMovieLens under the equal opportunity metric. We guess a possiblereason is that, the adversarial training process of