[PDF] "And the Winner Is...": Dynamic Lotteries for Multi-group Fairness-Aware Recommendation

Abstract

As recommender systems are being designed and deployed for an increasing number of socially-consequential applications, it has become important to consider what properties of fairness these systems exhibit. There has been considerable research on recommendation fairness. However, we argue that the previous literature has been based on simple, uniform and often uni-dimensional notions of fairness assumptions that do not recognize the real-world complexities of fairness-aware applications. In this paper, we explicitly represent the design decisions that enter into the trade-off between accuracy and fairness across multiply-defined and intersecting protected groups, supporting multiple fairness metrics. The framework also allows the recommender to adjust its performance based on the historical view of recommendations that have been delivered over a time horizon, dynamically rebalancing between fairness concerns. Within this framework, we formulate lottery-based mechanisms for choosing between fairness concerns, and demonstrate their performance in two recommendation domains.

Full PDF

““And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-AwareRecommendation

NASIM SONBOLI, ROBIN BURKE,

University of Colorado, Boulder, USA

NICHOLAS MATTEI,

Tulane University, USA

FARZAD ESKANDANIAN,

DePaul University, USA

TIAN GAO,

IBM Watson Research Center, USA

As recommender systems are being designed and deployed for an increasing number of socially-consequential applications, ithas become important to consider what properties of fairness these systems exhibit. There has been considerable research onrecommendation fairness. However, we argue that the previous literature has been based on simple, uniform and often uni-dimensionalnotions of fairness assumptions that do not recognize the real-world complexities of fairness-aware applications. In this paper,we explicitly represent the design decisions that enter into the trade-off between accuracy and fairness across multiply-definedand intersecting protected groups, supporting multiple fairness metrics. The framework also allows the recommender to adjust itsperformance based on the historical view of recommendations that have been delivered over a time horizon, dynamically rebalancingbetween fairness concerns. Within this framework, we formulate lottery-based mechanisms for choosing between fairness concerns,and demonstrate their performance in two recommendation domains.

ACM Reference Format:

Nasim Sonboli, Robin Burke, Nicholas Mattei, Farzad Eskandanian, and Tian Gao. 2020. “And the Winner Is...”: Dynamic Lotteries forMulti-group Fairness-Aware Recommendation. 1, 1 (September 2020), 17 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

In addition to the core property of accurate personalization – delivering results that match user interests and preferences– recommender systems may need to satisfy other, non-accuracy, constraints in certain applications. One property ofinterest that has received significant attention recently is fairness : a constraint that a recommender system should try todistribute its benefits fairly across different stakeholder groups [7, 13, 20, 28, 33, 54]. For example, all else being equal, ajob recommender system should not recommend executive jobs to male users and clerical jobs to female users.Key stakeholder groups in recommender systems are often identified as consumers , individuals who receive recom-mendations from the systems, providers , stakeholders for the items that are being recommended, and the system orplatform. Fairness concerns may arise from either consumers or providers [12]. In the job recommendation case above,it was fairness across groups of consumers that is of interest.In this paper, we focus on the problem of provider fairness : namely, how to ensure that a recommender system,over time, is recommending items from protected groups in a fair manner relative to others. We are interested in amulti-aspect version of this problem, where items may be associated with multiple, intersecting, protected groups.Our motivating example is drawn from the peer-to-peer micro-lending platform, Kiva.org. The users of Kiva arelenders, who support entrepreneurs usually from developing countries by lending small amounts. The organization has

Authors’ addresses: Nasim Sonboli, Robin Burke, University of Colorado, Boulder, Boulder, CO, USA, [email protected]; Nicholas Mattei,Tulane University, New Orleans, LA, USA, ; Farzad Eskandanian, DePaul University, Chicago, IL, USA, [email protected]; Tian Gao, IBM WatsonResearch Center, Yorktown Heights, NY, USA, [email protected]. Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . I R ] S e p Nasim Sonboli, Robin Burke, Nicholas Mattei, Farzad Eskandanian, and Tian Gaothe goal of providing equitable access to capital across different regions, economic sectors and borrower demographics.This organizational mission needs to be embedded in any system deployed to recommend loans to funders. Withoutsome control over the characteristics of the recommendations delivered, it is easy to imagine that a positive feedbackloop [50] could develop in which some types of loans are increasingly disadvantaged by the algorithm.In general, we may anticipate that a fairness-aware recommender system will need to respond to multiple fairnessconcerns simultaneously. In this work, we adopt a social choice perspective [9] on balancing different fairness concerns,which gives us a rich set of normative properties and algorithms. Social choice is fundamentally concerned withcombining preferences from multiple parties into a single outcome in which all parties participate, voting being aparadigmatic example of such a choice [59]. We can think of each fairness concern as a kind of actor, with preferencesover which recommendations should be delivered. Combining the preferences of multiple such concerns fits squarelyinto the social choice realm.We believe that social choice is a more flexible and realistic framework for representing fairness-aware issues inmachine learning than the optimization frameworks typically employed. Social choice is inherently multi-agent, andtherefore, the idea of the integration of multiple fairness concerns naturally emerges, rather than being a complex add-on.Importantly for recommendation problems, social choice naturally allows for heterogeneity and hence personalizationacross decision instances since a user is just another agent with preferences over the outcome. Finally, fairness isinherently a social and political construct, and a social choice formalization allows the preferences of different actors tobe foregrounded, rather than relegated to the black box of machine learning optimization. The study of fairness has along history in the social choice literature [55, 59].

Contribution.

In this paper we propose a novel framework for recommender systems we call

Social Choice forRe-ranking Under Fairness (SCRUF). SCRUF uses multiple fairness metrics to evaluate the history of recommendationdelivery and determine whether and how to adjust its performance. It uses feature-specific re-rankers to improvefairness, selecting one re-ranker (possibly non-deterministically) at each time point. We use a set of social choice inspiredalgorithms to allocate re-rankers to users based on the user preferences. This framework abstracts the particular fairnessmetrics away from the recommendation algorithm design, an approach that becomes unwieldy when attempting toincorporate multiple fairness concerns. It also supports a dynamic balance between the interplay between personalizationand fairness and is therefore sensitive to context and individual differences. We demonstrate the efficacy of our designon two recommendation domains.

A substantial body of research on fairness in machine learning, especially in classification settings, has emerged inthe past ten years, including formalizing definitions of fairness [16, 17, 22, 38] and offering algorithmic techniques tomitigate unfairness [24, 40, 56, 57]. Fairness in recommender systems emerged as a research topic more recently, first inthe work of Kamishima et al. in 2012 [25], and the topic has since drawn increased attention [12, 13, 18, 19, 27, 36, 46, 54].Recommender systems, while a subclass of machine learning systems, are different enough that the results fromclassification cannot be readily applied. Chief among these challenges is the issue of personalization. A recommendersystem is supposed to deliver suggestions tailored to each user’s preferences, providing every user with a differentexperience. As such, it differs from a classifier that establishes a classification function with a single decision boundaryfor all cases.

Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 3Both in the machine learning and the recommender systems formulation of fairness, there has been little recognitionof the intersection of multiple fairness definitions and dimensions, although recent work has noted the benefits ofcombining multiple fairness definitions [7]. Most existing research considers only a single protected class, and even incases where multiple groups are considered as in [11, 23, 29, 58], fairness is conceived the same way for all groups. Inrecognition of the complexity of the fairness concept, we seek to accommodate different definitions of fairness putforward by different stakeholders, all of which must be integrated in a single framework. This nuanced understandingof the value of fairness is essential for capturing the richness of this social construct in real-world settings such as thosestudied by scholars of organizational justice, anti-discrimination law, and social justice.Two standard approaches have emerged to integrating fairness concerns into recommender systems. The integrated approach builds a fairness constraint into the recommendation model itself, for example as a regularization constraint,balancing between accuracy and fairness in the optimization process for the recommendation model [26, 54]. The re-ranking approach applies fairness to the output of a recommendation algorithm, reordering the results. Re-rankingapproaches offer a number of advantages. First, the trade-off between accuracy and fairness can be tuned withoutre-learning the recommendation model. Second, researchers have found that re-ranking can achieve better trade-offsversus accuracy with this type of model [2, 19, 33]. Due to these advantages we choose to use the latter method.There has been some recent work in recommendation fairness that incorporates multiple fairness dimensions,most notably [48]. This work integrates user tolerance towards different types of variation in item features with arepresentation of protected groups that spans multiple dimensions. The authors were able to show a beneficial trade-offbetween fairness and accuracy and improved results across different categories of protected groups. One drawbackof the method of [48] is that it relies on weights associated with item features to boost the inclusion of protectedgroup items into recommendation lists. Balancing across different groups requires careful setting of these weights andsometimes unexpected interactions arise.In addition, this and similar methods are list-wise approaches, which aim to increase protected group representationin individual lists. However, as noted above, the real objective of fairness-aware recommendation is to enhance fairnessas measured historically, across the behavior of the system as it provides recommendations to many users over time.The personalization element of recommendation means that this objective cannot be targeted directly: any given usermay or may not constitute a good opportunity to enhance fairness relative to a particular protected group. In [48], thisaspect of the problem was addressed by incorporating user-specific weighting, which can be interpreted as a preferencein a social choice setting.Analyzing user characteristics enables the system to determine which users constitute good opportunities to pursuedifferent fairness goals. However, there is another side of the problem. At any point in time, the system’s historicalperformance may have been more favorable to one protected group than another. If we only look at the users, weignore the signal from past performance about where fairness needs are most critical.

Fairness has been extensively studied in the economic field of social choice including work focusing on the divisionof continuous resources such as land or water [37], on more discrete, indivisible settings such as goods and services[52, 53] and more fundamentally in the areas of political economy having to do with justice and fair distribution ofresources to individuals [41, 42, 55]. In its classical formulation, social choice concerns itself with the study of howgroups, where each member is endowed with their own preferences, make decisions that must be then shared by

Manuscript submitted to ACM

Nasim Sonboli, Robin Burke, Nicholas Mattei, Farzad Eskandanian, and Tian Gaothat group [47]. To these considerations the field of computational social choice adds computational tools includingalgorithms, complexity, and big data [9, 35].From the literature on social choice we will focus on the allocation setting (which is a generalization of the classicalmatching setting) [9]. In allocation, the items within A are to be distributed or allocated to the set of agents in N .Hence, the social part of social choice reinforces the idea that a set of preferences need to be considered and combinedsince the outcome of a social choice process will affect all the agents. There have been many practical applications ofmatching and allocations from kidney allocation [44] to conference paper reviewing [31]. There are extensive studies ofalgorithms for a variety of settings [34] and the study of fair allocations in multi-agent systems is a popular topic in thebroad area of artificial intelligence [4]. Equity and other concerns, formalized as economic axioms have a long historyin social choice both in allocation [55] and voting [59]. It is this long history of study of the axioms, or properties, ofthe algorithms and aspects including fairness we hope to leverage.Rather than thinking of integrating the concerns of protected groups into re-ranking decisions indirectly, in the formof weights for particular feature values as in [48], our approach Social Choice for Re-ranking Under Fairness (SCRUF)conceives of both users and protected groups as actors with preferences over the items that may be recommended. Thegoal is to achieve an integration of these preferences over the whole recommendation history. We explore a class ofsolutions to this problem that assumes multiple fairness criteria can be persued at the same time by deciding whichobjectives to address and which users to address them with. We make this choice is made non-deterministically and alsotake into account both the current state of the recommendation history, which we can think of as defining immediateneeds, and the user’s propensity towards different item categories, which define current opportunities, in the context ofproviding personalized recommendation results.A social choice perspective on recommendation has emerged in recent research as a possible source of methods tointegrate the viewpoints of multiple agents or priorities [14]. Chakraborty et al. [15] build a recommendation systemfor finding fair group recommendations through viewing them as elections between various signals of popularity,leading to a shared group recommendation. This does not take the important aspect of personalization into accountthat we address in SCRUF. Sühr et al. [49] explore driver assignment in two sided matching markets with an emphasison producer and rider fairness. Patro et al. [39] propose a recommender system for two-sided matching markets withthe goal of fair exposure amongst producers. This differs from our work in that the fairness metrics are fixed andembedded into the matching algorithms themselves. Finally, Lee et al. [30] propose a system that uses social choice toembed normative properties for algorithmic governance into the algorithms themselves, as demonstrated on a foodbank matching scenario. However, none of this research considers multiple fairness concerns on the provider side ofa recommendation system as required in the Kiva case or the dynamic response to historical fairness outcomes asembodied in SCRUF.

To leverage the power of recommender systems for personalized fairness we will define a set of choice functions topromote fairness. We first detail the formal notation for our recommender system and how to view it as a social choiceproblem. We then describe our overall system in terms of choice functions and how these can be used to promotefairness.

Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 5 F : Region F : Gender F : Sector F : Amount v Africa Male Agriculture $0-$500 v Africa Female Health $500-$1,000 v Middle-East Female Clothing $0-$500

Table 1. Set of Potential Loans.

In a recommendation system setting we have a set of users U = { u , . . . u n } and a set of items V = { v , . . . , v m } .For each item v i ∈ V we have a k -dimensional feature vector (cid:174) v i = ⟨ f i , . . . f ik ⟩ over a set of categorical features F = { F , . . . , F k } , where each feature F i has finite domain D i . We assume that all elements in V share the same set offeatures. Consider a running example of a funding site that shows micro-loans in emerging markets to potential funders.In this example we have m = |F | = { Region, Gender, Sector, Amount } ,where each has its own domain. For example, D = { Africa, Middle-East, India } . This setting is illustrated in Table 1.Though our items are comprised of a set of features, we start with the view that not all features should be treatedthe same. We assume that there is a subset of the features S ⊆ F that are denoted as sensitive and there is a subset ofvalues for each such feature, i.e., P i ⊆ D i that constitute the protected values of the sensitive feature. That is, givenall items in the recommendation system, we have a subset of sensitive features, each of which may contain protectedvalues. Turning back to our running example, we may wish to target F = { Reдion } as a sensitive feature and thevalues P = { Africa, India } as the protected values. We could designate sensitive features and protected values basedon operational goals such as regions or genders that are funded less frequently.For our recommender system we have a personalized ranking function R ( u i , V) → σ i (V) , which given user u i andset of items V produces a permutation, i.e., a ranking, over the set of items for that user, i.e. a recommendation. As apractical matter, the recommendation results will always contain a subset of the total set of items, typically the head(prefix) of the permutation σ i up to some cutoff number of items. In order to promote fairness, we assume that we are also given a set of re-ranking functions K = { κ , . . . , κ |S | } , whichare a set of functions, one for each sensitive feature. For feature j , the re-ranking function κ j ( σ ) → σ ′ will take apermutation σ and produce a new permutation σ ′ of the set of items that is more “fair” towards the particular protectedfeature values associated with S j . In real applications, the final recommendation slate is a short list of the most preferreditems from this final, re-ranked permutation.For our system we assume a common form of all re-ranking functions, where the permutation is achieved by sortingitems based on a score, and the score is a linear combination of the score from the recommender system (the determinerof the original σ ranking) and a score based on the presence of the protected feature, such that protected group itemsare moved up in the ranking list [3]. The scoring function ρ for user u , an item v , and a sensitive feature j is defined asfollows: ρ ( u , v , j ) △ = λ j ( R ( u , v ) + ( − λ j ) { v ∈S j } (1)The indicator function { v ∈S j } has the value 1 if the item v has a protected value of sensitive feature S j , and 0otherwise. λ j is a feature-specific parameter that controls the trade-off between accuracy (as represented by the original Manuscript submitted to ACM

Nasim Sonboli, Robin Burke, Nicholas Mattei, Farzad Eskandanian, and Tian Gao σ ranking) and fairness (as represented by the boost given to protected items). All of the items in the list σ are re-scoredusing ρ , sorted in decreasing order, and truncated to produce the final σ ′ recommendation list. There are a wide variety of metrics that have been proposed for measuring the fairness of a recommendation result orset of recommendation results. In our setting we are not concerned with the fairness of a particular recommendationbut rather the history of recommendations the system has generated over within some time window. Hence we track theprior history of recommendations lists that have been generated (cid:174) L = [ ℓ , . . . , ℓ t − ] for the users (with a slight abuse ofnotation) (cid:174) U = [ u , . . . , u t − ] that have appeared to the system.Rather than commit to one particular metric in our system, we assume a family of functions M j : (cid:174) L × (cid:174) U → R , onefor each sensitive feature S j , mapping from a set of recommendation results L and the set of users U to whom each ofthose results have been delivered to a value indicating the degree of fairness in the total set of results. We assume thata higher M j values indicate a fairer result. Without loss of generality, we assume that each metric has values in therange [ , ] .We will assume that the re-ranking functions have a non-decreasing impact on their associated fairness metric. Thatis, given a recommendation result σ , M j ( σ , u ) < = M j ( κ j ( σ ) , u ) . Because of this property, we can interpret a fairnessscore as indicating the relative number of times we want to select the different re-ranking functions. If the metrics wereall equal, then the different re-ranking functions would be equally desirable.Note that the inclusion of (cid:174) U as an argument to the M j functions allows us to include a family of fairness metrics thatare sensitive to the user’s level of interest in items that vary on different feature dimensions. Each recommendationresult is then evaluated relative to the user to whom it is delivered. For example, even if our recommendation historytells us we should be favoring loans in the textile sector, it may not be as valuable to recommend such loans to theagriculture-focused user, as opposed to a user that has proved to be more flexible in which sectors they support. To incorporate the social choice aspects of the problem, user preferences over both the overall set of items as wellas preferences about the re-ranking functions themselves need to be taken into account. In a traditional social choicesetting we have a finite set of agents N = { , . . . , n } and a finite set of alternatives A = { , . . . , m } . Each agent i ∈ N has a preference ≿ i over the alternatives. Typically these preference are expressed as a binary relation (weak or linearorder) over the set of alternatives A .While the user preferences are handled by the personalized ranking function R ( u i , ·) we will also incorporate thepreference over the fairness functions themselves. To this end, replacing the preferences ≿ i above, we assume thatfor each user we are also given a vector of real numbers, (cid:174) τ u i = { τ , . . . , τ k } of length k , which indicates the tolerance(preference) of u i for variation relative to feature F k . We can then view our problem as one of allocating re-rankingfunctions to users based both on the their preferences and on the current fairness status.[21] introduced the concept of personalized diversity in collaborative filtering using a user-specific measure basedon information entropy. High entropy in a categorical distribution of user profile represents high interest of user indiversity. Liu et al. [32, 33] integrated this concept for the first time in recommendation re-ranking using a quantity τ u ,a user-specific measure of interest in diversity. Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 7

Fig. 1. SCRUF framework, a snapshot at time t : On the left are the recommendation lists L computed at prior time points. Fair-ness metrics M compute the fairness state, which is input to the choice function C , selecting a re-ranker κ t that processes therecommendations ℓ from R into a final re-ranked slate ℓ ′ . (cid:174) τ u ( F j ) △ = − (cid:213) f ∈ F j P ( f | u ) log P ( f | u ) , (2)where P ( f | u ) is computed as the fraction of items in the user’s profile that have the feature value f . This can beinterpreted as the user’s likelihood of liking items with that value. The higher the entropy value is for a user on a feature,the higher their tolerance to see diversity within that feature. We assume that we can interpret this as a preference in the social choice sense. In our running example, a user may be particularly dedicated to a particular economicsector, agriculture for example, and may only have supported loans in this sector in the past. Hence, they would havelow tolerance for variation in this feature. Note that since these map onto R we could both interpret these as ordinalrankings: as a preference order for user u i , ≿ i , over the set of f j ; or as cardinal valuations. SCRUF is our framework for explicitly representing the design decisions that enter into trading off between accuracyand fairness across multiply-defined and intersecting protected groups in the setting described above. Figure 1 showsthe general process that the framework instantiates by looking at a snapshot in time. A user u t arrives at the systemand the base recommender algorithm R ( u t , V) generates a recommendation list ℓ t .SCRUF is able to accommodate different metrics, one for possibly each sensitive feature, F j , M j : (cid:174) L × (cid:174) U → R . Sincewe have access to the history of all recommendations, we can derive particular fairness results relative to the differentsensitive dimensions indicated by the meters associated with each metric M j . This set of metrics are input to a choicefunction C which picks one dimension to prioritize and selects the corresponding re-ranking function κ c , which isapplied to ℓ t resulting in a final set of recommendations ℓ ′ t that is displayed to the user. (As a practical matter, ourimplementation described below groups users into batches and calculates the fairness metrics only once per batch.)Note the arrows from the user u t point both to the recommendation algorithm R where the algorithm takes intoaccount the user’s inferred preferences over items in attempting to predict the user’s preferences, and also to the choice Manuscript submitted to ACM

Nasim Sonboli, Robin Burke, Nicholas Mattei, Farzad Eskandanian, and Tian Gaofunction C that may take into account the user’s inferred preferences from their tolerance scores (cid:174) τ u i over item featuresin choosing a re-ranking algorithm. As noted above, we are investigating both deterministic and non-deterministic mechanisms for selecting, at each pointwhen a recommendation is generated, a single κ j function to use to re-rank those recommendation to a specific user u j . The choice function C uses the current state of the recommendation history, as defined by the M j metrics overthe recommendation history so far and optionally the identity of the current user, to compute a feature c ∈ S whosecorresponding re-ranking function κ c will be applied to the recommendations for this user, i.e., κ c ( R ( u j , V)) . We preferthis simple uni-dimensional re-ranking scheme over one that attempts to incorporate multiple fairness dimensionsare considered at once because it creates independence between re-ranking operations and avoids complex parameterinteractions that might occur in attempting to compute a single re-ranking incorporating multiple fairness dimensions.One issue that may arise, and we discuss in the next section, is that we need to decide when to stop running there-rankers for a particular feature. If the historical data at time t shows that we have been fair towards a particularfeature, we do not need to promote it in this iteration. In the following we will describe how we select which featuresto consider. What remains to be specified within the SCRUF framework is the choice function C . There are wide variety of wayssuch a function could be formulated. In this work, we explore three different variants of the lottery, where probabilitiesare set for each re-ranker and a single re-ranker is chosen by sampling from this distribution.In order to pick a choice function at time t we will, for operational concerns, also have a parameter w b which definesthe historical (backward) window over which we are concerned with our fairness metric. This means that our fairnessmetrics M j are run over over the set of users and lists between [ t − w b − , t − ] . Recall that our fairness metrics areall in the range [ . , . ] with 1 . M to represent this list of values.Also, for operational reasons, we are given an ϵ which represents a non-zero cutoff or tolerance for each metric. Wewill only consider running re-rankers when M j − ϵ > .

0. This allows us to focus on the sensitive features with greaterunfairness and provides a way to guard against an over-emphasize on protected groups at the expense of accuracy.Let the unfairness vector be (point-wise) (cid:174)

U F = − (M − ϵ ) . Intuitively, this captures how unfair we are beingtowards a particular feature. The following discussion will treat this vector as a probability distribution, and so it willbe normalize to unit length, (cid:174) U F = U F / (cid:205) i ( U F i ) . Note that this will implicitly assume that our target each element of U F = /| U F | , i.e., that we want unfairness to be equal across all aspects. This is a byproduct of our metrics taking valuesin [ . , . ] since if all metrics were 1 . ϵ . Normatively, thismakes sense in that if we are being unfair the same amount to all sensitive features then we want an equal probabilitydistribution over all features. We employ two simple baseline techniques to contrast with the more dynamic options discussed below. The simplest isthe

Fixed Lottery , in which each re-ranker is chosen with equal probability for each set of recommendations delivered.

Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 9This method has the benefit of great simplicity and does not require any bookkeeping about the historical fairness ofthe system.If we want to use the information in the

U F vector, another simple alternative is a deterministic

Least Misery algorithm, in which we identify the feature in the

U F with the highest value (most unfair) and chose the associatedre-ranker. This method directs the system’s attention to the dimensions with the worst historical performance andattempts to correct that. It will be dynamic in the sense that as the performance improves in one dimension, anothermay be chosen.

We have found that it is typical for some dimensions to be more difficult to achieve fairness for than others. In particular,some types of items are rarely retrieved by the base recommendation algorithm and therefore only small improvementscan be had through re-ranking. Applying the least misery algorithm in such a setting could lead the system to concentrateall of its effort on one of these intractable dimensions and miss opportunities to achieve fairness in other parts of theitem space.To avoid this problem, we can use

U F as a lottery over the re-rankers and select a re-ranker with probabilityproportional to its weight, so that the poorest performing dimensions (most unfair) would have highest probabilities ofbeing chosen. This is a

Dynamic Lottery as opposed to the fixed version above, because the probability associated witheach re-ranker will change as a function of system performance.

While the above method is sensitive to the dynamic properties of the system, it is not sensitive to each user’s particularpropensity or interest towards different dimensions. In prior work, the ability to re-rank selectively based on usercharacteristics was found to yield a better tradeoff between accuracy and fairness [33, 48]. For this reason we considerin this section randomized allocation mechanisms that consider both users and fairness concerns. In a such a mechanismwe compute a fractional allocation that we can then sample from in order to compute an assignment. So, for a given setof n agents and m objects, we compute a bi-stochastic matrix of size n × m which represents the fraction of a particularitem is allocated to an agent.We use a modification of the probabilistic serial (PS) mechanism [8]. In PS, also known as the simultaneous eatingalgorithm, each object is considered to have an infinitely divisible probability weight of one. To find the allocation everyagent, simultaneously and at the same speed, begins “eating” their most preferred object that has not been completelyconsumed already. Once an object is consumed, the agents move to their next most preferred object until all objectshave been consumed. The random allocation of an agent by PS is the amount of each object he has eaten. PS satisfies anumber of important fairness and efficiency criteria [5, 6] and has been used in real allocation settings such as courseselection at universities [10].In translating our recommendation system setting to use PS we use again the tolerance values τ of the agents asrepresentative of their preferences. As PS only requires ordinal preferences we simply use the ordering and not theactual values (breaking ties randomly when needed). A key concept in PS is the idea of an object’s capacity, how muchit is available to be allocated. We set the capacity of each re-ranker to mirror the sampling lottery probability fromabove. Specifically, each re-ranked has weight w f ∗ U F i , thus limiting the amount of that re-ranker to allocate. Wethen run the PS algorithm and get a fractional allocation for each user for each re-ranker. We interpret this fractionalallocation (normalized into a distribution) as the probability that the user should be assigned that particular re-ranker. Manuscript submitted to ACM

As our work here concentrates on ranking performance, we use normalized discounted cumulative gain (nDCG) as ourmeasure of recommendation accuracy. Note that we are only evaluating re-ranking algorithms so nDCG is limited tosome extent by the performance of the base algorithm to which the re-ranking is applied.Provider-side fairness metrics come in two basic varieties. There are those that respond to the appearance of protecteditems in a recommendation list: exposure metrics, and those that take into account the suitability of the target user as hit-based metrics [1]. In this work, we concentrate on exposure metrics, in particular, protected class exposure, whichcalculates the fraction of a retrieved recommendation list belongs to a particular protected class. This value is relatedto the fairness concept of “statistical parity,” measured relative to items’ level of promotion within the recommendersystem. Because list lengths are fixed (10 in our case), the exposure of unprotected items is just one minus the protectedgroup exposure. We note, however, that exposure metrics may overstate the effectiveness of re-ranking, since they donot evaluate the quality of the protected items promoted into the recommendation list. Exposure e j of the protectedclass items relative to feature S j is defined as: e j ( ℓ ) = (cid:205) v ∈ ℓ { v ∈S j } | ℓ | (3)Given this definition, our fairness metrics use the notion of absolute unfairness [54], and have the following form: M j ( L , U ) = − | − (cid:205) ℓ ′ ∈ L e j ( ℓ ′ )| L | | (4)where L is the list of recommendation L = [ ℓ t − w b , ..., ℓ t − ] . Note that this definition implies ideal fairness consistsof equal exposure, that is, recommendation lists containing 50% protected group items. We plan to explore othercharacterizations of exposure and other fairness metrics in future work.Each metric has a maximum fairness of 1 and therefore it is possible to calculate regret ω j as the difference betweenthis ideal M ∗ j and the current state of the metric M j . For reasons of space, we report only on the average regret overall metrics, and leave more detailed analysis for future work. To understand the consistency of algorithm performance,we also compute the variance of the average regret across time periods. We tested our model on two datasets. The first is The Movies Dataset, which was obtained from the Kaggle website andcontains the metadata of 45,000 movies listed in the Full MovieLens Dataset which were released on or before July2017. Although movies are not a domain to which important fairness concerns are typically applied, we use this datasetas a well-known example with a rich set of provider-side features. Additionally, we extracted two features that containdemographic information on the movie directors and screenplay writers.The dataset contains 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5. Eachmovie contains a set of features from which the following were used in this project: genres, original language, releasedate, run-time, popularity, director gender and writer gender. A sample of this dataset was extracted which containedthe 361,468 ratings from 6,000 users on 6,037 items (density of 0.99%). The metric as defined penalizes lists with more than 50% protected items, which might seem counterproductive. However, as a practical matter in ourexperiments higher exposure values for protected items were never achieved. https://grouplens.org/datasets/movielensManuscript submitted to ACM And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 11Dataset Features Protected Values Unprotected ValuesKiva Activity Bicycle Repair, Gardening, Souvenir Sales Taxi, Fishing, Vehicle RepairsCountry Indonesia, Nigeria, Yemen Cameroon, Armenia, LebanonGender Male FemaleMovieLens Genres Documentary, Foreign, War, Western Adventure, Crime, Action, ComedyWriter Gender {’01’, ’012’} {’0’, ’02’, ’12’, ’2’, ’1’}Director Gender {’01’, ’1’} {’0’, ’12’, ’012’, ’02’, ’2’}

Table 2. Examples of sensitive features and their values.

All the features with numerical values were transformed into categorical values. Release date is bucketed into fourgroups, run-time into six groups and popularity is bucketed into five groups. In this dataset, three types of genders werepresent: 0, 1, 2. And each movie can be directed or written by a group of directors or writers. To capture this diversity,gender was discretized into seven groups. For example if a movie is directed by all the genders, we assign 012 for thegender information and if it is directed only by one gender, a single number was assigned to that movie e.g. 0, 1 or 2.All the categorical features were transformed into dummy variables, resulting in a total of 335 binary features. Table 2shows some examples of the sensitive features and their protected values.In a fielded application, the choice of sensitive features and protected groups within those features may be determinedby legal liability or business model considerations. Lacking this type of insight, we chose to identify protected featuresas those associated with rarely-recommended items. To determine the protected values of each feature, we performed atrial run of recommendation generation over the data set, and examined the distribution of features in the results. In alive system, historical recommendation data would be available over which to calculate this distribution. The values inthe 25th percentile of the distribution were selected as the protected group for that feature.Our algorithm is also evaluated on a proprietary dataset obtained from Kiva.org, including all lending transactionsover an 12-month period. Initially, there were 1,084,521 transactions involving 122,464 loans and 207,875 Kiva users. Ofthese loans, we found that 116,650 were funded, that is they received their full funding amount from Kiva users by the30-day deadline imposed by the site. We selected only the funded loans for analysis. Each loan is specified by featuresincluding borrower’s name/id, gender, borrower’s country, loan purpose, funded date, posted date, loan amount, loansector, and geographical coordinates. To reduce the feature space, and to solve the multicollinearity problem, highlycorrelated features were removed.The percentage funding rate (PFR) was added as a new feature, computed as follows:

PFR = ∗

100 (5)The percentage funding rate captures the speed at which a loan goes from being introduced in the system to beingfully funded. For example, a loan with PFR of 25% is accumulating a quarter of its needed capital each day. Afterpreparing the data, the final features for each loan reduced to borrower’s gender, borrower’s country, loan purpose,loan amount (binned to 10 equal-sized buckets), and loan’s percentage funding rate. We found that this dataset washighly sparse (density = 4 . e − ) and could not support effective collaborative recommendation, because a loan can onlyattract a limited amount of support (up to that needed for its funding). There are no “blockbuster” loans with thousandsof lenders. Loans not fully funded within 30 days are dropped from the system and the money raised is returned to lenders. Manuscript submitted to ACM pseudo-items that represent groups of items with shared features. We applied agglomerative hierarchicalclustering [43] using the features of borrower gender, borrower country, loan purpose, loan amount (binned to 10equal-sized buckets), and percentage funding rate (4 equal-sized buckets). We chose the cluster with the highestSilhouette Coefficient [45] of around 0.69 which indicates a reasonable cohesion of the clusters. Then we applied a10-core transformation, selecting pseudo-items with at least 10 lenders who had funded at least 10 pseudo-items. Theretained dataset has 2,673 pseudo-items, 4,005 lenders and 110,371 ratings / lending actions.To identify the protected values for each feature, we applied the same method as for the MovieLens data set. Weassigned the values that their frequencies are in the 25 percentile of the distribution to the protected group for eachfeature. The final number of features are 231 for this dataset.

Our experimental methodology is designed to highlight differences between these choice functions. We followed atypical recommendation evaluation process with each user’s profile split into 80% training and 20% testing. We chosenon-negative matrix factorizing (NMF) as our base algorithm [51] based on prior experience with these data sets. Weplan to explore the interaction between base algorithm and choice functions in future work. The factorization modelwas built using the training data and then used to generate a recommendation list ℓ for each user. Arrival time wassimulated in our experiments. Users were shuffled randomly and grouped into batches of size 0.5% of all the users,where each batch was considered to be a single time step. For each batch, we computed fairness metrics M over theprevious 20 batches, so that the backward window w b equals approximately 10% of the test data. The experiment wasrun for each of the four choice functions described above: Fixed Lottery, Deterministic Least Misery, Dynamic Lottery,and Allocation Lottery. For the choice functions dependent on M , we computed the lottery probabilities once per batch.The results of the different algorithms were compared in summary and over the course of each experiment’s iterations.Overall nDCG was compared to establish the accuracy loss for each choice function. Over the course of each experiment,we computed cumulative fairness regret on each fairness dimension and on average. Table 3a shows the overall results for the MovieLens data set. The first point to notice is that fairness is greatly improved(5x) over the base algorithm for all of the re-ranking methods, which is to be expected. Interestingly, the Fixed choicefunction, which chooses among the re-rankers with equal probability has the best fairness over all experiment iterationstaken as a whole. The other re-rankers are similar. All of the re-rankers show a reduction in ranking accuracy, around25% of nDCG. We did not seek to minimize nDCG loss in these experiments as doing so would reduce the impact of anygiven re-ranking operation and require a longer experiment to tease out differences.Table 3b shows similar results for the Kiva data set. Here we do not see as much accuracy loss. The Allocationalgorithm, which here has the best nDCG, is only 5.5% below the original base algorithm. For this data set, the re-rankersalso improve fairness, although not as dramatically as in the MovieLens case. The Allocation method has the highestfairness score in addition to the best nDCG. Figure 2 shows the average fairness regret over time for the experiment.The algorithms all move within a fairly narrow regret bound, indicating the difficulty of achieving fairness in these data For the first batch, when no backward window exists, the Fixed Lottery was performed.Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 13Algorithm nDCG Fairness Fairness VarianceBase (NMF) 0.143 0.039 5.3e-6Fixed 0.107 0.179 3.8e-3Least Misery 0.106 0.178 1.3e-3Dynamic 0.104 0.170 1.5e-3Allocation 0.109 0.171 2.3e-4 (a) MovieLens data set

Algorithm nDCG Fairness Fairness VarianceBase (NMF) 0.057 0.214 2e-4Fixed 0.045 0.323 1.1e-3Least Misery 0.043 0.322 9e-4Dynamic 0.045 0.325 1e-3Allocation 0.048 0.327 1e-4 (b) Kiva data setTable 3. Summary results. Fairness measured by percentage of protected item exposure in recommendation lists. sets. The Fixed lottery shows lowest regret over most epochs for the MovieLens data set, but does not do as well withKiva. Similar inconsistency is shown with the Least Misery algorithm. The low variance of the fairness of Allocationalgorithm can be seen, as its regret does not show the swings of the other algorithms. (a) MovieLens data set(b) Kiva data setFig. 2. Average fairness regret over time

Across both tables and in the time series figure, we see that the Allocation method has lower variance in the fairnessit achieves across iterations. Another way to see this consistency is the distribution of the average regret values.Figures 3a and 3b show the distribution of regret for the different choice functions on each data set. The distribution ofthe Allocation method (shown in red) falls within a much narrower band than any of the other methods, particularly inthe MovieLens data set where we see the Fixed method in blue taking on a wide range of regret values. At any given

Manuscript submitted to ACM (a) MovieLens (b) KivaFig. 3. Distribution of fairness regret time, the Allocation algorithm is producing consistently fair results without the large variations in regret seen the otheralgorithms. As its fairness results are similar to those of the other lottery mechanisms, this consistency is a good reasonto prefer it.

In this paper, we conceptualize algorithmic fairness and recommendation fairness, in particular, as a problem of socialchoice . That is, we define the task of computing a recommendation as a problem of arbitrating among the preferences ofdifferent individual agents to arrive at a single outcome. For our purposes, the agents in question include the user andalso multiple fairness concerns that may be active within a particular organization.The move to frame fairness as a problem of social choice has several important consequence. First, it highlights themultiplicity and diversity of fairness (and other stakeholder) concerns that might be relevant in a given application. Thisapproach allows us to be agnostic to different definitions and metrics of fairness and does not impose any particularstructure on stakeholder preferences.Second, we are able to make use of the large body of research in computational social choice, including the study offairness, that has emerged in the past decades.Building on these ideas, we demonstrate the SCRUF framework for dynamic adaptation of recommendation fairnessusing social choice to arbitrate between different re-ranking methods. We define a set of choice functions, ranging froma simple fixed lottery to an adaptation of the probabilistic serial mechanism, and demonstrate their performance ontwo data sets where multiple fairness concerns have been defined. We found relatively minor differences between thedifferent lottery mechanisms, except that the Allocation mechanism, which takes user preferences over features intoaccount, provides lower variance in fairness over time and therefore a more consistently fair output.

Authors Burke and Sonboli were supported by the National Science Foundation under Grant No. 1911025.

Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 15

REFERENCES [1] Himan Abdollahpouri, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, and LuizPizzato. [n.d.]. Multistakeholder recommendation: Survey and research directions.

User Modeling and User-Adapted Interaction ([n. d.]), 1–32.https://doi.org/10.1007/s11257-019-09256-1[2] Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2019. Managing popularity bias in recommender systems with personalized re-ranking.In

The Thirty-Second International Flairs Conference .[3] Gediminas Adomavicius and Y Kwon. 2009. Improving recommendation diversity using ranking-based techniques.

IEEE Transactions on Knowledgeand Data Engineering

10 (2009).[4] Haris Aziz. 2019. Developments in Multi-Agent Fair Allocation.

CoRR abs/1911.09852 (2019). arXiv:1911.09852 http://arxiv.org/abs/1911.09852[5] Haris Aziz, Jiashu Chen, Aris Filos-Ratsikas, Simon Mackenzie, and Nicholas Mattei. 2015. Egalitarianism of Random Assignment Mechanisms.

CoRR abs/1507.06827 (2015). arXiv:1507.06827 http://arxiv.org/abs/1507.06827[6] Haris Aziz, Serge Gaspers, Simon Mackenzie, Nicholas Mattei, Nina Narodytska, and Toby Walsh. 2015. Equilibria Under the Probabilistic SerialRule. In

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 ,Qiang Yang and Michael J. Wooldridge (Eds.). AAAI Press, 1105–1112. http://ijcai.org/Abstract/15/160[7] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H Chi, et al. 2019. Fairness inRecommendation Ranking through Pairwise Comparisons. arXiv preprint arXiv:1903.00780 (2019).[8] Anna Bogomolnaia and Hervé Moulin. 2001. A new solution to the random assignment problem.

Journal of Economic theory

Handbook of Computational Social Choice . Cambridge University Press.[10] Eric Budish, Yeon-Koo Che, Fuhito Kojima, and Paul Milgrom. 2013. Designing random allocation mechanisms: Theory and applications.

Americaneconomic review

Conference onFairness, Accountability and Transparency . 77–91.[12] Robin Burke. 2017. Multisided Fairness for Recommendation. In

Workshop on Fairness, Accountability and Transparency in Machine Learning (FATML) .Halifax, Nova Scotia.[13] Robin Burke, Nasim Sonboli, and Aldo Ordonez-Gauger. 2018. Balanced Neighborhoods for Multi-sided Fairness in Recommendation. In

Conferenceon Fairness, Accountability and Transparency . 202–214.[14] Robin Burke, Amy Voida, Nicholas Mattei, and Nasim Sonboli. 2020. Algorithmic Fairness, Institutional Logics, and Social Choice. In

Harvard CRCSWorkshop: AI for Social Good .[15] Abhijnan Chakraborty, Gourab K Patro, Niloy Ganguly, Krishna P Gummadi, and Patrick Loiseau. 2019. Equality of voice: Towards fair representationin crowdsourced top-k recommendations. In

Proceedings of the Conference on Fairness, Accountability, and Transparency . 129–138.[16] Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.

Big data

5, 2 (2017),153–163.[17] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In

Proceedings of the 3rdInnovations in Theoretical Computer Science Conference . ACM, 214–226.[18] Michael Ekstrand and Amit Sharma (Eds.). 2017. . Held at RecSys 2017, Como, Italy.[19] Michael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand, Oghenemaro Anuyah, David McNeill, and Maria Soledad Pera. 2018.All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. In

Proceedings of the 1stConference on Fairness, Accountability and Transparency (Proceedings of Machine Learning Research) , Sorelle A. Friedler and Christo Wilson (Eds.),Vol. 81. PMLR, New York, NY, USA, 172–186.[20] Michael D Ekstrand, Mucun Tian, Mohammed R Imran Kazi, Hoda Mehrpouyan, and Daniel Kluver. 2018. Exploring author gender in book ratingand recommendation. In

Proceedings of the 12th ACM Conference on Recommender Systems . 242–250.[21] Farzad Eskandanian, Bamshad Mobasher, and Robin Burke. 2017. A Clustering Approach for Personalizing Diversity in Collaborative RecommenderSystems. In

Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (Bratislava, Slovakia) (UMAP ’17) . Association forComputing Machinery, New York, NY, USA, 280–284. https://doi.org/10.1145/3079628.3079699[22] Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in supervised learning. In

Advances in neural information processing systems .3315–3323.[23] Úrsula Hébert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. 2018. Multicalibration: Calibration for the (Computationally-identifiable)masses. In

International Conference on Machine Learning . 1944–1953.[24] Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination aware decision tree learning. In

Data Mining (ICDM), 2010 IEEE 10thInternational Conference on . IEEE, 869–874.[25] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Enhancement of the Neutrality in Recommendation.. In

Workshop onHuman Decision Making in Recommender Systems (DecisionsRecSys) . 8–14.[26] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer.

MachineLearning and Knowledge Discovery in Databases (2012), 35–50. Manuscript submitted to ACM [27] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2018. Recommendation Independence. In

Conference on Fairness, Accountabilityand Transparency (Proceedings of Machine Learning Research) , Sorelle A. Friedler and Christo Wilson (Eds.), Vol. 81. PMLR, New York, NY, USA,187–201.[28] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Issei Sato. 2016. Model-based approaches for independence-enhanced recommendation. In . IEEE, IEEE, New York, 860–867.[29] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2017. Preventing Fairness Gerrymandering: Auditing and Learning for SubgroupFairness. arXiv preprint arXiv:1711.05144 (2017).[30] Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas,et al. 2019. WeBuildAI: Participatory framework for algorithmic governance.

Proceedings of the ACM on Human-Computer Interaction

3, CSCW(2019), 1–35.[31] J. W. Lian, N. Mattei, R. Noble, and T. Walsh. 2018. The Conference Paper Assignment Problem: Using Order Weighted Averages to Assign IndivisibleGoods. In

Proc. of the 32nd AAAI Conference .[32] Weiwen Liu and Robin Burke. 2018. Personalizing Fairness-aware Re-ranking. , 6 pages. arXiv:1809.02921 [cs.IR][33] Weiwen Liu, Jun Guo, Nasim Sonboli, Robin Burke, and Shengyu Zhang. 2019. Personalized fairness-aware re-ranking for microlending. In

Proceedings of the 13th ACM Conference on Recommender Systems . 467–471.[34] David F. Manlove. 2013.

Algorithmics of Matching Under Preferences . Series on Theoretical Computer Science, Vol. 2. WorldScientific. https://doi.org/10.1142/8591[35] Nicholas Mattei. 2020. Closing the Loop: Bringing Humans into Empirical Computational Social Choice and Preference Reasoning. In

Proceedingsof the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020 , Christian Bessiere (Ed.). ijcai.org, 5169–5173. https://doi.org/10.24963/ijcai.2020/729[36] Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. 2018. Towards a Fair Marketplace: CounterfactualEvaluation of the Trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In

Proceedings of the Conference on Informationand Knowledge Management . 2243–2251.[37] Hervé Moulin. 2004.

Fair division and collective welfare . MIT press.[38] Arvind Narayanan. 2018. Translation tutorial: 21 fairness definitions and their politics. In

Proc. Conf. Fairness Accountability Transp., New York, USA .[39] Gourab K Patro, Arpita Biswas, Niloy Ganguly, Krishna P Gummadi, and Abhijnan Chakraborty. 2020. FairRec: Two-Sided Fairness for PersonalizedRecommendations in Two-Sided Platforms. In

Proceedings of The Web Conference 2020 . 1194–1204.[40] Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In

Proceedings of the 14th ACM SIGKDD internationalconference on Knowledge discovery and data mining . ACM, 560–568.[41] J. Rawls. 1971.

A Theory of Justice . Harvard University Press.[42] Nicholas Rescher. 2002.

Fairness: Theory and practice of distributive justice . Transaction Publishers.[43] Lior Rokach and Oded Maimon. 2005. Clustering methods. In

Data mining and knowledge discovery handbook . Springer, 321–352.[44] Alvin E. Roth, Tayfun Sönmez, and M. Utku Ünver. 2005. Pairwise kidney exchange.

J. Econ. Theory

Journal of computational and appliedmathematics

20 (1987), 53–65.[46] Pierre-Nicolas Schwab, Toshiro Kamishima, and Michael Ekstrand (Eds.). 2018. . Held atRecSys 2018, Vancouver, Canada.[47] Amartya Sen. 2018.

Collective Choice and Social Welfare . Harvard University Press.[48] Nasim Sonboli, Farzad Eskandanian, Robin Burke, Weiwen Liu, and Bamshad Mobasher. 2020. Opportunistic Multi-Aspect Fairness throughPersonalized Re-Ranking. In

Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization (Genoa, Italy) (UMAP ’20) .Association for Computing Machinery, New York, NY, USA, 239–247. https://doi.org/10.1145/3340631.3394846[49] Tom Sühr, Asia J Biega, Meike Zehlike, Krishna P Gummadi, and Abhijnan Chakraborty. 2019. Two-sided fairness for repeated matchings intwo-sided markets: A case study of a ride-hailing platform. In

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining . 3082–3092.[50] Wenlong Sun, Sami Khenissi, Olfa Nasraoui, and Patrick Shafto. 2019. Debiasing the human-recommender system feedback loop in collaborativefiltering. In

Companion Proceedings of The 2019 World Wide Web Conference . 645–651.[51] Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk. 2008. Investigation of various matrix factorization methods for largerecommender systems. In . IEEE, 553–562.[52] William Thomson. 2011. Fair allocation rules. In

Handbook of Social Choice and Welfare . Vol. 2. Elsevier, 393–506.[53] William Thomson. 2016. Introduction to the Theory of Fair Allocation. In

Handbook of Computational Social Choice , Felix Brandt, Vincent Conitzer,Ulle Endriss, Jérôme Lang, and Ariel D. Procaccia (Eds.). Cambridge University Press, 261–283. https://doi.org/10.1017/CBO9781107446984.012[54] Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collaborative filtering. In

Advances in Neural Information Processing Systems .2921–2930.[55] H. Peyton Young. 1995.

Equity - in theory and practice . Princeton University Press.Manuscript submitted to ACM

And the Winner Is...”: Dynamic Lotteries for Multi-group Fairness-Aware Recommendation 17 [56] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In

Proceedings of the 30th InternationalConference on Machine Learning (ICML-13) . 325–333.[57] Lu Zhang and Xintao Wu. 2017. Anti-discrimination learning: a causal modeling-based framework.

International Journal of Data Science andAnalytics (2017), 1–16.[58] Ziwei Zhu, Xia Hu, and James Caverlee. 2018. Fairness-Aware Tensor-Based Recommendation. In

Proceedings of the 27th ACM InternationalConference on Information and Knowledge Management . ACM, 1153–1162.[59] William S. Zwicker. 2016. Introduction to the Theory of Voting. In