[PDF] Fair and Optimal Cohort Selection for Linear Utilities

Abstract

The rise of algorithmic decision-making has created an explosion of research around the fairness of those algorithms. While there are many compelling notions of individual fairness, beginning with the work of Dwork et al., these notions typically do not satisfy desirable composition properties. To this end, Dwork and Ilvento introduced the fair cohort selection problem, which captures a specific application where a single fair classifier is composed with itself to pick a group of candidates of size exactly k. In this work we introduce a specific instance of cohort selection where the goal is to choose a cohort maximizing a linear utility function. We give approximately optimal polynomial-time algorithms for this problem in both an offline setting where the entire fair classifier is given at once, or an online setting where candidates arrive one at a time and are classified as they arrive.

Full PDF

aa r X i v : . [ c s . D S ] F e b Fair and Optimal Cohort Selection for Linear Utilities

Konstantina Bairaktari Huy Le Nguyen Jonathan Ullman Khoury College of Computer Sciences, Northeastern University

February 17, 2021

Abstract

The rise of algorithmic decision-making has created an explosion of research around the fairness of thosealgorithms. While there are many compelling notions of individual fairness, beginning with the work of Dworket al. [DHP + fair cohort selection problem , which captures a speciﬁc application where a singlefair classiﬁer is composed with itself to pick a group of candidates of size exactly 𝑘 . In this work we introducea speciﬁc instance of cohort selection where the goal is to choose a cohort maximizing a linear utility function.We give approximately optimal polynomial-time algorithms for this problem in both an oﬄine setting wherethe entire fair classiﬁer is given at once, or an online setting where candidates arrive one at a time and areclassiﬁed as they arrive. The rise of algorithmic decision-making has created an explosion of research around the fairness of those algo-rithms. Beginning with the seminal work of Dwork, Hardt, Pitassi, Reingold, and Zemel [DHP + fair cohort selection problem introduced in [DI19]. In this problem, we wouldlike to select 𝑘 candidates for a job from a universe of 𝑛 possible candidates. We are given a classiﬁer that assignsa score 𝑠 𝑢 ∈ [ , ] to each candidate, with the guarantee that the scores are individually fair with respect tosome similarity measure D . We would like to select a set a cohort of 𝑘 candidates such that the probability 𝑝 𝑢 ofselecting any candidate should be fair with respect to the same metric.We can trivially solve the fair cohort selection problem by ignoring the scores and selecting a uniformly ran-dom set of 𝑘 candidates, so that every user is chosen with the same probability 𝑘 / 𝑛 . Thus Smedemark-Margulies,Langton, and Nguyen [SMLN20] proposed to consider fair algorithms for cohort selection that optimize someutility function related to the scores, and construct algorithms for a particular choice of utility function.In this work we consider a setting where the utility function is known and simple, speciﬁcally we assume a linear utility function where the utility of a cohort is the sum of the scores of all candidates in the cohort: Í 𝑛𝑢 = 𝑝 𝑖 𝑠 𝑖 . This type of utility implicitly assumes that the scores 𝑠 𝑢 are themselves doing a good job of approximating theutility of a candidate, and that there are no complements or supplements among participants in the cohort. KB is supported by NSF award CCF-1750640. HLN is supported by NSF awards CCF-1909314 and CCF-1750716. JU is supported by NSFawards CCF-1750640, CNS-1816028, and CNS-1916020. In the model of Dwork et al. [DHP + Oﬀline Setting.

In this model, we are given all the scores 𝑠 , . . . , 𝑠 𝑛 at once, and must choose the optimal cohort.We present a polynomial-time algorithm that computes a fair cohort with optimal exdpected utility. Online Setting.

In this model we are given the scores 𝑠 , 𝑠 , . . . as a stream. Ideally, after receiving 𝑠 𝑢 , we wouldlike to immediately accept candidate 𝑢 to the cohort or reject them from the cohort. However, this goal is toostrong, so we consider a relaxation where candidate 𝑢 can be either accepted, rejected, or kept on hold untillater, and we would like to output a fair cohort with optimal expected utility while minimizing the number ofcandidates who are kept on hold. We give an approximately fair and optimal algorithm in this online model.Speciﬁcally, we present a polynomial-time algorithm for this online setting where the fairness constraints aresatisﬁed and utility is optimized up to an additive error of 𝜀 , for any desired 𝜀 >

0, while ensuring that the numberof users on hold never exceeds 2 𝑘 + 𝜀 .To interpret our additive error guarantee, since the fairness constraint applies to are the probabilities 𝑝 , . . . , 𝑝 𝑛 ,allowing the fairness constraint to be violated by an additive error of, say, 𝜀 = .

01 means that every candidate isselected with probability that is within ±

1% of the probability that candidate would have been selected by somefair mechanism.

We start with a brief overview of the key steps in our algorithms for each model.

Oﬀline Setting.

Our oﬄine algorithm is based on two main properties of the fair cohort selection problem.First, we use a dependent-rounding procedure from [SMLN20] that takes a set of probabilities 𝑝 , . . . , 𝑝 𝑛 suchthat Í 𝑛𝑖 = 𝑝 𝑖 = 𝑘 and outputs a random cohort 𝐶 , represented by an indicator vector ˜ 𝑝 , . . . , ˜ 𝑝 𝑛 with Í 𝑛𝑖 = ˜ 𝑝 𝑖 = 𝑘 such that E ( ˜ 𝑝 𝑢 ) = 𝑝 𝑢 for every candidate 𝑢 . Thus, it is enough for our algorithm to come up with a set of marginalprobabilities for each candidate to appear in the cohort, rather than directly ﬁnding the cohort.To ﬁnd the marginal probabilities that maximize utilities with respect to the scores 𝑠 , . . . , 𝑠 𝑛 ∈ [ , ] , wewould like to solve the linear programmaximize Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 s.t. Í 𝑛𝑖 = 𝑝 𝑖 ≤ 𝑘 ∀ 𝑖, 𝑗 | 𝑝 𝑖 − 𝑝 𝑗 | ≤ | 𝑠 𝑖 − 𝑠 𝑗 |∀ 𝑖 ≤ 𝑝 𝑖 ≤ 𝑝 𝑢 representing the marginal probability ofselecting candidate 𝑢 in a cohort of size 𝑘 . The second constraint ensures that these probabilities 𝑝 are fair withrespect to the same measure D as the original scores 𝑠 . Speciﬁcally, | 𝑝 𝑢 − 𝑝 𝑣 | ≤ | 𝑠 𝑢 − 𝑠 𝑣 | ≤ D ( 𝑢, 𝑣 ) . Although wecould have used the stronger constraint | 𝑝 𝑢 − 𝑝 𝑣 | ≤ D ( 𝑢, 𝑣 ) , writing the LP this way means that our algorithmdoesn’t need to know the underlying measure, and means our solution will preserve any stronger fairness thatthe scores 𝑠 happen to satisfy.While we could solve this linear program explicitly, we can get a faster solution that is more useful for extend-ing to the online setting by noting that this LP has a speciﬁc closed form based on “water ﬁlling.” Speciﬁcally, if Í 𝑛𝑖 = 𝑠 𝑖 ≤ 𝑘 , then the optimal solution simply adds some number 𝑐 ≥ 𝑝 𝑢 = min { 𝑠 𝑢 + 𝑐, } ,and an analogous solution works when Í 𝑛𝑖 = 𝑠 𝑖 > 𝑘 . Online Setting.

In the online model we do not have all the scores in advance, thus we cannot determine thesolution to the linear program, and do not even know the value of the constant 𝑐 that determines the solution. Wegive two algorithms for addressing this problem. The basic algorithm begins by introducing some approximation ,in which we group users into 1 / 𝜀 groups based on their scores, where 𝑔 contains users with scores in [ 𝑔𝜀, ( 𝑔 + ) 𝜀 ] .This grouping can only reduce utility by 𝜀 , and can only lead to violating the fairness constraint by 𝜀 . Since usersin each bucket are treated identically, we know that when we reach the end of the algorithm, we can express theﬁnal cohort as containing a random set of 𝑛 𝑔 members from each group 𝑔 . Thus, to run the algorithm we use reservoir sampling to maintain a random set of at most 𝑘 candidates from each group, reject all other members,and then run the oﬄine algorithm at the end to determine how many candidates to select from each group.2he drawback of this method is that it keeps as many as 𝑘 / 𝜀 candidates on hold until the end. Thus, wedevelop an improved algorithm that solves the linear program in an online fashion, and uses the informationit obtains along the way to more carefully choose how many candidates to keep from each group. This ﬁnalalgorithm reduces the number of candidates on hold by as much as a quadratic factor, down to 2 𝑘 + 𝜀 . Our work ﬁts into the line of work initiated by Dwork et al. [DHP +

12] on individual fairness , which imposes con-straints on how algorithms may distinguish speciﬁc individuals. Within this framework, issues with composingfair algorithms were ﬁrst explored by Dwork and Ilvento [DI19], and later studied by [DIJ20] and [SMLN20].A complementary line of work, initiated by Hardt, Price, and Srebro [HPS16] considers notions of groupfairness , which imposes constraints on how algorithms may distinguish in aggregate between individuals fromdiﬀerent groups. While our work is technically distinct from work on group fairness, issues of composition alsoarise in that setting, as noted in several works [BKN +

17, KRZ19, AKRZ21].

For our model, we consider a set of 𝑛 individuals 𝑈 and a binary classiﬁer 𝐶 , that chooses individual 𝑢 ∈ 𝑈 withprobability 𝑝 𝑢 . The classiﬁer outputs 1 when the individual is chosen and 0 otherwise. If we restrict the deﬁnitionof individual fairness from to this particular model, we obtain the following deﬁnition. Deﬁnition 1 (Individual Fairness [DHP + . Given a metric D over the individuals of set 𝑈 and a randomizedbinary classiﬁer 𝐶 with outputs in { , } that assigns selection probability 𝑝 𝑢 to any individual 𝑢 in 𝑈 , we saythat the classiﬁer is individually fair if and only if for all 𝑢, 𝑣 in 𝑈 , D ( 𝑢, 𝑣 ) ≥ | 𝑝 𝑢 − 𝑝 𝑣 | . In this work, we want to select a cohort of exactly 𝑘 individuals out of the set 𝑈 , by consulting an individuallyfair classiﬁer 𝐶 that assigns individual selection probabilities 𝑠 , 𝑠 , . . . , 𝑠 𝑛 to the 𝑛 individuals. We are interestedin maintaining the fairness property for the probabilities of selection for the cohort. The general setting of thisproblem was deﬁned in [DI19]. At the same time, we attempt to eliminate trivial solutions that would be fair,such as a uniform probability distribution. To achieve this, we deﬁne a linear utility function of the probabilitiesof selection, which we want to maximize during the cohort selection. Deﬁnition 2 (Utility) . Given a set of 𝑛 candidates 𝑈 , a randomized binary classiﬁer 𝐶 that assigns individualselection probabilities 𝑠 , 𝑠 , . . . , 𝑠 𝑛 to the members of 𝑈 and a cohort selection algorithm S which chooses indi-viduals with probabilities 𝑝 , 𝑝 , . . . , 𝑝 𝑛 , we deﬁne the utility to be 𝑛 Õ 𝑖 = 𝑝 𝑖 𝑠 𝑖 . We consider two variations of the cohort selection problem. The ﬁrst one is the oﬄine cohort selection, wherethe algorithm has full access to the set 𝑈 and makes decisions oﬄine. The second one is the streaming versionof the cohort selection problem, for which we use a relaxation of the individual fairness. Deﬁnition 3 ( 𝜀 -Individual Fairness) . Given a metric D over the individuals of set 𝑈 , a randomized binary clas-siﬁer 𝐶 with outputs in { , } that assigns selection probabilities 𝑝 𝑢 to any individual in 𝑈 and 0 ≤ 𝜀 ≤

1, we saythat the classiﬁer is 𝜀 -individually fair if and only if for all 𝑢, 𝑣 in 𝑈 , D ( 𝑢, 𝑣 ) + 𝜀 ≥ | 𝑝 𝑢 − 𝑝 𝑣 | . Now, we have the tools we need to deﬁne the problem we study in the following sections.

Deﬁnition 4 (Fair and Useful Cohort Selection Problem) . Given a set of 𝑛 individuals 𝑈 and the probabilities ofselection an individually fair classiﬁer 𝐶 assigns to the members of 𝑈 with respect to a metric D (and a constant 𝜀 in ( , ] ), choose a cohort of 𝑘 individuals such that: 3. the probability of selection for the cohort is individually fair (or 𝜀 -individually fair) with respect to D , and2. it achieves the optimal cohort selection utility.Since the input and the output probabilities of the cohort selection problem satisfy fairness conditions for thesame metric, it will be implied in the theorems of the following sections that the result of the cohort selection isfair with respect to the same metric as the input. Both the oﬄine and the streaming solutions for the cohort selection problem will be based on a dependent round-ing algorithm that solves the oﬄine fair and useful cohort selection problem when the sum of the input scores ofall the candidates is equal to the number of people we want to choose 𝑘 . The output of this algorithm consists ofindicators ˜ 𝑠 𝑖 which are 1 if the 𝑖 -th candidate is selected to be part of the cohort and 0 otherwise. This roundingprocedure is a special case of rounding a fractional solution in a matroid polytope (in this case, we have a uniformmatroid). This problem has been studied extensively with rounding procedures satisfying additional desirableproperties (see e.g. [CVZ10]). Here we describe a simple and very eﬃcient rounding algorithm for the specialcase of the problem arising in our work. Algorithm 1:

Rounding

Input: list of scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 ∈ [ , ] for the 𝑛 candidates with Í 𝑛𝑖 = 𝑠 𝑖 = 𝑘 Output: list of selection indicators ˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ∈ { , } pendingIndex ← for 𝑖 from to 𝑛 do if 𝑖 = pendingIndex then continue to next 𝑖 end 𝑎 ← 𝑠 𝑖 𝑏 ← 𝑠 pendingIndex choose 𝑢 randomly from unif ( , ) if 𝑎 + 𝑏 ≤ then if 𝑢 < 𝑎𝑎 + 𝑏 then 𝑠 𝑖 ← 𝑎 + 𝑏 𝑠 pendingIndex ← pendingIndex ← 𝑖 else 𝑠 𝑖 ← 𝑠 pendingIndex ← 𝑎 + 𝑏 end else if 𝑢 < − 𝑏 − 𝑎 − 𝑏 then 𝑠 𝑖 ← 𝑠 pendingIndex ← 𝑎 + 𝑏 − else 𝑠 𝑖 ← 𝑎 + 𝑏 − 𝑠 pendingIndex ← pendingIndex ← 𝑖 end end end return 𝑠 , 𝑠 , . . . , 𝑠 𝑛 emma 1 (see e.g. [SMLN20]) . Let 𝑠 , 𝑠 , . . . , 𝑠 𝑛 ∈ [ , ] be a list of scores with Í 𝑛𝑖 = 𝑠 𝑖 = 𝑘 ∈ N . Algorithm 1outputs randomized ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ∈ { , } such that 𝑘 elements will be equal to 1, the rest will be equal to 0 and forall 𝑖 ∈ { , . . . , 𝑛 } , E [ ˜ 𝑠 𝑖 ] = 𝑠 𝑖 . For the streaming algorithm which solves the cohort selection problem we use a procedure called random reser-voir sampling. In particular, we want to maintain an upper-bounded number of people in the memory. We dothis by choosing a subset of people uniformly at random if the number of people exceeds a constant and rejectthe rest. However, since our algorithm works in the streaming setting, we need a sampling method which createsthe sample on the ﬂy.

In this version of the problem, the algorithm has full access to the set 𝑈 and makes decisions oﬄine. We canformalize it as the following constrained utility maximization problem where we want to compute the selectionprobabilities for all 𝑛 individuals. Maximize Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 s.t. Í 𝑛𝑖 = 𝑝 𝑖 ≤ 𝑘 ∀ 𝑖, 𝑗 | 𝑝 𝑖 − 𝑝 𝑗 | ≤ | 𝑠 𝑖 − 𝑠 𝑗 |∀ 𝑖 ≤ 𝑝 𝑖 ≤ ∀ 𝑖 ≤ 𝑠 𝑖 ≤

1. By lemma 1, we have that algorithm 1 can receive as input the list of individual cohortselection probabilities and form a cohort that respects these probabilities. Nevertheless, the sum of the initialprobabilities generated by classiﬁer 𝐶 might not be equal to 𝑘 . If the sum is greater than 𝑘 , then the new selectionprobabilities become 𝑝 𝑖 = min { 𝑠 𝑖 + 𝑐, } , where the constant 𝑐 is calculated so that Í 𝑛𝑖 = 𝑝 𝑖 = 𝑘 . This adjustmentmaintains the probability diﬀerences between the pairs of candidates, unless one of them becomes 1, in whichcase the diﬀerence will become smaller. As a result, the diﬀerences of the new probabilities will remain boundedby the same metric as the initial probabilities. The case where the sum is less than 𝑘 is treated similarly. Morespeciﬁcally, the new probabilities are 𝑝 𝑖 = min { , 𝑠 𝑖 − 𝑐 } . After the adjustment, algorithm 1 can decide whichpeople will constitute the cohort based on the probabilities of the input. Theorem 1.

Given individually fair scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 for a set of 𝑛 candidates, Algorithm 2 solves the OﬄineCohort Selection problem by selecting 𝑘 individuals with marginal probabilities 𝑝 , 𝑝 , . . . , 𝑝 𝑛 that:1. are individually fair, and2. optimize the cohort selection utility function. Proof.

We can rewrite the procedure that Algorithm 2 runs before the rounding in a succinct way. In particular,if the sum is less than 𝑘 then we have 𝑝 𝑖 = min { 𝑠 𝑖 + 𝑏, } , for a real number 𝑏 so that Í 𝑛𝑖 = 𝑝 𝑖 = 𝑘 . The valueof the sum remains the same after the initialization of the 𝑝 𝑖 s because the algorithm only redistributes the massbetween individuals. All 𝑝 𝑖 s that are not set to 1 are equal to the sum of 𝑠 𝑖 plus the same constant that consistsof the initial 𝑐 and the fractions of the 𝑝 𝑖 s that exceeded 1 and, therefore, were set to 1. In the end no 𝑝 𝑖 can begreater than 1 because then its value will be set to 1 and the remaining mass will be uniformly redistributed toall 𝑝 𝑗 s that are less than 1. Similarly, if the sum is greater than 𝑘 the formula is 𝑝 𝑖 = max { 𝑠 𝑖 − 𝑏, } , for a realnumber 𝑏 so that Í 𝑛𝑖 = 𝑝 𝑖 = 𝑘 . Individual fairness

The algorithm adjusts the scores of the individuals according the value of the Í 𝑛𝑖 = 𝑠 𝑖 before running the rounding algorithm. Hence, there are three distinct cases.1. Í 𝑛𝑖 = 𝑠 𝑖 = 𝑘 . The input scores are used unaltered as the marginal probabilities for the cohort selection and,thus, we have for any pair of indices diﬀerent 𝑖, 𝑗 | 𝑝 𝑖 − 𝑝 𝑗 | = | 𝑠 𝑖 − 𝑠 𝑗 | . lgorithm 2: Oﬄine Cohort Selection

Input: list of scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 ∈ [ , ] for the 𝑛 candidates,the number of individuals that must be selected 𝑘 Output: list of selection indicators ˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ∈ { , } sum ← Í 𝑛𝑖 = 𝑠 𝑖 if sum < 𝑘 then 𝑐 ← 𝑘 − sum 𝑛 𝑝 𝑖 ← 𝑠 𝑖 + 𝑐 , ∀ 𝑖 ∈ [ 𝑛 ] while ∃ 𝑝 𝑖 > do set 𝑛 < equal to the number of individuals with 𝑝 𝑗 < 𝑝 𝑗 ← 𝑝 𝑗 + 𝑝 𝑖 − 𝑛 < , ∀ 𝑗 : 𝑝 𝑗 < 𝑝 𝑖 ← end else 𝑐 ← sum − 𝑘𝑛 𝑝 𝑖 ← 𝑠 𝑖 − 𝑐 , ∀ 𝑖 ∈ [ 𝑛 ] while ∃ 𝑝 𝑖 < in 𝑃 do set 𝑛 > equal to the number of individuals with 𝑝 𝑗 > 𝑝 𝑗 ← 𝑝 𝑗 + 𝑝 𝑖 𝑛 > , ∀ 𝑗 : 𝑝 𝑗 > 𝑝 𝑖 ← end end return Rounding( 𝑝 , . . . , 𝑝 𝑛 )2. Í 𝑛𝑖 = 𝑠 𝑖 < 𝑘 . The cohort selection probabilities are of the form 𝑝 𝑖 = min { 𝑠 𝑖 + 𝑏, } . For any pair of individuals 𝑖, 𝑗 if both have probability of being selected 1 then | 𝑝 𝑖 − 𝑝 𝑗 | = 𝑝 𝑖 = 𝑝 𝑗 = 𝑠 𝑗 + 𝑏 , we have | 𝑝 𝑖 − 𝑝 𝑗 | < | 𝑠 𝑖 − 𝑠 𝑗 | . Finally, if 𝑝 𝑖 = 𝑠 𝑖 + 𝑏 and 𝑝 𝑗 = 𝑠 𝑗 + 𝑏 , then it holds that | 𝑝 𝑖 − 𝑝 𝑗 | = | 𝑠 𝑖 − 𝑠 𝑗 | . Combining these, we conclude that for any pair of individuals 𝑖, 𝑗 we have | 𝑝 𝑖 − 𝑝 𝑗 | ≤ | 𝑠 𝑖 − 𝑠 𝑗 | . Í 𝑛𝑖 = 𝑠 𝑖 > 𝑘 . Now, the cohort selection probabilities are of the form 𝑝 𝑖 = max { 𝑠 𝑖 − 𝑏, } . For any pair ofindividuals 𝑖, 𝑗 if both have zero probability of being selected then | 𝑝 𝑖 − 𝑝 𝑗 | = 𝑝 𝑖 = 𝑝 𝑗 = 𝑠 𝑗 − 𝑏 , we have | 𝑝 𝑖 − 𝑝 𝑗 | < | 𝑠 𝑖 − 𝑠 𝑗 | . Finally, if 𝑝 𝑖 = 𝑠 𝑖 − 𝑏 and 𝑝 𝑗 = 𝑠 𝑗 − 𝑏 , then itholds that | 𝑝 𝑖 − 𝑝 𝑗 | = | 𝑠 𝑖 − 𝑠 𝑗 | . We conclude that for any pair of individuals 𝑖, 𝑗 we have | 𝑝 𝑖 − 𝑝 𝑗 | ≤ | 𝑠 𝑖 − 𝑠 𝑗 | . Therefore, the individual fairness is ensured by the assumption of the theorem that the input scores satisfy theindividual fairness condition. More speciﬁcally, if the scores are individually fair with respect to a metric D , weobtain ∀ 𝑖, 𝑗 ∈ { , . . . , 𝑛 } | 𝑝 𝑖 − 𝑝 𝑗 | ≤ | 𝑠 𝑖 − 𝑠 𝑗 | ≤ D ( 𝑖, 𝑗 ) . Optimal utility.

Let 𝑝 𝑖 for 𝑖 ∈ { , . . . , 𝑛 } denote the individual selection probabilities 𝑃 of the solution ofAlgorithm 2. For the optimal solution the ﬁrst constraint that refers to the sum is satisﬁed with equality. Thishappens because otherwise we would be able to increase the 𝑝 𝑖 s by setting 𝑝 ′ 𝑖 = min { 𝑝 𝑖 + 𝑑, } so that Í 𝑛𝑖 = 𝑝 ′ 𝑖 = 𝑘 and obtain greater utility while not violating any constraint. We assume that there exists a diﬀerent solution 𝑃 ′ which is also individually fair and it is the optimal one, i.e. Í 𝑛𝑖 = 𝑝 ′ 𝑖 𝑠 𝑖 > Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 . Without loss of generality wecan assume that the original scores 𝑠 𝑖 are sorted in increasing order. In particular, we have 𝑠 ≤ 𝑠 ≤ . . . ≤ 𝑠 𝑛 . 𝑝 ≤ 𝑝 ≤ . . . ≤ 𝑝 𝑛 . Let 𝑖 be the individual with the smallest index for whom the two solutions diﬀer. There are two possible cases.If 𝑝 ′ 𝑖 < 𝑝 𝑖 , then because Í 𝑘𝑖 = 𝑝 𝑖 = Í 𝑘𝑖 = 𝑝 ′ 𝑖 = 𝑘 there exists an individual with 𝑗 > 𝑖 such that 𝑝 ′ 𝑗 > 𝑝 𝑗 . Inaddition 𝑝 𝑖 > 𝑝 𝑗 <

1. The values of the probabilities depend on the sum of the scores in comparison to 𝑘 .If the sum is greater than 𝑘 , all the 𝑝 𝑖 s that are not equal to 0 are of the form 𝑝 𝑖 = 𝑠 𝑖 − 𝑏 . Hence, considering that 𝑝 ′ 𝑖 < 𝑝 𝑖 ≤ 𝑝 𝑗 < 𝑝 ′ 𝑗 we obtain | 𝑝 ′ 𝑖 − 𝑝 ′ 𝑗 | = 𝑝 ′ 𝑗 − 𝑝 ′ 𝑖 > 𝑠 𝑗 − 𝑏 − ( 𝑠 𝑖 − 𝑏 ) = 𝑠 𝑗 − 𝑠 𝑖 = | 𝑠 𝑖 − 𝑠 𝑗 | . Similarly, if the sum is less than 𝑘 , the 𝑝 𝑖 s that are not equal to 1 are of the form 𝑝 𝑖 = 𝑠 𝑖 + 𝑏 , therefore giving | 𝑝 ′ 𝑖 − 𝑝 ′ 𝑗 | = 𝑝 ′ 𝑗 − 𝑝 ′ 𝑖 > 𝑠 𝑗 + 𝑏 − ( 𝑠 𝑖 + 𝑏 ) = 𝑠 𝑗 − 𝑠 𝑖 = | 𝑠 𝑖 − 𝑠 𝑗 | . If the sum is equal to 𝑘 , we have | 𝑝 ′ 𝑖 − 𝑝 ′ 𝑗 | = 𝑝 ′ 𝑗 − 𝑝 ′ 𝑖 > 𝑝 𝑖 − 𝑝 𝑗 = 𝑠 𝑗 − 𝑠 𝑖 = | 𝑠 𝑖 − 𝑠 𝑗 | . In all three cases one of the constraints of the optimization is violated.If 𝑝 ′ 𝑖 > 𝑝 𝑖 , then there exists an individual with 𝑗 > 𝑖 such that 𝑝 ′ 𝑗 < 𝑝 𝑗 . Let 𝑗 be the smallest such index. Ifthere exists another index 𝑙 > 𝑗 such that 𝑝 ′ 𝑙 ≥ 𝑝 𝑙 , then by the previous argument the constraints are not satisﬁed.Therefore, for all individuals 𝑙 after 𝑗 it must hold that 𝑝 ′ 𝑙 < 𝑝 𝑙 . We can now separate the individuals in threegroups: 𝐺 ( 𝑃 ′ ) = { 𝑖 : 𝑝 𝑖 = 𝑝 ′ 𝑖 } 𝐺 ( 𝑃 ′ ) = { 𝑖 : 𝑝 ′ 𝑖 > 𝑝 𝑖 } 𝐺 ( 𝑃 ′ ) = { 𝑖 : 𝑝 ′ 𝑖 < 𝑝 𝑖 } . All individuals in 𝐺 have greater indices than those in 𝐺 and 𝐺 . We can now take mass uniformly from theindividuals in 𝐺 and distribute it uniformly to individuals in 𝐺 in a way that all individuals remain in the samegroup as before. Let 𝑝 ′′ 𝑖 be the new score of individual 𝑖 . The value of the objective function will increase becauseindividuals in 𝐺 , who have higher 𝑠 𝑖 s than those in 𝐺 , will get higher new scores. The constraints are stillsatisﬁed because the total sum remains the same and for any pair of individuals 𝑖, 𝑗 we have | 𝑝 ′′ 𝑖 − 𝑝 ′′ 𝑗 | ≤ | 𝑝 ′ 𝑖 − 𝑝 ′ 𝑗 | .By constructing a solution 𝑃 ′′ which gives a greater objective than 𝑃 ′ , we arrive at a contradiction. As a result, 𝑃 is the optimal solution. (cid:3) In this section, we consider the streaming setting of the cohort selection problem. Speciﬁcally, we propose algo-rithms 3 and 5 that read the initial scores from a stream and solve the cohort selection problem while achievinghigh utility and keeping a small number of people in the memory. In particular, due to the result in [DI19], whichstates that if the number of individuals in the input is unknown the online version of the cohort selection problemhas no solution, a streaming algorithm cannot choose a cohort without having seen the entire stream. However,it can make the process more eﬃcient for the candidates by rejecting some candidates throughout the process.We say that a candidate is pending if they have not yet been rejected. It is important to note though that noperson is accepted before the algorithm reaches the end of the stream.Both algorithms described in this section divide the people into groups with similar initial selection scoresand treat any member of a group as equivalent. Hence, the ﬁnal probability of being selected for the cohort isequal for any person assigned to the same group. This leads to a relaxation of the individual fairness propertyachieved, which is the reason why we deﬁned 𝜀 -individual fairness. Even though for each group we keep anumber of people pending and reject everyone else, we maintain the information of the rejected candidates byconsidering that each person pending represents themselves as well as a fraction of the rejected people from theirgroup. The fraction of people represented by the 𝑖 -th candidate is denoted by 𝑛 𝑖 ≥

0. Algorithm 3 maintains at7ost 𝑘 people pending per group uniformly at random and every representative within a certain group representsthe same fraction of people. Algorithm 5 reduces the memory required by allowing pending candidates of thesame group to represent diﬀerent numbers of people. Algorithm 4 is used to eliminate people and determine howmany people each candidate represents. We split the people we see into groups according to their initial score. For example we can split the interval [ , ] into 𝑚 intervals of size 𝜀 each, where 𝑚 = ⌈ 𝜀 ⌉ . Person 𝑖 is assigned to group 𝑔 ∈ { , . . . , 𝑚 } , if 𝑠 𝑖 ∈ [( 𝑔 − ) 𝜀, 𝑔𝜀 ] .Once person 𝑖 is in the group, they get a new score ˆ 𝑠 𝑖 , equal to the score of all other members in the group ˆ 𝑠 𝑔 = 𝑔𝜀 .The use of the groups aﬀects the performance of the cohort selection algorithm in terms of individual fairnessand utility. Lemma 2 shows that the performance compromise of the streaming algorithms 3 and 5 is caused bythe use of groups, as it is also observed in the oﬄine algorithm 2 when the input scores are grouped in multiplesof 𝜀 . Lemma 2.

Given individually fair scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 for a set of 𝑛 candidates and 𝜀 ∈ ( , ] , we split the interval [ , ] into 𝑚 = ⌈ 𝜖 ⌉ intervals of length 𝜀 and for all 𝑖 ∈ { , . . . , 𝑛 } we set ˆ 𝑠 𝑔 = 𝑔𝜀 if 𝑠 𝑖 ∈ [( 𝑔 − ) 𝜀, 𝑔𝜀 ] . Algorithm 2with input the modiﬁed scores ˆ 𝑠 , . . . , ˆ 𝑠 𝑛 solves the Cohort Selection problem by choosing a cohort of size 𝑘 ∈ N with individual selection probabilities 𝑝 , 𝑝 , . . . , 𝑝 𝑛 that:1. are 𝜀 -individually fair2. achieve cohort selection utility 𝑂𝑃𝑇 − 𝑘𝜀 ≤ Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 ≤ 𝑂𝑃𝑇 , where the optimal utility is with respect tothe original scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 . Proof.

We have that for all individuals 𝑖 in { , . . . , 𝑛 } ≤ ˆ 𝑠 𝑖 − 𝑠 𝑖 ≤ 𝜀 . Therefore, we obtain that for any pair ofindividuals 𝑖, 𝑗 | ˆ 𝑠 𝑖 − ˆ 𝑠 𝑗 | ≤ | 𝑠 𝑖 − 𝑠 𝑗 | + 𝜀 . By Theorem 1 we have that | 𝑝 𝑖 − 𝑝 𝑗 | ≤ | ˆ 𝑠 𝑖 − ˆ 𝑠 𝑗 | . Combining the twoinequalities, we obtain that the selection process is 𝜀 -individually fair.We denote by 𝑝 ∗ , . . . , 𝑝 ∗ 𝑛 the optimal solution that oﬀers utility Í 𝑛𝑖 = 𝑝 ∗ 𝑖 𝑠 𝑖 ≥ Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 . The utility achieved bythe algorithm is 𝑛 Õ 𝑖 = 𝑝 𝑖 𝑠 𝑖 = 𝑛 Õ 𝑖 = 𝑝 𝑖 ˆ 𝑠 𝑖 + 𝑛 Õ 𝑖 = 𝑝 𝑖 ( 𝑠 𝑖 − ˆ 𝑠 𝑖 )≥ 𝑛 Õ 𝑖 = 𝑝 𝑖 ˆ 𝑠 𝑖 − 𝑛 Õ 𝑖 = 𝑝 𝑖 𝜀 (ˆ 𝑠 𝑖 − 𝑠 𝑖 ≤ 𝜀, ∀ 𝑖 ∈ { , . . . , 𝑛 } ) = 𝑛 Õ 𝑖 = 𝑝 𝑖 ˆ 𝑠 𝑖 − 𝑘𝜀 ( 𝑛 Õ 𝑖 = 𝑝 𝑖 = 𝑘 ) ≥ 𝑛 Õ 𝑖 = 𝑝 ∗ 𝑖 ˆ 𝑠 𝑖 − 𝑘𝜀 ( 𝑛 Õ 𝑖 = 𝑝 𝑖 ˆ 𝑠 𝑖 is the optimal utility for scores ˆ 𝑠 𝑖 ) . Since we want a ﬁnal result that involves Í 𝑖 = 𝑝 ∗ 𝑖 𝑠 𝑖 , we can rewrite Í 𝑛𝑖 = 𝑝 ∗ 𝑖 ˆ 𝑠 𝑖 as 𝑛 Õ 𝑖 = 𝑝 ∗ 𝑖 ˆ 𝑠 𝑖 = 𝑛 Õ 𝑖 = 𝑝 ∗ 𝑖 𝑠 𝑖 + 𝑛 Õ 𝑖 = 𝑝 ∗ 𝑖 ( ˆ 𝑠 𝑖 − 𝑠 𝑖 )≥ 𝑛 Õ 𝑖 = 𝑝 ∗ 𝑖 𝑠 𝑖 (ˆ 𝑠 𝑖 − 𝑠 𝑖 ≥ , ∀ 𝑖 ∈ { , . . . , 𝑛 } ) . Thus, we see that Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 ≥ Í 𝑛𝑖 = 𝑝 ∗ 𝑖 𝑠 𝑖 − 𝑘𝜀 . (cid:3) This algorithm adds each new person to the appropriate group and maintains at most 𝑘 people from each groupusing reservoir sampling if the size of the group exceeds 𝑘 . If there are more than 𝑘 people in one group, each8f the people from this group who are not rejected represent 𝑘 of the size of the group. For all people that arewaiting for the decision, the algorithm keeps the score and the number of people they represent. When thestream ends, the algorithm adjusts the scores of the remaining people so that the sum of their scores is equal to 𝑘 . The process followed is equivalent to that of the oﬄine algorithm. To be more precise, it is modiﬁed to workwith the limited information kept by the streaming algorithm so that the expected value of any score is equal tothe value computed by the corresponding oﬄine version. Algorithm 3:

Basic streaming cohort selection

Input: list of scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 ∈ [ , ] for the 𝑛 candidates,the number of individuals that must be selected 𝑘 ,constant 𝜀 ∈ ( , ] Output: list of selection indicators ˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ∈ { , } while stream has individuals do add the new person 𝑒 to group 𝑔 for which 𝑠 𝑒 ∈ [( 𝑔 − ) 𝜀, 𝑔𝜀 ] 𝑛 𝑒 ← // 𝑒 represent only themselves ˆ 𝑠 𝑒 ← 𝑔𝜀 if group 𝑔 has 𝑛 𝑔 > 𝑘 people then keep 𝑘 people from group 𝑔 with uniform random reservoir sampling of size 𝑘 for any rejected person 𝑗 set 𝑛 𝑗 ← 𝑠 𝑗 ← for any pending person 𝑖 in group 𝑔 set 𝑛 𝑖 ← 𝑛 𝑔 𝑘 end end sum ← Í 𝑛𝑖 = ˆ 𝑠 𝑖 if sum < 𝑘 then 𝑐 ← 𝑘 − sum 𝑛 for any person 𝑖 pending set ˜ 𝑠 𝑖 ← 𝑛 𝑖 ˆ 𝑠 𝑖 + 𝑛 𝑖 𝑐 while ∃ ˜ 𝑠 𝑖 > do set 𝑛 < equal to the number of individuals with ˜ 𝑠 𝑗 < for any pending person 𝑗 with ˜ 𝑠 𝑗 < 𝑠 𝑗 ← ˜ 𝑠 𝑗 + 𝑛 𝑗 ( ˜ 𝑠 𝑖 − ) 𝑛 < ˜ 𝑠 𝑖 ← end else if sum > 𝑘 then 𝑐 ← sum − 𝑘𝑛 for any person 𝑖 pending set ˜ 𝑠 𝑖 ← 𝑛 𝑗 ˆ 𝑠 𝑖 − 𝑛 𝑗 𝑐 while ∃ ˜ 𝑠 𝑖 < do set 𝑛 > equal to the number of individuals with ˜ 𝑠 𝑗 > for any pending person 𝑗 with ˜ 𝑠 𝑗 > 𝑠 𝑗 ← ˜ 𝑠 𝑗 + 𝑛 𝑗 ˜ 𝑠 𝑖 𝑛 > ˜ 𝑠 𝑖 ← end else for any pending person 𝑖 set ˜ 𝑠 𝑖 ← 𝑛 𝑖 ˆ 𝑠 𝑖 end return Rounding(˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ) Theorem 2.

Given individually fair scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 for a set of 𝑛 candidates, the streaming algorithm 3solves the cohort selection problem for any 𝜀 ∈ ( , ] by choosing a cohort with individual selection probabilities 𝑝 , 𝑝 , . . . , 𝑝 𝑛 that:1. are 𝜀 -individually fair2. achieve cohort selection utility 𝑂𝑃𝑇 − 𝑘𝜀 ≤ Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 ≤ 𝑂𝑃𝑇 .9n addition, it keeps at most 𝑂 ( 𝑘𝜀 ) candidates pending. Proof.

We want to show that algorithm 3 gives the same marginal probabilities of selection as the oﬄine algorithm2 whose input is rounded into multiples of 𝜀 and the scores have become ˆ 𝑠 , . . . , ˆ 𝑠 𝑛 . In other words, our goal isto show that for any candidate 𝑖 ∈ [ 𝑛 ] , 𝑝 𝑖 = 𝑞 𝑖 , where 𝑞 𝑖 is the probability that the candidate is selected by theoﬄine algorithm. To prove this, we can show that two properties hold for scores ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 that the candidateshave right before the ﬁnal rounding:1. Í 𝑛𝑖 = ˜ 𝑠 𝑖 = 𝑘 , and2. ∀ 𝑖 ∈ [ 𝑛 ] , E [ ˜ 𝑠 𝑖 ] = 𝑞 𝑖 .In the oﬄine algorithm, all members of a group 𝑔 have the same score denoted by 𝑠 𝑔 , which, as seen in theproof of theorem 1, means that the individual selection probabilities are 𝑞 𝑖 = 𝑞 𝑗 for any pair of candidates 𝑖, 𝑗 from group 𝑔 . The basic streaming version redistributes the score mass of the group among its members so that ifthe candidates are more than 𝑘 , each of the pending candidates of the group represents 𝑛 𝑔 𝑘 of 𝑠 𝑔 and the rejectedget score ˜ 𝑠 =

0. We will show that ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑞 𝑖 , for any individual 𝑖 , where ˜ 𝑠 𝑖 is the score of 𝑖 before the ﬁnalrounding and 𝑛 𝑖 signiﬁes the number of people represented by 𝑖 . There are three ways the new scores ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 are calculated depending on the value of the sum of scores ˆ 𝑠 , . . . , ˆ 𝑠 𝑛 when the stream has ended. Case 1 : Í 𝑛𝑖 = ˆ 𝑠 𝑖 = 𝑘 . The streaming algorithm sets ˜ 𝑠 𝑖 = 𝑛 𝑖 ˆ 𝑠 𝑖 in line 29 for the people who are pending, whichalso holds for those rejected because their 𝑛 𝑖 and ˜ 𝑠 𝑖 are zero. The oﬄine algorithm makes no modiﬁcations in thescores and, hence, every person 𝑖 keeps their score ˆ 𝑠 𝑖 . By lemma 1, the probability of candidate 𝑖 being selected bythe oﬄine algorithm is 𝑞 𝑖 = ˆ 𝑠 𝑖 . Thus, we obtain that for any person 𝑖 has score ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑞 𝑖 before the ﬁnal rounding. Case 2 : Í 𝑛𝑖 = ˆ 𝑠 𝑖 < 𝑘 . First, we see that if the size of a group 𝑔 is 𝑛 𝑔 > 𝑘 , then for any member 𝑖 of group 𝑔 theprobability of selection by the oﬄine algorithm 𝑞 𝑖 is at most 𝑘𝑛 𝑔 . This holds because if there exists a person 𝑖 ingroup 𝑔 with 𝑞 𝑖 > 𝑘𝑛 𝑔 , then any other member 𝑗 of this group has selection probability 𝑞 𝑗 = 𝑞 𝑖 > 𝑘𝑛 𝑔 . By addingtogether the probabilities of the candidates in this group, we obtain probability mass greater than 𝑘 , which leadsto a contradiction because the oﬄine algorithm gives a solution with Í 𝑛𝑖 = 𝑞 𝑖 = 𝑘 . Therefore, only the probabili-ties of people in groups of size smaller than 𝑘 (who have 𝑛 𝑖 =

1) can reach 1. This observation is useful for theanalysis of lines 13-19 of the streaming algorithm that are executed in this case. In line 14 all scores of pendingcandidates are initialized with 𝑛 𝑖 ( ˆ 𝑠 𝑖 + 𝑐 ) . The oﬄine algorithm initializes the scores of all candidates with ˆ 𝑠 𝑖 + 𝑐 .The total amount of mass added to the candidates in the streaming is equal to that in the oﬄine algorithm, butinstead of distributing it uniformly, it is added according the fraction represented by each individual. If no scoreexceeds 1 in both the streaming and the oﬄine, we have the property we want, because the oﬄine will select 𝑖 with probability 𝑞 𝑖 = ˆ 𝑠 𝑖 + 𝑐 . Otherwise, the loop is executed. Due to our observation, only people with 𝑛 𝑖 = 𝑛 𝑖 ( ˆ 𝑠 𝑖 + 𝑐 ) = ˆ 𝑠 𝑖 + 𝑐 . As a consequence, the term ˆ 𝑠 𝑖 − 𝑛 < of line 17 involves only the originalscores, which are the same as in the oﬄine. Since any term added to any candidate 𝑖 is multiplied by 𝑛 𝑖 , we havethat if the oﬄine algorithm has computed that the score of the 𝑖 -th candidate is ˆ 𝑠 𝑖 + 𝑏 , then the streaming hascomputed score ˜ 𝑠 𝑖 = 𝑛 𝑖 ( ˆ 𝑠 𝑖 + 𝑏 ) and if the oﬄine has assigned score 1, then the streaming has also assigned score1. Therefore, any person 𝑖 is assigned score ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑞 𝑖 by the streaming algorithm before the ﬁnal rounding. Case 3 : Í 𝑛𝑖 = ˆ 𝑠 𝑖 < 𝑘 . The adjusting process is analogous to that of case 2 and, hence, it can be similarly shownthat for any person 𝑖 we have ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑞 𝑖 .Additionally, the new sum of scores is 𝑛 Õ 𝑖 = ˜ 𝑠 𝑖 = 𝑛 Õ 𝑖 = 𝑛 𝑖 𝑞 𝑖 = 𝑚 Õ 𝑔 = Õ 𝑖 in group 𝑔 𝑛 𝑖 𝑞 𝑖 = 𝑚 Õ 𝑔 = Õ 𝑖 in group 𝑔 𝑛 𝑔 𝑘 𝑠 𝑔 = 𝑚 Õ 𝑔 = 𝑛 𝑔 𝑠 𝑔 = 𝑘. 𝑛 𝑖 s is randomized because of the random reservoir sampling. The value of 𝑛 𝑖 for candidate 𝑖 who is in a group with at most 𝑘 members is equal to 1, which gives E [ 𝑛 𝑖 ] =

1. If candidate 𝑖 is in a group 𝑔 with 𝑛 𝑔 > 𝑘 people, then E [ 𝑛 𝑖 ] = 𝑛 𝑔 𝑘 𝑘𝑛 𝑔 =

1. Combining, the expected value of 𝑛 𝑖 with the fact that ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑞 𝑖 , wecan compute the expected value of ˜ 𝑠 𝑖 E [ ˜ 𝑠 𝑖 ] = E [ 𝑛 𝑖 𝑞 𝑖 ] = 𝑞 𝑖 E [ 𝑛 𝑖 ] = 𝑞 𝑖 . Since the indicators ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 have values in { , } , the probability 𝑝 𝑖 of person 𝑖 getting selected by thestreaming algorithm is 𝑝 𝑖 = 𝑞 𝑖 . Thus, we see that algorithm 3 is equivalent to the algorithm of lemma 2 andproperties 1 and 2 of the lemma are satisﬁed.The number of groups is 𝑚 = ⌈ 𝜀 ⌉ and each group keeps at most 𝑘 candidates. As a result, the algorithm keepsat most 𝑚𝑘 = ⌈ 𝜀 ⌉ 𝑘 ≤ (cid:0) 𝜀 + (cid:1) 𝑘 candidates pending in total. (cid:3) The size of memory needed by the basic streaming algorithm is determined by the way the representativesare deﬁned within the groups. By keeping a subset of 𝑘 people for each group that has size greater than 𝑘 and distributing the numbers of people they represent uniformly, algorithm 3 keeps at most 𝑂 ( 𝑘𝜀 ) candidatespending. We can improve this by optimizing the process that determines who gets rejected and who representswhich subgroup of rejected people. Instead of distributing the mass of a group uniformly to the candidateswho have not been rejected, we perform a rounding process presented in algorithm 4. Its input consists of therepresentation numbers 𝑛 , . . . , 𝑛 𝑙 , where 𝑛 𝑖 is the number of people the 𝑖 -th candidate represents, and a value 𝑣 , which determines the maximum number of people a pending candidate can represent. At every iteration, itrounds two entries by randomly setting one equal to zero and one equal to their sum, if their sum is less than 𝑣 else it sets one entry equal to 𝑣 and the other one equal to the remainder so that their sum is maintained. Itsgoal is to maximize the number of people it can reject while having each pending candidate represent at most 𝑣 people. Lemma 3.

Let 𝑛 , 𝑛 , . . . , 𝑛 𝑛 𝑔 ∈ { , , . . . , 𝑣 } be a list of the number of people each candidate of the same group 𝑔 represents. Algorithm 4 rounds each 𝑛 𝑖 up to value 𝑣 so that ⌊ 𝑛 𝑔 𝑣 ⌋ elements will be equal to 𝑣 , one element willbe equal to 𝑛 𝑔 − ⌊ 𝑛 𝑔 𝑣 ⌋ 𝑣 and the rest will be equal to 0. Moreover, for all 𝑖 ∈ [ 𝑛 𝑔 ] E [ ˜ 𝑛 𝑖 ] = 𝑛 𝑖 . Proof.

We begin by showing that for all 𝑖 ∈ [ 𝑛 𝑔 ] E [ ˜ 𝑛 𝑖 ] = 𝑛 𝑖 . At step 𝑖 , we have two candidates 𝑎 and 𝑏 whoget rounded and obtain new values denoted by ˜ 𝑎 and ˜ 𝑏 , respectively. The rounding depends on the value of 𝑎 + 𝑏 . If 𝑎 + 𝑏 is at most equal to 𝑣 , then E [ ˜ 𝑎 ] = ( 𝑎 + 𝑏 ) 𝑎𝑎 + 𝑏 = 𝑎 . If 𝑎 + 𝑏 greater than 𝑣 and at most 2 𝑣 , then E [ ˜ 𝑎 ] = 𝑣 𝑣 − 𝑏 𝑣 − 𝑎 − 𝑏 + ( 𝑎 + 𝑏 − 𝑣 ) 𝑣 − 𝑎 𝑣 − 𝑎 − 𝑏 = 𝑎 . Similarly, we obtain E [ ˜ 𝑏 ] = 𝑏 . Since, the expected values remain constantthroughout the process, we conclude that E [ ˜ 𝑛 𝑖 ] = 𝑛 𝑖 .We notice that during the procedure ∀ 𝑗 : 𝑗 < 𝑖 and 𝑗 ≠ pendingIndex 𝑛 𝑗 = 𝑛 𝑗 = 𝑣 . At the end all thecandidates with index 𝑗 : 𝑗 < 𝑖 and 𝑗 ≠ pendingIndex and one of the candidates with index 𝑛 𝑔 or pendingIndexhave value 0 or 𝑣 . Since Í 𝑛 𝑔 𝑖 = 𝑛 𝑖 = 𝑛 𝑔 , we obtain that ⌊ 𝑛 𝑔 𝑣 ⌋ of the people with 𝑗 : 𝑗 < 𝑖 and 𝑗 ≠ pendingIndex havevalue 𝑣 and the rest of them have value 0. The remaining mass is assigned to either the 𝑛 𝑔 -th candidate or thependingIndex. (cid:3) The structure of algorithm 5 is similar to that of algorithm 3. The most important change is the roundingprocess which determines who represents whom and who gets rejected before the stream ends. Since all themembers of one group have the same score ˆ 𝑠 , we adjust the scores, as new people are considered, on the level ofgroups. Speciﬁcally, for every iteration we calculate the scores of the groups so that Í 𝑚𝑔 = 𝑛 𝑔 𝑠 𝑔 = 𝑘 , where 𝑛 𝑔 isthe size of group 𝑔 and 𝑠 𝑔 is its adjusted score. Then, we perform the rounding process while making sure that forall groups 𝑔 no candidate has 𝑛 𝑖 𝑠 𝑔 >

1, where 𝑛 𝑖 is the number of people candidate 𝑖 represents. This is ensuredby setting the argument 𝑣 = ⌊ 𝑠 𝑔 ⌋ . In the end, every person who has not been rejected while the algorithm readsfrom the stream is assigned score ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑠 𝑔 . The ﬁnal step calls the algorithm 1 with input ˆ 𝑠 , . . . , ˆ 𝑠 𝑛 . Theorem 3.

Given individually fair scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 for a set of 𝑛 candidates, the streaming algorithm 5solves the cohort selection problem for any 𝜀 ∈ ( , ] by choosing a cohort with individual selection probabilities 𝑝 , 𝑝 , . . . , 𝑝 𝑛 that:1. are 𝜀 -individually fair 11 lgorithm 4: IRounding

Input: list of representation numbers 𝑛 , 𝑛 , . . . , 𝑛 ℓ ∈ N for ℓ candidates and maximum number ofpeople one person can represent 𝑣 ∈ N Output: list of adjusted representation numbers ˜ 𝑛 , ˜ 𝑛 , . . . , ˜ 𝑛 ℓ ∈ N pendingIndex ← for 𝑖 from to ℓ do if 𝑖 = pendingIndex then continue to next 𝑖 end 𝑎 ← 𝑛 𝑖 𝑏 ← 𝑛 pendingIndex choose 𝑢 randomly from 𝑢𝑛𝑖 𝑓 ( , ) if 𝑎 + 𝑏 ≤ 𝑣 then if 𝑢 < 𝑎𝑎 + 𝑏 then 𝑛 𝑖 ← 𝑎 + 𝑏 𝑛 pendingIndex ← pendingIndex ← 𝑖 else 𝑛 𝑖 ← 𝑛 pendingIndex ← 𝑎 + 𝑏 end else if 𝑢 < 𝑣 − 𝑏 𝑣 − 𝑎 − 𝑏 then 𝑛 𝑖 ← 𝑣 𝑛 pendingIndex ← 𝑎 + 𝑏 − 𝑣 else 𝑛 𝑖 ← 𝑎 + 𝑏 − 𝑣 𝑛 pendingIndex ← 𝑣 pendingIndex ← 𝑖 end end end return 𝑛 , 𝑛 , . . . , 𝑛 ℓ lgorithm 5: Improved streaming cohort selection

Input: list of scores 𝑠 , 𝑠 , . . . , 𝑠 𝑛 ∈ [ , ] for the 𝑛 candidates,the number of individuals that must be selected 𝑘 ,constant 𝜀 ∈ ( , ] Output: list of selection indicators ˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ∈ { , } while stream has individuals do add the new person 𝑒 to group 𝑔 for which 𝑠 𝑒 ∈ [( 𝑔 − ) 𝜀, 𝑔𝜀 ] ˆ 𝑠 𝑒 ← 𝑔𝜀 𝑛 𝑒 ← sum ← Í 𝑒𝑖 = ˆ 𝑠 𝑖 if sum < 𝑘 then 𝑐 ← 𝑘 − 𝑠𝑢𝑚𝑛 for any group 𝑔 set 𝑠 𝑔 ← 𝑔𝜀 + 𝑐 while ∃ group 𝑔 with 𝑠 𝑔 > do set 𝑛 < equal to the number of individuals from groups with 𝑠 𝑓 < for any group 𝑓 with 𝑠 𝑓 < 𝑠 𝑓 ← 𝑠 𝑓 + 𝑛 𝑔 ( 𝑠 𝑔 − ) 𝑛 < 𝑠 𝑔 ← end else if sum > 𝑘 then 𝑐 ← 𝑠𝑢𝑚 − 𝑘𝑛 for any group 𝑔 set 𝑠 𝑔 ← 𝑔𝜀 − 𝑐 while ∃ group 𝑔 with 𝑠 𝑔 < do set 𝑛 > equal to the number of individuals from groups with 𝑠 𝑓 > for any group 𝑓 with 𝑠 𝑓 > 𝑠 𝑓 ← 𝑠 𝑓 + 𝑛 𝑔 𝑠 𝑔 𝑛 > 𝑠 𝑔 ← end else for any group 𝑔 set 𝑠 𝑔 ← 𝑔𝜀 end for group 𝑔 do if 𝑠 𝑔 > then /* 𝑣 is the max number of people a person 𝑖 can represent s.t. 𝑛 𝑖 𝑣 ≤ */ ({ 𝑛 𝑖 } 𝑖 in group 𝑔 ) = IRounding( { 𝑛 𝑖 } 𝑖 in group 𝑔 , 𝑣 = ⌊ 𝑠 𝑔 ⌋ ) for any person 𝑖 in group 𝑔 with 𝑛 𝑖 = 𝑠 𝑖 ← end end end for any group 𝑔 and any person 𝑖 in 𝑔 set ˜ 𝑠 𝑖 ← 𝑛 𝑖 𝑠 𝑔 return Rounding(˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 ) 13. achieve cohort selection utility 𝑂𝑃𝑇 − 𝑘𝜀 ≤ Í 𝑛𝑖 = 𝑝 𝑖 𝑠 𝑖 ≤ 𝑂𝑃𝑇 .In addition, it keeps at most 𝑂 ( 𝑘 + 𝜀 ) candidates pending. Proof.

By comparing algorithm 5 to the algorithm described in lemma 2 we want to prove that they are equivalent.In particular, we show that the probability 𝑝 𝑖 that individual 𝑖 is selected by algorithm 5 is equal to the probability 𝑞 𝑖 that 𝑖 is selected by the oﬄine algorithm with input the modiﬁed scores ˆ 𝑠 , . . . , ˆ 𝑠 𝑛 , where ˆ 𝑠 𝑖 = 𝑔𝜀 if 𝑠 𝑖 ∈[( 𝑔 − ) 𝜀, 𝑔𝜀 ] . The ﬁrst step is to prove that the scores ˜ 𝑠 , ˜ 𝑠 , . . . , ˜ 𝑠 𝑛 which are the input of the ﬁnal rounding inline 31 satisfy the following properties:1. Í 𝑛𝑖 = ˜ 𝑠 𝑖 = 𝑘 E [ ˜ 𝑠 𝑖 ] = 𝑞 𝑖 , ∀ 𝑖 ∈ [ 𝑛 ] .We saw in the proof of theorem 1 that for the oﬄine algorithm all candidates of the same group have thesame selection probability 𝑞 𝑖 . We will show that for any person 𝑖 we have 𝑞 𝑖 = 𝑠 𝑔 , where 𝑠 𝑔 is the ﬁnal calculatedscore of the group 𝑔 to which 𝑖 belongs. The 𝑠 𝑔 we are referring to is calculated by the ﬁnal execution of lines 5-24.Lines 6-24 of algorithm 5 describe the same procedure as lines 2-18 of algorithm 2, but from the point of viewof groups instead of individuals. We consider three cases that depend on the value of the sum of the initial scores. Case 1 : Í 𝑛𝑖 = ˆ 𝑠 𝑖 = 𝑘 . The oﬄine algorithm does not change the scores of the individuals and, thus, assignsprobability 𝑞 𝑖 = ˆ 𝑠 𝑖 = 𝑔𝜀 to candidate 𝑖 of group 𝑔 . Similarly, algorithm 5 assigns score 𝑠 𝑔 = 𝑔𝜀 to group 𝑔 and inline 30 sets ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑠 𝑔 = 𝑛 𝑖 𝑔𝜀 = 𝑛 𝑖 𝑞 𝑖 for the people who have not been rejected. Case 2 : Í 𝑛𝑖 = ˆ 𝑠 𝑖 < 𝑘 . The process that calculates 𝑠 𝑔 starts by setting 𝑠 𝑔 = 𝑔𝜀 + 𝑐 . The oﬄine algorithm ini-tializes the probability of 𝑖 who is a member of 𝑔 as 𝑞 𝑖 = ˆ 𝑠 𝑖 + 𝑐 = 𝑔𝜀 + 𝑐 . If for all groups 𝑔 , 𝑔𝜀 + 𝑐 ≤

1, then theadjustment stops for both algorithms and we have that for any group 𝑔 and any member 𝑖 of 𝑔 has 𝑞 𝑖 = 𝑠 𝑔 . If thereexists 𝑔 such that 𝑞 𝑖 >

1, then the corresponding 𝑠 𝑔 exceeds 1 by the same amount. Additionally, at this pointthe probabilities of all people in the same group as 𝑖 will exceed 1 by this amount. Therefore, the 𝑛 < of the twoalgorithms is the same. The oﬄine algorithm runs the loop for all members of all groups that have individualswith 𝑞 𝑖 >

1. The streaming version aggregates the excess mass from all 𝑛 𝑔 members of group 𝑔 and redistributesit all at once instead of running separate iterations for every member of the group as the oﬄine does. No extramass is added after the initialization but it is only moved from group to group. Hence, we obtain that at the endof this process 𝑞 𝑖 = 𝑠 𝑔 . Case 3 : Í 𝑛𝑖 = ˆ 𝑠 𝑖 > 𝑘 . Analogously to case 2, if 𝑖 is a member of group 𝑔 , then 𝑞 𝑖 = 𝑠 𝑔 and for all .Those who were rejected have ˜ 𝑠 𝑖 = 𝑛 𝑖 =

0. As a result, we see that for any person 𝑖 , ˜ 𝑠 𝑖 = 𝑛 𝑖 𝑞 𝑖 . From thiswe can infer that 𝑛 Õ 𝑖 = ˜ 𝑠 𝑖 = 𝑛 Õ 𝑖 = 𝑛 𝑖 𝑞 𝑖 = 𝑚 Õ 𝑔 = Õ 𝑖 ∈ 𝑔 𝑛 𝑖 𝑠 𝑔 = 𝑚 Õ 𝑔 = 𝑛 𝑔 𝑠 𝑔 = 𝑚 Õ 𝑔 = Õ 𝑖 ∈ 𝑔 𝑞 𝑖 = 𝑘. As new people are added to the groups, the group scores 𝑠 𝑔 become smaller in order for the sum of scores 𝑛 𝑔 𝑠 𝑔 to be equal to 𝑘 . Therefore, the maximum number 𝑣 of people that can be represented by a candidate in agiven group either stays the same or increases after every iteration. By lemma 3, we obtain that the roundingprocess maintains the expected value of the number of people each candidate represents equal to their initialvalue. Since every person begins by representing only themselves, we have that for the 𝑖 -th candidate E [ 𝑛 𝑖 ] = E [ ˆ 𝑠 𝑖 ] = E [ 𝑛 𝑖 𝑞 𝑖 ] = E [ 𝑛 𝑖 ] 𝑞 𝑖 = 𝑞 𝑖 , because the calculation of 𝑠 𝑔 s is deterministic. The ﬁnal rounding process makes the ﬁnal decisions and outputs 0if the candidate is rejected and 1 if the candidate is selected. Due to properties 1 and 2, the probability of candidate 𝑖 being selected by the streaming algorithm is 𝑞 𝑖 . Thus, algorithm 5 and the oﬄine algorithm with scores roundedto multiples of 𝜀 have the same selection probabilities. The theorem follows by the application of lemma 2.14ecause of the online score adjustments, we have that for 𝑛 ≥ 𝑘 the sum of all the scores is equal to 𝑘 at theend of each loop. Therefore, we have Í 𝑚𝑔 = 𝑛 𝑔 𝑠 𝑔 = 𝑘 . If 𝑠 𝑓 >

0, each person can represent at most ⌊ 𝑠 𝑔 ⌋ candidates.By lemma 3, the number of representatives per group is at most & 𝑛 𝑔 ⌊ 𝑠 𝑔 ⌋ ' ≤ 𝑛 𝑔 ⌊ 𝑠 𝑔 ⌋ + ≤ 𝑛 𝑔 𝑠 𝑔 + , since 𝑛 𝑔 ⌊ 𝑠𝑔 ⌋ ≤ 𝑛 𝑔 𝑠 𝑔 . If we sum the number of representatives for all groups we obtain 𝑚 Õ 𝑔 = & 𝑛 𝑔 ⌊ 𝑠 𝑔 ⌋ ' ≤ 𝑚 Õ 𝑔 = ( 𝑛 𝑔 𝑠 𝑔 + ) = 𝑘 + 𝜀 . This completes the proof. (cid:3)

References [AKRZ21] Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, and Juba Ziani. PipelineInterventions. In

Innovations in Theoretical Computer Science Conference , ITCS ’21, 2021. https://arxiv.org/abs/2002.06592 .[BKN +

17] Amanda Bower, Sarah N Kitchen, Laura Niss, Martin J Strauss, Alexander Vargas, and Suresh Venkata-subramanian. Fair pipelines. arXiv preprint arXiv:1707.00391 , 2017.[CVZ10] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen. Dependent randomized rounding via exchangeproperties of combinatorial structures. In

IEEE Symposium on Foundations of Computer Science , FOCS’10. IEEE, 2010. https://arxiv.org/abs/0909.4348 .[DHP +

12] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness throughawareness. In

Innovations in Theoretical Computer Science , ITCS ’12. ACM, 2012.[DI19] Cynthia Dwork and Christina Ilvento. Fairness under composition.

Innovations in Theoretical Com-puter Science , 2019. http://arxiv.org/abs/1806.06122 .[DIJ20] Cynthia Dwork, Christina Ilvento, and Meena Jagadeesan. Individual Fairness inPipelines. In

Symposium on Foundations of Responsible Computing , FORC ’20, 2020. https://arxiv.org/abs/2004.05167 .[HPS16] Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervisedlearning. In

Conference on Neural Information Processing Systems , NIPS ’16, 2016. https://arxiv.org/abs/1610.02413 .[KRZ19] Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream eﬀects of aﬃrmative ac-tion. In

Conference on Fairness, Accountability, and Transparency , FAT* ’19. ACM, 2019. https://arxiv.org/abs/1808.09004 .[SMLN20] Niklas Smedemark-Margulies, Paul Langton, and Huy L Nguyen. Fair and useful cohort selection. arXiv preprint arXiv:2009.02207arXiv preprint arXiv:2009.02207