[PDF] Fair and Useful Cohort Selection

Abstract

As important decisions about the distribution of society's resources become increasingly automated, it is essential to consider the measurement and enforcement of fairness in these decisions. In this work we build on the results of Dwork and Ilvento ITCS'19, which laid the foundations for the study of fair algorithms under composition. In particular, we study the cohort selection problem, where we wish to use a fair classifier to select k candidates from an arbitrarily ordered set of size n>k , while preserving individual fairness and maximizing utility. We define a linear utility function to measure performance relative to the behavior of the original classifier. We develop a fair, utility-optimal O(n) -time cohort selection algorithm for the offline setting, and our primary result, a solution to the problem in the streaming setting that keeps no more than O(k) pending candidates at all time.

Full PDF

aa r X i v : . [ c s . D S ] S e p Fair and Useful Cohort Selection

Niklas Smedemark-Margulies ∗ Northeastern [email protected] Paul Langton ∗ Northeastern [email protected] Huy Lê Nguy˜ênNortheastern [email protected] 7, 2020

Abstract

As important decisions about the distribution of society’s resources become increasingly automated, it isessential to consider the measurement and enforcement of fairness in these decisions. In this work we build on theresults of Dwork and Ilvento in [1], which laid the foundations for the study of fair algorithms under composition.In particular, we study the cohort selection problem, where we wish to use a fair classiﬁer to select k candidatesfrom an arbitrarily ordered set of size n > k , while preserving individual fairness and maximizing utility. We deﬁnea linear utility function to measure performance relative to the behavior of the original classiﬁer. We develop afair, utility-optimal O ( n )-time cohort selection algorithm for the oﬄine setting, and our primary result, a solutionto the problem in the streaming setting that keeps no more than O ( k ) pending candidates at all time. Many aspects of life are now determined by automated systems, leading to increasing concerns about the fairnessof such systems [2] [3]. Following the pioneering work of Dwork et al. [4], a large body of work has been developedsimultaneously in the machine learning, theoretical computer science and economics communities. There are twocommon families of notions regarding fairness: individual fairness and group fairness. In this work, we focus onthe setting of individual fairness. Intuitively, individual fairness requires the algorithm under consideration totreat similarly-qualiﬁed individuals (using a task-speciﬁc metric) similarly. Under this notion, algorithms have beendeveloped for many tasks that are individually fair and eﬃcient. However, in practice, most systems are complexand consist of many separate building blocks. A natural question is whether it is possible to combine fair buildingblocks into a fair composition. Dwork and Ilvento initiated a rigorous study of this topic in [1]. Interestingly, evenfor simple compositions, the competition among diﬀerent tasks for the same individual and the competition amongindividuals for the same task already leads to dependent outcomes, aﬀecting fairness. This situation arises naturally,for example, when there is a limited resource to distribute and individuals must be considered in order of appearance(see Example 1).

Example 1 (Cohort Selection) . Suppose we are building an automated selection system that will select k candidatesby the end of an application period. To ensure our system is useful, we wish to maximize the beneﬁt of the cohort,but also ensure each applicant is evaluated solely on their relevant qualiﬁcations for the task. More precisely, our goal is to select k individuals from a universe U of n candidates. To do so, we are given accessto a fair classiﬁer C to evaluate each candidate. It is important to note that the fair classiﬁer is oblivious to thelimit k and thus, if more than k candidates are qualiﬁed, we must to enforce the limit in a fair way. This problemwas previously studied in [1] as an archetype of fairness under composition. They developed several algorithms forthe oﬄine setting, i.e. where all individuals are available for consideration at the same time, and the online setting,i.e. when the individuals appear one by one and the system needs to make an irrevocable decision before seeing thenext person.When considering fairness, we can always compare to a simple baseline: ignore the classiﬁer C and select k random individuals. In some settings, this baseline shows that a fair algorithm trivially exists, and in some settings,showing that this baseline cannot be implemented can suggest that the problem has no fair solution.In this work, our contribution is twofold: ∗ Equal contribution The oﬄine setting.

When multiple algorithms exist, to diﬀerentiate them from the baseline, we consider anatural notion of utility and develop a new fair algorithm achieving the optimal utility. Interestingly a simpleexample shows that even in the oﬄine setting, the algorithms from [1] might lose a constant factor comparedwith the optimal utility. • The streaming setting.

The work [1] shows that no fair online algorithm exists (even the baseline of uniformrandom selection is not implementable). We develop a relaxed model where decisions need not be immediate,but at any given point in time, the number of pending decisions must be as small as possible, thus retainingthe eﬃciency of online algorithms. We give an algorithm for fair cohort selection which leaves no more than O ( k ) candidates pending at any given step and achieves the optimal utility. The baseline algorithm mentioned above, when implemented in the online setting, is precisely the canonical reservoirsampling algorithm. When also taking the classiﬁer results into account, the problem is related to weighted reservoirsampling (see for example [5] and references therein). Interestingly, the special case k = 1 is exactly weightedreservoir sampling, but even for k = 2, the problem is already very diﬀerent from weighted reservoir sampling withreplacement.In Dwork and Ilvento’s [1] discussion on equal degrees of composition, it is apparent that theoretically fairsystems can lose a great deal of their usefulness in practice. The authors propose utility as an important metricfor understanding the loss incurred due to enforcing fairness. [6] discusses the implications of composing relaxed-fairsystems. It identiﬁes scenarios where the composed system can be fair, but does not focus heavily on the tradeoﬀsmade to achieve fairness. This makes it diﬃcult for potential implementors of fair algorithms to assess the costs andrisks. Armed with the tools for utility described in this paper, [6] could provide critical understanding for the readerabout the practical implications of enforcing the fairness they describe.We make heavy use of the individual fairness deﬁnition in [4] (reﬁned in [1]), who source their inspiration forfairness formalisms directly from [7].Relevance, impacts, and importance of algorithmic fairness in the current cultural context are discussed through-out our works cited, but especially in [3], [8]. [9], for example, underscores the importance of algorithmic fairness inanalyzing recidivism. In this work we consider a set of n individuals from a universe U , where we need to select exactly k individualsfairly based on the output of an individually fair classiﬁer C . In this section we review the deﬁnition of fairness weare considering, build a measure of utility that expresses the discriminative power of C , and develop a fundamentalmethod for later use, which we call integer exact marginal rounding (EMR). We will use Y to denote a set or list, y to denote an element of that list, and y i for the i th element of Y . A call toa function named “example” will appear in monospaced font. We include a short index of commonly used symbolsand their meanings for quick reference at the end of the paper. We follow the deﬁnition of individual fairness proposed by Dwork et al in [4].

Deﬁnition 1 (Individual Fairness) . For a given metric D over a universe of individuals to be classiﬁed U , and arandom classiﬁer C : U → { , } . We say that the classiﬁer is individually fair with respect to that metric if andonly if, ∀ u, v ∈ U , D ( u, v ) ≥ | P r [ C ( u ) = 1] − P r [ C ( v ) = 1] | .3 Utility We begin by establishing a basic notion of utility for any group of candidates.Even as we try to measure and enforce fair outcomes from some classiﬁcation procedure, we must also deﬁneand optimize a utility objective to maintain the ability to distinguish between well-qualiﬁed and poorly-qualiﬁedcandidates.Without a utility objective, we can almost always resort to trivially “fair” solutions, such as treating all individualsthe same and using no information about their qualiﬁcations. This approach is intuitively not useful, because thereare valid reasons for distinguishing between individuals, and a fair allocation of resources need not be uniform(discussed in Related Work and further in [1]).Instead, we consider a simple linear utility function.

Deﬁnition 2 (Utility) . Consider a set of candidates W whose individual probabilities of selection by C are ~s = s , . . . , s n , and with utilities ~u ∈ [0 , n and P i u i > . We deﬁne the utility of W to be n X i =0 s i u i (1)Even though we would like to optimize the utility, following [1], we consider only composition algorithms that donot have access to the utility values ~u . In many cases, utility may be derived from metrics not relevant to the taskitself. For example, utility may be gained from team synergy, role model status, company culture ﬁt, diversity, etc.Thus this is a natural setting to consider. Intuitively, in this setting, the algorithm needs to optimize for the worstcase and tries to match the probabilities of C as closely as possible. We formalize this intuition in the followingsimple deﬁnition and lemma. Deﬁnition 3 (Optimal Utility Cohort Selection) . An algorithm A with selection probabilities p , . . . , p n achievesoptimal utility for cohort selection on ~s if it is a solution to max alg min ~u P i p i u i P i s i u i (2)The following lemma shows that we can optimize the worst case utility, even when the composition algorithmonly has access to the probabilities s i from classiﬁer C , and does not have access to the utilities ~u . Lemma 1. min { ~u } P ni =0 p i u i P ni =0 s i u i ≤ min i p i s i (3) Proof.

Let j be the coordinate with the smallest ratio between p j and s j : p j /s j ≤ p i /s i , ∀ i = j . Consider the ~u ∗ vector with u ∗ j = 1, and u ∗ i = 0 , ∀ i = j . Then we have:min ~u P ni =0 p i u i P ni =0 s i u i ≤ P ni =0 p i u ∗ i P ni =0 s i u ∗ i = p j u ∗ j s j u ∗ j = p j s j = min i p i s i Thus we see that the ratio of probabilities at the most extreme coordinate gives an upper bound on the utilitythat any algorithm can achieve, relative to the original utility. In order to improve the worst case utility, an algorithmmust improve its most extreme coordinate. 3 .4 Dependent Rounding

In this section we develop a solution for Fair Useful Cohort Selection in the special case where the probabilities from C sum up exactly to the desired number of selection k . The solutions for the general case of both the oﬄine andstreaming settings will build on this subroutine. The solution to this special case is a simple dependent rounding,but it is worth noting that several common strategies fail to achieve the optimal utility in it. Example 2.

Consider the example with 3 candidates s = 0 . , s = 0 . , s = 1 and we would like to select k = 2 individuals. The optimal solution is to pick { , } and { , } each with probability . . The WeightedSamplingalgorithm of [1] selects each subset with probability proportional to its weight, and would select { , } with probability / , { , } with probability / , and { , } with probability / , losing a factor of / in utility in the worst case.Weighted reservoir sampling with replacement selects { , } with probability / , { , } with probability / , and { , } with probability / , losing a factor of / in utility in the worst case. Our procedure, called Exact Marginal Rounding (EMR), randomly rounds a list of probabilities that sum to k ∈ Z into a list of integers so that the expected value of each element after rounding is equal to its original marginalprobability. The algorithm iteratively rounds two fractional entries at a time until there is no fractional entry left.Suppose the two fractional values are a and b . If a + b ≤

1, the algorithm randomly rounds one of them to a + b and the other one to 0 with appropriate probabilities to preserve the marginal probabilities. The case a + b > list )” or “EMR( list , 1)”, referring to usingInteger Exact Marginal Rounding in Lemma 2 on list , or we will use “EMR( list , a )”, referring to using the Non-Integer Exact Marginal Rounding in Corollary 2, up to a value of a < Algorithm 1:

Integer Exact Marginal Rounding input :

Array S of length N s.t. S [ i ] ∈ [0 , P S = K output: ˜ S ∈ { , } N set pendingIndex to 1 for i from to N − do if i == pendingIndex then continue to next i set a to S [ i ] set b to S [ pendingIndex ] set u to Unif( ) if a + b ≤ then // One gets all, other gets 0 if u < aa + b then set ˜ S [ i ] to a + b set pendingIndex to i set ˜ S [ i + 1] to 0 else set ˜ S [ i ] to 0 set pendingIndex to i + 1 set ˜ S [ i + 1] to a + b else // One gets 1, other gets remainder if u < − b − a − b then set ˜ S [ i ] to 1 set pendingIndex to i + 1 set ˜ S [ i + 1] to a + b − else set ˜ S [ i ] to a + b − set pendingIndex to i set ˜ S [ i + 1] to 1 return S 4 emma 2 (Integer Exact Marginal Rounding) . Let s . . . s n ∈ [0 , be a list of probabilities with P s i = k ∈ N .Algorithm 1 rounds each s i such that k elements will be equal to , n − k elements will be equal to , and E [ EM R ( s i )] = s i .Proof. We ﬁrst show that E [ EM R ( s i )] = s i , ∀ i .At the current step i , let the two candidates under consideration be A and B , and let their initial probability ofselection be a and b , and their adjusted value after the current step be ˜ a and ˜ b . We must consider two cases: Case 1 : a + b ≤

1. With probability aa + b , we set ˜ a = a + b , and with probability 1 − aa + b , we set ˜ a = 0. Thus,we have for individual A : E [˜ a ] = (cid:18) aa + b (cid:19) ( a + b ) + (cid:18) − aa + b (cid:19) (0)= a The same holds analogously for B . Case 2 : 1 < a + b ≤

2. With probability − b − a − b , we set ˜ a = 1, and with probability 1 − − b − a − b = − a − a − b , we set˜ a = a + b −

1. Thus we have for individual A : E [˜ a ] = (cid:18) − b − a − b (cid:19) (1) + (cid:18) − a − a − b (cid:19) ( a + b − − b ) + ( a + b − − a ( a + b − − a − b = a Again, this holds analogously for B .Next, we show that there are exactly k items equal to 1 and n − k items equal to 0. Note that at every step ofthe process, either one of the elements was already integer, or the number of integer elements increases by at least1 (in a rare case, a + b = 1 and both elements become integer in a single step). We take n steps, and therefore theﬁnal list ˜ S contains n integer elements. Furthermore, notice that probability mass is conserved during this process,so that the ﬁnal list ˜ S still sums to k . Since ˜ S contains all integer elements and sums to k , then we know that atthe end there are exactly k individuals with 1 and the other n − k have 0. Corollary 1.

Alg 1 is individually fair.Proof.

By Lemma 2 each individual is rounded with exactly their original marginal probability.

Corollary 2 (Non-Integer Exact Marginal Rounding) . Let s . . . , s n ∈ [0 , β ] , β < be a list of probabilities whosesum is equal to βK . We can use Alg 1 to round each s i such that K elements will be equal to β and n − k elementswill be equal to , and E [ EM R ( s i )] = s i .Proof. Note that there is a 1-to-1 mapping to an input for Alg 1; we simply scale all entries s i by 1 /β . We knowthat Algorithm 1 results in a rounded list ˜ S containing exactly k elements equal to 1, so we can then map back to arounded list containing exactly k elements equal to β . We begin with the scenario of selecting k of n individuals when all candidates are known in advance. ConsideringExample 1, this corresponds to the idealized case in which all candidates simultaneously submit their applicationsand are evaluated in a single batch. Dwork and Ilvento give a solution to this scenario in [1] (with a constant factorlost in utility as shown above); we present an alternate one that achieves the optimal utility.In contrast with the above special case, the sum of probabilities given by P i s i need not add up to k . If thesum exceeds k , the algorithm simply scales down all probabilities so that the new sum is k and it is not hard toshow that this operation preserves fairness. The harder case is when the sum is smaller than k . Scaling up theprobabilities is not possible since it increases the gap among candidates and can be unfair. Intuitively, the solutionis to additively increase all probabilities by the same amount, thus preserving the gap and the fairness. However,some care is needed, as no probability can exceed 1. 5 lgorithm 2: Useful Oﬄine Cohort Selection input : S : a list of P r [ C ( w i ) = 1] for candidates w , . . . , w n k : number of individuals that must be selected output: cohort: list of k indices corresponding to chosen candidates sum ← P i s i set P to S if sum < k then set c to k − sumn set p i to p i + c, ∀ i ∈ { , . . . , n } while ∃ p i > in P do distribute ( p i −

1) uniformly to all p j < P set p i to 1 // Now the list sums to exactly k and each item is in [0 , set rounded to EMR( P ) set cohort to all x ∈ rounded with x = 1 else set P to P · ksum set rounded to EMR( P ) set cohort to all x ∈ rounded with x = 1 return cohort Table 1: Behavior of Alg 2 n = 5, k = 2, P i s i < k , no candidates adjusted to 1 s i value 0.1 0.1 0.2 0.2 0.9 A ( s i ) output 0.2 0.2 0.3 0.3 1.0 n = 5, k = 2, P i s i < k , some candidates adjusted to 1 s i value 0.1 0.1 0.2 0.3 0.8 A ( s i ) output 0.2 0.2 0.3 0.4 0.9 n = 5, k = 1, P i s i > k s i value 0.3 0.2 0.5 0.4 0.6 A ( s i ) output 0.15 0.1 0.25 0.2 0.3 Problem 1 (Fair Oﬄine Cohort Selection) . Given a set of candidates W , | W | = n , with utilities ~u and an individuallyfair classiﬁer C , where P r [ C ( w i ) = 1] ≡ s i , select exactly k candidates from W while maintaining fairness with respectto C . Lemma 3. If sum < k , Algorithm 2 increases each probability s i to p i = s i + α i ≥ s i and all of the candidates j with p j < receive the same adjustment value α j = v .Proof. First we observe that p i ≥ s i , ∀ i . At each step of the algorithm the value of p i either increases, or is clippedat 1. Since s i ∈ [0 , , ∀ i , we therefore know that p i ≥ s i .Algorithm 2 initially adds a constant amount c to every candidate, then repeatedly redistributes overﬂow evenlyacross all candidates with s i + α i <

1. We prove by induction that for all i with p i <

1, their increment α i are allthe same. Consider two candidates i, j such that their ﬁnal p i , p j <

1. As argued above, both p i , p j are smaller than1 throughout the execution of the algorithm. At the beginning, α i = α j = c so the claim holds.For the inductive step, assume that after step d of redistribution, we have α i = p i − s i = α j = p j − s j . We willargue that this equality still holds after step d + 1. Suppose that in step d + 1, the algorithm distributes an overﬂowvalue x to m candidates. As noted before, i, j are among these m candidates. Thus, both p i and p j are increased by x/m and α i = α j after step d + 1. Theorem 1.

Alg 2 is individually fair. roof. By assumption, the s i we are given are from an individually fair C , thus D ( u, v ) ≥ | E [ C ( u )] − E [ C ( v )] | , ∀ u, v ∈ U . To show Alg. 2, is individually fair, we must either preserve or decrease the diﬀerence between any pair ofindividuals. Let A ( u ) for u ∈ U denote the probability that u is selected by Alg. 2. Thus we must show, for all( u, v ): | E [ C ( u )] − E [ C ( v )] | ≥ | A ( u ) − A ( v ) | We apply C to a set of candidates from U and obtain two disjoint sets of output candidates: let S (=1) be the setof candidates whose output A ( s i ) = 1, and let S ( < be the set of candidates whose output A ( s i ) <

1. Let S all bethe union of these two disjoint sets.1. First, consider when P i s i < k . Here, each element s i ∈ S all is increased by an amount α i s.t. P i ( s i + α i ) = k .Then we apply EMR, where we know by Lemma 2 that Pr[EMR selects s i + α i ] = s i + α i . We thus need tocover three cases.(a) First, we consider two elements s i , s j ∈ S (=1) | A ( s i ) − A ( s j ) | = | s i + α i − ( s j + α j ) | = | − | ≤ | s i − s j | (b) Next, we have s i , s j ∈ S ( < By lemma 3, these candidates all receive the same adjustment. Let thisadjustment be called b . | A ( s i ) − A ( s j ) | = | s i + b − ( s j + b ) | = | s i − s j | (c) Finally we have s i ∈ S (=1) , s j ∈ S ( < . Note that α i − α j ≤

2. Next, consider P i s i ≥ k . Here we know k P i s i ≤

Alg 2 achieves optimal utility for Fair Useful Oﬄine Cohort Selection.Proof.

We must show that Alg. 2 is a solution to max alg min i p i s i (4)Let S ( < and S (=1) be deﬁned as in Theorem 1, and note that S (=1) may be empty. Consider for contradiction anarbitrary fair algorithm A ′ with probabilities of selection p ′ i s.t.min i p ′ i s i > min i p i s i

1. First, consider the case where P i s i < k and S (=1) is nonempty. Let individual j have the most extreme ratio: p j s j = min i p i s i . In order for this ratio to be minimal, we know that individual j must have s j > s i ∀ i and be amember of S (=1) . Therefore s j + α j = 1, and we have that for Algorithm A , min i p i s i = s j + α j s j = s j .By our deﬁnition of A ′ , there is some potentially diﬀerent most extreme element l s.t. p ′ l s l = min i p ′ i s i > p j s j .7 If s l = s j , the largest minimum ratio p ′ l s l that A ′ can achieve is s l = s j , which is the same ratio asAlgorithm A and contradicts our assumption. • If s l = s j , we know s l < s j , since we already know that s j is at least as big as all other elements. Then,by fairness, α ′ j ≤ α ′ l . Then we can construct the following: α ′ l /s l > α ′ j /s j α ′ l /s l > α ′ j /s j ( s l + α ′ l ) /s l > ( s j + a ′ j ) /s j (5)This contradicts our assumption that p ′ l s l is the minimum ratio for A ′ .2. Next, consider the case where P i s i < k and S (=1) group is empty. Again, let element j be the most extremeelement for algorithm A , i.e. min i p i s i = p j s j .Consider A ′ on its most extreme element l constructed as in Item 1.(a) This time, if s l = s j , we may have p ′ l = 1 > p j , achieving greater utility. As before, s j ≥ s i , ∀ i = j .By fairness, if A ′ adjusts element l by a certain amount, it must adjust all smaller elements by at leastthe same amount. However, we know that P i p i = k , so this required extra adjustment will mean that P i p ′ i > k , and A ′ will not be selecting exactly k candidates.(b) If s l < s j , then we still know that a ′ j ≤ a ′ l , and we again have the contradiction in (5)3. Finally, consider the case where P i s i ≥ k .In alg 2, all elements will be multiplied by a factor of k P i s i , and then by Lemma 2, any element will satisfy p i = ks i P i s i , so we have: min i p i /s i = min i ( ks i P i s i ) /s i = k P i s i Notice that in this case, all elements are adjusted by the same constant ratio. Using extreme element l foralgorithm A ′ as constructed previously, and again supposing that this element is more extreme than any elementin algorithm A , we derive the following. min i p ′ i s i = p ′ l s l > k P i s i p ′ l > s l k P i s i Since element l was the minimum ratio for algorithm A ′ , it follows that for all i , p ′ i /s i > (cid:18) k P i s i (cid:19) p ′ i > s i (cid:18) k P i s i (cid:19) Taking the sum on both sides we see X i p ′ i > X i s i (cid:18) k P i s i (cid:19) = k As before, the number of elements chosen by algorithm A ′ will be more than k , which is a contradiction.Thus we conclude A ′ cannot exist and Alg. 2 achieves optimal utility for Fair Oﬄine Cohort Selection in all cases.8 Streaming Cohort Selection

We now consider the streaming scenario, in which decisions are made before all candidates have been seen. Dwork et al. [1] show that in the true online setting, where decisions must be made before next candidate is observed, afair selection is impossible. Speciﬁcally, consider an individually fair algorithm that receives a stream of exactly k individuals; it must clearly select all k individuals in order to complete the cohort selection task. Now, let thealgorithm receive an identical stream, followed by one additional candidate. We know it must select the ﬁrst k candidates, based on its behavior on the previous stream; however this means that the k + 1st candidate has nochance to be chosen, violating individual fairness. Therefore, we know the requirement for strict online decisionmaking must be relaxed to make progress on this problem.Motivated in part by the deferred-acceptance algorithm in the context of matching [10], we consider the settingwhere candidates are tentatively accepted but might be rejected later on as more candidates come. To maintain theeﬃciency of online algorithms, our goal is to minimize the number of candidates with a pending decision at all pointsin time. It is clear that the number of candidates on hold must be at least k since we need to accept k candidatesat the end. Problem 2 (Fair Useful Streaming Cohort Selection) . Consider the fair classiﬁer C and candidates W = { w . . . w n } ,with utilities x i . Given π , a stream of classiﬁcations from C on w i ∈ W , select exactly k candidates from π whilemaximizing utility, respecting individual fairness, and with as few candidates pending as possible. Below, we provide an algorithm that achieves optimal utility as deﬁned in 3 for the Streaming Cohort Selectionproblem, while leaving only O ( k ) candidates pending. The algorithm uses a parameter α ∈ [0 , / α and we can set α to minimize this number; we will pick α = 1 / The algorithm maintains a set of k/α candidates with highest probabilities called top and a set rest of the remainingcandidates. As a new candidate s appears, s is added to either top (and thus bumps some other candidate from top to rest ), or they are added to rest . The algorithm needs to make sure the list rest is not too big by roundingand eliminating candidates. It is tempting to use the idea from the oﬄine case: take two candidates from rest withprobabilities a and b and round them to a + b and 0. However, if at the end of the stream, the total probabilitiesis less than k and we need to increase all probabilities, then some probability might exceed 1. The key idea (andmotivation for having top ) is that if the total is less than k , the probabilities for candidates not in the top k/α isat most α . Furthermore, since the adjustment to make the total probability equal to k preserves the ordering of theprobabilities, their increments are also at most α . Thus, it is safe to round probabilities in rest to 0 or 1 − α < k then we need to increase all probabilities.Observe that all candidates in rest have the same increment (due to the additive adjustment procedure, and thefact that all their probabilities are less than 1 after the increment). We can compute exactly the increment forthe probabilities in top and the increment for all probabilities outside of top . Before the increment, we store allcandidates with non-zero probabilities in top ∪ rest ; refer to this set as A . After the increment, we still have theprobabilities of everyone in A and we know everyone in W \ A has the same probability (the value of their increment).The second key idea is that we can round the probabilities in W \ A by maintaining a set called randoms consistingof k uniformly random samples from W \ A . Each member of randoms receives probability equal to 1 /k times thetotal probability of W \ A . Since everyone in S \ A has the same probability, this rounding clearly preserves themarginal probabilities. Finally, we can use the oﬄine algorithm to select the cohort from the union of randoms and A . The case where the sum of probabilities exceeds k is simpler. The algorithm always scales down all probabilitiesso that they add up to exactly k . When a new candidate appears, they receive a probability equal to what C givesthem, times the current scaling factor, and get added to top ∪ rest . The algorithm then scales down all probabilitiesso that the sum is exactly k and reduces the size of top ∪ rest by repeatedly rounding pairs of candidate as in theoﬄine setting. In this case, the list randoms is not used.We present the algorithm as a series of update rules to be applied when a new stream element s is encountered. Thealgorithm keeps track of the sum of the probabilities encountered so far (including the new element s ) and performsdiﬀerent updates depending on the comparison between the sum and k . At the end of the stream, depending on thecomparison between the sum and k , the algorithm also selects the cohort among the pending candidates diﬀerently.9 .2 Before k Algorithm 3:

Streaming Update for new element s with sum + s < k set sum to sum + s if top has fewer than ⌈ k/α ⌉ elements then add s to top else if minimum element in top < s then remove minimum element from top , add it to rest add s to top else add s to rest set rounded to EMR( rest , − α ) for x with probability in rounded do feed x to uniform random reservoir sampling of size k set randoms to this sample set rest to all x with probability > if π ends then go to Alg 4 Algorithm 4:

Stream End Adjustment, sum < k set P to top S rest set c to k − sum | π | set p i to p i + c, ∀ p i ∈ P for e ∈ randoms do set e to (cid:16) k − P p i ∈ P p i (cid:17) / | randoms | while ∃ p i > in P do set c to ( p i − / ( | π | − set p j to p j + c, ∀ p j ∈ P \ { p i } set e to e + c · ( | π | − | P | ) / | randoms | set p i to 1 output EMR( P S randoms , 1 ) Algorithm 5:

Transition Step Streaming Update, sum + s ≥ k and sum < k set sum to sum + s add top , rest to pending set top , rest , randoms to empty set scale to ksum set pending to ( pending S s ) · scale set pending to all x ∈ EMR( pending , ) with x > if π ends then output pending .4 After k Algorithm 6:

Streaming Update when sum ≥ k // scale is the product of previous incremental weights, and renormalizes the incoming s set sum to sum + s set incr to sum − ssum set s to s · scale set pending to ( pending S s ) · incr set scale to scale · incr set pending to all x ∈ EMR( pending , ) with x > if π ends then output pending Lemma 4.

Given a list L of candidates, let p i denote the probabilities that the i ’th candidate is selected by theStreaming Cohort Selection Algorithm acting on L as a stream, and let q i denote the same for Alg 2 acting on L asan oﬄine array. Then, p i = q i ∀ i (6) Proof.

Let the process outlined in Algs 3, 4, 5, 6 be denoted A and let A have probability p i of selecting candidate i . Let Alg. 2 be denoted B with probability q i selecting candidate i . A never accepts candidates until the end of π ,thus there are two major cases.1. π ends with sum ≥ k . Thus we ﬁnished after using Alg 6; this procedure selects all candidates who were in pending at the end of the stream. A candidate i is added to pending in one of three ways:(a) Candidate i was in top when the sum reached k . At each round, they must survive EMR, which byLemma 2 happens with exactly their marginal probability. Therefore we only need to consider the eﬀectof the incremental scaling adjustments.First, let scale i , sum i , incr i represent the value of these variables at some step i .Assume we entered Alg 5 and initialized scale at some step j : scale j = ksum j scale j +1 = scale j · sum j +1 − s j +1 sum j +1 = ksum j · sum j sum j +1 = ksum j +1 Thus we can see by induction that scale i = ksum i , ∀ i . Furthermore, we can express the ﬁnal value of asingle element s i , which by deﬁnition is at position i in the stream, as: p i = s i · scale i · | π | Y t = i +1 incr t = s i · ksum i · | π | Y t = i +1 sum t − s t sum t = s i · ksum i · | π | Y t = i +1 sum t − sum t = s i · ksum | π | Thus we see that the ﬁnal p i is adjusted exactly the same as it would be in the oﬄine case.11b) Candidate i was added to rest when the sum reached k . This means there was less than k total probabilityin top , excluding i . First, we know s i ≤ − α , as follows: every candidate in top is at least as large as s i (who is in rest ). There are ⌈ k/α ⌉ top candidates. If s i > − α , then the total probability in top is ⌈ k/α ⌉ · (1 − α ) > k , which is a contradiction.Candidate i joined rest using EMR up to 1 − α , which respects its marginal. Then it was added to pending via EMR up to 1, which again respects the original marginal. As in Case 1a, their chance ofbeing selected is then p i = k s i sum which is the same as Alg 2.(c) Candidate i was encountered when sum ≥ k . This is the same as 1a.2. The stream π ends with sum < k . A has three arrays at this point: top (the greatest ⌈ k/α ⌉ elements in π ), rest (at most ⌈ k/α ⌉ elements rounded to 1 − α ) and randoms (a randomly selected group of k candidates,disjoint from top and rest ).No acceptances are made in (Alg 3) until π ends. The top candidates are placed into top , and elements areadded to rest via EMR, which preserves original marginal probabilities exactly by Lemma 2. The only changein probabilities then comes from Alg 4.For the purpose of proof, consider a conceptual algorithm A ′ that behaves exactly the same, except insteadof randoms it has zeros , a list of all candidates outside of top and rest . This conceptual algorithm ends bycollecting all of the mass in zeros into a set of k random candidates, which becomes the equivalent of the list randoms in A, and performing a ﬁnal EMR step to select the outputs.Each member of top , rest , and zeros is given k − sum | π | , and any overﬂow is redistributed evenly. Any candidate i now has the same probability of selection by both A ′ and Alg 2: s i + α i , and thus Alg A ′ , top , rest , and zeros are given the exact same treatment they would receive in the oﬄine algorithm.The only diﬀerence between A and A ′ is that, before the ﬁnal step of selecting candidates by EMR, A ′ selects k random elements from zeros , and evenly distributes the mass of zeros across these k elements, therebyconstructing a list equivalent to randoms . This step does not change the marginal probabilities of any elementin zeros , because they have a uniform probability before and afterwards and we did not add or subtract mass. A ′ then selects candidates using EMR on the entire list top S rest S randoms , which does not aﬀect theadjusted marginals s i + α i . Thus p i = s i + α i = q i ∀ i . Theorem 3.

The Streaming Cohort Selection Algorithm A denoted by Algs 3, 4, 5, 6 is individually fair.Proof. We have shown Alg 2 to be individually fair. By Lemma 4, A has the same outcome as Alg 2, thus A mustalso be individually fair. Theorem 4.

The Streaming Cohort Selection Algorithm A denoted by Algs 3, 4, 5, 6 achieves optimal utility forFair Useful Cohort Selection.Proof. We have shown Alg 2 achieves optimal utility for Fair Cohort Selection. By lemma 4, A has the same outcomeas Alg 2, thus A must also achieve optimal utility. Theorem 5.

The Streaming Cohort Selection Algorithm A denoted by Algs 3, 4, 5, 6 keeps no more than O ( k ) candidates pending.Proof. Any candidate not in one of: top , rest , randoms , pending is considered rejected. The sizes of top and randoms are explicitly bounded by ⌈ k/α ⌉ . rest is not bounded in size explicitly, but note that the array is maintained byconstantly applying EMR( rest , 1 − α ). If we ever have more than ⌈ k/ (1 − α ) ⌉ elements, we have sum ≥ (cid:24) k − α (cid:25) (1 − α ) > k This goes to the sum ≥ k case, which no longer adds elements to rest , thus it is bounded in length by ⌈ k/ (1 − α ) ⌉ . pending is created initially by adding top and rest ( ⌈ k/ (1 − α ) ⌉ + ⌈ k/α ⌉ pending), but for subsequent steps is neverlonger than k by Lemma 2. Thus the worst we can do is remain under a sum of k for the entire stream. top , rest , and randoms will all be kept pending until the end of the stream, but still we have at most ⌈ k/α ⌉ + ⌈ k/ (1 − α ) ⌉ + ⌈ k/α ⌉ = O ( k ) candidates pending. 12 Conclusion

Quantifying and optimizing the utility of fair algorithms is an important step for increasing adoption of fair algorithmsrequires considering the utility of the algorithm. Our key contribution is to solve the streaming cohort selectionproblem both with respect to fairness and with respect to utility, while allowing only O ( k ) candidates to remainpending. An immediate open question following our work is to develop an algorithm with exactly k pending candidates.In a broader perspective, as demonstrated in this problem, there is a strong connection between fair compositionalgorithms and dependent rounding algorithms in approximation algorithms. Perhaps this connection can be usedfor other composition settings.This work focuses only on Individual Fairness, but other deﬁnitions of fairness that could be considered. Forexample, using Group Fairness (Conditional Parity), discussed in [1], would change the nature of this problem byremoving the strict linear constraint imposed by Individual Fairness. With this deﬁnition decisions could be madesooner in the online case and potentially with better utility. Symbol Semantics k number of candidates to select n length of input list of candidates U universe of possible candidates C a classiﬁer C : U → { , } W, w i the ith candidate from candidate set W~x utilities for candidate set

W~s s i = P r [ C ( w i ) = 1] p i algorithm’s probability of selecting candidate w i α for P S < k , max constant increase any candidate can receive α i the actual constant increase w i receives in a candidate pool, α i ≤ α Table 2: Notation and symbols

References [1] C. Dwork and C. Ilvento, “Fairness under composition,”

CoRR , vol. abs/1806.06122, 2018.[2] G. N. Rothblum and G. Yona, “Probably approximately metric-fair learning,”

CoRR , vol. abs/1803.03242, 2018.[3] L. T. Liu, S. Dean, E. Rolf, M. Simchowitz, and M. Hardt, “Delayed impact of fair machine learning,”

CoRR ,vol. abs/1803.04383, 2018.[4] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. S. Zemel, “Fairness through awareness,”

CoRR ,vol. abs/1104.3913, 2011.[5] R. Jayaram, G. Sharma, S. Tirthapura, and D. P. Woodruﬀ, “Weighted reservoir sampling from distributedstreams,”

CoRR , vol. abs/1904.04126, 2019.[6] A. Bower, S. N. Kitchen, L. Niss, M. J. Strauss, A. Vargas, and S. Venkatasubramanian, “Fair pipelines,”

CoRR ,vol. abs/1707.00391, 2017.[7] C. Dwork, “Diﬀerential privacy,” in

Automata, Languages and Programming (M. Bugliesi, B. Preneel, V. Sassone,and I. Wegener, eds.), (Berlin, Heidelberg), pp. 1–12, Springer Berlin Heidelberg, 2006.[8] M. K. Lee, “Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response toalgorithmic management,”

Big Data & Society , vol. 5, no. 1, p. 2053951718756684, 2018.139] A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,”

Big Data , vol. 5, no. 2, 2017.[10] D. Gale and L. S. Shapley, “College admissions and the stability of marriage,”