[PDF] Algorithmic Monoculture and Social Welfare

Abstract

As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. Here we show that the dangers of algorithmic monoculture run much deeper, in that monocultural convergence on a single algorithm by a group of decision-making agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents. Unexpected shocks are therefore not needed to expose the risks of monoculture; it can hurt accuracy even under "normal" operations, and even for algorithms that are more accurate when used by only a single decision-maker. Our results rely on minimal assumptions, and involve the development of a probabilistic framework for analyzing systems that use multiple noisy estimates of a set of alternatives.

Full PDF

AAlgorithmic Monoculture and Social Welfare

Jon Kleinberg Manish Raghavan

Abstract

As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment,lending, and other domains, concerns have been raised about the eﬀects of algorithmic monoculture , inwhich many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture,where a monocultural system runs the risk of severe harm from unexpected shocks. Here we show thatthe dangers of algorithmic monoculture run much deeper, in that monocultural convergence on a singlealgorithm by a group of decision-making agents, even when the algorithm is more accurate for any oneagent in isolation, can reduce the overall quality of the decisions being made by the full collection ofagents. Unexpected shocks are therefore not needed to expose the risks of monoculture; it can hurtaccuracy even under “normal” operations, and even for algorithms that are more accurate when used byonly a single decision-maker. Our results rely on minimal assumptions, and involve the development of aprobabilistic framework for analyzing systems that use multiple noisy estimates of a set of alternatives.

The rise of algorithms used to shape societal choices has been accompanied by concerns over monoculture —the notion that choices and preferences will become homogeneous in the face of algorithmic curation. Oneof many canonical articulations of this concern was expressed in the New York Times by Farhad Manjoo,who wrote, “Despite the barrage of choice, more of us are enjoying more of the same songs, movies andTV shows” [15]. Because of algorithmic curation, trained on collective social feedback [20], our choices areconverging.When we move from the inﬂuence of algorithms on media consumption and entertainment to theirinﬂuence on high-stakes screening decisions about whom to oﬀer a job or whom to oﬀer a loan, the concernsabout algorithmic monoculture become even starker. Even if algorithms are more accurate on a case-by-case basis, a world in which everyone uses the same algorithm is susceptible to correlated failures whenthe algorithm ﬁnds itself in adverse conditions. This type of concern invokes an analogy to agriculture,where monoculture makes crops susceptible to the attack of a single pathogen [18]; the analogy has becomea mainstay of the computer security literature [3], and it has recently become a source of concern aboutscreening decisions for jobs or loans as well. Discussing the post-recession ﬁnancial system, Citron andPasquale write, “Like monocultural-farming technology vulnerable to one unanticipated bug, the convergingmethods of credit assessment failed spectacularly when macroeconomic conditions changed” [6].The narrative around algorithmic monoculture thus suggests a trade-oﬀ: in “normal” conditions, a moreaccurate algorithm will improve the average quality of screening decisions, but when conditions changethrough an unexpected shock, the results can be dramatically worse. But is this trade-oﬀ genuine? In theabsence of shocks, does monocultural convergence on a single, more accurate screening algorithm necessarilylead to better average outcomes?In this work, we show that algorithmic monoculture poses risks even in the absence of shocks. Weinvestigate a model involving minimal assumptions, in which two competing ﬁrms can either use their ownindependent heuristics to perform screening decisions or they can use a more accurate algorithm that isaccessible to both of them. (Again, we think of screening job applicants or loan applicants as a motivatingscenario.) We ﬁnd that even though it would be rational for each ﬁrm in isolation to adopt the algorithm, itis possible for the use of the algorithm by both ﬁrms to result in decisions that are worse on average. Thisin turn leads, in the language of game theory, to a type of “Braess’ paradox” [5] for screening algorithms:the introduction of a more accurate algorithm can drive the ﬁrms into a unique equilibrium that is worse forsociety than the one that was present before the algorithm existed.1 a r X i v : . [ c s . G T ] J a n ote that the harm here is to overall performance. Another common concern about algorithmic mono-culture in screening decisions is the harm it can cause to speciﬁc individuals: if all employers or lenders usethe same algorithm for their screening decisions, then particular applicants might ﬁnd themselves lockedout of the market when this shared algorithm doesn’t like their application for some reason. While this isclearly also a signiﬁcant concern, our results show that it would be a mistake to view the harm to particularapplicants as necessarily balanced against the gains in overall accuracy — rather, it is possible for algorithmicmonoculture to cause harm not just to particular applicants but also to the average quality of decisions aswell.Our results thus have a counterintuitive ﬂavor to them: if an algorithm is clearly more accurate thanthe alternatives when one entity uses it, why does the accuracy become worse than the alternatives whenmultiple entities use it? The analysis relies on deriving some novel probabilistic properties of rankings,establishing that when we are constructing a ranking from a probability distribution representing a “noisy”version of a true ordering, we can sometimes achieve less error through an incremental construction of theranking — building it one element at a time — than we can by constructing it in a single draw from thedistribution. We now set up the basic model, and then frame the probabilistic questions that underpin itsanalysis.Figure 1: Ranking candidates by algorithmically generated scores (Source: https://business.linkedin.com/talent-solutions/blog/recruiting-strategy/2018/the-new-way-companies-are-evaluating-candidates-soft-skills-and-discovering-high-potential-talent To instantiate the ideas introduced thus far, we’ll focus on the case of algorithmic hiring, where recruitersmake decisions based in part on scores or recommendations provided by data-driven algorithms. In thissetting, we’ll propose and analyze a stylized model of algorithmic hiring with which we can begin to investigatethe eﬀects of algorithmic monoculture.Informally, we can think of a simpliﬁed hiring process as follows: rank all of the candidates (see Fig-ure 1) and select the ﬁrst available one. We suppose that each ﬁrm has two options to form this ranking:either develop their own, private ranking (which we will refer to as using a “human evaluator”), or use analgorithmically produced ranking. We assume that there is a single vendor of algorithmic rankings, so allﬁrms choosing to use the algorithm receive the same ranking. The ﬁrms proceed sequentially, each hiring2heir favorite remaining candidate according to the ranking they’re using—human-generated or algorithmic.Thus, the we can frame the eﬀects of monoculture as follows: are ﬁrms better oﬀ using the more accurate,common algorithm, or should they instead employ their own less accurate, but private, evaluations?In what follows, we’ll introduce a formal model of evaluation and selection, using it to analyze a settingin which ﬁrms seek to hire candidates.

More formally, we model the n candidates as having intrinsic values x , . . . , x n , where any employer wouldderive utility x i from hiring candidate i . Throughout the paper, we assume without loss of generality that x > x > · · · > x n . These values, however, are unknown to the employer; instead, they must use somenoisy procedure to rank the candidates. We model such a procedure as a randomized mechanism R thattakes in the true candidate values and draws a permutation π over those candidates from some distribution.Our main results hold for families of distributions over permutations as deﬁned below: Deﬁnition 1 (Noisy permutation family) . A noisy permutation family F θ is a family of distributions overpermutations that satisﬁes the following conditions for any θ > and set of candidates x :1. (Diﬀerentiability) For any permutation π , Pr F θ [ π ] is continuous and diﬀerentiable in θ .2. (Asymptotic optimality) For the true ranking π ∗ , lim θ →∞ Pr F θ [ π ∗ ] = 1 .3. (Monotonicity) For any (possibly empty) S ⊂ x , let π ( − S ) be the partial ranking produced by removingthe items in S from π . Let π ( − S )1 denote the value of the top-ranked candidate according to π ( − S ) . Forany θ (cid:48) > θ , E F θ (cid:48) (cid:104) π ( − S )1 (cid:105) ≥ E F θ (cid:104) π ( − S )1 (cid:105) . (1) Moreover, for S = ∅ , (1) holds with strict inequality. θ serves as an “accuracy parameter”: for large θ , the noisy ranking converges to the true ranking overcandidates. The monotonicity condition states that a higher value of θ leads to a better ﬁrst choice, evenif some of the candidates are removed after ranking. Removal after ranking (as opposed to before) isimportant because some of the ranking models we will consider later do not satisfy Independence of IrrelevantAlternatives. Examples of noisy permutation families include Random Utility Models [23] and the MallowsModel [14], both of which we will discuss in detail later.As an objective function to evaluate the eﬀects of diﬀerent approaches to ranking and selection, we’llconsider each individual employer’s utility as well as the sum of employers’ utilities. We think of this lattersum as the social welfare , since it represents the total quality of the applicants who are hired by any ﬁrm.(For example, if all ﬁrms deterministically used the correct ranking, then the top applicants would be theones hired, leading to the highest possible social welfare.) Each ﬁrm in our model has access to the same underlying pool of n candidates, which they rank usinga randomized mechanism R to get a permutation π as described above. Then, in a random order, eachﬁrm hires the highest-ranked remaining candidate according to their ranking. Thus, if two ﬁrms both rankcandidate i ﬁrst, only one of them can hire i ; the other must hire the next available candidate according totheir ranking. In our model, candidates automatically accept the oﬀer they get from a ﬁrm. For the sakeof simplicity, throughout this paper, we restrict ourselves to the case where there are two ﬁrms hiring onecandidate each, although our model readily generalizes to more complex cases.As described earlier, each ﬁrm can choose to use either a private “human evaluator” or an algorithmicallygenerated ranking as its randomized mechanism R . We assume that both candidate mechanisms come froma noisy permutation family F θ , with diﬀering values of the accuracy parameter θ : human evaluators all havethe same accuracy θ H , and the algorithm has accuracy θ A . However, while the human evaluator produces aranking independent of any other ﬁrm, the algorithmically generated ranking is identical for all ﬁrms who3hoose to use it. In other words, if two ﬁrms choose to use the algorithmically generated ranking, they willboth receive the same permutation π .The choice of which ranking mechanism to use leads to a game-theoretic setting: both ﬁrms know theaccuracy parameters of the human evaluators ( θ H ) and the algorithm ( θ A ), and they must decide whether touse a human evaluator or the algorithm. This choice introduces a subtlety: for many ranking models, a ﬁrm’srational behavior depends not only on the accuracy of the ranking mechanism, but also on the underlyingcandidate values x , . . . , x n . Thus, to fully specify a ﬁrm’s behavior, we assume that x , . . . , x n are drawnfrom a known joint distribution D . Our main results will hold for any D , meaning they apply even when thecandidate values (but not their identities) are deterministically known. Our main result is a pair of intuitive conditions under which a Braess’ Paradox-style result occurs—in otherwords, conditions under which there are accuracy parameters for which both ﬁrms rationally choose to usethe algorithmic ranking, but social welfare (and each individual ﬁrm’s utility) would be higher if both ﬁrmsused independent human evaluators. Recall that the two ﬁrms hire in a random order. For a permutation π , let π i denote the value of the i th-ranked candidate according to π .We ﬁrst state the two conditions, and then the theorem based on them. Deﬁnition 2 (Preference for the ﬁrst position.) . A candidate distribution D and noisy permutation family F θ exhibits a preference for the ﬁrst position if for all θ > , if π, σ ∼ F θ , E [ π − π | π (cid:54) = σ ] > . In other words, for any θ >

0, suppose we draw two permutations π and σ independently from F θ , andsuppose that the ﬁrst-ranked candidates diﬀer in π and σ . Then the expected value of the ﬁrst-rankedcandidate in π is strictly greater than the expected value of the second-ranked candidate in π . Deﬁnition 3 (Preference for weaker competition.) . A candidate distribution D and noisy permutation family F θ , exhibits a preference for weaker competition if the following holds: for all θ > θ , σ ∼ F θ and π, τ ∼ F θ , E (cid:104) π ( −{ σ } )1 (cid:105) < E (cid:104) π ( −{ τ } )1 (cid:105) . Intuitively, suppose we have a higher accuracy parameter θ and a lower accuracy parameter θ < θ ; wedraw a permutation π from F θ ; and we then derive two permutations from π : π ( −{ σ } ) obtained by deletingthe ﬁrst-ranked element of a permutation σ drawn from the more accurate distribution F θ , and π ( −{ τ } ) obtained by deleting the ﬁrst-ranked element of a permutation τ drawn from the less accurate distribution F θ .Then the expected value of the ﬁrst-ranked candidate in π ( −{ τ } ) is strictly greater than the expectedvalue of the ﬁrst-ranked candidate in π ( −{ σ } ) — that is, when a random candidate is removed from π , thebest remaining candidate is better in expectation when the randomly removed candidate is chosen based ona noisier ranking.Using these two conditions, we can state our theorem. Theorem 1.

Suppose that a given candidate distribution D and noisy permutation family F θ satisfy Deﬁ-nition 2 (preference for the ﬁrst position) and Deﬁnition 3 (preference for weaker competition).Then, for any θ H , there exists θ A > θ H such that using the algorithmic ranking is a strictly dominantstrategy for both ﬁrms, but social welfare would be higher if both ﬁrms used human evaluators. Before we prove Theorem 1, we provide some intuition for the two conditions in Deﬁnitions 2 and 3. Thesecond condition essentially says that it is better to have a worse competitor: the ﬁrm randomly selected4o hire second is better oﬀ if the ﬁrm that hires ﬁrst uses a less accurate ranking (in this case, a humanevaluator instead of the algorithmic ranking).The ﬁrst condition states that when two identically distributed permutations disagree on their ﬁrstelement, the ﬁrst-ranked candidate according to either permutation is still better, in expectation, than thesecond-ranked candidate according to either permutation. In what follows, we’ll demonstrate that thiscondition implies that ﬁrms in our model rationally prefer to make decisions using independent (but equallyaccurate) rankings.To do so, we need to introduce some notation. Recall that the two ﬁrms hire in a random order. Givena candidate distribution D , let U s ( θ A , θ H ) denote the expected utility of the ﬁrst ﬁrm to hire a candidatewhen using ranking s , where s ∈ { A, H } is either the algorithmic ranking or the ranking generated by ahuman evaluator respectively. Similarly, let U s s ( θ A , θ H ) be the expected utility of the second ﬁrm to hiregiven that the ﬁrst ﬁrm used strategy s and the second ﬁrm uses strategy s , where again s , s ∈ { A, H } .Finally, let π, σ ∼ F θ .In what follows, we will show that for any θ , E [ π − π | π (cid:54) = σ ] > ⇐⇒ U AH ( θ, θ ) > U AA ( θ, θ ) . (2)In other words, whenever a ranking model meets Deﬁnition 2, the ﬁrm chosen to select second will prefer touse an independent ranking mechanism from it’s competitor, given that the ranking mechanisms are equallyaccurate.First, we can write U AH ( θ A , θ H ) = E [ π · π (cid:54) = σ + π · π = σ ] U AA ( θ A , θ H ) = E [ σ ]= E [ σ · π (cid:54) = σ + σ · π = σ ]Thus, U AH ( θ A , θ H ) − U AA ( θ A , θ H )= E [( π − σ ) · π (cid:54) = σ + ( π − σ ) · π = σ ] . Conditioned on either π = σ or π (cid:54) = σ , π and σ are identically distributed and therefore have equalexpectations. As a result, U AH ( θ A , θ H ) − U AA ( θ A , θ H ) = E [( π − π ) · π (cid:54) = σ ] , (3)which implies (2). Thus, whenever a ranking model meets Deﬁnition 2, ﬁrms rationally prefer independentassessments, all else equal.To provide some intuition for what this preference for independence entails, consider a setting wherea hiring committee seeks to hire two candidates. They meet, produce a ranking σ , and hire σ (the bestcandidate according to σ ). Suppose they have the option to either hire σ or reconvene the next day to forman independent ranking π and hire the best remaining candidate according to π ; which option should theychoose? It’s not immediately clear why one option should be better than the other. However, wheneverDeﬁnition 2 is met, the committee should prefer to reconvene and make their second hire according to anew ranking π . After proving Theorem 1, we will provide natural ranking models that meet Deﬁnition 2,implying that under these ranking models, independent re-ranking can be beneﬁcial. With this intuition, we are ready to prove Theorem 1.

Proof of Theorem 1.

For given values of θ A and θ H , using the algorithmic ranking is a strictly dominantstrategy as long as U A ( θ A , θ H ) + U AA ( θ A , θ H ) > U H ( θ A , θ H ) + U AH ( θ A , θ H ) (4) U A ( θ A , θ H ) + U HA ( θ A , θ H ) > U H ( θ A , θ H ) + U HH ( θ A , θ H ) (5)5ote that (5) is always true for θ A > θ H by the monotonicity assumption on F θ : U A ( θ A , θ H ) ≥ U H ( θ A , θ H )because a more accurate ranking produces a top-ranked candidate with higher expected value, and U HA ( θ A , θ H ) ≥ U HH ( θ A , θ H ) because this holds even conditioned on removing any candidate from the pool (in this case, thecandidate randomly selected by the ﬁrm that hires ﬁrst). Crucially, in (5), the ﬁrst ﬁrm’s random selectionis independent from the second ﬁrm’s selection; the same logic could not be used to argue that (4) alwaysholds for θ A ≥ θ H . Moreover, when θ A > θ H , U A ( θ A , θ H ) > U H ( θ A , θ H ) by the monotonicity assumption,meaning (5) holds.Let W s s ( θ A , θ H ) denote social welfare when the two ﬁrms employ strategies s , s ∈ { A, H } . Then,when both ﬁrms use the algorithmic ranking, social welfare is W AA ( θ A , θ H ) = U A ( θ A , θ H ) + U AA ( θ A , θ H ) . By (2), Deﬁnition 2 implies that for any θ , U AA ( θ, θ ) < U AH ( θ, θ ), implying U A ( θ H , θ H ) + U AA ( θ H , θ H ) U H (ˆ θ A , θ H ) + U AH (ˆ θ A , θ H ) . Note that U s ( θ A , θ H ) and U s s ( θ A , θ H ) are continuous with respect to θ A for any s , s ∈ { A, H } sincethey are expectations over discrete distributions with probabilities that are by assumption diﬀerentiable withrespect to θ A . Therefore, by the Diﬀerentiability assumption on F θ from Deﬁnition 1, there is some θ ∗ A > θ H such that U A ( θ ∗ A , θ H ) + U AA ( θ ∗ A , θ H ) = U H ( θ ∗ A , θ H ) + U AH ( θ ∗ A , θ H ) , (6)i.e., given that its competitor uses the algorithmic ranking, a ﬁrm is indiﬀerent between the two strategies.For such θ ∗ A , using the algorithmic ranking is still a weakly dominant strategy. By deﬁnition of W AA , W AA ( θ ∗ A , θ H ) = U H ( θ ∗ A , θ H ) + U AH ( θ ∗ A , θ H ) . If both ﬁrms had instead used human evaluators, social welfare would be W HH ( θ ∗ A , θ H ) = U H ( θ ∗ A , θ H ) + U HH ( θ ∗ A , θ H ) . By Deﬁnition 3, for σ ∼ F θ A ∗ and π, τ ∼ F θ H , E (cid:104) π ( −{ σ } )1 (cid:105) < E (cid:104) π ( −{ τ } )1 (cid:105) . Note that U AH ( θ ∗ A , θ H ) = E (cid:104) π ( −{ σ } )1 (cid:105) U HH ( θ ∗ A , θ H ) = E (cid:104) π ( −{ τ } )1 (cid:105) Thus, Deﬁnition 3 implies that for θ A ∗ > θ H , U HH ( θ ∗ A , θ H ) > U AH ( θ ∗ A , θ H ). As a result for θ A ∗ > θ H , usingthe algorithmic ranking is a weakly dominant strategy, but W HH ( θ ∗ A , θ H ) = U H ( θ ∗ A , θ H ) + U HH ( θ ∗ A , θ H ) > U H ( θ ∗ A , θ H ) + U AH ( θ ∗ A , θ H )= U A ( θ ∗ A , θ H ) + U AA ( θ ∗ A , θ H )= W AA ( θ ∗ A , θ H ) , meaning social welfare would have been higher had both ﬁrms used human evaluators.6e can show that this eﬀect persists for a value θ (cid:48) A such that using the algorithmic ranking is a strictly dominant strategy. Intuitively, this is simply by slightly increasing θ ∗ A so the algorithmic ranking is strictlydominant. For ﬁxed θ H , deﬁne f ( θ A ) = U A ( θ A , θ H ) + U AA ( θ A , θ H ) g ( θ A ) = U H ( θ A , θ H ) + U AH ( θ A , θ H ) h ( θ A ) = U H ( θ A , θ H ) + U HH ( θ A , θ H )Because (5) always holds for θ A > θ H , it suﬃces to show that there exists θ (cid:48) A such that g ( θ (cid:48) A ) < f ( θ (cid:48) A ) θ H . By theoptimality assumption of Deﬁnition 1, there exists suﬃciently large ˆ θ A such that f (ˆ θ A ) > g (ˆ θ A ). Recallthat by deﬁnition of θ ∗ A , f ( θ ∗ A ) = g ( θ ∗ A ). Both f and g are continuous by the Diﬀerentiability assumption inDeﬁnition 1. Thus, there must exist some θ (cid:48) A > θ ∗ A such that g ( θ (cid:48) A ) < f ( θ (cid:48) A ) < h ( θ (cid:48) A ). This means that for θ (cid:48) A , using the algorithmic ranking is a strictly dominant strategy, but social welfare would still be larger ifboth ﬁrms used human evaluators. Thus far, we have described a general set of conditions under which algorithmic monoculture can lead toa reduction in social welfare. Under which ranking models do these conditions hold? In the remainderof this paper, we instantiate the model with two well-studied ranking models: Random Utility Models(RUMs) [23] and the Mallows Model [14]. While RUMs do not always satisfy Deﬁnitions 2 and 3, theydo under some realistic parameterizations, regardless of the candidate distribution D . Under the MallowsModel, Deﬁnitions 2 and 3 are always met, meaning that for any candidate distribution D and humanevaluator accuracy θ H , there exists an accuracy parameter θ A such that a common algorithmic ranking withaccuracy θ A decreases social welfare.Figure 2: U AH ( θ, θ ) − U AA ( θ, θ ) for three noise models with n candidates whose utilities are drawn from auniform distribution with unit variance for n = 3, n = 5, and n = 15. Note that for n = 15, U AH ( θ, θ ) − U AA ( θ, θ ) < In Random Utility Models, the underlying candidate values x i are perturbed by independent and identicallydistributed noise ε i ∼ E , and the perturbed values are ranked to produce π . Originally conceived in thepsychology literature [23], this model has been well-studied over nearly a century, [7, 4, 9, 24, 16, 22],including more recently in the computer science and machine learning literature [2, 1, 19, 25, 13].First, we must deﬁne a family of RUMs that satisﬁes the conditions of Deﬁnition 1. Assume without loss ofgenerality that the noise distribution E has unit variance. Then, consider the family of RUMs parameterized7igure 3: Regions for diﬀerent equilibria. When human evaluators are more accurate than the algorithm,both ﬁrms decide to employ humans (HH). When the algorithm is signiﬁcantly more accurate, both ﬁrmsuse the algorithm (AA). When the algorithm is slightly more accurate than human evaluators, two possibleequilibria exist: (1) one ﬁrm uses the algorithm and the other employs a human (AH), or (2) both decidewhether to use the algorithm with some probability p . The shaded portion of the green AA region depictswhere social welfare is smaller at the AA equilibrium than it would be if both ﬁrms used human evaluators.by θ in which candidates are ranked according to x i + ε i θ . By this deﬁnition, the standard deviation of thenoise for a particular value of θ is simply 1 /θ . Intuitively, larger values of θ reduce the eﬀect of the noise,making the ranking more accurate. In Theorem 5 in Appendix A, we show as long as the noise distribution E has positive support on ( −∞ , ∞ ), this deﬁnition of F θ meets the diﬀerentiability, asymptotic optimality,and monotonicity conditions in Deﬁnition 1. For distributions with ﬁnite support, many of our results canbe generalized by relaxing strict inequalities in Deﬁnition 1 and Theorem 1 to weak inequalities.Because RUMs are notoriously diﬃcult to work with analytically, we restrict ourselves to the case where n = 3, i.e., there are 3 candidates. Under this restriction, we can show that for Gaussian and Laplacian noisedistributions, Deﬁnition 2 and 3 — the two conditions of Theorem 1 — are met, regardless of the candidatedistribution D . We defer the proof to Appendix C. Theorem 2.

Let F θ be the family of RUMs with either Gaussian or Laplacian noise with standard deviation /θ . Then, for any candidate distribution D over 3 candidates, the conditions of Theorem 1 are satisﬁed. It might be tempting to generalize Theorem 2 to other distributions and more candidates; however,certain noise and candidate distributions violate the conditions of Theorem 1. Even for 3-candidate RUMs,there exist distributions for which each of the conditions is violated; we provide such examples in Appendix B.Moreover, while Gaussian and Laplacian distributions provably meet Deﬁnitions 2 and 3 with only 3candidates, this doesn’t necessarily extend to larger candidate sets. Figure 2 shows that Deﬁnition 2 can beviolated under a particular candidate distribution D for Laplacian noise with 15 candidates. This challengesthe intuition that independence is preferable—under some conditions, it can actually better in expectationfor a ﬁrm to use the same algorithmic ranking as its competitor, even if an independent human evaluatoris equally accurate overall. Unlike Theorem 2, which applies for any candidate distribution D , certain noisemodels may violate Deﬁnition 2 only for particular D . It is an open question as to whether Theorem 2 canbe extended to larger numbers of candidates under Gaussian noise.Finally, there exist noise distributions that violate Deﬁnition 2 for any candidate distribution D . In8articular, the RUM family deﬁned by the Gumbel distribution is well-known to be equivalent to the Plackett-Luce model of ranking, which is generated by sequentially selecting candidate i with probabilityexp( θx i ) (cid:80) j ∈ S exp( θx j ) , (7)where S is the set of remaining candidates [12, 4]. Under the Plackett-Luce model, for any θ , U AH ( θ, θ ) = U AA ( θ, θ ). To see this, suppose the ﬁrm that hires ﬁrst selects candidate i ∗ . Then, the ﬁrm that hires secondgets each candidate i with probability given by (7) with S = { , . . . , n }\ i ∗ . As a result, by (3), if π, σ ∼ F θ , E [ π − π | π (cid:54) = σ ] = 0for any candidate distribution D , meaning the Plackett-Luce model never meets Deﬁnition 2. Thus, underthe Plackett-Luce model, monoculture has no eﬀect—the optimal strategy is always to use the best availableranking, regardless of competitors’ strategies.Given the analytic intractability of most RUMs, it might appear that testing the conditions of Theorem 1,especially for a particular noise and candidate distributions, may not be possible; however, they can beeﬃciently tested via simulation: as long as the noise distribution E and the candidate distribution D canbe sampled from, it is possible to test whether the conditions of Theorem 1 are satisﬁed. Thus, even if theconditions of Theorem 1 are not met for every candidate distribution D , it is possible to eﬃciently determinewhether they are met for any particular D .It is also interesting to ask about the magnitude of the negative impact produced by monoculture. Ourmodel allows for the qualities of candidates to be either positive or negative (capturing the fact that aworker’s productivity can be either more or less than their cost to the ﬁrm in wages); using this, we canconstruct instances of the model in which the optimal social welfare is positive but the welfare under the(unique) monocultural equilibrium implied by Theorem 1 is negative. This is a strong type of negative result,in which sub-optimality reverses the sign of the objective function, and it means that in general we cannotcompare the optimum and equilibrium by taking a ratio of two non-negative quantities, as is standard in Price of Anarchy results. However, as a future direction, it would be interesting to explore such Price ofAnarchy bounds in special cases of the problem where structural assumptions on the input are suﬃcient toguarantee that the welfare at both the social optimum and the equilibrium are non-negative. As one simpleexample, if the qualities for three candidates are drawn independently from a uniform distribution centeredat 0, and the noise distribution is Gaussian, then there exist parameters θ A > θ H such that expected socialwelfare at the equilibrium where both ﬁrm use the algorithmic ranking is non-negative, and approximately4% less than it would be had both ﬁrms used human evaluators instead. The Mallows Model also appears frequently in the ranking literature [8, 11], and is much more analyticallytractable than RUMs. Under the Mallows Model, the likelihood of a permutation is related to its distancefrom the true ranking π ∗ : Pr[ π ] = 1 Z φ − d ( π,π ∗ ) , (8)where Z is a normalizing constant. In this model, φ > φ is, themore likely the ranking procedure is to output a ranking π that is close to the true ranking r . To instantiatethis model, we need a notion of distance d ( · , · ) over permutations. For this, we’ll use Kendall tau distance,another standard notion in the literature, which is simply the number of pairs of elements in π that areincorrectly ordered [10]. In Appendix D, we verify that the family of distributions F θ given by the MallowsModel satisﬁes Deﬁnition 1, deﬁning θ = φ − θ is well-deﬁned on (0 , ∞ )).In contrast to RUMs, the Mallows Model always satisﬁes the conditions of Theorem 1 for any candidatedistribution D , which we prove in Appendix E. Theorem 3.

Let F θ be the family of Mallows Model distributions with parameter θ = φ − . Then, for anycandidate distribution D , the conditions of Theorem 1 are satisﬁed. θ H , θ A ) plane under the MallowsModel. The decrease in social welfare found in Theorem 3 is depicted by the shaded portion of the greenregion labeled AA , where social welfare would be higher if both ﬁrms used human evaluators.While the result of Theorem 3 is certainly stronger than that of Theorem 2, in that it applies to allinstances of the Mallows Model without restrictions, it should be interpreted with some caution. TheMallows Model does not depend on the underlying candidate values, so according to this model, monoculturecan produce arbitrarily large negative eﬀects. While insensitivity to candidate values may not necessarily bereasonable in practice, our results hold for any candidate distribution D . Thus, to the extent that the MallowsModel can reasonably approximate ranking in particular contexts, our results imply that monoculture canhave negative welfare eﬀects. Our main focus in this work has been on models with two competing ﬁrms. However, it is also interesting toconsider the case of more than two ﬁrms; we will see that the complex and sometimes counterintuitive eﬀectsthat we found in the two-ﬁrm case are further enriched by additional phenomena. Primarily, we will presentthe result of computational experiments with the model, exposing some fundamental structural propertiesin the multi-ﬁrm problem for which a formal analysis remains an intriguing open problem. For concreteness,we will focus on a model in which rankings are drawn from the Mallows model. As before, each ﬁrm mustchoose to order candidates according to either an independent, human-produced ranking or an algorithmicranking common to all ﬁrms who choose it. These rankings come from instances of the Mallows model withaccuracy parameters φ H and φ A respectively as deﬁned in (8). Braess’ Paradox for k > ﬁrms. First, we ask whether the Braess’ Paradox eﬀect persists with k > k > n = 4, k = 3, φ A = 2, φ H = 1 .

75, and candidatequalities are drawn from a uniform distribution on [0 , ≈ . ≈ . k > D and any value of φ H for k > Sequential decision-making.

Since the equilibrium behaviors we are studying take place in a modelwhere ﬁrms make decisions in a random order, a crucial ﬁrst step is to characterize ﬁrms’ optimal behaviorwhen making decisions sequentially —that is, when ﬁrms hire in a ﬁxed, known order as opposed to a randomorder. In this context, consider the rational behavior of each ﬁrm: given a distribution over candidate values,which ranking should each ﬁrm use? Clearly, the ﬁrst ﬁrm to make a selection should use the more accurateranking mechanism; however, as shown previously, subsequent ﬁrms’ decisions are less clear-cut. For a ﬁxednumber of ﬁrms, number of candidates, and distribution over candidate values, we can explore the ﬁrms’optimal strategies over the possible space of ( φ H , φ A ) values.An optimal choice of strategies for the k ﬁrms moving sequentially can be written as a sequence of length k made up of the symbols A and H ; the i th term in the sequence is equal to A if the i th ﬁrm to movesequentially uses the algorithm as its optimal strategy (given the choices of the previous i − H if the i th ﬁrm uses an independent human evaluation. We can therefore represent the choiceof optimal strategies, as the parameters ( φ H , φ A ) vary, by a labeling of the ( φ H , φ A )-plane: we label eachpoint ( φ H , φ A ) with the length- k sequence that speciﬁes the optimal sequence of strategies.We can make the following initial formal observation about these optimal sequences: Theorem 4.

When φ H ≥ φ A , one optimal sequence is for all ﬁrms to choose H . When φ H > φ A , theunique optimal sequence is for all ﬁrms to choose H . We prove this formally in Appendix F.1, but we provide a sketch here. When φ H ≥ φ A , the ﬁrst ﬁrmto move in sequence will simply use the more accurate strategy, and hence will choose H . Now, proceeding10igure 4: Regions for diﬀerent optimal strategy proﬁles, where each strategy proﬁle is a sequence of ‘ A ’ and‘ H ’ representing the optimal strategies of each ﬁrm sequentially. For this plot, there are 5 ﬁrms ( k = 5) and6 candidates ( n = 6) whose values are drawn from a uniform distribution. Note that when φ A is much largerthan φ H , all ﬁrms use the algorithmic ranking, but when φ A is only slightly larger than φ H , only the ﬁrstﬁrm uses the algorithmic ranking.by induction, suppose that the ﬁrst i ﬁrms have all chosen H , and consider the ( i + 1) st ﬁrm to move insequence. Regardless of whether this ﬁrm chooses A or H , it will be making a selection that is independentof the previous i selections, and hence it is optimal for it to choose H as well. Hence, by induction, it is anoptimal solution for all ﬁrms to choose H when φ H ≥ φ A . (This argument, slightly adapted, also directlyestablishes that it is uniquely optimal for all ﬁrms to choose H when φ H > φ A .)Beyond this observation, if we wish to extend to the case when φ A > φ H , the mathematical analysis of thismulti-ﬁrm model remains an open question; but it is possible to determine optimal strategies computationallyfor each choice of ( φ H , φ A ), and then to look at how these strategies vary over the ( φ H , φ A )-plane. Figure 4shows the result of doing this — producing a labeling of the ( φ H , φ A )-plane as described above — for k = 5ﬁrms and n = 6 candidates, with the values of the candidates drawn from a uniform distribution.We observe a number of interesting phenomena from this labeling of the plane. First, the region where φ H ≥ φ A is labeled with the all- H sequence, reﬂecting the argument above; for the half-plane φ A > φ H , onthe other hand, all optimal sequences begin with A , since it is always optimal for the ﬁrst ﬁrm to use themore accurate method. The labeling of the half-plane φ A > φ H becomes quite complex; in principle, anysequence over the binary alphabet { A, H } that begins with A could be possible, and in fact we see that all2 = 16 of these sequences appear as labels in some portion of the plane. This means that the sequentialchoice of optimal strategies for the ﬁrms can display arbitrary non-monotonicities in the choice of algorithmicor human decisions, with ﬁrms alternating between them; for example, even after the ﬁrst ﬁrm chooses A and the second chooses H , the third may choose A or H depending on the values ( φ H , φ A ).The boundaries of the regions labeled by diﬀerent optimal sequences are similarly complex; some of theregions (such as AAAHH ) appear to be bounded, while others (such as

AHAHA and

AHHAH ) appear toonly emerge for suﬃciently large values of φ H .Perhaps the most intriguing observation about the arrangement of regions is the following. Suppose wethink of the sequences of symbols over { A, H } as binary representations of numbers, with A corresponding tothe binary digit 1 and H correspnding to the binary digit 0. (Thus, for example, AAAHH would correspondto the number 16 + 8 + 4 = 28, while

AHAHA would correspond to the number 16 + 4 + 1 = 21.) The11bservation is then the following: if we choose any vertical line φ H = x (for a ﬁxed x ), and we follow itupward in the plane, we encounter regions in increasing order of the numbers corresponding to their labels,in this binary representation. (First HHHHH , then

AHHHH , then

AHHHA , then

AHHAH , and soforth.)We do not know a proof for this fact, or how generally it holds, but we can verify it computationally forthe regions of the ( φ H , φ A )-plane mapped out in Figure 4, as well as similar computational experiments notshown here for other choices of k and n . This binary-counter property suggests a rich body of additionalstructure to the optimal strategies in the k -ﬁrm case, and we leave it as an open question to analyze thisstructure mathematically. Concerns about monoculture in the use of algorithms have focused on the danger of unexpected, correlatedshocks, and on the harm to particular individuals who may fare poorly under the algorithm’s decision. Ourwork here shows that concerns about algorithmic monoculture are in a sense more fundamental, in thatit is possible for monoculture to cause decisions of globally lower average quality, even in the absence ofshocks. In addition to telling us something about the pervasiveness of the phenomenon, it also suggests thatit might be diﬃcult to notice its negative eﬀects even while they’re occurring — these eﬀects can persist atlow levels even without a shock-like disruption to call our attention to them. Our results also make clearthat algorithmic monoculture in decision-making doesn’t always lead to adverse outcomes; rather, we givennatural conditions under which such outcomes become possible, and show that these conditions hold in awide range of standard models.Our results suggest a number of natural directions for further work. To begin with, we have noted earlierin the paper that it would be interesting to give more comprehensive quantitative bounds on the magnitudeof monoculture’s possible negative eﬀects in decisions such as hiring — how much worse can the quality ofcandidates be when selected with an equilibrium strategy involving shared algorithms than with a sociallyoptimal one? In formulating such questions, it will be important to take into account how the noise modelfor rankings relates to the numerical qualities of the candidates.We have also focused here on the case of two ﬁrms and a single shared algorithm that is available toboth. It would be natural to consider generalizations involving more ﬁrms and potentially more algorithmsas well. With more algorithms, we might see solutions in which ﬁrms cluster around diﬀerent algorithms ofvarying accuracies, as they balance the level of accuracy and the amount of correlation in their decisions. Itwould also be interesting to explore the ways in which correlations in ﬁrms’ decisions can be decomposedinto constituent parts, such as the use of standardized tests that form input features for algorithms, and howquantifying these forms of correlation might help ﬁrms assess their decisions.Finally, it will be interesting to consider how these types of results apply to further domains. Whilethe analysis presented here illustrates the consequences of monoculture as applied to algorithmic hiring, ourﬁndings have potential implications in a broader range of settings. Algorithmic monoculture not only leadsto a lack of heterogeneity in decision-making; by allowing valuable options to slip through the cracks — bethey job candidates, potential hit songs, or budding entrepreneurs — it reduces total social welfare, evenwhen the individual decisions are more accurate on a case-by-case basis. These concerns extend beyond theuse of algorithms; whenever decision-makers rely on identical or highly correlated evaluations, they miss outon hidden gems, and in this way diminish the overall quality of their decisions.

Acknowledgements.

This work has been supported in part by a Simons Investigator Award, a VannevarBush Faculty Fellowship, a MURI grant, an NSF Graduate Research Fellowship, and grants from the ARO,AFOSR, and the MacArthur Foundation.

References [1] Hossein Azari Souﬁani, Hansheng Diao, Zhenyu Lai, and David C Parkes. Generalized random utilitymodels with multiple types. In

Advances in Neural Information Processing Systems , pages 73–81, 2013.122] Hossein Azari Souﬁani, David C Parkes, and Lirong Xia. Random utility theory for social choice. In

Advances in Neural Information Processing Systems , pages 126–134, 2012.[3] Kenneth P Birman and Fred B Schneider. The monoculture risk put into context.

IEEE Security &Privacy , 7(1):14–17, 2009.[4] HD Block and J Marschak. Random orderings and stochastic theories of responses. In I. Olkin, S. G.Ghurye, W. Hoeﬀding, W. G. Madow, and H. B. Mann, editors,

Contributions to probability and statis-tics , pages 97–132. Stanford University Press, 1960.[5] Dietrich Braess. ¨Uber ein paradoxon aus der verkehrsplanung.

Unternehmensforschung , 12(1):258–268,1968.[6] Danielle Keats Citron and Frank Pasquale. The scored society: Due process for automated predictions.

Wash. L. Rev. , 89:1, 2014.[7] HE Daniels. Rank correlation and population models.

Journal of the Royal Statistical Society. SeriesB (Methodological) , 12(2):171–191, 1950.[8] Sanmay Das and Zhuoshu Li. The role of common and private signals in two-sided matching withinterviews. In

International Conference on Web and Internet Economics , pages 492–497. Springer,2014.[9] Harry Joe. Inequalities for random utility models, with applications to ranking and subset choice data.

Methodology and computing in Applied Probability , 2(4):359–372, 2000.[10] Maurice G Kendall. A new measure of rank correlation.

Biometrika , 30(1/2):81–93, 1938.[11] Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences.

International Con-ference on Machine Learning , 2011.[12] R Duncan Luce.

Individual choice behavior: A theoretical analysis . Wiley, 1959.[13] Rahul Makhijani and Johan Ugander. Parametric models for intransitivity in pairwise rankings. In

TheWorld Wide Web Conference , pages 3056–3062, 2019.[14] Colin L Mallows. Non-null ranking models. I.

Biometrika , 44(1/2):114–130, 1957.[15] Farhad Manjoo. This summer stinks. But at least we’ve got ‘Old Town Road.’.

New York TimesOpinion , 2019.[16] Charles F Manski. The structure of random utility models.

Theory and decision , 8(3):229, 1977.[17] John P Mills. Table of the ratio: area to bounding ordinate, for any portion of normal curve.

Biometrika ,pages 395–400, 1926.[18] JF Power and RF Follett. Monoculture.

Scientiﬁc American , 256(3):78–87, 1987.[19] Stephen Ragain and Johan Ugander. Pairwise choice markov chains. In

Advances in Neural InformationProcessing Systems , pages 3198–3206, 2016.[20] Matthew J Salganik, Peter Sheridan Dodds, and Duncan J Watts. Experimental study of inequalityand unpredictability in an artiﬁcial cultural market.

Science , 311(5762):854–856, 2006.[21] Michael R Sampford. Some inequalities on Mill’s ratio and related functions.

The Annals of Mathemat-ical Statistics , 24(1):130–132, 1953.[22] David Strauss. Some results on random utility models.

Journal of Mathematical Psychology , 20(1):35–52,1979.[23] Louis L Thurstone. A law of comparative judgment.

Psychological review , 34(4):273, 1927.1324] John I Yellott Jr. The relationship between luce’s choice axiom, thurstone’s theory of comparativejudgment, and the double exponential distribution.

Journal of Mathematical Psychology , 15(2):109–144, 1977.[25] Zhibing Zhao, Tristan Villamil, and Lirong Xia. Learning mixtures of random utility models. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence , 2018.

A Random Utility Models satisfying Deﬁnition 1

Theorem 5.

Let f be the pdf of E . The family of RUMs F θ given by ranking x i + ε i θ with ε i ∼ E satisﬁesthe conditions of Deﬁnition 1 if: • f is diﬀerentiable • f has positive support on ( −∞ , ∞ ) Proof.

We need to show that F θ satisﬁes the diﬀerentiability, asymptotic optimality, and monotonicityconditions in Deﬁnition 1. Diﬀerentiability:

The probability density of any realization of the n noise samples ε i /θ is (cid:81) ni =1 f ( ε i /θ ).Let ε = [ ε /θ, . . . , ε n /θ ] be the vector of noise values and let M ( π ) ⊆ R n be the region such that any ε ∈ M ( π )will produces the ranking π . The probability of any permutation π isPr θ [ π ] = (cid:90) M ( π ) n (cid:89) i =1 f (cid:16) ε i θ (cid:17) d n z . Because f is diﬀerentiable, ddθ f (cid:16) xθ (cid:17) = f (cid:48) (cid:16) xθ (cid:17) · (cid:16) − xθ (cid:17) Because Pr θ ( π ) is an integral of the product of diﬀerentiable functions over a ﬁxed region, it is diﬀerentiable. Asymptotic optimality:

We will show that for any pair of elements and any δ >

0, there existssuﬃciently large θ such that the probability that they incorrectly ranked is at most δ . We will conclude witha union bound over the n − θ such that theprobability of outputting the correct ranking must be at least 1 − ( n − δ .Consider two candidates x i > x i +1 . Let ν be the diﬀerence x i − x i +1 . Then, they will be correctly rankedif ε i θ > − ν ε i +1 θ < ν q and q be the 1 − δ and δ quantiles of E respectively, and let q = max( | q | , | q | ). For θ > qν ,Pr (cid:104) ε i θ < − ν (cid:105) = Pr (cid:20) ε i < − νθ (cid:21) < Pr [ ε i < − q ] ≤ Pr (cid:2) ε i < q (cid:3) = δ (cid:104) ε i +1 θ > ν (cid:105) = Pr (cid:20) ε i +1 > νθ (cid:21) < Pr [ ε i +1 > q ] ≤ Pr [ ε i +1 > q ]= δ θ , the probability that x i and x i +1 are incorrectly ordered is at most δ .Repeating this analysis for all n − θ ’s, andtaking a union bound yields that the probability of incorrectly ordering any pair of elements is at most( n − δ , meaning the probability of outputting the correct ranking is at least 1 − ( n − δ . Since δ isarbitrary, this probability can be made arbitrarily close to 1, satisfying the asymptotic optimality condition. Monotonicity:

The removal of any elements does not alter the distribution of the remaining elements,meaning that the distribution of π ( − S ) is equivalent to a RUM with n − | S | elements. Thus, it suﬃces toshow that for a RUM with positive support on ( −∞ , ∞ ), the probability of ranking the best candidate ﬁrststrictly increases with θ .Recall that by deﬁnition, the candidates are ranked according to x i + ε i θ . The probability that x isranked ﬁrst isPr (cid:20) x + ε θ > max ≤ i ≤ n x i + ε i θ (cid:21) = Pr (cid:20) ε θ > max ≤ i ≤ n x i − x + ε i θ (cid:21) = Pr (cid:20) ε > max ≤ i ≤ n θ ( x i − x ) + ε i (cid:21) = E ε ,...,ε n Pr (cid:20) ε > max ≤ i ≤ n θ ( x i − x ) + ε i | ε , . . . , ε n (cid:21) (A.1)We want to show that (A.1) is increasing in θ . Intuitively, this is because as θ increases, the right handside of the inequality inside the probability decreases. To prove this formally, it suﬃces to show that thesubderivative of (A.1) with respect to θ only includes strictly positive numbers. First, we have ∂∂θ E ε ,...,ε n Pr (cid:20) ε > max ≤ i ≤ n θ ( x i − x ) + ε i | ε , . . . , ε n (cid:21) ⊂ R > ⇐⇒ ∂∂θ Pr (cid:20) ε > max ≤ i ≤ n θ ( x i − x ) + ε i | ε , . . . , ε n (cid:21) ⊂ R > Let F and f be the cumulative density function and probability density function of E respectively. Then,Pr (cid:20) ε > max ≤ i ≤ n θ ( x i − x ) + ε i | ε , . . . , ε n (cid:21) = 1 − F (cid:18) max ≤ i ≤ n θ ( x i − x ) + ε i (cid:19) Note that F ( · ) is strictly increasing (since f is assumed to have positive support on ( −∞ , ∞ )), so it suﬃcesto show that ∂∂θ max ≤ i ≤ n θ ( x i − x ) + ε i ⊂ R < For any i , ddθ θ ( x i − x ) + ε i = x i − x < . Thus, the subderivative of the max of such functions includes only strictly negative numbers, which completesthe proof.

B 3-candidate RUM Counterexamples

B.1 Violating Deﬁnition 2

Here, we provide a noise mode E , accuracy parameter θ , and candidate distribution D such that U AH < U AA .Choose the noise distribution E and accuracy parameter θ such that εθ =  w.p. δ w.p. − δ − w.p. δ −∞ , ∞ ); however, we can provide a “smooth” approximation to this distribution by expressing it as thesum of arbitrarily tightly concentrated Gaussians with the same results.We choose the candidate distribution D such that x − > x > x > x −

2. For example, x = 74 x = 12 x = 0Under this condition, assuming x = 0 without loss of generality, U AH ( θ, θ ) − U AA ( θ, θ ) = δ (cid:0) δ x − δ x + 4 δx + 2 δ x − δ x + 20 δx − x (cid:1) Notice that the lowest-power δ term is − δ x . Therefore, for suﬃciently small δ , this is negative. Forexample, plugging in the values given above with δ = . U AH ( θ, θ ) − U AA ( θ, θ ) ≈ − . B.2 Violating Deﬁnition 3

Next, we’ll give a 3-candidate RUM for which U AH < U HH does not hold in general. Consider the following3-candidate example. x = 3 x = 2 x = 0Choose E and θ such that εθ =  − δ − − δ

10 w.p. δ −

10 w.p. δ Again, while this noise model doesn’t satisfy Deﬁnition 1, we can approximate it arbitrarily closely with thesum of tightly concentrated Gaussians. Let the θ A = 1 . θ and θ H = 0 . θ .We will show that for these parameters, U AH ( θ A , θ H ) > U HH ( θ A , θ H ), i.e., it is somehow better to chooseafter a better opponent than after a worse opponent. At a high level, the reasoning for this is as follows:1. When choosing ﬁrst, the only diﬀerence between the algorithm and the human evaluator is that thealgorithm is more likely to choose x than x . Both strategies have identical probabilities of selecting x .2. When choosing second, the human evaluator’s utility is higher when x has already been chosen thanwhen x has already been chosen. This is because when x is unavailable, the human evaluatoris almost guaranteed to get x ; when x is unavailable, the human evaluator will choose x withprobability ≈ / τ and π be rankings generated by the algorithm and human evaluator respectively. First, we willshow that Pr[ τ = x ] = Pr[ π = x ] (B.1)Pr[ τ = x ] > Pr[ π = x ] (B.2)To do so, consider the realizations of ε , ε , ε that result in diﬀerent rankings under θ A and θ H . In fact,the only set of realizations that result in diﬀerent rankings are when ε /θ = − ε /θ = 1. Thus,16he algorithm and human evaluator always rank x in the same position, conditioned on a realization,which proves (B.1); the only diﬀerence is that the algorithm sometimes ranks x above x when the humanevaluator does not. Moreover, whenever ε /θ = − x is more strictly more likely to be ranked ﬁrst underthe algorithm than the human evaluator, which proves (B.2).Next, we must show that when choosing second, the human evaluator is better oﬀ when x is unavailablethan when x is unavailable. This is clearly true because for the human evaluator,Pr (cid:20) x + ε θ H θ > x + ε θ H θ (cid:21) ≈ − O ( δ )Pr (cid:20) x + ε θ H θ > x + ε θ H θ (cid:21) ≈ x being unavailable, the human evaluator gets utility ≈

3, whereas when x isunavailable, the human evaluator gets utility ≈ .

75. Let u − i be the expected utility for the human evaluatorwhen x i is unavailable. Putting this together, we get U AH ( θ A , θ H ) − U HH ( θ A , θ H )= (cid:88) i =1 (Pr[ τ = x i ] − Pr[ π = x i ]) u − i = (Pr[ τ = x ] − Pr[ π = x ]) u − + (Pr[ τ = x ] − Pr[ π = x ]) u − + (Pr[ τ = x ] − Pr[ π = x ]) u − = (Pr[ τ = x ] − Pr[ π = x ]) u − + (Pr[ τ = x ] − Pr[ π = x ]) u − (Pr[ τ = x = Pr[ π = x ])= (Pr[ τ = x ] − Pr[ π = x ])( u − − u − ) ( (cid:80) i =1 Pr[ τ = x i ] = (cid:80) i =1 Pr[ π = x i ]) > u − > u − . C Proof of Theorem 2

C.1 Verifying Deﬁnition 2

By (2), we can equivalently show that for any θ , U AH ( θ, θ ) > U AA ( θ, θ ). Let τ and π be the algorithmic andhuman-generated rankings respectively. Note that they’re identically distributed because θ A = θ H . Deﬁne Y (cid:44) (cid:40) π π (cid:54) = τ π otherwiseNote that U AH ( θ, θ ) = E [ Y ] and U AA ( θ, θ ) = E [ τ ]. We want to show that U AH ( θ, θ ) − U AA ( θ, θ ) = E [ x Y − x τ ] >

0. It is suﬃcient to show that for any k , E [ Y − τ | τ = x k ] >

0. Let X i = x i + ε i /θ . Notethat for distinct i, j, k and x i > x j , E [ Y − τ | τ = x k ] > ⇐ = Pr[ Y = x i | τ = x k ]Pr[ Y = x j | τ = x k ] > Pr[ τ = x i | τ = x k ]Pr[ τ = x j | τ = x k ] ⇐⇒ Pr[ Y = x i | τ = x k ] > Pr[ τ = x i | τ = x k ](numerator and denominator sum to 1) ⇐⇒ Pr[ X i > X j ] > Pr[ X i > X j | X k > X i ∩ X k > X j ] ⇐⇒ Pr[ X i > X j ] > E X k [Pr[ X i > X j | X k = a, X i < a, X j < a ]] . Thus, it suﬃces to show that for any a ,Pr[ X i > X j ] > Pr[ X i > X j | X i < a, X j < a ] . (C.1)Since Pr[ X i > X j ] = lim a →∞ Pr[ X i > X j | X i < a, X j < a ], it suﬃces to show that for all a , dda Pr[ X i > X j | X i < a, X j < a ] ≥ , (C.2)17nd that it is strictly positive for some a . In other words, the higher a is, the more likely i and j are to becorrectly ordered. In Theorems 7 and 8, we show that (C.2) holds for both Laplacian and Gaussian noiserespectively, which proves that RUMs based on both distributions satisfy Deﬁnition 2. C.2 Verifying Deﬁnition 3

Next, we show that for both Laplacian and Gaussian distributions, U AH ( θ A , θ H ) θ H . In fact, for 3-candidate RUM families, we will show that this is always true for any well-ordered distribution, deﬁned as follows. Deﬁnition 4.

A noise model with density f ( · ) is well-ordered if for any a > b and c > d , f ( a − c ) f ( b − d ) > f ( a − d ) f ( b − c ) . In other words, for a well-ordered noise model, given two numbers, two candidates are more likely to becorrectly ordered than inverted conditioned on realizing those two numbers in some order. Lemma 1 showsthat both Gaussian and Laplacian distributions are well-ordered.Thus, it suﬃces to show that for any 3-candidate RUM with a well-ordered noise model, U AH ( θ A , θ H ) θ H . Theorem 6.

For 3 candidates with unique values x > x > x and well-ordered i.i.d. noise with support ( −∞ , ∞ ) , if θ A > θ H , then U AH ( θ A , θ H ) < U HH ( θ A , θ H ) .Proof. Deﬁne u − i to be the expected utility of the maximum element of the human-generated ranking when i is not available. Because we’re in the 3-candidate setting, we have u − = λ x + (1 − λ ) x u − = λ x + (1 − λ ) x u − = λ x + (1 − λ ) x where 1 / < λ i <

1. This is because the noise has support everywhere, so it is impossible to correctly rankany two candidates with probability 1, and any two candidates are more likely than not to be correctlyordered: Pr (cid:104) ε i θ − ε j θ > − δ (cid:105) = Pr[ ε i − ε j ≥

0] + Pr (cid:20) > ε i − ε j θ > − δ (cid:21) > λ > λ and λ > λ , since λ = Pr[ ε − ε > − θ ( x − x )] > max { Pr[ ε − ε > − θ ( x − x )] , Pr[ ε − ε > − θ ( x − x )] } = max { λ , λ } . Let τ ∼ F θ A and π ∼ F θ H . With this, we can write U AH ( θ A , θ H ) = (cid:88) i =1 Pr[ τ = i ] u − i U HH ( θ A , θ H ) = (cid:88) i =1 Pr[ π = i ] u − i Deﬁne ∆ p i = Pr[ τ = i ] − Pr[ π = i ]Using Lemmas 2, and 3, we have∆ p > p ≥ ∆ p ∆ p ≤ p + ∆ p + ∆ p = 0. We must show that U AH ( θ A , θ H ) − U HH ( θ A , θ H ) = (cid:88) i =1 ∆ p i u − i < . We consider 2 cases.

Case 1: ∆ p ≤ p = − (∆ p + ∆ p ). This yields (cid:88) i =1 ∆ p i u − i = ∆ p u − + ∆ p u − + ∆ p u − ≤ ∆ p u − − ∆ p min( u − , u − )= ∆ p ( λ x + (1 − λ ) x − min { λ x + (1 − λ ) x , λ x + (1 − λ ) x } ) ≤ ∆ p ( λ x + (1 − λ ) x − min { λ x + (1 − λ ) x , x } )We can show that this is at most 0 regardless of which term attains the minimum. Because λ > λ , λ x + (1 − λ ) x − λ x − (1 − λ ) x = λ x + x − λ x − λ x − x + λ x = λ x − λ x − λ x + λ x = λ ( x − x ) + λ ( x − x ) < λ ( x − x ) + λ ( x − x )= λ ( x − x ) < λ x + (1 − λ ) x − x = (1 − λ )( x − x ) < . Thus, (cid:88) i =1 ∆ p i u − i < . Case 2: ∆ p >

0. Note that u − < x < u − . Then, using ∆ p = − (∆ p + ∆ p ), (cid:88) i =1 ∆ p i u − i = ∆ p u − + ∆ p u − + ∆ p u − = ∆ p ( u − − u − ) + ∆ p ( u − − u − ) ≤ ∆ p ( u − − u − ) + ∆ p ( u − − u − ) (∆ p ≥ ∆ p and u − )= 0Thus, U AH ( θ A , θ H ) < U HH ( θ A , θ H ). C.3 Supplementary Lemmas for Random Utility Models

Lemma 1.

Both Gaussian and Laplacian distributions are well-ordered. roof. The Gaussian noise model is well-ordered: f ( a − c ) f ( b − d ) = 12 σ π exp( − ( a − c ) − ( b − d ) )= 12 σ π exp( − ( a − d ) − ( b − c ) − ac + bd − ad − bc ))= f ( a − d ) f ( b − c ) exp( − a − b )( c − d ))) < f ( a − d ) f ( b − c )Laplacian noise is as well: f ( a − c ) f ( b − d ) = 14 exp( −| a − c | − | b − d | ) f ( a − d ) f ( b − c ) = 14 exp( −| a − d | − | b − c | )It suﬃces to show that for a > b and c > d , | a − c | + | b − d | < | a − d | + | b − c | . To show this, plot( a, b ) and ( c, d ) in the ( x, y ) plane. Note that they’re both below the y = x line, and that the (cid:96) distancebetween them is | a − c | + | b − d | . Moreover, the (cid:96) distance between any two points must be realized bysome Manhattan path, which is a combination of horizontal and vertical line segments. Consider the point( b, a ), which is above the y = x line. Any Manhattan path from ( b, a ) to ( c, d ) must cross the y = x line atsome point ( w, w ). Since ( b, a ) and ( a, b ) are equidistant from ( w, w ), for any Manhattan path from ( b, a )to ( c, d ), there exists a Manhattan path from ( a, b ) to ( c, d ) passing through ( w, w ) of the same length,meaning the (cid:96) distance from ( a, b ) to ( c, d ) is smaller than the (cid:96) distance from ( b, a ) to ( c, d ). As a result, | a − c | + | b − d | < | a − d | + | b − c | .Next, we show a few basic facts. Let f A ( r ) be the density function of the joint realization R =[ X , . . . , X n ] = [ x + ε /θ A , . . . , x n + ε n /θ A ] under the algorithmic ranking and f H ( r ) be the similarly deﬁneddensity function under the human-generated ranking. Consider the “contraction” operation r (cid:48) = cont( r ) suchthat r (cid:48) i = x i + ( r i − x i ) · θ H θ A . Essentially, the contraction deﬁnes a coupling between f A ( · ) and f H ( · ), sincefor r (cid:48) = cont( r ), f A ( r (cid:48) ) dr (cid:48) = f H ( r ) dr . Let π ( r ) be the ranking induced by r . Note that contraction cannotintroduce any new inversions in π ( r )—that is, if i is ranked above j in π ( r ) for i < j , then i is ranked above j in π (cont( r )). Intuitively, this is because contraction pulls values closer to their means, and can thereforeonly correct existing inversions, not introduce new ones. This fact will allow us to prove some useful lemmas. Lemma 2. If F θ is a RUM family satisfying Deﬁnition 1, then for τ ∼ F θ A and π ∼ F θ H , Pr[ τ = x n ] ≤ Pr[ π = x n ] Proof.

Consider any realization r . Because inversions can only be corrected, not generated, by contraction,if π ( r (cid:48) ) = n , then π ( r ) = n where r (cid:48) = cont( r ). Since r (cid:48) and r have equal measure under f A and f H respectively, we have Pr[ π = x n ] = (cid:90) R n f H ( r ) π ( r )= x n dr = (cid:90) R n f A (cont( r )) π ( r )= x n d cont( r ) ≥ (cid:90) R n f A (cont( r )) π (cont( r ))= x n d cont( r )= (cid:90) R n f A ( r ) π ( r )= x n dr = Pr[ τ = x n ]20ext, we prove the following result for well-ordered noise models. Lemma 3.

For any i > , if the noise model E is well-ordered, for θ A ≥ θ H , τ ∼ F θ A , and π ∼ F θ H , Pr[ τ = x ] − Pr[ π = x ] ≥ Pr[ τ = x i ] − Pr[ π = x i ] Proof.

For j (cid:54) = i , let S j → i ⊆ R n be the set of realizations r such that π ( r ) = x j and π (cont( r )) = x i . Notethat S j → i = ∅ for j i (cid:90) R n f H ( r ) r ∈ S j → i dr − (cid:88) ji (cid:90) R n f H ( r ) r ∈ S j → i dr Deﬁne swap i ( r ) = r (cid:48) , where r (cid:48) j =  r j j / ∈ { , i } r j = ir i j = 1Intuitively, the swap i operation simply swaps the realizations in positions 1 and i . Note that this is abijection. Also, if r ∈ S j → i , then swap i ( r ) ∈ S j → , sincecont(swap i ( r )) ≥ cont( r ) i ≥ max j cont( r ) j ≥ max j / ∈{ ,i } cont(swap i ( r )) j cont(swap i ( r )) ≥ cont( r ) i ≥ cont( r ) ≥ cont(swap i ( r )) i Furthermore for r ∈ S j → i , f H ( r ) ≤ f H (swap i ( r )) since f H (swap i ( r )) f H ( r ) = f ( r i − x ) f ( r − x i ) f ( r − x ) f ( r i − x i ) ≥ r ∈ S j → i implies r i > r . Thus, (cid:88) j>i (cid:90) R n f H ( r ) r ∈ S j → i dr ≤ (cid:88) j>i (cid:90) R n f H (swap i ( r )) r ∈ S j → i dr ≤ (cid:88) j>i (cid:90) R n f H (swap i ( r )) swap i ( r ) ∈ S j → dr ≤ (cid:88) j>i (cid:90) R n f H ( r ) r ∈ S j → dr ≤ (cid:88) j> (cid:90) R n f H ( r ) r ∈ S j → dr = Pr[ τ = x ] − Pr[ π = x ]Finally, we show that (C.2) holds for both Laplacian and Gaussian noise. Theorem 7.

For any a ∈ R and X i = x i + σε i where ε i is Laplacian with unit variance, dda Pr[ X i > X j | X i < a, X j < a ] ≥ . Moreover, it is strictly positive for some a . roof. First, we must derive an expression for Pr[ X i > X j | X i < a, X j < a ]. Recall that the Laplacedistribution parameterized by µ and λ has pdf f ( x ; µ, λ ) = λ − λ | x − µ | )and cdf F ( x ; µ, λ ) = (cid:40) exp ( − λ ( µ − x )) x < µ − exp ( − λ ( x − µ )) x ≥ µ Note that x i and x j be the respective means of X i and X j , with x i > x j . Because the Laplace distributionis piecewise deﬁned, we must consider 3 cases and show that in all 3 cases, (C.2) holds. Note thatPr[ X i > X j | X i < a, X j < a ] = (cid:82) a −∞ f ( x ; x i , λ ) F ( x ; x j , λ ) dxF ( a ; x i , λ ) F ( a ; x j , λ ) (C.3) Case 1: a ≤ x j .Then, the numerator of (C.3) is (cid:90) a −∞ λ − λ ( x i − x )) ·

12 exp( − λ ( x j − x )) dx = λ (cid:90) a −∞ exp( − λ ( x i + x j − x )) dx = λ exp( − λ ( x i + x j ))4 (cid:90) a −∞ exp(2 λx ) dx = λ exp( − λ ( x i + x j ))4 12 λ exp(2 λa )= exp( − λ ( x i + x j − a ))8The denominator is 12 exp( − λ ( x i − a )) ·

12 exp( − λ ( x j − a )) = 14 exp( − λ ( x i + x j − a )) . Thus, Pr[ X i > X j | X i < a, X j < a ] = 12 , so its derivative is trivially nonnegative. Case 2: x j < a ≤ x i .Then, the numerator of (C.3) is (cid:90) x j −∞ λ − λ ( x i − x )) ·

12 exp( − λ ( x j − x )) dx + (cid:90) ax j λ − λ ( x i − x )) (cid:18) −

12 exp( − λ ( x − x j )) (cid:19) dx = exp( − λ ( x i − x j ))8 + λ (cid:90) ax j exp( − λ ( x i − x )) dx − λ (cid:90) ax j exp( − λ ( x i − x j )) dx = exp( − λ ( x i − x j ))8 + λ λ (exp( − λ ( x i − a )) − exp( − λ ( x i − x j ))) − λ a − x j ) exp( − λ ( x i − x j ))= 12 exp( − λ ( x i − a )) − (cid:18)

38 + λ a − x j ) (cid:19) exp( − λ ( x i − x j ))The denominator is (cid:18) −

12 exp( − λ ( a − x j )) (cid:19) ·

12 exp( λ ( x j − a )) = 12 exp( − λ ( x i − a )) −

14 exp( − λ ( x i − x j ))22e can factor out exp( − λ ( x i − x j )) from both, soPr[ X i > X j | X i < a, X j < a ] = 2 exp( λ ( a − x j )) − (cid:0) + λ ( a − x j ) (cid:1) λ ( a − x j )) −

1= 2 exp( λ ( a − x j )) − − (cid:0) + λ ( a − x j ) (cid:1) λ ( a − x j )) −

1= 1 − + λ ( a − x j )2 exp( λ ( a − x j )) − dda Pr[ X i > X j | X i < a, X j < a ] > ⇐⇒ dda + λ ( a − x j )2 exp( λ ( a − x j )) − < ⇐⇒ (2 exp( λ ( a − x j )) − λ < (cid:18)

12 + λ ( a − x j ) (cid:19) λ exp( λ ( a − x j )) ⇐⇒ − exp( − λ ( a − x j )) < (cid:18)

12 + λ ( a − x j ) (cid:19) ⇐⇒ − exp( − λ ( a − x j )) < λ ( a − x j ) ⇐⇒ exp( − λ ( a − x j )) > − λ ( a − x j )This is true because λ ( a − x j ) >

0, and for z > − z ) > − z > − z. Case 3: a > x i .Then, the numerator of (C.3) is (cid:90) x j −∞ λ − λ ( x i − x )) ·

12 exp( − λ ( x j − x )) dx + (cid:90) x i x j λ − λ ( x i − x )) (cid:18) −

12 exp( − λ ( x − x j )) (cid:19) dx + (cid:90) ax i λ − λ ( x − x i )) (cid:18) −

12 exp( − λ ( x − x j )) (cid:19) dx = 12 − (cid:18)

38 + λ x i − x j ) (cid:19) exp( − λ ( x i − x j )) + 12 (1 − exp( − λ ( a − x i ))) − λ (cid:90) ax i exp( − λ (2 x − x i − x j )) dx = 1 − (cid:18)

38 + λ x i − x j ) (cid:19) exp( − λ ( x i − x j )) −

12 exp( − λ ( a − x i ))+ 18 exp( λ ( x i + x j ))(exp( − λa ) − exp( − λx i ))= 1 − (cid:18)

12 + λ x i − x j ) (cid:19) exp( − λ ( x i − x j )) −

12 exp( − λ ( a − x i )) + 18 exp( − λ (2 a − x i − x j ))The denominator is (cid:18) −

12 exp( − λ ( t − x i )) (cid:19) (cid:18) −

12 exp( − λ ( t − x j )) (cid:19) = 1 −

12 exp( − λ ( a − x i )) −

12 exp( − λ ( a − x j )) + 14 exp( − λ (2 a − x i − x j ))23hus, Pr[ X i > X j | X i < a, X j < a ]= 1 − (cid:0) + λ ( x i − x j ) (cid:1) exp( − λ ( x i − x j )) − exp( − λ ( a − x i )) + exp( − λ (2 a − x i − x j ))1 − exp( − λ ( a − x i )) − exp( − λ ( a − x j )) + exp( − λ (2 a − x i − x j )) ∝ − (4 + 2 λ ( x i − x j )) exp( − λ ( x i − x j )) − − λ ( a − x i )) + exp( − λ (2 a − x i − x j ))4 − − λ ( a − x i )) − − λ ( a − x j )) + exp( − λ (2 a − x i − x j ))We’re interested in dda Pr[ X i > X j | X i < a, X j < a ] > ⇐⇒ (4 − − λ ( a − x i )) − − λ ( a − x j )) + exp( − λ (2 a − x i − x j ) · (4 λ exp( − λ ( a − x i )) − λ exp( − λ (2 a − x i − x j ))) > (8 − − λ ( a − x i )) + exp( − λ (2 a − x i − x j )) − (4 + 2 λ ( x i − x j )) exp( − λ ( x i − x j ))) · (2 λ exp( − λ ( a − x i )) + 2 λ exp( − λ ( a − x j )) − λ exp( − λ (2 a − x i − x j ))) ⇐⇒

16 exp( − λ ( a − x i )) − − λ (2 a − x i − x j )) − − λ ( a − x i )) + 4 exp( − λ (3 a − x i − x j )) − − λ (2 a − x i − x j )) + 4 exp( − λ (3 a − x i − x j )) + 4 exp( − λ (3 a − x i − x j )) − − λ (2 a − x i − x j )) >

16 exp( − λ ( a − x i )) + 16 exp( − λ ( a − x j )) −

16 exp( − λ (2 a − x i − x j )) − − λ ( a − x i )) − − λ (2 a − x i − x j )) + 8 exp( − λ (3 a − x i − x j ))+ 2 exp( − λ (3 a − x i − x j )) + 2 exp( − λ (3 a − x i − x j )) − − λ (2 a − x i − x j )) − λ ( x i − x j )) exp( − λ ( a − x j )) − λ ( x i − x j )) exp( − λ ( a + x i − x j ))+ 2(4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) ⇐⇒ exp( − λ (3 a − x i − x j )) > − λ ( a − x j )) − − λ (2 a − x i − x j )) + exp( − λ (3 a − x i − x j )) − (4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) − (4 + 2 λ ( x i − x j )) exp( − λ ( a + x i − x j ))+ (4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) ⇐⇒ exp( − λ (2 a − x i − x j )) > − − λ ( a − x i )) + exp( − λ (2 a − x i )) − (4 + 2 λ ( x i − x j )) − (4 + 2 λ ( x i − x j )) exp( − λ ( x i − x j )) + (4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) ⇐⇒ exp( − λ (2 a − x i − x j )) − − λ ( a − x i )) − exp( − λ ( a − x i ))+ (4 + 2 λ ( x i − x j ))(1 + exp( − λ ( x i − x j ))) − (4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) > z ≥

0, we have(4 + 2 z )(1 + e − z ) − ≥ ⇐⇒ (2 + z )(1 + e − z ) ≥ ⇐⇒ z + 2 e − z + ze − z ≥ z = 0, this holds with equality, and the left hand side is increasing since ddx z + 2 e − z + ze − z ≥ ⇐⇒ − e − z + e − z − ze − z ≥ ⇐⇒ ≥ e − z + ze − z ⇐⇒

11 + z ≥ e − z ⇐⇒ z ≤ e z z = λ ( x i − x j ) and plugging back to (C.4), we haveexp( − λ (2 a − x i − x j )) − − λ ( a − x i )) − exp( − λ ( a − x i ))+ (4 + 2 λ ( x i − x j ))(1 + exp( − λ ( x i − x j ))) − (4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) > ⇐ = exp( − λ (2 a − x i − x j )) + 4 exp( − λ ( a − x i )) − exp( − λ ( a − x i )) − (4 + 2 λ ( x i − x j )) exp( − λ ( a − x j )) > ⇐⇒ exp( − λ ( a − x j )) + 4 − exp( − λ ( a − x i )) − (4 + 2 λ ( x i − x j )) exp( − λ ( x i − x j )) > ⇐⇒ − exp( − λ ( x i − x j ))) + exp( − λ ( a − x i ))(exp( − λ ( x i − x j )) − − λ ( x i − x j ) exp( − λ ( x i − x j )) > ⇐⇒ (4 − exp( − λ ( a − x i )))(1 − exp( − λ ( x i − x j ))) − λ ( x i − x j ) exp( − λ ( x i − x j )) > ⇐ =3(1 − exp( − λ ( x i − x j ))) − λ ( x i − x j ) exp( − λ ( x i − x j )) > − λ ( a − x i )) < z = λ ( x i − x j ), this is true if and only if3(1 − e − z ) > ze − z ⇐⇒ e z − > z ⇐⇒ e z > z which is true because e z > z for z >

0. This completes the proof for Case 3.As a result, we have that dda

Pr[ X i > X j | X i < a, X j < a ] ≥ a , with strict inequality for some a , which proves the theorem. Theorem 8.

For any a ∈ R and X i = x i + σε i where ε i ∼ N (0 , , dda Pr[ X i > X j | X i < a, X j < a ] > . Proof.

Assume σ = 1 / √

2. This is without loss of generality because for any instance with arbitrary σ (cid:48) , thereis an instance with σ = 1 / √ σ/σ (cid:48) ). First, we havePr[ X i > X j | X i < a, X j < a ] = (cid:82) a −∞ Pr[ X i = x ] Pr[ X j < x ] dxP r [ X i < a ] Pr[ X j < a ]= (cid:82) a −∞ exp( − ( x − x i ) ) / √ π · (1 + erf( x − x j )) / dx (1 + erf( a − x i )) / · (1 + erf( a − x j )) /

2= 2 √ π (cid:82) a −∞ exp( − ( x − x i ) )(1 + erf( x − x j )) dx (1 + erf( a − x i )) · (1 + erf( a − x j ))The derivative with respect to a is positive if and only if(1 + erf( a − x i ))(1 + erf( a − x j )) exp( − ( a − x i ) )(1 + erf( a − x j )) > (cid:90) a −∞ exp( − ( x − x i ) )(1 + erf( x − x j )) dx · √ π (cid:0) (1 + erf( a − x i )) exp( − ( a − x j ) ) + (1 + erf( a − x j )) exp( − ( a − x i ) ) (cid:1) (C.5)Let t = a − x i and δ = x i − x j . Then, using the fact that (cid:90) a −∞ exp( − ( x − x i ) )(1 + erf( x − x j )) dx = (cid:90) a − x i −∞ exp( − x ) dx + (cid:90) a − x i −∞ exp( − x ) erf( x + δ ) dx = √ π a − x i )) + (cid:90) a − x i −∞ exp( − x ) erf( x + δ ) dx, √ π · (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) > √ π t )) + (cid:90) t −∞ exp( − x ) erf( x + δ ) dx ⇐⇒ (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) − (1 + erf( t )) − √ π (cid:90) t −∞ exp( − x ) erf( x + δ ) dx > f ( t ) > f ( t ) is continuous and diﬀerentiable everywhere2. lim t →−∞ f ( t ) = 03. ddt f ( t ) > t →−∞ (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) − (1 + erf( t )) − √ π (cid:90) t −∞ exp( − x ) erf( x + δ ) dx = lim t →−∞ (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) (C.7)Observe that both the numerator and denominator of (C.7) are positive, so this limit must be at least 0.We can upper bound it bylim t →−∞ (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) ≤ lim t →−∞ (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t + δ )) exp( − t )= lim t →−∞ (1 + erf( t ))(1 + erf( t + δ ))= 0Thus, the limit is 0. Now, we must show that the derivative is positive. The derivative is ddt (cid:20) (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) − (1 + erf( t )) − √ π (cid:90) t −∞ exp( − x ) erf( x + δ ) dx (cid:21) = ddt (cid:20) (1 + erf( t ))(1 + erf( t + δ )) exp( − t )(1 + erf( t )) exp( − ( t + δ ) ) + (1 + erf( t + δ )) exp( − t ) (cid:21) − √ π exp( − t ) − √ π exp( − t ) erf( t + δ )(C.8)Taking this derivative and factoring out2(1 + erf( t ))(1 + erf( t + δ )) exp(4 t ) √ π (cid:0) (erf ( t ) + 1) e t + (erf ( δ + t ) + 1) e ( δ + t ) (cid:1) , we get that (C.8) is positive if and only if δ √ π exp(( t + δ ) )(1 + erf( t ))(1 + erf( t + δ )) − exp(2 δt + t )(1 + erf( t + δ )) + (1 + erf( t )) > ⇐⇒ δ √ π exp(( t + δ ) )(1 + erf( t )) + 1 + erf( t )1 + erf( t + δ ) − exp(2 δt + t ) > ⇐⇒ δ √ π exp( t )(1 + erf( t )) + exp( − δt − t ) 1 + erf( t )1 + erf( t + δ ) − > ⇐⇒ (1 + erf( t )) (cid:20) δ √ π exp( t ) + exp( − δt − δ )1 + erf( t + δ ) (cid:21) − > ⇐⇒ t )exp( − t ) (cid:20) δ √ π + exp( − ( t + δ ) )1 + erf( t + δ ) (cid:21) − > g ( t ) (cid:44) t )exp( − t ) . Then, (C.9) is g ( t ) (cid:20) δ √ π + 1 g ( t + δ ) (cid:21) − > ⇐⇒ g ( t ) − g ( t + δ ) < δ √ π By the Mean Value Theorem, 1 g ( t ) − g ( t + δ ) = − δ ddt g ( t ) (cid:12)(cid:12)(cid:12)(cid:12) t = t ∗ for some t ≤ t ∗ ≤ t + δ . Thus, it suﬃces to show that ddt g ( t ) > −√ π (C.10)for all t . To do this, consider Mills Ratio [17] R ( t ) (cid:44) exp( t / (cid:90) ∞ t exp( − x / dx. Note that this is quite similar in functional form to g ( t ), and with some manipulation, we can relate the two: R ( t ) = exp( t / (cid:90) ∞ t exp( − x / dxR ( √ t ) = exp( t ) (cid:90) ∞√ t exp( − x / dx = √ t ) (cid:90) ∞ t exp( − x ) dx = √ t ) (cid:90) − t −∞ exp( − x ) dx (exp( − x ) is symmetric) R ( −√ t ) = √ t ) (cid:90) t −∞ exp( − x ) dx = √ t ) · √ π t ))= (cid:114) π (cid:18) t )exp( − t ) (cid:19) R ( −√ t ) = (cid:114) π g ( t )Sampford [21, Eq. (3)] proved that ddt R ( t ) < t . Thus, ddt g ( t ) = ddt (cid:113) π R ( −√ t ) = (cid:114) π ddt R ( −√ t ) > (cid:114) π · · −√ −√ π, which proves (C.10) and completes the proof. D Verifying that the Mallows Model Satisﬁes Deﬁnition 1

Theorem 9.

The family of distributions F θ produced by the Mallows Model with Kendall tau distance with θ = φ − satisﬁes the conditions of Deﬁnition 1. roof. We must show that F θ satisﬁes the diﬀerentiability, asymptotic optimality, and monotonicity condi-tions of Deﬁnition 1. Diﬀerentiability:

Let Π be the set of all permutations on n candidates. The probability of a realizinga particular permutation π under the Mallows model isPr θ [ π ] = φ − d ( π,π ∗ ) (cid:80) π (cid:48) ∈ Π φ − d ( π (cid:48) ,π ∗ ) Both the numerator and denominator are diﬀerentiable with respect to θ = φ −

1, so Pr θ [ π ] is diﬀerentiablewith respect to θ . Asymptotic optimality:

For the correct ranking π ∗ ,Pr θ [ π ∗ ] = 1 Z , where the normalizing constant Z is Z = (cid:88) π ∈ Π φ − d ( π,π ∗ ) In the limit, lim θ →∞ Z = lim φ →∞ Z = lim φ →∞ (cid:88) π ∈ Π φ − d ( π,π ∗ ) = lim φ →∞ (cid:88) π (cid:54) = π ∗ ∈ Π φ − d ( π,π ∗ ) = 1 + (cid:88) π (cid:54) = π ∗ ∈ Π lim φ →∞ φ − d ( π,π ∗ ) = 1because for any π (cid:54) = π ∗ , d ( π, π ∗ ) ≥

1. Therefore,lim θ →∞ Pr θ [ π ∗ ] = lim θ →∞ Z = 1 Monotonicity:

We must show that for any S ⊂ x , if π ( − S )1 denotes the value of the top-ranked candidateaccording to π excluding candidates in S , E F θ (cid:48) (cid:104) π ( − S )1 (cid:105) ≥ E F θ (cid:104) π ( − S )1 (cid:105) . For any i / ∈ S , let j be the smallest index such that j > i and j / ∈ S . Consider any π such that π ( − S )1 = x j .Then, swapping i and j yields a permutation ˆ π such that ˆ π ( − S )1 = x i . Moreover,Pr[ˆ π ] = Pr[ π ] · φ inv( π ) − inv(ˆ π ) . Since i < j , inv( π ) − inv(ˆ π ) ≥

1. Finally, note that swapping i and j is a bijection between { π : π ( − S )1 = x j } and { π : π ( − S )1 = x i } . Thus,Pr[ π ( − S )1 = x i ]Pr[ π ( − S )1 = x j ] = (cid:88) π : π ( − S )1 = x j Pr[ π ]Pr[ π ( − S )1 = x j ] · φ inv( π ) − inv(ˆ π ) Note that the terms

Pr[ π ]Pr[ π ( − S )1 = x j ] sum to 1, so this is sum is some polynomial in φ with nonnegative weightsand integer powers of φ . As a result, it must have a positive derivative with respect to φ , i.e., for i < j , ddφ Pr[ π ( − S )1 = x i ]Pr[ π ( − S )1 = x j ] > φ (cid:48) > φ . Then, Pr φ [ π ( − S )1 = x i ]Pr φ [ π ( − S )1 = x j ] < Pr φ (cid:48) [ π ( − S )1 = x i ]Pr φ (cid:48) [ π ( − S )1 = x j ]Rearranging, Pr φ [ π ( − S )1 = x i ]Pr φ (cid:48) [ π ( − S )1 = x i ] < Pr φ [ π ( − S )1 = x j ]Pr φ (cid:48) [ π ( − S )1 = x j ] (D.1)For θ (cid:48) − φ (cid:48) − θ = φ − E F θ (cid:104) π ( − S )1 (cid:105) = (cid:88) i/ ∈ S Pr φ (cid:104) π − ( S )1 = x i (cid:105) x i E F θ (cid:48) (cid:104) π ( − S )1 (cid:105) = (cid:88) i/ ∈ S Pr φ (cid:48) (cid:104) π − ( S )1 = x i (cid:105) x i By Lemma 4, E F θ (cid:48) (cid:104) π ( − S )1 (cid:105) > E F θ (cid:104) π ( − S )1 (cid:105) , which completes the proof. Note that we apply Lemma 4 indexing backwards from n to 1, ignoring elementsin S , with p i = Pr φ (cid:104) π − ( S )1 = x i (cid:105) and q i = Pr φ (cid:48) (cid:104) π − ( S )1 = x i (cid:105) . (D.1) provides the condition that p i /q i isdecreasing (as i decreases, since we are indexing backwards). E Proof of Theorem 3

E.1 Verifying Deﬁnition 2

We must show that when π, τ ∼ F θ , E [ π − π | π (cid:54) = τ ] > . (E.1)We begin by expanding: E [ π − π | π (cid:54) = τ ] = n (cid:88) i =1 n (cid:88) j =1 ( x i − x j ) Pr[ π = x i ∩ π = x j | π (cid:54) = τ ]= n − (cid:88) i =1 (cid:88) j>i ( x i − x j ) (Pr[ π = x i ∩ π = x j | π (cid:54) = τ ] − Pr[ π = x j ∩ π = x i | π (cid:54) = τ ])Since x i > x j for i < j , it suﬃces to show that for all i < j ,Pr[ π = x i ∩ π = x j | π (cid:54) = τ ] ≥ Pr[ π = x j ∩ π = x i | π (cid:54) = τ ] , (E.2)and that this holds strictly for some i < j . We simplify (E.2) as follows:Pr[ π = x i ∩ π = x j | π (cid:54) = τ ] > Pr[ π = x j ∩ π = x i | π (cid:54) = τ ] ⇐⇒ Pr[ π = x i ∩ π = x j ∩ π (cid:54) = τ ]Pr[ π (cid:54) = τ ] > Pr[ π = x j ∩ π = x i ∩ π (cid:54) = τ ]Pr[ π (cid:54) = τ ] ⇐⇒ Pr[ π = x i ∩ π = x j ∩ π (cid:54) = τ ] > Pr[ π = x j ∩ π = x i ∩ π (cid:54) = τ ] ⇐⇒ Pr[ π = x i ∩ π = x j ∩ τ (cid:54) = x i ] > Pr[ π = x j ∩ π = x i ∩ τ (cid:54) = x j ] ⇐⇒ Pr[ π = x i ∩ π = x j ] Pr[ τ (cid:54) = x i ] > Pr[ π = x j ∩ π = x i ] Pr[ τ (cid:54) = x j ] (E.3)29e can simplify (E.3) using Lemmas 5 and 6. Let | i − j | denote the diﬀerence in rank between x i and x j .Pr[ π = x i ∩ π = x j ] Pr[ τ (cid:54) = x i ] − Pr[ π = x j ∩ π = x i ] Pr[ τ (cid:54) = x j ]= Pr[ π = x i ∩ π = x j ](1 − Pr[ τ = x i ]) − φ − Pr[ π = x i ∩ π = x j ](1 − Pr[ τ = x j ])= Pr[ π = x i ∩ π = x j ](1 − Pr[ τ = x i ]) − φ − Pr[ π = x i ∩ π = x j ](1 − φ −| i − j | Pr[ τ = x i ])= Pr[ π = x i ∩ π = x j ](1 − Pr[ τ = x i ] − φ − − φ −| i − j |− Pr[ τ = x i ]))This is positive if and only if 1 − Pr[ τ = x i ] − φ − − φ −| i − j |− Pr[ τ = x i ] > ⇐⇒ Pr[ τ = x i ] (cid:16) − φ −| i − j |− (cid:17) < − φ − ⇐⇒ Pr[ τ = x i ] < − φ − − φ −| i − j |− ⇐⇒ − φ − φ i − (1 − φ − n ) < − φ − − φ −| i − j |− ⇐⇒ φ i − (1 − φ − n ) > − φ −| i − j |− This is weakly true for any i < j because φ i − ≥ | i − j | + 1 ≤ n , and it is strictly true for any i, j other than 1 and n . Thus, E [ π − π | π (cid:54) = τ ] > E.2 Verifying Deﬁnition 3

Recall that Deﬁnition 3 is equivalent to U AH ( θ A , θ H ) θ H . Let τ be the algorithmicranking, and let π be a ranking from a human evaluator. Recall that U H ( θ A , θ H ) = E [ π ]. Throughout thisproof, we will drop the ( θ A , θ H ) notation and simply write U H , U AH , and U HH . U AH = n (cid:88) i =1 (Pr[ π = x i ∩ τ (cid:54) = x i ] + Pr[ π = x i ∩ π = τ ]) x i = n (cid:88) i =1 Pr[ π = x i ∩ τ (cid:54) = x i ] x i + n (cid:88) i =1 Pr[ π = x i ∩ π = τ ] x i = n (cid:88) i =1 ( P r [ π = x i ] − Pr[ π = x i ∩ τ = x i ]) x i + n (cid:88) i =1 (cid:88) j (cid:54) = i Pr[ π = x j ∩ τ = x j ∩ π = x i ] x i = U H − n (cid:88) i =1 Pr[ π = x i ∩ τ = x i ] x i + n (cid:88) i =1 Pr[ π = x i ∩ τ = x i ] E [ π | π = x i ∩ τ = x i ]= U H + n (cid:88) i =1 Pr[ π = x i ] Pr[ τ = x i ] ( E [ π | π = x i ] − x i )Similarly, because two human evaluators are independent, U HH = U H + n (cid:88) i =1 Pr[ π = x i ] ( E [ π | π = x i ] − x i ) . Let V − i = E [ π | π = x i ]. Note that conditioned on π = x i , the remaining elements of π follow a Mallowsmodel distribution over n − V − i increases as i increases (since x i , the value of the unavailable candidate, decreases). Moreover, x i

30s strictly decreasing in i , so V − i − x i is strictly increasing in i . With this, we have U AH − U H = n (cid:88) i =1 Pr[ π = x i ] Pr[ τ = x i ] ( V − i − x i ) U HH − U H = n (cid:88) i =1 Pr[ π = x i ] ( V − i − x i )Let C A = Pr[ π = τ ] = (cid:80) ni =1 Pr[ π = x i ] Pr[ τ = x i ], and similarly let C H = (cid:80) ni =1 Pr[ π = x i ] . C A > C H by Lemma 4 with y (cid:48) i = Pr[ π = x n − i +1 ], p (cid:48) i = Pr[ π = x n − i +1 ] and q (cid:48) i = Pr[ τ = x n − i +1 ].Let p i = Pr[ π (cid:48) = i ] Pr[ π = i ] /C A , q i = Pr[ π (cid:48) = i ] /C H , and y i = V − i − x i . Then, we have U AH − U H C A = n (cid:88) i =1 p i y i U HH − U H C H = n (cid:88) i =1 q i y i With φ A = θ A + 1 and φ H = θ H + 1, p i q i = C H C A · − φ − A φ i − A (1 − φ − nA )1 − φ − H φ i − H (1 − φ − nH ) ∝ φ i − H φ i − A , which is decreasing in i since φ H < φ A . By Lemma 4, (cid:80) ni =1 p i y i < (cid:80) ni =1 q i y i . Finally, note that U HH − U H < n (cid:88) i =1 p i y i < n (cid:88) i =1 q i y i U AH − U H C A C H , and U HH − U H < U AH < U HH F Supplementary Lemmas for the Mallows Model

Lemma 4.

Let { y i } ni =1 , { p i } ni =1 , and { q i } ni =1 be sequences such that • y i is strictly increasing. • (cid:80) ni =1 p i = (cid:80) ni =1 q i = 1 . • p i q i is decreasing.Then, (cid:80) ni =1 p i y i < (cid:80) ni =1 q i y i .Proof. First, note that there exists j such that p i > q i for i < j and p i ≤ q i for i ≥ j . To see this, let j bethe smallest index such that p j ≤ q j . Such a j must exist because p i and q i both sum to 1, so it cannot bethe case that p i > q i for all i . This implies p i /q i ≤

1, and since p i /q i is decreasing, p i ≤ q i for i ≥ j .Next, note that 0 = n (cid:88) i =1 ( p i − q i )= j − (cid:88) i =1 ( p i − q i ) + n (cid:88) i = j ( p i − q i ) , j − (cid:88) i =1 ( p i − q i ) = n (cid:88) i = j ( q i − p i ) . Using this choice of j , we can write n (cid:88) i =1 p i y i − n (cid:88) i =1 q i y i = n (cid:88) i =1 ( p i − q i ) y i = j − (cid:88) i =1 ( p i − q i ) y i − n (cid:88) i = j ( q i − p i ) y i ≤ j − (cid:88) i =1 ( p i − q i ) y j − − n (cid:88) i = j ( q i − p i ) y j = j − (cid:88) i =1 ( p i − q i ) y j − − n (cid:88) i = j ( q i − p i ) y j = j − (cid:88) i =1 ( p i − q i ) y j − − j − (cid:88) i =1 ( p i − q i ) y j = j − (cid:88) i =1 ( p i − q i )( y j − − y j ) < Lemma 5.

For x i > x j , Pr[ π = x i ∩ π = x j ] = φ Pr[ π = x j ∩ π = x i ] . (F.1) Proof.

Let π − ij be a permutation of all of the candidates except x i and x j . Then, we havePr[ π = x i ∩ π = x j ] = (cid:88) π − ij Pr[ π = x i ∩ π = x j | π − ij ] Pr[ π − ij ]= (cid:88) π − ij φ Pr[ π = x j ∩ π = x i | π − ij ] Pr[ π − ij ]= φ Pr[ π = x j ∩ π = x i ]Intuitively, given that x i and x j are in the top 2 positions, x i followed by x j is φ times more likely than x j followed by x i regardless of the remainder of the permutation, and therefore, x i followed by x j is φ timesmore likely overall. Lemma 6.

For ≤ i ≤ n , Pr[ π = x i ] = 1 − φ − φ i − (1 − φ − n ) . (F.2) Proof.

Let π − i be a permutation over all items except i . Then,Pr[ π = x i ] = (cid:88) π − i Pr[ π = x i | π − i ] Pr[ π − i ]= (cid:88) π − i φ − ( i − Pr[ π − i ]= φ − ( i − (cid:88) π − i Pr[ π − i ]32ote that Pr[ π − i ] doesn’t depend on which n − i .Moreover, (cid:80) ni =1 Pr[ π = x i ] = 1. Therefore, we havePr[ π = x i ] ∝ φ − ( i − . Normalizing, we get Pr[ π = x i ] = φ − ( i − (cid:80) nj =1 φ − ( j − = φ − ( i − − φ − n − φ − = 1 − φ − φ i − (1 − φ − n )Intuitively, any permutation over n − Lemma 7.

For the Mallows Model, U H ( θ A , θ H ) > U HH ( θ A , θ H ) .Proof. Intuitively, this is because selecting ﬁrst is better than selecting second. To prove this, let π and τ be ranking generated by independent human evaluators under the Mallows Model, i.e., π, τ ∼ F θ H . U H ( θ A , θ H ) − U HH ( θ A , θ H ) = E [ π ] − E [ τ · π (cid:54) = τ + τ · π = τ ]= E [( π − τ ) · π = τ ]= E [( π − π ) · π = τ ]For any i < j , conditioned on π = τ , they are more likely to be correctly ordered than not: E [( π − π ) · π = τ ] = (cid:88) i (cid:88) i F.1 Proof of Theorem 4

To prove this theorem, we make use of the following lemma.

Lemma 8.

Under the Mallows model, the probability that any two items i < j are correctly ranked increasesmonotonically with the accuracy parameter φ . Alternatively, we could prove this by showing that for any permutation with i in front, the permutation in which i and i − φ times more likely, and thus, i − φ times more likely to be in front than i . roof. Let inv( π ) be the number of inversions in a permutation π . Under the Mallows model, the probabilityof observing π is proportional to φ − inv( π ) . Let S i (cid:31) j (resp. S j (cid:31) i ) be the set of permutations where i is rankedbefore j (resp. j is ranked before i ). Then, the probability i is ranked before j isPr[ i (cid:31) j ] = (cid:80) π ∈ S i (cid:31) j φ − inv( π ) (cid:80) π ∈ S i (cid:31) j φ − inv( π ) + (cid:80) π ∈ S j (cid:31) i φ − inv( π ) . We will show that ddφ

Pr[ i (cid:31) j ] >

0. Note that this is equivalent to showing ddφ

Pr[ i (cid:31) j ]Pr[ j (cid:31) i ] >

0. Note thatPr[ i (cid:31) j ]Pr[ j (cid:31) i ] = (cid:80) π ∈ S i (cid:31) j φ − inv( π ) (cid:80) π ∈ S j (cid:31) i φ − inv( π ) . Let π i : j be the subsequence of π containing elements i through j . Then, we havePr[ i (cid:31) j ]Pr[ j (cid:31) i ] = (cid:80) π ∈ S i (cid:31) j φ − inv( π ) (cid:80) π ∈ S j (cid:31) i φ − inv( π ) = (cid:80) π i : j : π ∈ S i (cid:31) j φ − inv( π i : j ) (cid:80) π (cid:48) : π (cid:48) i : j = π i : j φ inv( π i : j ) − inv( π (cid:48) ) (cid:80) π i : j : π ∈ S j (cid:31) i φ − inv( π i : j ) (cid:80) π (cid:48) : π (cid:48) i : j = π i : j φ inv( π i : j ) − inv( π (cid:48) ) = (cid:80) π i : j : π ∈ S i (cid:31) j φ − inv( π i : j ) (cid:80) π i : j : π ∈ S j (cid:31) i φ − inv( π i : j ) Intuitively, the term (cid:80) π (cid:48) : π (cid:48) i : j = π i : j φ inv( π i : j ) − inv( π (cid:48) ) does not depend on π i : j because for any π i : j , if we ﬁxthe order and positions of the remaining elements, the number of inversions involving at least one elementoutside of i : j (i.e., inv( π (cid:48) ) − inv( π i : j )) is a constant. For ﬁxed π i : j , there is a bijection between permutations π (cid:48) : π (cid:48) i : j = π i : j and a ﬁxed order and position of the remaining elements (excluding i : j ), meaning this sumdoes not depend on π i : j . Thus, for the remainder of this proof, we can assume without loss of generalitythat i = 1 and j = n . The quantity of interest becomesPr[1 (cid:31) n ]Pr[ n (cid:31)

1] = (cid:80) π n : π ∈ S (cid:31) n φ − inv( π n ) (cid:80) π i : j : π ∈ S n (cid:31) φ − inv( π n ) = (cid:80) π ∈ S (cid:31) n φ − inv( π ) (cid:80) π ∈ S n (cid:31) φ − inv( π ) Next, we observe that we can similarly ignore inversions between two elements that are neither 1 nor n .To see this, let inv ,n ( π ) be the number of inversions involving at least one of 1 and n . Then, if we ﬁx theorder and positions of 1 and n , all possible permutations of the remaining elements 2 through n − ,n ( π ). More formally, let π (1) and π ( n ) be the respective positions ofelements 1 and n . Then, this we have (cid:88) π ∈ S (cid:31) n φ − inv( π ) = (cid:88) k<(cid:96) (cid:88) π : π (1) = k,π ( n ) = (cid:96) φ − inv( π ) = (cid:88) k<(cid:96) (cid:88) π : π (1) = k,π ( n ) = (cid:96) φ − inv ,n ( φ ) · φ inv ,n ( φ ) − inv( π ) = (cid:88) k<(cid:96) φ − ( k − − ( n − (cid:96) ) (cid:88) π : π (1) = k,π ( n ) = (cid:96) φ inv ,n ( φ ) − inv( π ) As noted above, (cid:80) π : π (1) = k,π ( n ) = (cid:96) φ inv ,n ( φ ) − inv( π ) does not depend on k or (cid:96) , since every permutation ofthe remaining elements yields the same number of inversions among them regardless of k and (cid:96) . A similarargument yields (cid:88) π ∈ S n (cid:31) φ − inv( π ) = (cid:88) k>(cid:96) φ − ( k − − ( n − (cid:96) )+1 (cid:88) π : π (1) = k,π ( n ) = (cid:96) φ inv ,n ( φ ) − inv( π ) (cid:31) n ]Pr[ n (cid:31)

1] = (cid:80) k<(cid:96) φ − ( k − − ( n − (cid:96) ) (cid:80) π : π (1) = k,π ( n ) = (cid:96) φ inv ,n ( φ ) − inv( π ) (cid:80) k>(cid:96) φ − ( k − − ( n − (cid:96) )+1 (cid:80) π : π (1) = k,π ( n ) = (cid:96) φ inv ,n ( φ ) − inv( π ) = (cid:80) k<(cid:96) φ − ( k − − ( n − (cid:96) ) (cid:80) k>(cid:96) φ − ( k − − ( n − (cid:96) )+1 = (cid:80) k<(cid:96) φ − ( k − − ( n − (cid:96) ) (cid:80) k>(cid:96) φ − ( k − − ( n − (cid:96) )+1 · φ n − φ n − = (cid:80) k<(cid:96) φ (cid:96) − k (cid:80) k>(cid:96) φ (cid:96) − k +1 Note that each term in the numerator is strictly increasing in φ , while each term in the denominator isweakly decreasing in φ . As a result, ddφ Pr[1 (cid:31) n ]Pr[ n (cid:31) >

Related Researches

A Generic Strategy Iteration Method for Simple Stochastic Games

by D. Auger

Best-of-Both-Worlds Fair-Share Allocations

by Moshe Babaioff

A Game Theoretic Framework for Surplus Food Distribution in Smart Cities and Beyond

by Surja Sanyal

Tractable mechanisms for computing near-optimal utility functions

by Rahul Chandan

Finding Nash Equilibria of Two-Player Games

by Bernhard von Stengel

Efficient, Fair, and Incentive-Compatible Healthcare Rationing

by Haris Aziz

A Link Diagram Visualizing Relations between Two Ordered Sets

by T. Mizuno

Smart Proofs via Smart Contracts: Succinct and Informative Mathematical Derivations via Decentralized Markets

by Sylvain Carré

Cutoff stability under distributional constraints with an application to summer internship matching

by Haris Aziz

Strategyproof Facility Location Mechanisms on Discrete Trees

by Alina Filimonov

A Multivariate Complexity Analysis of the Material Consumption Scheduling Problem

by Matthias Bentert

Classifying Convergence Complexity of Nash Equilibria in Graphical Games Using Distributed Computing Theory

by Juho Hirvonen

Are Gross Substitutes a Substitute for Submodular Valuations?

by Shahar Dozinski

Revelation Gap for Pricing from Samples

by Yiding Feng

Optimal Pricing of Information

by Shuze Liu

A Fragile multi-CPR Game

by Christos Pelekis

Convergence of Bayesian Nash Equilibrium in Infinite Bayesian Games under Discretization

by Linan Huang

Phragmén's Voting Methods and Justified Representation

by Markus Brill

Equal Affection or Random Selection: the Quality of Subjective Feedback from a Group Perspective

by Jiale Chen

Multi-Sided Matching Markets with Consistent Preferences and Cooperative Partners

by Maximilian Mordig

New Characterizations of Strategy-Proofness under Single-Peakedness

by Andrew Jennings

A Refined Complexity Analysis of Fair Districting over Graphs

by Niclas Boehmer

Mutual information-based group explainers with coalition structure for machine learning model explanations

by Alexey Miroshnikov

An agile and distributed mechanism for inter-domain network slicing in next generation mobile networks

by Jalal Khamse-Ashari

Dynamical Analysis of the EIP-1559 Ethereum Fee Market

by Stefanos Leonardos

«

1

2

3

4

»

Submitted on 14 Jan 2021 (v1), last revised 1 Jun 2021 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar