[PDF] Best vs. All: Equity and Accuracy of Standardized Test Score Reporting

Abstract

We study a game theoretic model of standardized testing for college admissions. Students are of two types; High and Low. There is a college that would like to admit the High type students. Students take a potentially costly standardized exam which provides a noisy signal of their type. The students come from two populations, which are identical in talent (i.e. the type distribution is the same), but differ in their access to resources: the higher resourced population can at their option take the exam multiple times, whereas the lower resourced population can only take the exam once. We study two models of score reporting, which capture existing policies used by colleges. The first policy (sometimes known as "super-scoring") allows students to report the max of the scores they achieve. The other policy requires that all scores be reported. We find in our model that requiring that all scores be reported results in superior outcomes in equilibrium, both from the perspective of the college (the admissions rule is more accurate), and from the perspective of equity across populations: a student's probability of admission is independent of their population, conditional on their type. In particular, the false positive rates and false negative rates are identical in this setting, across the highly and poorly resourced student populations. This is the case despite the fact that the more highly resourced students can -- at their option -- either report a more accurate signal of their type, or pool with the lower resourced population under this policy.

Full PDF

aa r X i v : . [ c s . G T ] F e b Best vs. All:Equity and Accuracy of Standardized Test Score Reporting

Sampath Kannan ∗ Mingzi Niu † Aaron Roth ‡ Rakesh Vohra § February 17, 2021

Abstract

We study a game theoretic model of standardized testing for college admissions. Students are of twotypes; High and Low. There is a college that would like to admit the High type students. Students takea potentially costly standardized exam which provides a noisy signal of their type.The students come from two populations, which are identical in talent (i.e. the type distribution isthe same), but diﬀer in their access to resources: the higher resourced population can at their optiontake the exam multiple times, whereas the lower resourced population can only take the exam once. Westudy two models of score reporting, which capture existing policies used by colleges. The ﬁrst policy(sometimes known as “super-scoring”) allows students to report the max of the scores they achieve. Theother policy requires that all scores be reported.We ﬁnd in our model that requiring that all scores be reported results in superior outcomes inequilibrium, both from the perspective of the college (the admissions rule is more accurate), and fromthe perspective of equity across populations: a student’s probability of admission is independent of theirpopulation, conditional on their type. In particular, the false positive rates and false negative rates areidentical in this setting, across the highly and poorly resourced student populations. This is the casedespite the fact that the more highly resourced students can—at their option—either report a moreaccurate signal of their type, or pool with the lower resourced population under this policy.

Worldwide, standardized tests are an essential input to University admission decisions. Their use is con-sidered to beneﬁt both institutions and individuals. Institutions want to avoid the false-positive error ofselection so as not to admit those who will not thrive. False-negative errors (the rejection of would-be suc-cesses) deprive the institution, and hence society, of potential college graduates. They are also a harm toindividual students, as they prevent a potentially successful student from attending University.Given the importance of standardized tests, there is an extensive literature on the fairness of such testsand how to compensate for possible unfairness. Thorndike, for example, pointed out in 1971 that thedistribution of false negatives between groups and not just their absolute numbers matters. His proposedremedy involved setting two selection cutoﬀs with a lower cutoﬀ for the members of the minority group[Tho71].In this paper we consider a source of unfairness that arises from the fact that some—but not all—applicants have the resources to take the test a number of times and have discretion over what scores theysubmit. Until 2008, the custom was for US Universities to ask for scores from all attempts on either the SATor the ACT and these were provided by the relevant testing companies. In 2008, the SAT introduced the ∗ University of Pennsylvania, Department of Computer and Information Sciences † Rice University, Department of Economics ‡ University of Pennsylvania, Department of Computer and Information Sciences § University of Pennsylvania, Department of Economics and Electrical and Systems Engineering Applicant’s can pick and choose which of their SAT scores to submit. In 2020, the ACTwent a step further and allowed students to retake individual sections of a test (a test has multiple sections,eg, mathematics, verbal, writing, etc). Furthermore, it will include the superscore—i.e. the component-wisemax—in all ACT score reports. In turn, many Universities have switched from a policy of requiring all scores to be reported. At presentUniversities employ one of the following policies:1. Require all scores: applicants must submit all test scores.2. Score choice: the applicant is free to choose which scores to submit. Some universities encouragestudents to submit all scores but commit to using only the highest reported scores, which is equivalent.Georgetown University is one of the shrinking number of Universities that requires all scores. In 2019,their Dean of admissions said :“If you take the SAT ﬁve times and score 600-650 on verbal on four of them but 750 on one, thatis useful information compared with allowing the student to cherry-pick their best score.”Superscoring raises a fairness concern in that not all groups retest at the same rates. The ACT, forexample, reports that the retest rates for Hispanic students is 34%, compared to 49% for both White andAsian students. Students whose parents did not attend college had a retest rate of 36% compared to 62% forstudents whose highest parental education level exceeded a bachelor’s degree. This is in spite of subsidiesoﬀered to low income students by the testing companies. In this paper, we asks what happens if applicants diﬀer in their ability to access signals like standardizedtests. In particular how are false negative and false positive rates aﬀected in diﬀerent populations, and howcan policy improve these eﬀects?

To answer this question we propose a simple model in which students are of two types, High ( H ) or Low( L ). The probability that a student is of type H is p < .

5, capturing the idea that high types are scarce(if p > .

5, the College would accept all students even if there was no testing). Students know their type,but cannot credibly convey it except through a test. The test generates a score s ∈ { A, B } , where A standsfor above the bar and B for below the bar. The probability that an H type student generates the score A is α > .

5. Similarly, an L type student generates a score of B with probability α .Student’s belong to one of two categories. Category 1 students are only able to take the test once. Theproportion of category 1 students is φ . Category 2 students, however can (in our baseline model) take thetest up to twice. In Section 5 we extend our results to the case in which category 2 students can take the test k times, for an arbitrary k >

1. This reﬂects the idea that students diﬀer in their ability to access multiplesignals. We might imagine, for example, that “Category 2” students come from a wealthier demographic.We emphasize that students in Category 2 can make their testing decisions adaptively after observing theoutcomes of their previous tests. So, for example, a student could decide to take the test a second time iftheir ﬁrst score was a B , but to decline to retest if their ﬁrst score was an A . In part this is driven by competitive pressure. See https://youtu.be/yFm5fAt0zIo for discussion between Louisiana StateUniversity’s VP for enrollment management and the University’s Board of Supervisors held on December 05, 2019. For a more detailed analysis see . This is the recommendation of a number of College coaching services which was determined by entering the following queryinto a search engine: “How many times should I take the SAT?”.

2n our model, neither p nor α depend on what category a student belongs to. Hence, category 1 and2 students are ex-ante identical and the test itself is fair —the only distinction between populations is theresources that they have available to be able to take the test twice (or not).Scores on the test are submitted to a college, which will decide whether to accept or reject a studentbased on the scores submitted. The college receives a payoﬀ of 1 for admitting a type H student and -1for admitting a type L . Changing these payoﬀs only aﬀects the magnitudes of various cutoﬀs but not thequalitative conclusions. We consider two policies. The ﬁrst requires that students report a single score.Hence, category 2 students can take the test up to twice and report their best score. We call this policy“Report Max” and it is akin to super scoring. The second policy requires students to report all scores. Wecall this “Report All”. The College can observe the reported test scores, but cannot observe the Categorythat the student comes from. Thus, under Report Max, the College has no information at all about whetherparticular students come from Category 1 or Category 2. Under “Report All”, if the College is sent a singletest score, it has no information about which Category a student belongs to — but if it receives multipletest scores, it knows the student must be from Category 2.The two policies aﬀect the incentives of Category 2 students only. Under “Report Max”, it is intuitivethat Category 2 students will take the test twice and report their best score. Under “Report All”, thetrade-oﬀs are more complicated. Category 2 type H students, for example, who initially score an A mightchoose to take the test a second time to separate themselves from Category 2 type L students. Category 2type L students who score an A on their ﬁrst attempt face a choice. Stop and mimic Category 1 type H students or take the test a second time to mimic Category 2 type H students. These actions will aﬀect thebeliefs that the College forms given the reported scores, and therefore its admissions policy.In certain parameter regimes, both policies admit trivial equilibria of the form everyone is rejected (when p is suﬃciently small) or everyone is accepted (when p is suﬃciently large). So, we restrict attention tonon-trivial pure strategy equilibria only. These occur for values of p ∈ [ˆ p, .

5) for some cut-oﬀ ˆ p .For p ∈ [ˆ p, .

5) there is a unique non-trivial pure strategy equilibrium under “Report Max”. Category 2students take the test twice and report their best score. The College accepts any student who reports an A score and rejects all others. The eﬀect is that Category 2 students have a diﬀerent (better) signal distributionat their disposal, and hence the admissions statistics are biased in favor of Category 2 students, with lowerfalse negative rates, and higher false positive rates. In other words, both low and high type students fromCategory 2 are more likely to be oﬀered admissions compared to their Category 1 counterparts.Under “Report All”, students must report all test scores, and the equilibrium outcomes are not obvious.Category 2 students appear to continue to have an advantage; they no longer have access to a signal distribu-tion that is clearly “better” — but they have access to a more informative signal distribution, which resultsfrom taking the test twice and reporting both outcomes. One might expect high types to take advantage ofthis. However they also now have the strategic option of attempting to pool with Category 1 students; onemight expect low types to take advantage of this. What we show however, is that for p ∈ [ˆ p, .

5) there is aunique equilibrium outcome where Category 2 students have no advantage because the college accepts anystudent whose ﬁrst (or only) score is A , and rejects those whose ﬁrst (or only) score is B . The Category 2students might take the test either once or twice — but only the ﬁrst score matters. This makes the equi-librium outcome entirely symmetric for both Category 1 and Category 2 students (since only a single scoreis relevant for admissions, for both types). Hence, “Report All” is superior in our model to “Report Max”from the perspective of equity: the false positive and false negative rates are equal across both categoriesof students. In other words, a student’s probability of admissions, conditional on her type, is independentof her category. The positive predictive value of “Report All” exceeds that of “Report Max”, meaning thatthe admitted class will have a higher proportion of High types. And the expected payoﬀ to the College isalso higher under “Report All” than ‘Report Max” — and hence “Report All” is better not just from theperspective of student equity, but from the College’s perspective as well. These results generalize to k ≥ We want to emphasize that while several sources of potential unfairness of standardized tests have been identiﬁed in theliterature, we focus exclusively on unfairness arising from the choice of reporting policy. It assumes students can’t conceal scores. k = 2, and even when k > all equilibria are strictly preferable under Report All, both from the perspective of the College, and from theperspective of equity (diﬀerence between false negative rates across populations) compared to the ReportMax equilibrium.An interesting feature of the equilibrium outcome in “Report All” is that while the College asks for alltests it conditions it’s admissions decisions on the ﬁrst test result only. This is a best response for the Collegegiven the behavior of students and not a matter of commitment. Thus, Category 1 and 2 students are treatedfairly. In conditioning on the ﬁrst test score only it appears that the College is throwing away informationthat it could have used to improve its ability to distinguish between types. It is just that in equilibrium thesecond test score is not informative.There is a half century debate about the best method for treating multiple scores from the ACT, SAT,and LSAT, see [BCC86] for a survey. The focus has been on the strength of the relationship betweenvarious aggregates of (average, maximum) test scores and future GPA. That focus has, as far as we areaware, continued to the present day. None considers the possibility of simply ignoring some scores as ourequilibrium analysis suggests. Concerns for fairness in standardized testing arose the instant they were introduced. In the US this datesto 1845, when Horace Mann deployed standardized written exams as a replacement for the existing oralexamination used for public school admission in Boston. Reese [Ree13] writes:“What transpired then still sounds eerily familiar: cheating scandals, poor performance by mi-nority groups, the narrowing of curriculum, the public shaming of teachers, the appeal of moresophisticated measures of assessment, the superior scores in other nations, all amounting to aconstant drumbeat about school failure.”For a recent survey of the fairness issue that standardized testing raises see Grodsky et al [GWF08], and seeHutchinson and Mitchell [HM19] for a retrospective of the long history of thought on fairness in standardizedtesting, contextualized within the current literature on fairness in machine learning.More recently, the widespread application of statistical techniques to high stakes decision making has ledto a resurgence of interest in fairness in classiﬁcation and prediction. False positive and false negative rateshave once again been focal measures of unfairness across populations — see e.g. [KMR16, Cho17, HPS16].This literature is broad—here we provide a brief overview of the most relevant subset of this literature,which uses equilibrium analysis to make predictions about policy choices. Hu and Chen [HC18] consider atwo stage model of a labor market, and study interventions at the ﬁrst (“internship”) stage that can leadto more equitable outcomes in equilibrium (See also Coate and Loury [CL93] and Foster and Vohra [FV92]for related models of self-conﬁrming equilibria in labor markets from the economics literature). Liu et al[LWH +

20] consider a model of the labor market with higher dimensional signals, and study equilibriumeﬀects of “subsidy” interventions which can lessen the cost of exerting eﬀort. Kannan, Roth, and Ziani[KRZ19] study a two-stage pipeline in which colleges admit students from one of two populations based on anoisy signal about their type, and then commit to a grading policy which is used by a rational downstreamemployer to make hiring decisions. They study how policy decisions at the level of the college aﬀect variousmeasures of equity at the level of the employer’s hiring decisions. Three papers [MMDH19, HIV19, BG20]study “strategic classiﬁcation” problems in which individuals rationally manipulate their features in responseto a deployed classiﬁer in an attempt to optimize for their own outcome, and study the equilibrium eﬀects ofdiﬀerent populations having diﬀerent costs for manipulation. Immorlica, Ligett, and Ziani [ILZ19] considerthe related problem of population level signalling , in which a third party (i.e. a highschool) is able to committo a signalling scheme for an entire population (i.e. a grading policy), in a Bayesian Persuasion like model.4hey show that if one population of students is associated with a high school that is able to optimally signal,but another population of students is associated with a high school that naively signals, then against aBayesian college, inequities arise in favor of the population associated with the optimal signalling technology— and that counter-intuitively, the introduction of an unbiased standardized test (administered uniformlyacross all populations, unlike in our model) can sometimes exacerbate this issue. Jung et al. [JKL +

20] studyan equilibrium model of criminal justice, in which individuals from populations that diﬀer in their access tolegal employment opportunities make rational decisions about whether or not to commit crime, as a functionof the criminal justice policy — and conclude that the crime-minimizing policy should commit to ignoringdemographic information so as to equalize false positive and negative rates across populations.

It is clear that if p is small enough, the College will be better oﬀ rejecting all students. This is an uninterestingoutcome, and we will exclude it by assuming the p is suﬃciently large. Below, we identify a threshold ˆ p suchthat for p < ˆ p the only equilibrium under “Report Max” is to reject all students and we restrict attention tovalues of p above that threshold, i.e. p ∈ [ˆ p, . A andrejects otherwise. This is the case when the test is eﬀective in selecting students. For a narrow range ofvalues in [ˆ p, .

5) there is also an equilibrium where all students are rejected. This is discussed in

Section 2.2 .For convenience of exposition, in the following we write ¯ x ≡ − x for any variable x ∈ [0 , Under the separating equilibrium, assuming it exists, only the best score needs to be reported. Therefore aCategory 2 student will take the test twice if needed to get a score of A . Hence, a Category 2 type L student,denoted by (2 , L ), will report B with probability α . Similarly, a Category 2 type H student, denoted by(2 , H ), will report A with probability 1 − (1 − α ) . Table 1 summarizes the outcomesBest Score / Type (1 , H ) (1 , L ) (2 , H ) (2 , L ) A αφp ¯ αφ ¯ p (1 − ¯ α ) ¯ φp (1 − α ) ¯ φ ¯ p B ¯ αφp αφ ¯ p ¯ α ¯ φp α ¯ φ ¯ p Total φp φ ¯ p ¯ φp ¯ φ ¯ p Table 1: Probability Distribution with Best Test Reported [Separating Case]In the following theorem, we characterize exactly when this separating equilibrium exists under the“Report Max” policy:

Theorem 1.

Let ˆ p = α (1 − φ ) − α +2 α (1 − φ ) ∈ (1 − α, ) . When p ∈ [ˆ p, . , “Report Max” has a separating equilibriumin which the College accepts a student if the score is A and rejects otherwise. Category 2 students take thetest twice if they receive a score of B on the ﬁrst attempt.Proof. From Table 1, we see that the probability that a student is of type H given they report A is αφp + (1 − ¯ α ) ¯ φpαφp + ¯ αφ ¯ p + (1 − ¯ α ) ¯ φp + (1 − α ) ¯ φ ¯ p . If this exceeds 0.5, anyone who reports A is admitted. This happens if h αφ + (1 − ¯ α ) ¯ φ i p ≥ h ¯ αφ + (1 − α ) ¯ φ i (1 − p ) ⇒ p ≥ ˆ p. Table 1 we see that the probability of high type given a score of B is¯ αφp + ¯ α ¯ φp ¯ αφp + αφ ¯ p + ¯ α ¯ φp + α ¯ φ ¯ p . If this is less than 0.5, then, anyone with a B score is rejected. This happens if h ¯ αφ + ¯ α ¯ φ i p ≤ h αφ + α ¯ φ i (1 − p ) , which holds true when p ≤ . Remark:

Unsurprisingly, the more accurate the test, the larger the range of p ’s for which there is aseparating equilibrium. As the accuracy of the test approaches 1, one can have a separating equilibriumeven if a vanishing fraction of students are of type H. We identify a threshold ˆˆ p such that for p ∈ [ˆ p, ˆˆ p ] ⊂ [ˆ p, .

5] there is also a reject all equilibrium. For p ∈ (ˆˆ p, . f L ( B ) be the probability that a (2 , L ) student stops after takingone test with score B . Let f H ( B ) denote the probability that a (2 , H ) student stops after one test with scoreof B . Table 2 summarizes the possible outcomes.Best Score / Type (1 , H ) (1 , L ) (2 , H ) (2 , L ) A αφp ¯ αφ ¯ p α ¯ φp [1 + ¯ α ¯ f H ( B )] ¯ α ¯ φ ¯ p [1 + α ¯ f L ( B )] B ¯ αφp αφ ¯ p ¯ α ¯ φp [1 − α ¯ f H ( B )] α ¯ φ ¯ p [1 − ¯ α ¯ f L ( B )] Total φp φ ¯ p ¯ φp ¯ φ ¯ p Table 2: Probability Distribution with Best Test Reported [General Case]Suppose the College rejects all students regardless of reported scores. Since score doesn’t matter for theadmission decision, any f L ( B ) ∈ [0 , , f H ( B ) ∈ [0 ,

1] are best responses for Category 2 students in this case.As the College rejects students with a score of A , the following must hold: αφp + α ¯ φp [1 + ¯ α ¯ f H ( B )] αφp + ¯ αφ ¯ p + α ¯ φp [1 + ¯ α ¯ f H ( B )] + ¯ α ¯ φ ¯ p [1 + α ¯ f L ( B )] ≤ ⇒ ¯ f L ( B ) ≥ p − ¯ α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p . (1)As the College rejects students with signal B , the following must be true:¯ αφp + ¯ α ¯ φp [1 − α ¯ f H ( B ))]¯ αφp + αφ ¯ p + ¯ α ¯ φp [1 − α ¯ f H ( B )] + α ¯ φ ¯ p [1 − ¯ α ¯ f L ( B )] ≤ ⇒ ¯ f L ( B ) ≤ − p + α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p . (2)Combining (1) and (2) we deduce that − p + α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ≥ p − ¯ α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ⇒ p ≤ , − p + α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ≥ ⇐ p ≤ ,p − ¯ α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ≤ ⇒ p ≤ α ¯ α ¯ φ + ¯ αα ¯ α ¯ φ + 1 , ¯ f H ( B ) ≤ α ¯ α ¯ φ ¯ p + ¯ α − pα ¯ α ¯ φp . p ≤ ˆˆ p where ˆˆ p = min { , α ¯ α ¯ φ +¯ αα ¯ α ¯ φ +1 } ∈ (ˆ p, ] , there exists an equilibrium in which the Collegerejects all students. In such an equilibrium, (2 , L ) take a second test with probability ¯ f L ( B ) when the ﬁrstscore is B , (2 , H ) students take a second test with probability ¯ f H ( B ) when the ﬁrst score is B and we have¯ f H ( B ) ∈ [0 , α ¯ α ¯ φ ¯ p +¯ α − pα ¯ α ¯ φp ] , ¯ f L ( B ) ∈ [ p − ¯ α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p , − p + α + α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ]. Here we characterize the equilibrium outcome under “Report All”.Let u B be 1 if the College accepts a student reporting a single B score and zero otherwise. Similarlydeﬁne u A , u BA , u AB , u BB , u AA to be the indicators of whether the College accepts a student reporting thecorresponding sequence of exam scores. We proceed by examining whether various combinations of valuesfor these u variables can be supported in equilibrium. It might seem that some combinations could beeliminated immediately—but things are not so simple.Sometimes our intuition will be conﬁrmed. For example, it may be obvious that we should have u B = u BB = 0. Why would the College accept a student who only reports B s? If p were large, say close to 1,then it would. But in our case, p < . p , we have u B = u BB = 0 in equilibrium.Now a more counter-intuitive case: Should u AB = u BA ? After all, why should the timing of a B scorematter? If these test scores were exogenously given to us, then by symmetry, they would induce the sameposterior belief on a student’s type. But our analysis shows that in equilibrium, there is an importantdistinction because of the incentives induced in equilibrium for students to take a second test — and this inturn aﬀects the inferences the College makes.We introduce the following variables to track the actions of the category 2 students as a function of theirﬁrst test score:1. f L ( A ): the probability that a (2 , L ) students stops after one test with a score of A .2. f L ( B ): the probability that a (2 , L ) student stops after one test with a score of B .3. f H ( A ): the probability that a (2 , H ) students stops after one test with a score of A .4. f H ( B ): the probability that a (2 , H ) students stops after one test with a score of B .We remark that we can interpret these probabilities as the fraction of a student population that takesthe corresponding action, so we do not have to imagine that individual students randomize.Fixing student strategies, the probabilities of each testing outcome are displayed in Table 3 .Score / Type (1 , H ) (1 , L ) (2 , H ) (2 , L ) A αφp ¯ αφ ¯ p α ¯ φpf H ( A ) ¯ α ¯ φ ¯ pf L ( A ) B ¯ αφp αφ ¯ p ¯ α ¯ φpf H ( B ) α ¯ φ ¯ pf L ( B ) AA α ¯ φp ¯ f H ( A ) ¯ α ¯ φ ¯ p ¯ f L ( A ) AB α ¯ α ¯ φp ¯ f H ( A ) α ¯ α ¯ φ ¯ p ¯ f L ( A ) BA α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ¯ f L ( B ) BB α ¯ φp ¯ f H ( B ) α ¯ φ ¯ p ¯ f L ( B ) Total φp φ ¯ p ¯ φp ¯ φ ¯ p Table 3: Distribution of Testing OutcomesThe following theorem characterizes equilibrium outcomes under “Report All” within the parameterrange of interest. 7 heorem 2 (First-Score Equilibrium) . For p ∈ (1 − α, . , the unique equilibrium outcome for “ReportAll” is that a student is admitted if their ﬁrst (or only) score is A and rejected otherwise. Remark:

For clarity, we prove

Theorem 2 here, but we remark that it can also be derived as a corollaryof the more general

Theorem 7 for the case in which Category 2 students may take k ≥ Section 5.2 . Proof.

Let the College’s posterior belief that a student is of type H after observing some (sequence of)reported scores s ∈ { B, A, BB, BA, AB, AA } be p s ∈ [0 , s is 1 · p s + ( − · (1 − p s ) = 2 p s −

1. Hence, the optimal admission policy for college is: u s =  p s > , p s = , p s < . From

Table 3 , we can compute: p A = φpα + ¯ φpαf H ( A ) φ ( pα + ¯ p ¯ α ) + ¯ φ [ pαf H ( A ) + ¯ p ¯ αf L ( A )] ∈ (0 , , (3) p B = φp ¯ α + ¯ φp ¯ αf H ( B ) φ ( p ¯ α + ¯ pα ) + ¯ φ [ p ¯ αf H ( B ) + ¯ pαf L ( B )] ∈ (0 , , (4) p AA = pα ¯ f H ( A ) pα ¯ f H ( A ) + ¯ p ¯ α ¯ f L ( A ) if pα ¯ f H ( A ) + ¯ p ¯ α ¯ f L ( A ) > , (5) p AB = p ¯ f H ( A ) p ¯ f H ( A ) + ¯ p ¯ f L ( A ) if p ¯ f H ( A ) + ¯ p ¯ f L ( A ) > , (6) p BA = p ¯ f H ( B ) p ¯ f H ( B ) + ¯ p ¯ f L ( B ) if p ¯ f H ( B ) + ¯ p ¯ f L ( B ) > , (7) p BB = p ¯ α ¯ f H ( B ) p ¯ α ¯ f H ( B ) + ¯ pα ¯ f L ( B ) if p ¯ α ¯ f H ( B ) + ¯ pα ¯ f L ( B ) > . (8)Note that p sB < p sA if f L ( s ) < f H ( s ) < s ∈ { A, B } . Also, p AB = p BA if any of thefollowing scenarios appear in equilibrium : (i) all students test once when their ﬁrst score is A ; or (ii) allstudents test once when their ﬁrst score is B ; or (iii) ¯ f H ( A )¯ f L ( A ) = ¯ f H ( B )¯ f L ( B ) .Next we will prove by contradiction that for all p ∈ (¯ α, . A and rejects students submitting B, BA , or BB in equilibrium. Thus, in any equilibrium, theadmission outcome depends solely on the ﬁrst score. Here we split the discussion according to whether thestudent’s ﬁrst score is A or B . Case 1: The First Score is A

Suppose, for contradiction, the College rejects students who submit a single score of A . Then we havethe following two cases:1. If the College rejects all scores starting in A , namely A, AA, AB , then we need max { p A , p AA , p AB } ≤ to rationalize this admission rule. Hence by the law of total probability : pα = p A { φ ( pα + ¯ p ¯ α ) + ¯ φ [ pαf H ( A ) + ¯ p ¯ αf L ( A )] } + p AA ¯ φ [ pα ¯ f H ( A ) + ¯ p ¯ α ¯ f L ( A )] + p AB ¯ φα ¯ α [ p ¯ f H ( A ) + ¯ p ¯ f L ( A )] ≤ { φ ( pα + ¯ p ¯ α ) + ¯ φ [ pαf H ( A ) + ¯ p ¯ αf L ( A )] + ¯ φ [ pα ¯ f H ( A ) + ¯ p ¯ α ¯ f L ( A )] + ¯ φα ¯ α [ p ¯ f H ( A ) + ¯ p ¯ f L ( A )] } = 12 ( pα + ¯ p ¯ α ) . This implies pαpα +¯ p ¯ α ≤ , i.e., p ≤ ¯ α , a contradiction.8. If the College admits students reporting either AA or AB , then all Category 2 students take the test asecond time after obtaining a ﬁrst score of A (i.e., f H ( A ) = f L ( A ) = 0). In this case, the only studentsreporting a single score of A are from Category 1, and so by Equation 3 we have p A = pαpα +¯ p ¯ α ≤ .This implies p ≤ ¯ α , a contradiction again.Therefore, as long as p ∈ (¯ α, . A yields admission. Case 2: The First Score is B

Suppose, for a contradiction, the College admits students reporting any of

B, BA or BB . Then we havethe following three cases:1. If the College admits all scores starting in B , namely B, BA, BB , then we need min { p B , p BA , p BB } ≥ to rationalize this admission rule. Hence by the law of total probability : p ¯ α = p B { φ ( p ¯ α + ¯ pα ) + ¯ φ [ p ¯ αf H ( B ) + ¯ pαf L ( B )] } + p BA α ¯ α ¯ φ [ p ¯ f H ( B ) + ¯ p ¯ f L ( B )] + p BB ¯ φ [ p ¯ α ¯ f H ( B ) + ¯ pα ¯ f L ( B )] ≥ { φ ( p ¯ α + ¯ pα ) + ¯ φ [ p ¯ αf H ( B ) + ¯ pαf L ( B )] + α ¯ α ¯ φ [ p ¯ f H ( B ) + ¯ p ¯ f L ( B )] + ¯ φ [ p ¯ α ¯ f H ( B ) + ¯ pα ¯ f L ( B )] } = 12 ( p ¯ α + ¯ pα ) . This implies p ¯ αp ¯ α +¯ pα ≥ , i.e., p ≥ α > .

5, a contradiction.2. If the College admits students reporting B but rejects students reporting either BA or BB , then allCategory 2 students take the test once if their ﬁrst score is B —since they won’t run the risk of beingrejected due to the second test result. Thus by Equation 4 , p B = p ¯ αp ¯ α +¯ pα ≥ . This implies p ≥ α > . B but admits students reporting either a score of BA or BB , then all Category 2 students take the test twice if their ﬁrst score is B . In this case, by Equation 7 and

Equation 8 , p BB < p BA = p < , a contradiction to the assumption that the Collegeadmits either BA or BB .Therefore, as long as p ∈ (¯ α, . B yields rejection. This completes the proof.In what follows, we refer to the set of equilibria in which the admissions outcome is made solely as afunction of the ﬁrst score as the ﬁrst-score equilibrium . In this section, we compare the separating equilibrium under “Report Max” with the (unique) ﬁrst-scoreequilibrium under “Report All”, both from the perspective of student equity, and from the perspective ofthe College’s objective.We recall that the false negative rate corresponds to the proportion of High type students who arerejected, and the false positive rate corresponds to the proportion of Low type students who are accepted. Itis straightforward to compare the false positive and false negative rates of the separating equilibrium under“Report Max” with the ﬁrst-score equilibrium under “Report All’. We summarize the results in the tablebelow. FN (1 , H ) (2 , H )Max − α (1 − α ) All − α − α FP (1 , L ) (2 , L )Max − α − α All − α − α Table 4: False Negative (left) and False Positive (right) Rates Across Categories of Students9e observe that the “Report Max” equilibrium favors the advantaged (Category 2) students, in that foreach type

L, H , their probability of admissions is strictly higher compared to the disadvantaged (Category1) students of the same type. This manifests itself as both a higher false positive rate and a lower falsenegative rate, compared to the “Report All” equilibrium. In contrast, because the College in the “ReportAll” equilibrium makes decisions only as a function of the ﬁrst test score, the probability of admissionsconditional on type is identical across advantaged and disadvantaged students. This manifests itself as anidentical false positive and false negative rate across categories. We can conclude that from the perspectiveof equity across advantaged and disadvantaged students, the “Report All” policy is preferable to the “ReportMax” policy.Next, we compare the positive predictive value of the equilibrium outcomes for both policies — i.e. theprobability, in equilibrium, that a student is a High type, conditional on receiving admissions to the college.Higher positive predictive values will correspond to admissions outcomes with a higher proportion of Hightype students among the admitted class, and are hence desirable. We ﬁnd that the positive predictive valueis strictly higher for the “Report All” policy:

Theorem 3.

For any α ∈ ( , , p < , the positive predictive value of the “Report All” policy strictlyexceeds that of the “Report Max” policy in any nontrivial equilibrium.Proof. The positive predictive value of “Report Max” is αφp + (1 − ¯ α ) ¯ φpαφp + ¯ αφ ¯ p + (1 − ¯ α ) ¯ φp + (1 − α ) ¯ φ ¯ p = (1 + ¯ α ¯ φ ) αpαp + ¯ α ¯ p + α ¯ α ¯ φ . The positive predictive value of “Report All” coincides with the policy of only looking at the ﬁrst score,so it has value αpαp + ¯ α ¯ p . It is straightforward to verify that the ﬁrst expression is strictly smaller than the second if α ∈ (0 . , negative predictive value of the College’s admissions rule used underboth policies. The negative predictive value is the probability, in equilibrium, that a student is a Low type,conditional on being rejected from the college. Theorem 4.

For any α ∈ ( , and p < . , the negative predictive value of the “Report All” policy isstrictly smaller than that of the “Report Max” policy in any nontrivial equilibrium.Proof. The negative predictive value of “Report Max” is αφ ¯ p + α ¯ φ ¯ p ¯ αφp + αφ ¯ p + ¯ α ¯ φp + α ¯ φ ¯ p . The negative predictive value of “Report All” is ¯ pαp ¯ α + ¯ pα . It is straightforward to verify that the ﬁrst is larger than the second for α > .Finally, we compare the college’s utility (i.e. its classiﬁcation accuracy for the task of distinguishingHigh and Low type students) in the equilibrium outcomes for both policies. We ﬁnd that the College hashigher utility under the “Report All” policy, demonstrating that not only is “Report All” better from theperspective of student equity, but it is strictly better from the perspective of the College as well. Theorem 5.

For any α ∈ ( , and p < . , the College’s expected payoﬀ per student under “Report All”exceeds that under “Report Max” in equilibrium. roof. The expected payoﬀ per student under “Report All” is αp − ¯ α ¯ p . The expected payoﬀ per studentunder “Report Max” is φ ( αp − ¯ α ¯ p ) + ¯ φ [(1 − ¯ α ) p − (1 − α )¯ p ] . Hence, the diﬀerence between the College’sexpected payoﬀ under these two schemes is: αp − ¯ α ¯ p − { φ ( αp − ¯ α ¯ p + ¯ φ [(1 − ¯ α ) p − (1 − α )¯ p ] } = ¯ φ [ αp − ¯ α ¯ p − (1 − ¯ α ) p + (1 − α )¯ p ]= ¯ φα ¯ α (1 − p ) > α ∈ (0 , , p < .Note that for all the quantities we have considered, equality occurs when α = 1 (i.e., when the test isa perfect noiseless signal of student type). In this case, the College is able to admit exactly the High typestudents and reject exactly the Low type students under both reporting schemes, and there is no advantageof one over the other either in terms of equity or accuracy. As the accuracy of the test α decreases, all thedisparities we have measured grow monotonically. Thus far we have assumed that the advantaged students (Category 2) are able to take up to two tests,and that High type students are rare — i.e. that p < . In this section, we generalize our results to abroader setting. Now, Category 2 students can adaptively choose to take the test up to k times for anarbitrary k ≥

2, and we consider the full parameter range p ∈ (0 ,

1) . As one might imagine, the setof possible equilibria under “Report All” expands. Nevertheless, we identify a wide range of parametersfor which basing admission on the ﬁrst reported test score only continues to be an equilibrium outcome.This equilibrium outcome maintains the advantages of the “Report All” equilibrium outcome we studiedin the special case of k = 2. Moreover, we prove that all equilibria of the “Report All” policy dominatethe equilibrium of the “Report Max” policy in terms of both equity and accuracy: the College continuesto strictly prefer outcomes under the “Report All” policy for all possible equilibria, and similarly, all suchequilibria have smaller false positive and false negative gaps between Categories of students, as compared tothe Report Max equilibrium. We start by characterizing the equilibrium outcome under “Report Max”. As before we focus on a separatingequilibrium in which the College accepts a student if the reported score is A and rejects otherwise. Underthis separating equilibrium, assuming it exists, only the best score will be reported. Therefore Category 2students will take the test as often as needed to get an A score. Therefore a (2 , L ) student will report B with probability α k , and a (2 , H ) student will report A with probability 1 − (1 − α ) k . Theorem 6.

Let ˆ p k = φ (1 − α )+(1 − φ )(1 − α k ) φ +(1 − φ )[2 − α k − (1 − α ) k ] and ˆ p ′ k = φα +(1 − φ ) α k φ +(1 − φ )[ α k +(1 − α ) k ] for any k ≥ . “Report Max” hasa nontrivial (separating) equilibrium if and only if p ∈ [ˆ p k , ˆ p ′ k ] . Of course, the nontrivial equilibrium is unique: the College accepts a student if the reported score is A andrejects otherwise. Category 2 students take the exam as many times as they need to get an A score (up to k times). Note that the parameter range in which a separating equilibrium exists is always nontrivial since ˆ p ′ k >α > > ˆ p k > ¯ α for all k ≥

2. Observe also that ˆ p k strictly increases in k , which captures the intuition thatunder “Report Max”, increased testing makes a report of A less indicative of a high type.11 roof. We can calculate that the probability that a student is of type H given they report A is αφp + (1 − ¯ α k ) ¯ φpαφp + ¯ αφ ¯ p + (1 − ¯ α k ) ¯ φp + (1 − α k ) ¯ φ ¯ p . If this exceeds 0.5, then anyone who reports A is admitted by the College. This happens exactly when: h αφ + (1 − ¯ α k ) ¯ φ i p ≥ h ¯ αφ + (1 − α k ) ¯ φ i (1 − p ) ⇒ p ≥ ˆ p k The probability of being a high type given a report of B is¯ αφp + ¯ α k ¯ φp ¯ αφp + αφ ¯ p + ¯ α k ¯ φp + α k ¯ φ ¯ p . If this is less than 0.5, then, anyone with an B score is rejected. This happens exactly when h ¯ αφ + ¯ α k ¯ φ i p ≤ h αφ + α k ¯ φ i (1 − p ) ⇒ p ≤ ˆ p ′ k . We now turn to the “Report All” policy. We ﬁnd that within a wide range of parameters ( p ∈ [1 − α, α ])there still exists an equilibrium of the sort that we had in the k = 2 case — namely, in which the Collegemakes decisions only based on the ﬁrst test score, and hence which treats both populations equally. We alsocharacterize the parameter range in which other non-trivial equilibria exist. Theorem 7.

Let p ∗ k = ¯ α k − α k − +¯ α k − for k ≥ . Under “Report All”, for any α ∈ ( , ,1. There exists an equilibrium in which the admission outcome depends solely on the ﬁrst score if andonly if p ∈ [1 − α, α ] .

2. There exists an equilibrium in which the admission outcome depends on more than the ﬁrst score ifand only if p ∈ [ p ∗ k +2 , − α ] ∪ [ p ∗ k , α ] .

3. For any p ∈ (1 − α, α ) , a reported (single) score of A yields admission and a reported score sequencethat consists entirely of B scores yields rejection in all equilibria.Proof. We write p s ∈ [0 ,

1] to denote the college’s posterior belief that the student is of High type afterobserving a reported score s ∈ ∪ ki =1 { A, B } i . For the analysis, we will also be interested in the proportionof high type students among all students whose reported scores start with s (and might have an arbitrarycontinuation). We denote this by p s ∗ ∈ [0 ,

1] with the convention that p s ∗ = 0 if no score starts in s inequilibrium. For any s the optimal admissions policy for the college remains: u s =  p s > , p s = , p s < . ∪ ki =1 { A, B } i = { A, B, AA, AB, BA, BB, · · · , A . . . A | {z } k , · · · , B . . . B | {z } k } denotes the set of all possible score sequences that canresult from taking the test up to k times.

12t will be useful to group testing outcomes by whether they consist of a single score, or more than onescore. This distinction is important because Category 2 students can pool with Category 1 students only ifthey take the test only once. Just as in

Table 3 , we display the probabilities of testing outcomes in

Table 5 in which s ∗ denotes the set of all scores starting in s .Score / Type (1 , H ) (1 , L ) (2 , H ) (2 , L ) A αφp ¯ αφ ¯ p α ¯ φpf H ( A ) ¯ α ¯ φ ¯ pf L ( A ) B ¯ αφp αφ ¯ p ¯ α ¯ φpf H ( B ) α ¯ φ ¯ pf L ( B ) AA ∗ α ¯ φp ¯ f H ( A ) ¯ α ¯ φ ¯ p ¯ f L ( A ) AB ∗ α ¯ α ¯ φp ¯ f H ( A ) α ¯ α ¯ φ ¯ p ¯ f L ( A ) BA ∗ α ¯ α ¯ φp ¯ f H ( B ) α ¯ α ¯ φ ¯ p ¯ f L ( B ) BB ∗ α ¯ φp ¯ f H ( B ) α ¯ φ ¯ p ¯ f L ( B ) Total φp φ ¯ p ¯ φp ¯ φ ¯ p Table 5: Distribution of Testing Outcomes with k Tests AvailableFrom

Table 5 , we can compute: p A = φpα + ¯ φpαf H ( A ) φ ( pα + ¯ p ¯ α ) + ¯ φ ( pαf H ( A ) + ¯ p ¯ αf L ( A )) ∈ (0 , ,p B = φp ¯ α + ¯ φp ¯ αf H ( B ) φ ( p ¯ α + ¯ pα ) + ¯ φ ( p ¯ αf H ( B ) + ¯ pαf L ( B )) ∈ (0 , ,p AA ∗ = pα ¯ f H ( A ) pα ¯ f H ( A ) + ¯ p ¯ α ¯ f L ( H ) { ¯ f H ( A ) + ¯ f L ( A ) > } ,p AB ∗ = p ¯ f H ( H ) p ¯ f H ( A ) + ¯ p ¯ f L ( H ) { ¯ f H ( A ) + ¯ f L ( A ) > } ,p BA ∗ = p ¯ f H ( B ) p ¯ f H ( B ) + ¯ p ¯ f L ( B ) { ¯ f H ( B ) + ¯ f L ( B ) > } ,p BB ∗ = p ¯ α ¯ f H ( B ) p ¯ α ¯ f H ( B ) + ¯ pα ¯ f L ( B ) { ¯ f H ( B ) + ¯ f L ( B ) > } where {·} is the indicator function whose value is 1 if the statement in brackets is true and 0 otherwise.Note that for any s ∈ { A, B } , p sB ∗ ≤ p sA ∗ if ¯ f H ( s ) + ¯ f L ( s ) > s ). Furthermore, to identify the relationship between the value of p s ∗ andthe matching equilibrium admissions decisions for a reported score sequence s , we show the following usefullemma. Lemma 5.1.

If a score sequence s ∈ ∪ ki =1 { A, B } i yields admission on the equilibrium path , then p s ∗ ≥ .Additionally, if p s ∗ > , the score sequence s yields admission for s ∈ { A, B } .Proof. If all score sequences starting with s yield admission, then by the law of total probability , p s ∗ ≥ asdesired.Otherwise, the college rejects some score sequence starting in s after some number of tests. We write T to denote the length of the longest continuation of s that cannot result in rejection i.e., T = | s | + max { m ∈ N | u s ˜ s = 1 , ∀ ˜ s ∈ ∪ mi =1 { A, B } i } where | s | denotes the length of the score sequence s . If the set { m ∈ N | u s ˜ s = 1 , ∀ ˜ s ∈ ∪ mi =1 { A, B } i } is empty,then T = | s | . After having taken T tests, students with score sequences starting with s would not take In other words, s is obtained with positive probability in equilibrium. p s ∗ ≥ .The proof above gives us a necessary condition for s to yield admission on the equilibrium path, thatis, p s ∗ ≥

12 12 . Now we investigate a special case of single scores, and characterize the suﬃcient conditionfor s ∈ { A, B } to be admitted. We will show by contradiction that s yields admission given p s ∗ > forsome s ∈ { A, B } . If, otherwise, a length-one score sequence s yields rejection, then we have the followingtwo cases: (i) if the College rejects all scores starting in s , then by the law of total probability , p s ∗ ≤ ,a contradiction; (ii) if the College admits students submitting scores starting in s for some continuation,then all category 2 students take the test more than once if their ﬁrst score is s , and thus p s = p s ∗ > , acontradiction again. Therefore, as long as p s ∗ > , s ∈ { A, B } yields admission.With this lemma in hand, we are now in position to discuss the equilibrium outcomes when k tests areavailable. By Lemma 5.1 , a single score A yields admission (i.e., u A = 1) if p > ¯ α and only if p ≥ ¯ α (i.e., p A ∗ = pαpα +¯ p ¯ α ≥ ). A single score B yields admission (i.e., u B = 1) if p > α and only if p ≥ α (i.e., p B ∗ = p ¯ αp ¯ α +¯ pα ≥ ). Therefore, for a non-trivial admission outcome to depend only on the ﬁrst score, wehave p ∈ [¯ α, α ].Now we characterize the condition for a non-ﬁrst-score equilibrium to exist. Suppose a single B yieldsadmission, then p A ∗ > p B ∗ ≥ , and thus a single A also yields admission and all students are admitted.Therefore, for an equilibrium admission rule to depend on more than one score, we must have that a single B yields rejection, i.e. p ≤ α . Furthermore, for some subsequent score after B to matter in the admissionoutcome, that is, to make some score sequence starting in B yield admission, we need p ¯ αα k − p ¯ αα k − +¯ pα ¯ α k − ≥ .To see why, we discuss the following two cases:1. If either BA or BB gives admission, then Category 2 students whose ﬁrst score is B always take asecond test, and thus by Lemma 5.1 , max { p BA ∗ , p BB ∗ } = p ≥ . Then we have p ¯ αα k − p ¯ αα k − +¯ pα ¯ α k − ≥ p ≥ as desired.2. If, otherwise, both BA and BB give rejection, then denote by M the maximum length of a continuationof tests such that every continuation of length ≤ M results in rejection , i.e., M = max { m ∈ N | u Bs = 0 , ∀ s ∈ ∪ mi =1 { A, B } i } . Observe that we have 1 ≤ M ≤ k − B yields admission. Students whotake tests no more than M + 1 times with a ﬁrst score of B are rejected. By deﬁnition, there exists s ∈ { A, B } such that B ˜ ss gives admission for some ˜ s ∈ ∪ Mi =1 { A, B } i . Thus f H ( B ˜ s ) = f L ( B ˜ s ) = 0, andthus p B ˜ ss ∗ ≥ by Lemma 5.1 . Let G be the number of A scores in B ˜ ss (Note that 0 ≤ G ≤ M + 1).We have p ¯ αα k − p ¯ αα k − +¯ pα ¯ α k − ≥ p ¯ αα M +1 p ¯ αα M +1 +¯ pα ¯ α M +1 ≥ p BMs ∗ = p ¯ α M +2 − G α G p ¯ α M +2 − G α G +¯ pα M +2 − G ¯ α G ≥ .Therefore, p ¯ αα k − p ¯ αα k − +¯ pα ¯ α k − ≥ (i.e., p ≥ p ∗ k ≡ ¯ α k − α k − +¯ α k − ) is a necessary condition for some subsequent scoreafter B to aﬀect the admission outcome.To see why p ∈ [ p ∗ k , α ] is also suﬃcient for a non-ﬁrst-score equilibrium to arise under “Report All”, deﬁnea partition { p ∗ n } kn =1 over the range [ p ∗ k , α ] in which p ∗ n = ¯ α n − α n − +¯ α n − for all n = 1 , · · · , k . If p ∈ [ p ∗ n − , p ∗ n ],there is a non-ﬁrst-score equilibrium in which the College accepts students with ﬁrst score A , and rejectsall students with ﬁrst score B unless their sequence of scores is B A . . . A | {z } n − . In this equilibrium, Category 2 This condition alone may not suﬃce to tell us whether or not s yields admission in equilibrium. For example, if p BB ∗ ≥ (i.e. p ≥ α α +¯ α ), there may exist an equilibrium in which only A and BBA yield admission. Hence, p BB ∗ ≥ alone is notsuﬃcient to derive that BB is admitted in equilibrium. The maximum exists since the set { m ∈ N | u Bs = 0 , ∀ s ∈ ∪ mi =1 { A, B } i } is nonempty (the College rejects both BA and BB )and compact (the number of scores is an integer with the upper bound k − Note that p ∗ = α , p ∗ = , · · · , p ∗ k ≡ ¯ α k − α k − +¯ α k − , and ∪ kn =1 [ p ∗ n , p ∗ n +1 ] = [ p ∗ k , α ]. B , and both score sequences starting with A and thoserealizing B A . . . A | {z } n − yield admission.Similarly, we can derive the suﬃcient and necessary condition for some subsequent score after A to matterin the admission outcome, that is, p ∈ [ ¯ α k α k +¯ α k , ¯ α ]. In this case, students with a single score are rejected.Finally, we can study possible equilibrium outcomes if p ∈ (¯ α, α ). By Lemma 5.1, if p > ¯ α (i.e., p A ∗ > ),a single A yields admission; and if p < α (i.e., p B ∗ < ), a single B yields rejection. What remains to beproved is that any score sequence consisting only of B ’s leads to rejection. Denote M B = max { m ∈ N | u s = 0 , ∀ s ∈ ∪ mi =1 { B } i } . Note that we have 1 ≤ M B ≤ k by deﬁnition. To show M B = k , suppose by contradiction that M B ≤ k − B . . . B | {z } M B are rejected and students with score sequence B . . . B | {z } M B +1 are admitted. Therefore Category 2 students who have thus far only obtained B scores won’t stop until theytake M B + 1 tests. Therefore we have pB . . . B | {z } MB +1 = ¯ α M +1 p ¯ α M +1 p + α M +1 ¯ p < ¯ α M p ¯ α M p + α M ¯ p = pB . . . B | {z } MB ≤ , a contradiction.A direct result of Theorem 7 is that when p ∈ (¯ α, p ∗ k ), the admission outcome is unique and it dependsonly on the ﬁrst score. Note that p ∗ k > ¯ α only when k = 2. This gives us the following corollary, which is ageneralization of Theorem 2 . Corollary 5.2. If k = 2 and p ∈ (1 − α, p ∗ ) = (1 − α, ) , the unique equilibrium under “Report All” is theﬁrst-score equilibrium. If k ≥ , a ﬁrst-score equilibrium always coexists with an equilibrium in which theadmission outcome can be aﬀected by more than the ﬁrst score. Once again, it is straightforward to compare the false positive and false negative rates of the separatingequilibria under ‘Report Max” with the ﬁrst-score equilibirum under “Report All”, whenever it exists. Therelevant parameter range is p ∈ [ˆ p k , ˆ p ′ k ] ∩ [¯ α, α ] = [ˆ p k , α ] which is nontrivial for all k ≥ p k < < α .We summarize the results in the tables below and we can see that in a corresponding parameter range inwhich p ∈ [ˆ p k , α ], “Report All” achieves parity across categories in false positive and false negative rates,whereas for “Report Max”, there is necessarily a discrepency between false positive rates and false negativerates—in favor of the advantaged (Category 2) students—which increases as the number of tests k availableto the advantaged population increases. (1 , H ) (2 , H )Max − α (1 − α ) k All − α − α (1 , L ) (2 , L )Max − α − α k All − α − α Table 6:

False Negative (left) and False Positive (right) Rates with k Tests Available [Using First-Score Equilibrium]

What about equilibria other than the ﬁrst-score equilibrium? Here, students from Category 2 again havean advantage, but we can quantify the degree of that advantage by measuring the disparity between falsepositive rates and false negative rates between Category 1 and Category 2 students. What we can show is ∪ mi =1 { B } i = { B, BB, BBB, · · · , B . . . B | {z } m } . some inequity may remain under “Report All”, it is reduced (always weakly, and strictly ifeither k > p < ) compared to “Report Max”.We proceed by comparing the unique nontrivial “Report Max” equilibrium to an equilibrium under“Report All”. First note that under both “Report All” and “Report Max”, a single A yields admissionwhile a single B yields rejection when p ∈ [ˆ p k , α ]. Therefore, the false positive and false negative ratesremain the same for Category 1 students across both admissions policies. Next, we note that under anyequilibrium under “Report All”, a score sequence consisting entirely of B ’s results in rejection — and hencethe admissions probability for Category 2 students (regardless of their type) can only be smaller underReport All than under Report Max — since any other sequence of scores would lead to admission underReport Max, but possibly not under Report All. Finally, we observe that if either k > p < /

2, thereis some sequence of scores that leads to rejection under Report All, but not under Report Max, showingthat the probability of admission for Category 2 students is strictly lower under Report All for both types,which implies the corresponding reduction in false positive and false negative disparities. To see, considerthat if this does not hold, the College must accept students with the length- k score sequence ˆ s = B . . . BA while rejecting all score sequences s ∈ ∪ ki =1 { B } i . Under this admissions policy, Category 2 students whohave only received B ’s thus far would continue to take the test (up to k times) and thus to rationalize theadmissions rule, we must have: p ¯ α k − αp ¯ α k − α +¯ pα k − ¯ α ≥ , i.e., p ≥ α k − ¯ αα k − ¯ α +¯ α k − α ≥ . Note that a nontrivialequilibrium under “Report All” exists only if p ≤ α , and α k − ¯ αα k − ¯ α +¯ α k − α ≤ α only when k ∈ { , } . Hence,if p < or k >

3, there exists at least one score sequence (ˆ s ) which is rejected under “Report All” but isadmitted under “Report Max”.As in the k = 2 case, we similarly ﬁnd that in the ﬁrst-score equilibrium under “Report All”, the positivepredictive value is strictly higher, compared to the separating equilibrium under “Report Max” — i.e. theadmitted class has a higher proportion of High types. Theorem 8.

For any α ∈ ( , , the positive predictive value in the ﬁrst-score equilibrium under the “ReportAll” policy exceeds that under the “Report Max” separating equilibrium.Proof. The positive predictive value of “Report Max” is αφp +(1 − ¯ α k ) ¯ φpαφp +¯ αφ ¯ p +(1 − ¯ α k ) ¯ φp +(1 − α k ) ¯ φ ¯ p . The positive predictivevalue of the ﬁrst-score equilibrium under “Report All” is αpαp +¯ α ¯ p . It is straightforward to verify that the ﬁrstterm is strictly smaller than the second if α ∈ (0 . , Theorem 9.

For any α ∈ ( , , the negative predictive value in the ﬁrst-score equilibrium under the “ReportAll” policy is strictly smaller than that of the “Report Max” separating equilibrium.Proof. The negative predictive value of “Report Max” is αφ ¯ p + α k ¯ φ ¯ p ¯ αφp + αφ ¯ p +¯ α k ¯ φp + α k ¯ φ ¯ p . The negative predictive valueof the ﬁrst-score equilibrium under“Report All” is ¯ pαp ¯ α +¯ pα . It is straightforward to verify that the ﬁrst termis larger than the second for α ∈ ( , Theorem 10 below, which establishes the result for all equilibria under “Report All”.

Lemma 5.3.

Let p ∗∗ k = α − α k − α k − ¯ α k ∈ [ , α ) For any α ∈ ( , , p < p ∗∗ k , the College’s expected payoﬀ in theﬁrst score equilibrium under the “Report All” policy exceeds what it is under the “Report Max” separatingequilibrium. The expected payoﬀ gap increases with the number of tests k available to Category 2 students. roof. The expected payoﬀ per student under the ﬁrst score equilibrium for “Report All” is αp − ¯ α ¯ p . Theexpected payoﬀ per student under “Report Max” is φ ( αp − ¯ α ¯ p ) + ¯ φ [(1 − ¯ α k ) p − (1 − α k )¯ p ] . Hence, thediﬀerence between the College’s expected payoﬀ under these two schemes is: αp − ¯ α ¯ p − { φ ( αp − ¯ α ¯ p ) + ¯ φ [(1 − ¯ α k ) p − (1 − α k )¯ p ] } = ¯ φ [ αp − ¯ α ¯ p − (1 − ¯ α k ) p + (1 − α k )¯ p ]= ¯ φα ¯ α k − X i =0 ( α i ¯ p − ¯ α i p ) = ¯ φ [( α − α k )¯ p − (¯ α − ¯ α k ) p ] > p < α − α k − α k − ¯ α k = p ∗∗ k and α > .We can strengthen Lemma 5.3 by using it to prove that for every nontrivial equilibrium under “ReportAll”, the College has higher expected payoﬀ compared to “Report Max”. Hence, the College has an incentiveto prefer this policy even without the ability to perform equilibrium selection.

Theorem 10.

For any α ∈ ( , , p < p ∗∗ k , the College obtains strictly higher utility under the “ReportAll” policy compared to the“Report Max” policy in every nontrivial equilibrium.Additionally, amongst the “Report All” equilibria, the College has a higher expected payoﬀ under equilibriain which the admissions outcome depends on more than the ﬁrst reported score.Proof. The statement holds vacuously true for p < ˆ p k ∈ (0 , p ∗∗ k ) since there only exists trivial equilibriumof rejecting all under “Report Max”. Recall that when p ∈ [ˆ p k , p ∗∗ k ) ⊂ (1 − α, α ), a student whose ﬁrst (oronly) score is A gets admission under both schemes. Therefore, the only diﬀerence between equilibria under“Report All” and “Report Max” consists in how they treat students whose ﬁrst (or only) score is B . ByLemma 5.3, we know already that for the ﬁrst score equilibrium under “Report All”, the College strictlyprefers “Report All” to “Report Max”.Therefore, it remains to study the case in which the equilibrium admissions outcome depends on morethan the ﬁrst score. Fix any such an equilibrium and deﬁne S to be the set of score sequences of length ≥ : S ≡ { s ∈ ∪ ki =2 { A, B } i | u s = 1 } . For any s ∈ S , denote the fraction of students who obtain score s in the equilibrium by q s ∈ [0 , ¯ φ ]. Then theexpected payoﬀ for the college under this equilibrium of the “Report All” policy is: αp − ¯ α ¯ p | {z } A yields admission + X s ∈ S q s [ p s + ( − − p s )] | {z } Longer score sequences yield admission . For the College to admit s ∈ S , we need p s ≥ , and thus the last term P s ∈ S q s [ p s + ( − − p s )] = P s ∈ S q s (2 p s − ≥

0. Note that the ﬁrst two terms are the expected payoﬀ for the College in the ﬁrst-scoreequilibrium. Therefore, the College prefers a non-ﬁrst-score equilibrium than the ﬁrst-score equilibriumunder “Report All”, which in turn is strictly better than the separating equilibrium under “Report Max” by

Lemma 5.3 . Note that p ∗∗ k = α − α k − α k − ¯ α k ∈ [ , α ) strictly increases in k and lim k →∞ p ∗∗ k = α , so the range of interest expands as thenumber of tests available increase and in the limit, the results in Theorem 10 hold in every nontrivial equilibrium comparisonfor p ∈ (0 , p < ≤ p ∗∗ k . In thiscase, ﬁndings in Theorem 10 are valid. The set S is nonempty. Otherwise we are back to the case in which only the ﬁrst score matter for the admissions outcome. Discussion

Allowing students to retake standardized tests and report only the best scores obtained—a currently commonpractice known as “super-scoring”—clearly gives an advantage to well-resourced students who have the abilityto take the test multiple times. A natural ﬁx would seem to be to require that all students take the examonly once, thereby enforcing equity—but for various reasons, including that tests are administered by thirdparty entities with their own interests, and that diﬀerent colleges have diﬀerent admissions policies, thisseems unworkable.

A priori , the eﬀects of a traditional alternative—requiring students to report all of theirscores — are less transparent. This is because it seems to still gives well-resourced students an advantage,as a population: it provides the option for the more talented students to report a more accurate signal (bytaking the exam several times), while allowing the less talented students to pool with the lower-resourcedstudents by taking the exam only once, thereby providing a less accurate signal and an increased chance ofadmissions.Nevertheless, we show that in equilibrium , the traditional policy of requiring that all scores be reportedhas the same eﬀect as enforcing that students take the exam only once. Moreover, this policy is preferableto super-scoring, both from the perspective of equity—in the “Report All” equilibrium, the chance thata student is admitted is independent of their population, conditional on their type—but also from theperspective of the college. This represents an unusual but important situation in which goals of accuracyand equity are in alignment.

Acknowledgements

We thank Chris Jung, Changhwa Lee, and Mallesh Pai for helpful conversationsat an early stage of this work. We gratefully acknowledge support from NSF grants CCF-1763307 andCCF-1763349 and the Simons Collaboration on the Theory of Algorithmic Fairness.

References [BCC86] R. F. Boldt, J. A. Centra, and R. G. Courtney. The validity of various methods of treatingmultiple sat scores.

ETS Research Report Series , 1986(1):i–8, 1986.[BG20] Mark Braverman and Sumegha Garg. The role of randomness and noise in strategic classiﬁcation. arXiv preprint arXiv:2005.08377 , 2020.[Cho17] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivismprediction instruments.

Big data , 5(2):153–163, 2017.[CL93] Stephen Coate and Glenn C Loury. Will aﬃrmative-action policies eliminate negative stereo-types?

The American Economic Review , pages 1220–1240, 1993.[FV92] Dean P Foster and Rakesh V Vohra. An economic argument for aﬃrmative action.

Rationalityand Society , 4(2):176–188, 1992.[GWF08] Eric Grodsky, John Robert Warren, and Erika Felts. Testing and social stratiﬁcation in americaneducation.

Annual Review of Sociology , 34:385–404, 2008.[HC18] Lily Hu and Yiling Chen. A short-term intervention for long-term fairness in the labor market.In

Proceedings of the 2018 World Wide Web Conference , pages 1389–1398, 2018.[HIV19] Lily Hu, Nicole Immorlica, and Jennifer Wortman Vaughan. The disparate eﬀects of strategicmanipulation. In

Proceedings of the Conference on Fairness, Accountability, and Transparency ,pages 259–268, 2019.[HM19] Ben Hutchinson and Margaret Mitchell. 50 years of test (un) fairness: Lessons for machinelearning. In

Proceedings of the Conference on Fairness, Accountability, and Transparency , pages49–58, 2019. 18HPS16] Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. arXiv preprint arXiv:1610.02413 , 2016.[ILZ19] Nicole Immorlica, Katrina Ligett, and Juba Ziani. Access to population-level signaling as asource of inequality. In

Proceedings of the Conference on Fairness, Accountability, and Trans-parency , pages 249–258, 2019.[JKL +

20] Christopher Jung, Sampath Kannan, Changhwa Lee, Mallesh Pai, Aaron Roth, and RakeshVohra. Fair prediction with endogenous behavior. In

Proceedings of the 21st ACM Conferenceon Economics and Computation , pages 677–678, 2020.[KMR16] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-oﬀs in the fairdetermination of risk scores. arXiv preprint arXiv:1609.05807 , 2016.[KRZ19] Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream eﬀects of aﬃrmative action. In

Proceedings of the Conference on Fairness, Accountability, and Transparency , pages 240–248,2019.[LWH +

20] Lydia T Liu, Ashia Wilson, Nika Haghtalab, Adam Tauman Kalai, Christian Borgs, and Jen-nifer Chayes. The disparate equilibria of algorithmic decision making when individuals investrationally. In

Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency ,pages 381–391, 2020.[MMDH19] Smitha Milli, John Miller, Anca D Dragan, and Moritz Hardt. The social cost of strategicclassiﬁcation. In

Proceedings of the Conference on Fairness, Accountability, and Transparency ,pages 230–239, 2019.[Ree13] William J. Reese.

Testing Wars in the Public Schools: A Forgotten History . Harvard UniversityPress, 2013.[Tho71] Robert L. Thorndike. Concepts of culture-fairness.