Multi-Stage Decentralized Matching Markets: Uncertain Preferences and Strategic Behaviors
MMulti-Stage Decentralized Matching Markets:Uncertain Preferences and Strategic Behaviors
Xiaowu Dai and Michael I. Jordan
University of California at Berkeley
Abstract
Matching markets are often organized in a multi-stage and decentralized manner.Moreover, participants in real-world matching markets often have uncertain prefer-ences. This article develops a framework for learning optimal strategies in such set-tings, based on a nonparametric statistical approach and variational analysis. Wepropose an efficient algorithm, built upon concepts of “lower uncertainty bound” and“calibrated decentralized matching,” for maximizing the participants’ expected payoff.We show that there exists a welfare-versus-fairness trade-off that is characterized bythe uncertainty level of acceptance. Participants will strategically act in favor of alow uncertainty level to reduce competition and increase expected payoff. We studysignaling mechanisms that help to clear the congestion in such decentralized marketsand find that the effects of signaling are heterogeneous, showing a dependence on theparticipants and matching stages. We prove that participants can be better off withmulti-stage matching compared to single-stage matching. The deferred acceptanceprocedure assumes no limit on the number of stages and attains efficiency and fairnessbut may make some participants worse off than multi-stage matching. We demon-strate aspects of the theoretical predictions through simulations and an experimentusing real data from college admissions.
Key Words : Multi-stage matching, Decentralized markets, Uncertain preference, Fairness,College admissions, Reproducing kernel Hilbert spaces
Two-sided matching markets have played an important role in microeconomics for severaldecades (Roth and Sotomayor, 1990). Matching markets are used to allocate indivisible“goods” to multiple decision-making agents based on mutual compatibility as assessed viasets of preferences. Such a market does not clear through prices. For example, a student1 a r X i v : . [ c s . G T ] F e b pplicant cannot simply demand the college she prefers but must also be chosen by thecollege. Matching markets are often organized in a decentralized way. Each agent makestheir decision independently of others’ decisions, and each agent can have multiple stagesof interactions with the other side of the market. College admissions with waiting listsand academic job markets are notable examples. We refer to such markets as multi-stagedecentralized matching markets .Uncertain preference is ubiquitous in multi-stage decentralized matching markets. Forinstance, colleges competing for students lack information on students’ preferences. Anadmitted student may receive offers from other colleges. She needs to accept one or rejectall offers within a short period during each stage of early, regular, and waiting-list admissions(Avery et al., 2003). This admission process provides little opportunity for colleges to learnstudents’ preferences, which are uncertain due to competition among colleges and variabilityin the relative popularity of colleges over time. Such uncertain preferences pose a challengefor colleges in their attempt to formulate an optimal admission strategy. Consequently,colleges may end up enrolling too many or too few students relative to their capacity orhaving enrolled students overly far from the attainable optimum in quality.This paper addresses the following two research questions: • Given the uncertain preferences on one side of the market (e.g., students), how canagents (e.g., colleges) learn an optimal strategy that maximizes expected payoffs basedon historical data? • What are the fundamental implications of multi-stage decentralized matching on thewelfare and fairness for both sides of the market?We study these two questions using nonparametric statistical methodology and variationalanalysis . We propose a new algorithm for maximizing agents’ expected payoffs that is basedon learning stage-wise optimal strategies and calibrating state parameters based on histor-ical data. In particular, our algorithm balances the opportunity cost and the penalty forexceeding the quota for calibration. Based on the calibrated state, the algorithm efficientlylearns an optimal strategy using statistical machine learning methods. The statistical model2ot only provides a foundation for the algorithm but it also provides an analytical frame-work for understanding the implications of the approach for welfare and fairness. We showthat agents will favor arms with realistic and stable opportunities for matching instead ofonly targeting the top-ranked arms. Moreover, we show that agents are better off withmulti-stage decentralized matching as compared to single-stage decentralized matching.Adopting literature from the bandit literature, our model has a set of agents , each withlimited capacity, and a set of arms . Each agent values two attributes of an arm: a “score”that is common to all agents and a “fit” that is agent-specific and independent across agents.Agents rank arms according to their scores and fits. An agent’s strategy consists of howmany and which arms to pull at each stage. On the other hand, there is no restriction on thepreferences of arms. The model allows uncertainty in the preferences, which is incorporatedinto the arms’ stage-wise acceptance probabilities. The acceptance probability depends onthe unknown state of the world and the competition of agents at each stage. We considera simple timeline for multi-stage markets. At each stage, agents simultaneously pull sets ofarms. Each arm accepts at most one of the agents that pulled it. The arms have to makeirreversible decisions at each stage without knowing which other agents might select themin later stages.Decentralized matching markets can serve as a recommendation platform through whichparticipants decide who they would like to connect to on the other side of the market. Thismatching system involves scarcity as each arm is allowed to accept at most one agent. Ourresults add three features to such a two-sided platform. First, our algorithm predicts matchcompatibility using historical data. Each agent can predict the probability of successfullypulling a specific arm. The prediction of compatibility is also possible in another directionthat arms can also learn how much an agent may like them. This feature distinguishes thetwo-sided platform from a one-sided recommendation engine that only considers which armsan agent may like, but not which arms may also like the agent in return. Second, we showthat strategic behaviors arise in multi-stage decentralized matching markets, where agentsfavor arms that have a realistic potential for being in a stable matching. In turn, armscan strategically enhance their chance of being pulled by preferred agents. For example, ifarms have resources to invest in improvement, we show that they have incentives to raise the3gent-specific fits instead of improving their scores. Third, we find that the market efficiencyis improved if participants reduce the uncertainty of preferences. Due to agents’ competition,the top arms will be pulled by a deluge of agents while the next-best arms receive manyfewer offers; most of the arms and agents turn out to yield unfruitful connections. We showthat if the market allows arms to signal their interests or suggests agents to restrict waitinglists’ capacity, the market achieves a better clearance of congestion.
There are two main contributions in this paper, which correspond to the two questionsabove. Our first contribution is to propose a new algorithm that maximizes the agent’sexpected payoff in multi-stage decentralized matching markets. The algorithm sequentiallylearns the optimal strategy at each stage and is built upon notions of lower uncertaintybound (LUB) and calibrated decentralized matching (CDM). The key idea is to calibrate thestate parameter in a data-driven approach and take the opportunity cost and penalty forexceeding the quota into account. The calibration can be performed under both average-case and worst-case metrics, depending on whether we are maximizing the averaged orminimal expected payoff with respect to the uncertain state. Given the calibrated state, thealgorithm efficiently learns the optimal strategy using historical data via statistical machinelearning methods.The second contribution is providing an analytical framework for understanding thewelfare and fairness implications. We show that agents favor arms with low uncertainty inlevels of acceptance, suggesting that agents prefer arms with a realistic and stable chance formatching instead of only targeting the top-ranked arms. Such strategic behavior improvesthe agent’s expected payoff since otherwise, by the time that arms have rejected that agent,the next-best arms that the agent has in mind may already have accepted other agents.However, the strategic behavior leads to unfair outcomes for arms because some arms arenot pulled by their favorite agents even though these agents pull arms ranked below them.Our analysis shows that if arms signal their interests or agents restrict the waiting list’scapacity, the market will have reduced uncertainty. Consequently, the market congestion isbetter cleared as more arms are matched to agents. We also find that the effects of signal-4ng are heterogeneous depending on the participants and matching stages. We prove thatagents are better off in multi-stage decentralized matching markets compared to single-stagedecentralized matching markets. Moreover, although the well-known deferred acceptance(DA) algorithm (Gale and Shapley, 1962) assumes no limit on the number of stages andattains efficiency and fairness, we show that DA may still make some participants worse offcompared to multi-stage decentralized matching.
This paper is related to several strands of literature. The first line of work is on thedecentralized interactions in matching markets (Roth and Xing, 1997; Das and Kamenica,2005; Niederle and Yariv, 2009; Diamantoudi et al., 2015; Dai and Jordan, 2020) and searchliterature (Montgomery, 1991; Peters, 1991). Our paper contributes to this line of work viaits analysis of multi-stage markets that allow uncertain preferences on one side of the market.Moreover, we develop a statistical model to learn the optimal strategy using historical dataand we analyze the implications for strategic behavior in multi-stage decentralized markets.A second related body of literature is preference signaling in matching markets (Spense,1973; Crawford and Sobel, 1982; Lee and Schwarz, 2009; Hoppe et al., 2009; Coles et al.,2013; Abdulkadiro˘glu et al., 2015). Our analysis varies from the standard model in theexisting literature in that we consider uncertain preferences in decentralized markets. Wefind that if arms signal their interest or agents restrict the waiting list’s capacity, the marketwill have reduced uncertainty and have better clearance of congestion. Moreover, we showthat signaling has heterogeneous effects on the payoffs depending on the agents and matchingstages.The third related body of literature is on algorithmic studies of college admissions. Thecelebrated work of Gale and Shapley (1962) introduced the deferred acceptance algorithmimplemented under central clearinghouses. Recently, Epple et al. (2006) and Fu (2014)modeled equilibrium admissions and studied application costs, financial aid, and determin-ing tuition. By contrast, we emphasize students’ multidimensional abilities and colleges’uncertainties over students’ acceptance probability. Chade and Smith (2006) and Hafaliret al. (2018) provided an algorithm focused on students’ efforts, as compared to the focus5n the strategy of the colleges in our analysis. Avery and Levin (2010) studied early admis-sions as a way for students to signal college-specific quality. Chade et al. (2014) developed adecentralized Bayesian model under a particular preference structure. Azevedo and Leshno(2016) characterized a college admission equilibrium in terms of supply and demand in acentralized market. Che and Koh (2016) considered enrollment uncertainty with two col-leges and a continuum of students. By contrast, our model considers incomplete informationwith multiple colleges and a finite number of arms, compared to a continuum of agents orarms common in this literature. Moreover, we develop a nonparametric statistical model tolearn optimal strategies using historical data.The rest of the paper is organized as follows. We introduce the background of multi-stagedecentralized matching markets in Section 2, and propose a new algorithm for the learningof the optimal strategy in Section 3. We study the fairness implications and signalingmechanisms of multi-stage decentralized matching markets in Section 4 and compare withother matching markets in Section 5. We investigate the numerical performance of ouralgorithm in Section 6, and illustrate with a real data example in Section 7. We provideadditional numerical examples and all proofs in the Supplementary Appendix.
Denote the set of m agents by P = { P , P , . . . , P m } and the set of n arms by A = { A , A , . . . , A n } . Then P and A are the sets of participants on the two sides of the matchingmarket. Each agent P i has a quota q i ≥
1. We assume that q + q + · · · + q m ≤ n . Thereare total of K ≥ m ] ≡ { , . . . , m } , [ n ] ≡ { , . . . , n } , and [ K ] ≡ { , . . . , K } .Matching markets are typically organized in one of the two forms: centralized and decen-tralized. The centralized matching markets use clearinghouses to coordinate and implementstable matching (Roth, 1984; Abdulkadiro˘glu and S¨onmez, 2003). By contrast, decentral-6zed matching markets require participants to make their decisions independently of others’decisions (Roth and Xing, 1997; Roth, 2008). From an agent’s perspective, the goal is tofind a strategy for maximizing the expected payoff. Here a strategy consists of decidinghow many and which arms to pull at each stage. Agent’s decision-making in decentralizedmarkets faces incomplete information about other agents’ decisions and arms’ preferences.In this paper, we study decentralized matching markets with multiple stages of matchingprocess.Notable examples of such markets include college admissions in the United States, Korea,and Japan, where P and A represent the sets of colleges and students, respectively. Collegescan use mechanisms such as early admission and waiting list . Colleges process the earlyadmission applications before the regular admission with the requirement of binding ornon-binding terms for students to accept offers early (Avery et al., 2003; Avery and Levin,2010). The mechanism of the waiting list allows colleges to wait-list some students duringregular admission. Later, colleges admit students from the waiting list when some regularadmission offers are rejected. The waiting list procedure may repeat over multiple rounds(NACAC, 2019). Each student accepts at most one offer or rejects all offers at each stageof early, regular, and waiting list admissions. Once a student accepts an offer, she exits themarket. The college’s goal is to have the entering class have size reasonably close to thequota and quality close to the attainable optimum. The strategy of the colleges consists indeciding how many and which applicants to admit at each admission stage. The uncertaintyof outcomes arises from the competition between colleges and the unknown preferences ofthe students. Specifically, each college has little information on other colleges’ strategiesand how student applicants rank colleges. The agents’ preferences are based on the arms’ latent utilities. We consider the followinglatent utility model: U i ( A j ) = v j + e ij , ∀ i ∈ [ m ] , j ∈ [ n ] . (1)Here, v j ∈ [0 ,
1] is arm A j ’s systematic score considered by all agents, and e ij ∈ [0 ,
1] is anagent-specific idiosyncratic fit considered only by agent P i , i ∈ [ m ]. A utility model with a7imilar separable structure has been widely used in the matching market literature (Chooand Siow, 2006; Menzel, 2015; Chiappori and Salani´e, 2016; Ashlagi et al., 2020)The arms’ preferences have no restrictions and can involve uncertainty. From an agent’sperspective, arms accept offers with probabilities dependent on opponents’ strategies andarms’ preferences. Let the parameter s i,k ∈ [0 ,
1] be the state of the world (Savage, 1972)for agent P i , such that the probability that an arm A j accepts P i at stage k is π i,k ( s i,k , v j ) , ∀ i ∈ [ m ] , j ∈ [ n ] , k ∈ [ K ] . Since agents compete for arms with a higher score, the acceptance probability π i,k ( s i,k , v j )models the agents’ competition through the dependence on the score v j . Moreover, π i,k ( s i,k , v j )incorporates the arm’s uncertain preference into the state s i,k . It is known that there ex-ists a valid probability mass function π i,k ( s i,k , v j ) (Dai and Jordan, 2020). We assume that π i,k ( s i,k , v j ) is strictly increasing and continuous in s i,k . Thus, a larger value of the state s i,k corresponds to the case that agent P i is more popular. In practice, the true state is un-known a priori to P i and needs to be estimated from data. For instance, the yield in collegeadmissions is defined as the rate at which a college’s admitted students accept the offers.However, the yield is unknown a priori to the college in the current year (Che and Koh,2016). Colleges can only estimate the distribution of the yield from historical data. In thispaper, we study a nonparametric model of π i,k ( · , · ) by assuming it belongs to a reproducingkernel Hilbert space (RKHS) (Aronszajn, 1950; Wahba, 1990). Later, in Section 3.2, wepropose an algorithm for calibrating s i,k and efficiently estimating π i,k ( · , · ) using historicaldata.Given the latent utility U i ( A j ) and the acceptance probability π i,k ( s i,k , v j ), agent P i ’sexpected utility of pulling arm A j at stage k is E [utility] = ( v j + e ij ) · π i,k ( s i,k , v j ) . (2) We consider the following timeline for multi-stage decentralized matching. First, Naturedraws a state such that arms’ preferences are realized. Denote by s ∗ i,k the true state foragent P i at stage k . 8ext, arms simultaneously display their interests to all agents. For example, studentsapply to colleges in a given period. Under the assumption that students incur negligibleapplication costs, submitting applications to all colleges is the dominant strategy as studentslack information on how colleges evaluate their academic ability or personal essays (Averyand Levin, 2010; Che and Koh, 2016).Next, at each stage k ∈ [ K ], agents simultaneously pull available arms that have notpreviously rejected them. Each arm either accepts one of the agents that pulled it (if any)or rejects all. An arm exits the market once it accepts an agent, and agents are allowedto exit the market at any time. The arms act simultaneously at each stage. They cannot“hold” offers for accepting or rejecting at a later stage. Hence, agents make “exploding”offers, and arms have to make irreversible decisions without knowing what other offers arecoming in later stages.Finally, this multi-stage matching process ends when all agents have exited or when apre-specified number of stages has been reached. If there remain arms in the market whenthe matching has terminated, these arms are unmatched. See an example of such a matchingin the left plot of Figure 1. Figure 1:
The left plot shows the process of a two-stage matching without a central clearinghousefor coordination (Roth, 2008). The solid lines with arrows represent the actions at the first stage,and the dotted lines with arrows are the actions at the second stage. The middle plot shows thecutoff (cid:98) e i,k ( s i,k , v ) of Theorem 2 when Eq. (9) holds. The shaded area presents (cid:98) B i,k ( s i,k ). Thedotted curve represents the function e − i,k ( s i,k , v ) in verifying Eq. (9). The dashed curve represents (cid:98) b i,k ( s i,k )[1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v )] − − v , and if thresholding to e i,k ∈ [0 , (cid:98) e i,k ( s i,k , v ), which is denoted by the blue solid curve. The right plot shows the cutoff (cid:98) e i,k ( s i,k , v )when Eq. (9) does not hold. .4 Agent’s Expected Payoff An agent’s goal is to maximize the expected payoff, which consists of two parts: theexpected utilities and the penalty for exceeding the quota. Let A k be the set of arms thatare available in the market at stage k ∈ [ K ]. Suppose that agent P i pulls arms from theset B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } at stage k , where A \ B denotes that set A minus set B . Let C i,k ⊆ B i,k be the set of arms that accept P i at stage k . Then C i,k is unknown until stage k + 1, where k ≤ K −
1, and C i,K is unknown until the end of the matching process. ByEq. (2), P i ’s expected payoff at stage k ∈ [ K ] is U i,k [ B i,k ] ≡ (cid:88) j ∈B i,k ( v j + e ij ) π i,k ( s ∗ i,k , v j ) − γ i max (cid:110) (cid:88) j ∈B i,k π i,k ( s ∗ i,k , v j ) + card( ∪ l ≤ k − C i,l ) − q i , (cid:111) . (3)Here card( · ) denotes the size of the set, and γ i is the marginal penalty for exceeding thequota. We assume that γ i > max j ∈A { v j + e ij } . Hence the penalty is greater than anyarm’s latent utility. Note that since our model studies unknown strategies of the opponentsand uncertain preferences of the arms in a decentralized market, we consider the optimalexpected payoff in Eq. (3) instead of the optimal realized payoff. We consider a variational formulation of the optimal strategy in Section 3.1 and proposea two-step algorithm using a statistical machine learning method in Section 3.2.
The problem of finding the optimal set of arms, and the corresponding optimal value ¯ U i ,can be described as follows:¯ U i = max B i,k ⊆{A k \∪ l ≤ k − B i,l } ,k ∈ [ K ] (cid:88) k ∈ [ K ] U i,k [ B i,k ] , (4)where the expected payoff U i,k is defined in Eq. (3). Finding and checking an optimal solutionto Eq. (4) is difficult. Suppose that an arm set ∪ k ∈ [ K ] ¯ B i,k is given and that it is claimed tobe the optimal solution to Eq. (4). It is clear that the problem of verifying that ∪ k ∈ [ K ] ¯ B i,k is10ptimal is NP-complete; roughly because we need to individually check a significant fractionof the combinations of card( ∪ k ∈ [ K ] A k ) arms to determine which combination might give alarger expected payoff than the given arm set ∪ k ∈ [ K ] ¯ B i,k . Since the number of combinationsgrows exponentially with the number of arms, the complexity of any systematic algorithmbecomes impractically large. Moreover, the expected payoff U i,k depends on the unknowntrue state s ∗ i,k . We exploit the underlying structure of the optimization problem in Eq. (4) to find apractical methodology. For an arm set B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } , its loss can be formulatedby comparing its expected payoff to the expected payoff of ¯ B i,k , where we suppose that ∪ k ∈ [ K ] ¯ B i,k achieves the optimal value ¯ U i in Eq. (4). Then the loss of B i,k becomes L i,k [ B i,k ] = (cid:110) (cid:88) j ∈B i,k π i,k ( s ∗ i,k , v j ) > q i − card( ∪ l ≤ k − C i,l ) (cid:111) OE[ B i,k ]+ (cid:110) (cid:88) j ∈B i,k π i,k ( s ∗ i,k , v j ) ≤ q i − card( ∪ l ≤ k − C i,l ) (cid:111) UE[ B i,k ] , ∀ k ∈ [ K ] . (5)Here the over-enrollment (OE) loss in Eq. (5) is defined asOE[ B i,k ] ≡ γ i (cid:110) (cid:88) j ∈B i,k π i,k ( s ∗ i,k , v j ) + card( ∪ l ≤ k − C i,l ) − q i (cid:111) − (cid:110) (cid:88) j ∈B i,k ( v j + e ij ) π i,k ( s ∗ i,k , v j ) − (cid:88) j ∈ ¯ B i,k ( v j + e ij ) π i,k ( s ∗ i,k , v j ) (cid:111) , ∀ k ∈ [ K ] , where we recall that penalty parameter γ i is defined in Eq. (3). The under-enrollment (UE)loss in Eq. (5) is given byUE[ B i,k ] ≡ (cid:40) ρ i,k [ (cid:80) j ∈ ¯ B i,k ( v j + e ij ) π i,k ( s ∗ i,k , v j ) − (cid:80) j ∈B i,k ( v j + e ij ) π i,k ( s ∗ i,k , v j )] , k ≤ K − , (cid:80) j ∈ ¯ B i,k ( v j + e ij ) π i,K ( s ∗ i,k , v j ) − (cid:80) j ∈B i,k ( v j + e ij ) π i,K ( s ∗ i,k , v j ) , k = K, (6)where ρ i,k ∈ (0 ,
1) is a discount factor for k ≤ K −
1. Note that ρ i,k < P i can fill the remaining quota (if any) in subsequent stages of the matching process. Onthe other hand, ρ i,k > hierarchical structure of the multi-stage matching and it isdefined as follows: For any agent P i , the j th best arm available at the subsequent stage haslower latent utility than the j th best arm available at the current stage, where j ≥
1. Thehierarchical structure has been noted in college admissions with waiting lists (Che and Koh,2016). The report of the NACAC (2019) corroborates the hierarchical structure where theadmission rate of the waiting list is significantly lower than that of regular admission. Thetop students in a college’s waiting list, uncertain about their rankings in the list and whetherthe college would admit them later, may have accepted offers from their less preferredcolleges. However, unlike the stages k ≤ K −
1, the last stage k = K has the discount factorequals to 1 since the agent cannot fill the remaining quota (if any) after the last stage.The formulation in Eq. (5) allows one to study stage-wise optimal sets B i,k that minimizethe loss L i,k for each k ∈ [ K ]. This makes the optimization problem easier compared tojointly finding B i,k for all k ∈ [ K ] such that ∪ k ∈ [ K ] B i,k maximizes the expected payoff inEq. (4). We introduce the following notation: δ i,k ( v ) ≡ (cid:20) max s i,k π i,k ( s i,k , v ) − min s i,k π i,k ( s i,k , v ) (cid:21) , (7)which measures the uncertainty of the acceptance probability with respect to the unknownstate. Using this notation, we show that a solution to a variational problem gives a minimizerof the loss in Eq. (5). Theorem 1.
There exist parameters η i,k > , for k ≤ K − , and η i,K = 0 such that withprobability approaching one, the minimizer of the following loss function is also a minimizerof the loss in Eq. (5): L † i,k [ B i,k ] = (cid:88) j ∈B i,k ( v j + e ij ) [ η i,k δ i,k ( v j ) − π i,k ( s ∗ i,k , v j )]+ γ i max (cid:110) (cid:88) j ∈B i,k π i,k ( s ∗ i,k , v j ) + card ( ∪ l ≤ k − C i,l ) − q i , (cid:111) , ∀ k ∈ [ K ] , (8) where B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } .
12e make two remarks regarding this theorem. First, the regularization parameter η i,k isinduced by the hierarchical structure where the arms available at subsequent stages mightbe much worse than current ones. Hence, each agent prefers arms with a stable acceptanceprobability, and for which η i,k controls the penalty on the uncertainty. Second, we emphasizethe difference between the multi-stage decentralized matching problem and the multi-armedbandit problem (Bubeck and Cesa-Bianchi, 2012). A bandit problem is a sequential alloca-tion problem in which an environment repeatedly provides an agent with a fixed set of arms.Although similar in that it involves sequential decision making under limited information,the multi-stage matching market involves multiple agents competing for arms. An arm exitsthe market once it accepts an offer. The competition induces a hierarchical structure whichmakes the optimization in Eq. (8) different from the optimization in multi-armed bandits. Although the variational problem in Eq. (8) requires only stage-wise optimization andcan be solved sequentially for each k ∈ [ K ], the finding and checking of an optimal solutionis still NP-complete. This is because we need to individually check a significant fractionof the combinations of card( A k \ ∪ l ≤ k − B i,l ) arms at each stage k ∈ [ K ] to determine theoptimal solution for Eq. (8). The number of combinations grows exponentially with thenumber card( A k \ ∪ l ≤ k − B i,l ) for any k ∈ [ K ].We propose a greedy algorithm that gives an approximate solution to the optimiza-tion problem in Eq. (8). Suppose the true state is fixed at s ∗ i,k = s i,k . We refer to( v j + e ij )[ π i,k ( s i,k , v j ) − η i,k δ i,k ( v j )] as arm A j ’s variational expected utility . For each A j ∈{A k \ ∪ l ≤ k − B i,l } , the greedy algorithm computes the variational expected utility per unitof acceptance probability, that is, r ( A j ) ≡ ( v j + e ij )[ π i,k ( s i,k , v j ) − η i,k δ i,k ( v j )] /π i,k ( s i,k , v j ) . Then the algorithm ranks arms according to its associated value of r so that r (1) ≥ r (2) · · · ≥ r (card( A k \∪ l ≤ k − B i,l )) . Starting with the first arm corresponding to r (1) and continuing in order,the algorithm selects the arm if its variational expected utility is larger than the expectedpenalty of exceeding the quota. This algorithm terminates when it arrives at a cutoff value13f r . Then only arms whose associated r value are better than or equal to the cutoff areselected for agent P i to pull at stage k ∈ [ K ]. In the following, we present the formalizedcutoff for r .Let b i,k be the value of r of those arms on the cutoff. That is, arms on the cutoff satisfy b i,k = ( v + e i )[1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v )] ≥
0. Let Π i,k ( b i,k ) be the expected number of armsin (cid:98) B i,k ( s i,k ) that would accept P i . That is,Π i,k ( b i,k ) = (cid:88) j ∈A (cid:0) e ij ≥ min (cid:8) max (cid:8) b i,k [1 − η i,k δ i,k ( v j ) π − i,k ( s i,k , v j )] − − v j , (cid:9) , (cid:9)(cid:1) π i,k ( s i,k , v j ) . If there exists some b i,k ≥ i,k ( b i,k ) = q i − card( ∪ l ≤ k − C i,l ), we let (cid:98) b i,k ( s i,k ) = b i,k and the cutoff (cid:98) e i,k ( s i,k , v ) = min { max { (cid:98) b i,k ( s i,k )[1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v )] − − v, } , } .However, if there is no solution to Π i,k ( b i,k ) = q i − card( ∪ l ≤ k − C i,l ), we let b + i,k ( s i,k ) = arg max b i,k ≥ { Π i,k ( b i,k ) > q i − card( ∪ l ≤ k − C i,l ) } ,b − i,k ( s i,k ) = arg min b i,k ≥ { Π i,k ( b i,k ) < q i − card( ∪ l ≤ k − C i,l ) } . To choose between b + i,k and b − i,k , it is necessary to balance the expected utility and the ex-pected penalty for exceeding the quota due to pulling arms on the boundary . Define two cut-offs e + i,k ( s i,k , v ) ≡ min { max { b + i,k ( s i,k )[1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v )] − − v, } , } and e − i,k ( s i,k , v ) ≡ min { max { b − i,k ( s i,k )[1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v )] − − v, } , } . The two cutoffs correspond to twosets, B + i,k ( s i,k ) = { j | e ij ≥ e + i,k ( s i,k , v j ) } and B − i,k ( s i,k ) = { j | e ij ≥ e − i,k ( s i,k , v j ) } , respectively.Consider the following condition for the arms on the boundary {B + i,k ( s i,k ) \ B − i,k ( s i,k ) } . Thiscondition formalizes the comparison of the variational expected utility and the expectedpenalty of exceeding the quota: (cid:88) j ∈B + i,k ( s i,k ) \B − i,k ( s i,k ) ( v j + e ij )[ π i,k ( s i,k , v j ) − η i,k δ i,k ( v j )] ≥ γ i (cid:88) j ∈B + i,k ( s i,k ) π i,k ( s i,k , v j ) − γ i [ q i − card( ∪ l ≤ k − C i,l )] . (9)If Eq. (9) holds, let (cid:98) b i,k ( s i,k ) = b + i,k ( s i,k ) and otherwise, let (cid:98) b i,k ( s i,k ) = b − i,k ( s i,k ). Then thecutoff (cid:98) e i,k ( s i,k , v ) = min (cid:110) max (cid:110)(cid:98) b i,k ( s i,k )[1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v )] − − v, (cid:111) , (cid:111) . (10)14inally, the greedy algorithm pulls arms from (cid:98) B i,k ( s i,k ) for agent P i , where (cid:98) B i,k ( s i,k ) = { j | A j ∈ {A k \ ∪ l ≤ k − B i,l } with ( v j , e ij ) satisfying e ij ≥ (cid:98) e i,k ( s i,k , v j ) } . (11)The middle and right plots in Figure 1 illustrate the nonlinear cutoff (cid:98) e i,k ( s i,k , v ). Theorem 2.
Suppose the true state is fixed at s ∗ i,k = s i,k . The arm set (cid:98) B i,k ( s i,k ) in Eq. (11)is near-optimal as its loss satisfies min B i,k ⊆{A k \∪ l ≤ k − B i,l } L † i [ B i,k ] ≤ L † i [ (cid:98) B i,k ( s i,k )] ≤ min B i,k ⊆{A k \∪ l ≤ k − B i,l } L † i [ B i,k ] + UE † , where the loss function L † i is defined in Eq. (8), and UE † is defined byUE † ≡ (cid:104) min j ∈B − i,k ( s i,k ) ( v j + e ij )(1 − η i,k π − i,k ( s i,k , v j ) δ i,k ( v j f )) (cid:105)(cid:104) q i − card ( ∪ l ≤ k − C i,l ) − (cid:88) j ∈B − i,k ( s i,k ) π i,k ( s i,k , v j ) (cid:105) . Moreover, if there is a continuum of arms and π i,k ( · , v ) is continuous in v , then UE † = 0 . Since the true state and the acceptance probability are unknown a priori in practice,the near-optimal strategy in Eq. (11) is unknown a priori to the agent P i . We proposea two-step algorithm to implement the strategy in Eq. (11) by using historical data andstatistical machine learning methods. The two-step algorithm is built upon the ideas oflower uncertainty bound (LUB) and calibrated decentralized matching (CDM) (Dai andJordan, 2020). In the first step, we compute an estimated expected utility of each armand its lower uncertainty bound. Many machine learning methods can be applied here forthe modeling of historical data. In the second step, we calibrate the state parameter in adata-driven approach that takes the opportunity cost and penalty for exceeding the quotainto account. Based on the calibrated state, an agent selects arms with the largest loweruncertainty bounds of the expected utility. The key idea of the two-step algorithm is toselect arms which have large expected utility or little uncertainty in the expected utility.We apply the two-step algorithm to each of the stages k = 1 , . . . , K .15 .2.1 Step 1: Estimation of Expected Utility and Lower Uncertainty Bound Let A t = { A t , A t , . . . , A tn t } be the arm set at t ∈ [ T ] ≡ { , . . . , T } . Let s ti,k be the stateof agent P i at stage k and time t . The state s ti,k is unknown until the next stage or the nexttime point, and the state s ti,k varies over time. For instance, the yield rate of a college maychange over the years. For any arm A tj ∈ A t , there are an associated pair of the score andfit values ( v tj , e tij ) obtained from Eq. (1), where i ∈ [ m ] , j ∈ [ n t ] . We let ( v tj , e tij ) denote theattributes of arm A tj . Define the set B ti,k = { j | P i pulls arm A tj at time t and step k, ≤ j ≤ n t } , where |B ti,k | = n ti,k and n ti,k ≤ n t . For any j ∈ B ti,k , the outcome that P i observes is whetheran arm A tj accepted P i , that is, y tij = { A tj accepts P i } . Here, the outcome y tij has thelikelihood that y tij = 1 is π i,k ( s ti,k , v tj ) and the likelihood that y tij = 0 is 1 − π i,k ( s ti,k , v tj ).We want to estimate π i,k based on the historical data: D = { ( s ti,k , v tj , e tij , y tij ) : i ∈ [ m ]; j ∈ ∪ Kk =1 B ti,k ; t ∈ [ T ] } . A wide range of machine learning methods, e.g., reproducingkernel methods, random forests, or neural networks, can be applied here to learn π i,k (Hastieet al., 2009). For concreteness, we consider a penalized estimator in a reproducing kernelHilbert space (RKHS; Wahba (1990)). Let the log odds ratio be f i,k ( s i,k , v ) = log (cid:18) π i,k ( s i,k , v )1 − π i,k ( s i,k , v ) (cid:19) . We assume that f i,k resides in an RKHS H K i,k , which corresponds to a reproducing kernel K i,k . Then we solve for the estimator (cid:98) f i,k ∈ H K i,k that minimizes the following objectivefunction: T (cid:88) t =1 (cid:88) j ∈B ti,k (cid:2) − y tij f i,k ( s ti,k , v tj ) + log (cid:0) (cid:0) f i,k ( s ti,k , v tj ) (cid:1)(cid:1)(cid:3) + 12 T (cid:88) t =1 n ti,k λ i,k (cid:107) f i,k (cid:107) H K i,k , (12)where (cid:107) · (cid:107) H K i,k is the RKHS norm and λ i,k ≥ H K i,k , where K i,k (( s i , v ) , ( s (cid:48) i , v (cid:48) )) = K si,k ( s i , s (cid:48) i ) K vi,k ( v, v (cid:48) ) withsome kernel functions K si,k and K vi,k (Wahba et al., 1995). It is known that (cid:98) f i,k is minimaxrate-optimal and satisfies (see, e.g., (Dai and Jordan, 2020)), E [( (cid:98) f i,k − f i,k ) ] ≤ c f (cid:2) T (log T ) − (cid:3) − r/ (2 r +1) , ∀ i ∈ [ m ] . (13)16ere, the constants c f > T , and r ≥ K si,k ( s, · ) and K vi,k ( v, · ) have squared integrable r th-orderderivatives.The value of learning from historical data is particularly significant a new arm is intro-duced into the problem. Let A T +1 = { A , . . . , A n } be the new arm set at time T + 1, whereeach arm A j has attributes obtained from Eq. (1). Then the estimated probability that A j accepts P i at time T + 1 and stage k is (cid:98) π i,k ( s i,k , v j ) = (cid:104) − (cid:98) f i,k ( s i,k , v j )) (cid:105) − (14)Moreover, the expected utility of A j is (cid:98) π i,k ( s i,k , v j )( v j + e ij ) for any j ∈ [ n ].Finally, we construct a lower uncertainty bound for the probability π i,k ( s i,k , v j ): (cid:98) π L i,k ( s i,k , v j ) = (cid:98) π i,k ( s i,k , v j ) − η i,k (cid:98) δ i,k ( v j ) , if v j ∈ [min { v tj | j ∈ ∪ Tt =1 B ti } , max { v tj | j ∈ ∪ Tt =1 B ti } ];1 , o.w. , (15)where (cid:98) π i,k ( s i,k , v j ) is obtained by Eq. (14) and (cid:98) δ i,k ( v j ) = [max s i,k (cid:98) π i,k ( s i,k , v j ) − min s i,k (cid:98) π i,k ( s i,k , v j )].The regularization parameter η i,k ≥ lower uncertainty bound for the expected utility is (cid:98) π L i,k ( s i,k , v j )( v j + e ij ) , ∀ j ∈ [ n ] . (16)Note that Eq. (15) assigns probability one to arms with scores that agent P i has neverpulled. Hence it encourages the exploration of previously untried arms. Since the true state is unknown in practice, a natural question is how to calibrate thestate parameter s i,k in Eq. (11). The calibration can be performed under both average-case and worst-case metrics, depending on minimizing the averaged- or worst-case losswith respect to the unknown true state. First, we consider the the average-case loss, E s ∗ i,k {L † i,k [ (cid:98) B i,k ( s i,k )] } , over the true state s ∗ i,k , where the loss L † i,k is defined in Eq. (8). Letthe marginal set be ∂ (cid:98) B i,k ( s i,k ) ≡ lim δ s → + (cid:110) (cid:98) B i,k ( s i,k − δ s ) \ (cid:98) B i,k ( s i,k ) (cid:111) . ∂ (cid:98) B i,k ( s i,k ) represents the change of (cid:98) B i,k ( s i,k ) with a perturbation of s i,k . Proposition 1.
The average-case loss E s ∗ i,k {L † i,k [ (cid:98) B i,k ( s i,k )] } is minimized if s i,k ∈ (0 , ischosen as the solution to P ( s ∗ i,k (cid:54) = s i,k ) (cid:88) j ∈ ∂ (cid:98) B i,k ( s i,k ) ( v j + e ij ) E s ∗ i,k (cid:2) π i,k ( s ∗ i,k , v j ) − η i,k δ i,k ( v j ) | s ∗ i,k (cid:54) = s i,k (cid:3) = γ i [1 − F s ∗ i,k ( s i,k )] (cid:88) j ∈ ∂ (cid:98) B i,k ( s i,k ) E s ∗ i,k [ π i,k ( s ∗ i,k , v j ) | s i,k < s ∗ i,k ≤ , (17) where F s ∗ i,k is the cumulative distribution function of s ∗ i,k ∈ [0 , . Since the calibration method in Proposition 1 minimizes the average-case loss, it is calledthe mean calibration . The key idea is to balance the trade-off between opportunity costand penalty for exceeding the quota (Dai and Jordan, 2020). If Eq. (17) has more thanone solution, then s i,k is chosen as the largest one. If the distribution F s ∗ i,k has discretesupport, the objective in Proposition 1 needs to be changed as follows: choosing the minimal s i,k ∈ [0 ,
1] such that the left side of Eq. (17) is not less than the right side of Eq. (17),where the search of s i,k starts from the maximum value in the support and decreases to theminimal value.Besides the average-case loss in Proposition 1, we also consider the worst-case loss withrespect to the unknown s ∗ i,k . Proposition 2 gives minimax calibration , which calibrates s i,k to minimize the maximum loss max s ∗ i,k {L † i,k [ (cid:98) B i,k ( s i,k )] } over the unknown s ∗ i,k . Proposition 2.
The worse-case loss max s ∗ i {L † i,k [ (cid:98) B i,k ( s i,k )] } is minimized if s i,k ∈ [0 , ischosen as the solution to (cid:88) j ∈ (cid:98) B i,k ( s i,k ) v j + e ij ) δ i,k ( v j ) + (cid:88) j ∈ (cid:98) B i,k (0) ( v j + e ij ) [ π i,k (0 , v j ) − η i,k δ i,k ( v j )]= (cid:88) j ∈ (cid:98) B i,k (1) ( v j + e ij ) [ π i,k (1 , v j ) − η i,k δ i,k ( v j )] + γ i (cid:88) j ∈ (cid:98) B i,k ( s i,k ) π i,k (1 , v j ) − γ i q i . Using the lower uncertainty bound in Eq. (16) and the calibrations in Propositions 1and 2, we propose to pull arms from the following set at stage k for agent P i : (cid:98) B L i,k ( s i,k ) = { j | A j ∈ {A T +1 k \ ∪ l ≤ k − B i,l } with ( v j , e ij ) satisfying e ij ≥ (cid:98) e L i,k ( s i,k , v j ) (cid:9) . (18)18 lgorithm 1 The LUB-CDM algorithm for multi-stage decentralized matching Input:
Historical data for an agent P i : { ( s ti,k , v tj , e tij , y tij ) : j ∈ B ti,k ; t = 1 , , . . . , T } ;New arm set A T +1 at time T + 1, where the arms have attributes { ( v j , e ij ) : j ∈ [ n ] } ;Penalty γ i for exceeding the quota. Regularization parameter η i,k ≥ for stage k = 1 , , . . . , K do Predict the acceptance probability (cid:98) π i,k ( s i,k , v ) by Eq. (14); Construct the lower uncertainty bound (cid:98) π L i,k ( s i,k , v j ) by Eq. (15); Estimate the distribution F s ∗ i,k ( · ) by the kernel density method (Silverman, 1986); Calibrate the state s i,k according to Proposition 1 or 2; Determine the arm set (cid:98) B L i,k ( s i,k ) in Eq. (18); Calculate the remaining quota: q i − card( ∪ l ≤ k − C i,l ) and the available arms. end for Output:
The arm set (cid:98) B L i,k ( s i,k ) for agent P i at each stages.Here A T +1 k is the set of arms that are available at stage k of time T + 1. The cutoff (cid:98) e L i,k ( s i , v ) = min { max { (cid:98) b i,k ( s i,k ) (cid:98) π i,k ( s i,k , v ) / (cid:98) π L i ( s i , v ) − v, } , } , and (cid:98) b i,k ( s i,k ) is obtained byEq. (10), and the state s i,k is calibrated by Proposition 1 or 2. Applying the convergenceresult in Eq. (13), we have the following consistency result: (cid:98) B L i,k ( s i,k ) → (cid:98) B i,k ( s i,k ) , as T → ∞ , where the arm set (cid:98) B i,k ( s i,k ) is defined in Eq. (11). We summarize this two-step algorithmin Algorithm 1. Agents in a multi-stage decentralized matching markets cannot observe other agents’quotas or the choices of the arms that accept other agents. Each agent only observes thearms that are left in the market at each stage. Theorem 1 implies that due to the hierarchicalstructure, agents prefer arms with stable acceptance probability. This preference lead tostrategic behavior on the part of the agents as follows. Define the uncertainty level as theuncertainty measure δ i,k ( v ) in Eq. (7) relative to the acceptance probability π i,k ( s i,k , v ).That is, uncertainty level ≡ δ i,k ( v ) /π i,k ( s i,k , v ) . (19)19ote that the optimal cutoff (cid:98) e i,k ( s i,k , v ) in Eq. (10) is strictly increasing in the uncertaintylevel for any v ∈ [0 ,
1] and k ≤ K −
1, which implies that an agent favors arms with a lowuncertainty level. Hence, an agent’s strategic behavior in this market is to strategically selectarms with a low uncertainty level. We now study the implications of such strategic behavioron fairness and welfare, and the signaling mechanisms that better clear the congestion inthis market.
The fairness studied here is defined in terms of no justified envy ; see, e.g., Balinski andS¨onmez (1999); Abdulkadiro˘glu and S¨onmez (2003). Specifically, an arm A j has justifiedenvy if, at a stage k ∈ [ K ], A j prefers an agent P i (cid:48) to another agent P i that pulls A j , eventhough P i (cid:48) pulls an arm A j (cid:48) which ranks below A j according to the true preference of P i (cid:48) .We define a multi-stage matching procedure to be fair if there is no arm having justifiedenvy at any stage.We note that the optimal cutoff (cid:98) e i,k ( s i,k , v ) given in Eq. (10) satisfies d (cid:98) e i,k ( s i,k , v ) /dv (cid:54) = − k ≤ K −
1. Thus, P i ’s strategy is not a cutoff strategy to the arm’s latent utilitydefined by Eq. (1). That is, P i may prefer an arm with a smaller latent utility. For example,there are two arms with the same fit to P i but P i may skip the one with a higher scoreand pull the another arm. Formally, the following proposition shows that agents’ strategicbehavior results in unfairness for arms. Proposition 3.
The probability that an arm has justified envy is strictly increasing in thearm’s uncertainty level defined in Eq. (19).
The fairness issue has been noted in practical multi-stage matching markets. For exam-ple, candidates in job markets may “fall through the cracks”—an employer that values acandidate highly perceives that the candidate is unlikely to accept the job offer and hencedecides to conduct an interview with the candidate; hence, candidates may have justifiedenvy Coles et al. (2010). Figure 2 illustrates the unfairness, where the arms from B (1) i,k may20ave justified envy towards the arms from B (2) i,k as the previous group has better latent utilitybut is ignored by the agent P i . Figure 2:
Fairness issue in multi-stage decentralized matching. The dotted line segments rep-resent the linear cutoff with respect to latent utility (see Supplementary Appendix B.5), whichyields a fair outcome(. The solid curve denotes the )nonlinear cutoff in Eq. 10. Here, arms in B (1) i,k have justified envy towards arms in B (2) i,k . As shown in Proposition 3, an arm with a high uncertainty level is likely to have justifiedenvy. We now give an example of the arms that have high uncertainty levels. Consider thearms with top scores, which are likely to receive more offers than other arms. The acceptanceprobabilities of top-scoring arms are expected to have higher uncertainty levels defined inEq. (19). We formalize this intuitive argument in a two-state example.
Proposition 4.
Suppose that there are two states for agent P i at stage k : ≤ s (1) i,k < s (2) i,k ≤ . Then arms with larger scores have higher uncertainty levels in Eq. (19), given that π i,k ( s (1) i,k , v ) /π i,k ( s (2) i,k , v ) is strictly decreasing in v ∈ [0 , . (20)The condition (20) holds in practical markets; for instance, it holds when the acceptanceprobability at the popular state (i.e., π i,k ( s (2) i,k , v )) is insensitive to the score while the ac-ceptance probability at the less-popular state (i.e., π i,k ( s (1) i,k , v )) is decreasing in the score.Proposition 4 and the agent’s strategic behavior imply that agent P i would strategically skipover some arms with top scores and pull the next-best arms as the latter group of arms has21ower uncertainty level. This result confirms that “falling through the cracks” can appen(Coles et al., 2010). The explanation of the fact that the strategic behavior improves theagent’s expected payoff is that if P i only pulls top-scoring arms, the next-best arms that areuncertain about whether P i would pull them in a later stage may accept their less preferredagents. Then at the time that some arms reject P i , the next-best arms that P i has in mindmay be unavailable, and P i is worse off. We analyze the relationship between the unfairness and the regularization parameter η i,k in Eq. (8). We define the level of unfairness as the number of arms that have justifiedenvy. Proposition 5.
The level of unfairness is strictly increasing in η i,k ≥ . We note that by Theorem 1, an agent has increased expected payoff under η i,k > η i,k = 0 for all stages k ≤ K −
1. Together with Proposition 5, there exists a trade-offbetween welfare and fairness since both the level of unfairness and welfare increase whenchanging η i,k = 0 to η i,k >
0. A reason for this trade-off is the agent’s strategic behavior interms of preferring arms with low uncertainty levels of acceptance.
We study signaling mechanisms that can clear congestion in the market and we discussthe heterogeneity of signaling in practice.
Since agents may strategically target arms with low uncertainty levels, we argue thata signaling mechanism through which each arm can credibly signal its interest to agentswould facilitate matches in decentralized matching. An example of such a credible signalingmechanism is letting each arm send a signal of interest to at most one agent. In that case,the scarcity of signals induces credibility (Coles et al., 2013). This signaling mechanism isuseful because credibility reduces an arm’s uncertainty measure δ i,k ( v ) in Eq. (7) and hence22educes the arm’s uncertainty level in Eq. (19). Moreover, the market congestion is bettercleared with the credible signaling mechanism as arms are more likely to match their favoriteagents. This finding agrees with the college admissions evidence, where colleges have anincentive to favor early admissions compared to regular admissions as the early admissionserves as a mechanism for applicants to signal their interest (Avery and Levin, 2010).Next, we consider a second signaling mechanism for arms. Suppose now arms haveresources to invest in improvement. We argue that arms will have increased incentives toraise the agent-specific fit e ij instead of improving the score v j due to two reasons. First,Proposition 4 shows that agents perceive that the top-scored arms have high uncertaintylevels and may strategically skip over those arms. Hence, the investment in raising the scorecannot generally improve an arm’s chance of being matched to the favorite agent. Second,since agent-specific fits are unknown to other agents, agents cannot compete based on fits,and they prefer arms with higher fits, which is shown in Eq. (11). Therefore, the investmentin raising the agent-specific fit can serve as a signaling mechanism.Finally, we discuss a signaling mechanism for agents. We argue that if agents limitthe number of arms on the waiting list, this provides a mechanism of credibly signalinginformation to arms. In particular, the waiting list’s arms can estimate that their chancesof being pulled by the agent in a later stage are non-negligible. Hence the arms are morelikely to wait compared to the case where there is no restriction on the waiting list. The two signaling mechanisms for arms (i.e., credible signaling and investment in fits)have heterogeneous effects. First, the effects of signaling depend on the agents. For example,if an agent P i estimates arms’ uncertainty levels: δ i,k ( v j ) π − i,k ( s i,k , v j ) for all j ∈ [ n ] are small,then arms’ signals have a low value to P i ’s decision-making. On the other hand, the signalshave a high value to P i if arms’ uncertainty levels of acceptance are high.Moreover, the effects of signaling depend on the matching stages. For instance, if thequality of the arms decay quickly in stage number k ∈ [ K ], then the discount factor ρ i,k inEq. (6) increases as k gets large. Since the proof of Theorem 1 shows that the regularizationparameter η i,k decreases when ρ i,k increases, there are fewer strategic behaviors when k gets23arge by Proposition 5. In particular, at the last stage, Theorem 1 shows that it is optimalto choose η i,K = 0, which implies no strategic behavior. In this example, signaling effectsare larger at the early stages (i.e., k is small) compared to that at the late stages (i.e., k islarge). We now compare multi-stage decentralized matching markets with single-stage decen-tralized matching markets and centralized markets in which the matching is implementedvia the deferred acceptance algorithm.
We have shown in Section 4.1 that unfairness arises in multi-stage matching markets.By contrast, the optimal strategy that maximizes an agent’s expected payoff in single-stagematching gives a fair outcome for arms (Dai and Jordan, 2020). However, we show thatagents are better off in multi-stage decentralized matching markets compared to single-stagemarkets.
Proposition 6.
Agents have improved welfare under multi-stage decentralized matchingthan under single-stage decentralized matching.
Next, we prove that agents behave more conservatively in multi-stage markets comparedto single-stage markets. To show this result, we need to establish a relationship betweenstate parameter s i,k and the regularization parameter η i,k . Proposition 7.
The calibrated parameters s i,k by Propositions 1 and 2 are strictly increasingin the regularization parameter η i,k ≥ . Since single-stage matching corresponds to K = 1, Theorem 1 shows that the correspondingregularization parameter η i,K = 0. Together with Proposition 7, the agent would calibratea larger value for the state parameter s i,k in the first stage of the multi-stage markets with η i, >
0, compared to that in single-stage markets. Note that a larger state parameter in theoptimal arm set in Eq. (11) corresponds to the case of selecting fewer arms. Hence agents24ehave more conservatively in multi-stage markets as they would select fewer arms in thefirst stage of multi-stage markets than in single-stage markets.
Many centralized matching markets are implemented by employing the celebrate deferredacceptance (DA) algorithm (Gale and Shapley, 1962). Notable examples are the nationalmedical residency matching program (Roth, 1984) and public school choice (Abdulkadiro˘gluand S¨onmez, 2003). In the arm-proposing version of DA (e.g., student-proposing in collegeadmissions), agents and arms report their ordinal preferences to a clearinghouse, whichsimulates the following multi-stage procedure. Every arm shows its interest to the mostpreferred agent that has not yet rejected it at each stage. Every agent tentatively pullsthe most preferred arms up to its quota limit and permanently rejects the remaining armsthat have indicated their interest to the agent. Once the process terminates, each arm isassigned to the agent that has tentatively pulled it or otherwise remains unmatched. DAhas several desirable properties, as summarized in the following proposition.
Proposition 8.
The outcome of arm-proposing DA is fair and weakly Pareto optimal forarms and agents. Hence, there is no matching that all arms and agents are weakly betteroff and at least one arm or one agent strictly better off than the matching yielded by thearm-proposing DA. The arm-proposing DA also makes it a weakly dominant strategy for allarms to report true preferences.
There exists a formal connection between multi-stage decentralized matching and theDA algorithm; see, e.g., Adachi (2003); Hitsch et al. (2010). Specifically, suppose the marketonly allows one-to-one matching, such as in the marriage market, and there is no restrictionon the number of stages. Then, as agents and arms become more and more patient, theset of equilibria obtainable in multi-stage decentralized matching are the same as the set ofstable matches in the model of Gale and Shapley (1962).However, multi-stage decentralized matching is different from DA in practice. The mainreason is that the acceptance is not tentative (i.e., non-deferrable) in decentralized matching.Moreover, there is usually a restriction on the number of stages in decentralized matching.25or example, the time cost at each stage of multi-stage decentralized matching is not neg-ligible. Proposition 8 shows that agents and arms are collectively better off under DAcompared to decentralized matching. However, we show in an example of Section A.3 thatsome agents are better off by using strategical admission decisions in decentralized mar-kets. This observation gives a partial explanation of the prevalence of decentralized collegeadmissions in many countries.
In this section we study numerically Algorithm 1 numerically in a simulated graduateschool admission example. We provide extensive numerical comparisons of Algorithm 1with other methods in the Supplementary Appendix.Suppose there are 50 graduate schools from three tiers of colleges: five top colleges { P , . . . , P } , ten good colleges { P , . . . , P } , and 35 other colleges { P , . . . , P } . Each hasthe same quota q = 5 and penalty γ = 2 .
5. The simulation generates students’ preferenceswith ten different states { s , . . . , s } ⊂ [0 , { , , , , , } . For each size of students, thereare ten students having score v j chosen uniformly and i.i.d. from [0 . ,
1] and 100 studentshaving score v j i.i.d. uniformly chosen from [0 . , . v j randomly chosen from [0 , . e ij for all college-student pairs are drawn uniformlyand i.i.d. from [0 , simple cutoff strategy , where the latter method allows each college tochoose the most preferred students up to the remaining quota at each stage. The training26
50 260 270 280 290 300
Number of students C o ll ege ' s pa y o ff (a) College P1 LUB-CDMSimple Cutoff Strategy
250 260 270 280 290 300
Number of students (b) College P6
LUB-CDMSimple Cutoff Strategy
250 260 270 280 290 300
Number of students (c) College P16
LUB-CDMSimple Cutoff Strategy
Figure 3:
Performance of the proposed Algorithm 1 (i.e., LUB-CDM) and the Simple CutoffStrategies with varying numbers of students. The results are averaged over 500 data replications.(a): College P from tier 1. (b): College P from tier 2. (c): College P from tier 3. data are simulated from colleges’ random proposing by pulling a random number of armsaccording to the latent utilities. We train 20 times of random proposing under each of thearms’ preference structures. This training data simulates admission over 20 years. Thetesting data draws a random state from { s , . . . , s } which gives the corresponding arms’preferences. The LUB-CDM estimate the lower uncertainty bound using Eq. (16). Figure3 reports the averaged payoffs of three colleges P , P , and P over 500 data replications.Here colleges P , P , and P belong to the three different tiers, respectively. In Figure 3, allcolleges except P use the LUB-CDM with mean calibration while P uses one of the twomethods: the LUB-CDM and the simple cutoff strategy. It is seen that the LUB-CDM givesthe largest average payoffs for all of P , P and P . In particular, the LUB-CDM performssignificantly better for P and P compared to the simple cutoff strategy. We study an admission data from the
New York Times “The Choice” blog (available at https://thechoice.blogs.nytimes.com/category/admissions-data ). In this dataset,37 U.S. colleges reported their admission yields and waiting list offers for 2015–17 applicants.As we discussed in Section 2.2, a college’s yield is a proxy for the state s i,k as it indicatesthe college’s popularity. The set of 37 colleges consists of liberal arts colleges, national27niversities, and other undergraduate programs. We excluded two colleges, Harvard andYale, from the sample due to a significant proportion of missing values. Figure 4:
Regression of the yield on the size of admitted class and the ranking, respectively. Wefit the dashed curves using smoothing splines with the tuning parameter chosen by GCV. Thelabels { , , . . . , } of each point indicates colleges’ ranking according to U.S. News and WorldReport , where two (or more) colleges might tie in the ranking, and liberal arts colleges, nationaluniversities, and other undergraduate programs are ranked separately within their categories.Gray and black points denote colleges with insignificant and significant p -values, respectively, inchi-squared tests under an FDR control. We test if the yields of colleges changed over 2015–17. The null hypothesis is that thestate is the same. We use a simultaneous chi-squared test for all colleges with the countdata on accepted and enrolled students and under an FDR control at a .
05 significance level(Benjamini and Hochberg, 1995). There are 13 colleges tested with significant p -values;see the details in Tables 3 and 4 of the Supplementary Appendix A.4. Figure 4 showsthat colleges with large numbers of admitted students are likely to have significantly variedyields. Moreover, top-ranked national universities and liberal arts colleges are likely to havesignificantly varied yields. This observation corroborates the uncertainty in applicants’preferences facing colleges.We estimate the uncertainty level δ i,k ( v ) π − i,k ( s i,k , v ) defined in Eq. (19) and study col-leges’ strategic responses. While conclusive evidence on the individual students’ acceptanceprobability is difficult to obtain, we estimate the college-wise uncertainty on the yield: (cid:112) Var( s i,k ) s − i,k . Since the choice set for admitted students differs across years, the yield’s28 igure 5: Regression of uncertainty level on the size of admitted class and the ranking, respec-tively. Two dashed curves are fitted using smoothing splines with the tuning parameter chosen byGCV. uncertainty underestimates the uncertainty facing a college. There are two colleges, Collegeof the Holy Cross and Emory University, excluded from the sample due to their extremeuncertainty levels and the change in accounting methods (Che and Koh, 2016). We show inthe Supplementary Appendix A.4 that the hierarchical structure of arms is evident in thisexample. Figure 5 shows that colleges’ uncertainty levels are much smaller than one, which,together with Theorem 3, implies that students face limited unfairness. In particular, theyield uncertainty is robust to the size of admitted students; see the left plot of Figure 5. Onthe other hand, top-ranked national universities may have higher uncertainty levels; see theright plot of Figure 5, where the outlier is the University of Chicago at the .
19 uncertaintylevel. We verify the higher uncertainty level for top universities using the waiting list data.We perform Fisher’s exact test for the rank data on the difference of rates of accepted waitinglist students to total enrolled students over 2015–16. This statistic reflects the uncertaintyon both the regular admission yield and the wait-listed students’ quality. We reject the nullhypothesis that the uncertainty of acceptance is the same for all national universities at the .
05 significance level. The higher uncertainty for top-ranked national universities may arisedue to the intense competition. Those universities are better off by employing strategicadmission to reduce the enrollment uncertainty. This result implies that students are more29ikely to experience unfairness when applying for top national universities.
This paper develops a nonparametric statistical model to learn optimal strategies inmulti-stage decentralized matching markets. The model provides insight into the inter-play between learning and economic objectives in decentralized matching markets. In themodel, arms have uncertain preferences that depend on the unknown state of the worldand competition among the agents. We propose an algorithm, built upon the ideas of loweruncertainty bound and calibrated decentralized matching, for learning optimal strategiesusing historical data. We find that agents act strategically in favor of arms with low uncer-tainty levels of acceptance. The strategic targeting improves an agent’s welfare but leadsto unfairness for arms. Our theory suggests several signaling mechanisms to better clearthe congestion in multi-stage decentralized markets. It also allows analytical comparisonsbetween single-stage decentralized markets and centralized markets. We have not studiedalgorithmic strategies when agents’ preferences show complementarities or indifference. Forinstance, firms demand workers that complement one another in terms of their skills androles, or some applicants are indistinguishable to a firm. We leave these questions for futurework.
A Supplementary Numerical Results
A.1 Comparison with the Straightforward Strategy
In this example, we compare the proposed Algorithm 1 with the straightforward strategy ,where the latter method pulls arms according to the latent utility defined in Eq. 1 andcalibrates the state in the same way as Algorithm 1.Suppose there are n arms A = { A , A , . . . , A n } and three agents P = { P , P , P } ,where each agent has a quota q < n/
3. There are two equally likely states: s a and s b with s a = 1 − s b > /
2. All arms prefer P and P to P , but the arms prefer P compared tobeing unmatched. Agents P and P evaluate each arm based on score v and with probability p ∗ ∈ (0 , P and P finds an arm unacceptable. Agent P evaluates each arm only30ased on the score. For each state j ∈ { a, b } , a fraction s j of arms receives utility u whenmatched to P and utility u when matched to P , where u > u and the remaining (1 − s j )of arms receive the opposite utilities. Hence, P is more popular under the state s a and P is more popular under the state s b . In each state, an arm gets utility u from P , where(1 − p ∗ ) u < u < u . This condition implies that an arm is better off by accepting P thanwaiting for P or P . We consider a two-stage matching, where at the first stage, each agentpulls a set of arms and wait-lists other arms. An arm pulled by an agent must accept orreject the agent immediately. Proposition A.9.
Agent P is better off by using Algorithm 1 than using the straightforwardstrategy, where the expected payoff is improved by O ( η , ) . Here η , is the regularizationparameter defined in Theorem 1. Regularization parameter (a) Relative increase of P1's payoffwhen P1 uses the LUB-CDM
Regularization parameter -50%-40%-30%-20%-10%0 (b) Relative decrease of P2's payoffwhen P1 uses the LUB-CDM
Figure 6:
Comparison of the proposed Algorithm 1 (i.e., LUB-CDM) and the straightforwardstrategy. The results are averaged over 500 data replications. (a) The relative increase of P ’s pay-offs when P changes from the straightforward strategy to the LUB-CDM, where the improvementis O ( η , ). (b) The relative decrease of P ’s payoffs when P changes from the straightforwardstrategy to the LUB-CDM. To illustrate the improvement, we consider the states s a = 0 . , s b = 0 .
4, the num-ber of arms n = 100, the quota q = 10, the utilities u = 1 , u = 0 . , u = 0 .
8, andthe probability p ∗ = 0 .
3. Suppose that the score v follows a deterministic uniform de-sign points { . , . , . , . . . , . , } ⊂ [1 , γ = γ = γ = 5. We compare the proposed Algorithm 1 (i.e., LUB-CDM) with the31traightforward strategy (i.e., CDM). The latter method is a straightforward strategy as itpulls arms according to the latent utilities in Eq. 1 without strategic behaviors. Figure 6reports P ’s and P ’s relative changes in payoffs, when P changes from using the CDM tousing the LUB-CDM. The results are averaged over 500 data replications. Here P usingthe LUB-CDM and the CDM correspond to η , > η , = 0, respectively. The P usesCDM. It is seen the LUB-CDM improves P ’s expected payoff, where the improvement isat the cost of P ’s payoff. A.2 Comparison with the Patient Strategy in the Adachi’s Model
In this example, we compare the proposed Algorithm 1 with the patient strategy , wherethe latter method pulls arms according to the latent utility at the beginning stage but hasmore strategic behaviors as the matching proceeds. We consider a search model due toAdachi (2003) that captures the search process in matching markets and builds a connec-tion between the multi-stage decentralized matching markets and the centralized matchingmarkets; see the discussions in Section 5.2.Suppose there are n arms A = { A , A , . . . , A n } and m agents P = { P , P , . . . , P m } ,where each agent has quota q = 1. At each stage, each agent comes across a randomlysampled arm. Let v P ( i ) and v A ( j ) be the reservation utilities of agent P i and arm A j fromstaying unmatched and continuing the search. Recall the latent utility U i ( A j ) in Section2.2. Similarly, we define U j ( P i ) as the utility that arm A j receives when matched to P i .Let v P ( i ) and v A ( j ) be the reservation utilities of agent P i and arm A j from staying singleand continuing the search for a match. Hence { P i pulls A j } = { U i ( A j ) ≥ v P ( i ) } , and { A j accepts P i } = { U j ( P i ) ≥ v A ( j ) } . The utility that agent P i gets upon coming acrossarm A j is ¯ U i ( A j ) = U i ( A j ) { U i ( A j ) ≥ v P ( i ) } { U j ( P i ) ≥ v A ( j ) } + v P ( i )[1 − { U i ( A j ) ≥ v P ( i ) } { U j ( P i ) ≥ v A ( j ) } ] , where the first term on the right-hand side is the utility from a successful match andthe second term on the right-hand side is the utility when no match occurs. Adachi’smodel involves a stage discount factor ρ >
0, where the Bellman equations for the optimal32eservation values and search rules are v P ( i ) = ρ (cid:90) ¯ U i ( A j ) dF A ( j ) and v A ( j ) = ρ (cid:90) ¯ U j ( P i ) dF P ( i ) , (A.21)where F A and F P are the distributions that each agent and arm came across. Adachi (2003)shows that Bellman equations in Eq. (A.21) defines an iterative mapping that convergesto the equilibrium reservation utilities ( v ∗P ( i ) , v ∗A ( j )). Furthermore, as ρ →
1, the Bellmanequations lead to the matching outcomes that are stable in the sense of Gale and Shapley(1962).
Stage R e s e r v a t i on u t ili t y (a) LUB-CDM Stage R e s e r v a t i on u t ili t y (b) Patient strategy
200 400 600 800 1000
Number of arms P a y o ff (c) Agent's payoff LUB-CDMPatient Strategy
Figure 7:
Performance of the proposed Algorithm 1 (i.e., LUB-CDM) and the patient strategy.The results averaged over 500 data replications. (a) P ’s reservation utility under the LUB-CDMgiven by 50 − k ), where the stage k = 1 , . . . , P ’s reservation utility under the patientstrategy given by 50 + 5log( N +1 − kN ). (c) P ’s payoffs with varying number of arms. Since the equilibrium reservation utilities ( v ∗P ( i ) , v ∗A ( j )) are unknown in practice, agentsneed to learn an optimal strategy of choosing the reservation utility v P ( i ) at different stages.We compare the proposed Algorithm 1 (i.e., LUB-CDM) with the patient strategy, wherethe latter is defined as the strategy with ρ = 1 at the beginning stage k = 1 and decreasing ρ as the matching proceeds in Eq. (A.21). Note that LUB-CDM has less strategic behaviorsas the matching proceeds. Hence it corresponds to the case that v P ( i ) is a convex functionof the stages. On the other hand, the patient strategy has more strategic behaviors as thematching proceeds. Hence it corresponds to the case that v P ( i ) is a concave function of thestages. Suppose that different arms receive the same utility for matching the same agent,that is, U j ( P i ) = U j (cid:48) ( P i ) , ∀ j (cid:54) = j (cid:48) , which utility is unknown to P i . Similarly, different agentsreceive the same utility for matching the same arm, that is, U i ( A j ) = U i (cid:48) ( A j ) , ∀ i (cid:54) = i (cid:48) , which33tility is known to A j . Then P i matches with A j if the event { U j ( P i ) ≥ U i ( A j ) ≥ v P ( i ) } holds. Suppose that agent P ’s utility is U j ( P ) = 40, and m = n ∈ { , , . . . , } .Let the reservation utility v P (1) at the stage k be 50 − k ) and 50 + 5log( N +1 − kN ) forthe LUB-CDM and the patient strategy, respectively; see Figure 7(a) and (b). Figure7(c) reports P ’s payoff under two methods, where the LUB-CDM outperforms the patientstrategy. Therefore, the strategic behavior at early stages improves the agent’s payoff inpractice, which result corroborates Theorem 1. A.3 Comparison of Multi-Stage Matching and DA
In this example, we compare the multi-stage decentralized matching with the DA algo-rithm (Gale and Shapley, 1962).Suppose there are four arms A = { A , A , A , A } and three agents P = { P , P , P } .Agents have varied quotas: q = 2 and q = q = 1. Arms’ attributes are given by v = v = v = 2 , v = 1, and e = e = e = 0 , e = e = e = 0 . , e = e = e = 1, e = 0 . e = 0 . e = 0 .
8. The latent utilities and arms’ true preferences are shown inTable 1. For the decentralized matching, suppose that at each stage, every agent uses thestraightforward strategy by pulling its most preferred arms up to the quota. Arms accepttheir most preferred agent (if any) or wait until the next stage. Then the decentralizedmatching has the outcome ( A , P ) , ( A , P ) , ( A , P ) , ( A , P ). On the other hand, the DAalgorithm gives the outcome ( A , P ) , ( A , P ) , ( A , P ) , ( A , P ), which the unique stablematching outcome. Here both P and P strictly prefer the decentralized matching outcometo DA outcome. This result agrees with the remark in Section 5.2 that some agents arebetter off under the decentralized matching.Second, we study the incentive of agents in the multi-stage decentralized matching.We show that it is not a dominant strategy for each agent to use the straightforwardstrategy by pulling arms according to the latent utility. For example, consider the pref-erences in Table 1. If P skips over A and firstly pulls A , and other agents pull theirmost preferred arms up to their quotas. Then the decentralized matching has the outcome( A , P ) , ( A , P ) , ( A , P ) , ( A , P ), where P is strictly better off compared to the outcomewhen P firstly pulls A . 34 able 1: (a) Arm’s latent utilities for each agent, which corresponds to Eq. (1). (b) Arms’preferences with the number indicating the arms’ ranking of agents. For example, A ranks P first, P second, P third. These preferences are unknown to agents. (a) Arm’s latent utility (b) Arm’s preference A A A A P P P A A A A P P P Table 2: (a) Arm’s latent utilities for each agent. (b) Arms’ preferences with the number in-dicating the arms’ ranking of agents. For example, A ranks P first, P second, P third, P fourth. (a) Arm’s latent utility (b) Arm’s preference A A A A P P P P P P P P A A A A A , P ) , ( A , P ) , ( A , P ) , ( A , P ). However, suppose arms are strategic, where A rejects P as P is A ’s least favorite agent and A believes the coming agent will not be worse. Theoutcome becomes ( A , P ) , ( A , P ) , ( A , P ) , ( A , P ). Hence A is strictly better off. Be-sides, if A also rejects { P , P } as they are A ’s two least favorite agents, the decentralizedmatching gives the outcome ( A , P ) , ( A , P ) , ( A , P ) , ( A , P ). Hence A and A are bothstrictly better off. Moreover, suppose there is a coordination mechanism among arms suchthat each arm only accepts the most preferred agent. The decentralized matching gives theoutcome ( A , P ) , ( A , P ) , ( A , P ) , ( A , P ), which is the arm-optimal stable matching.35 .4 Supplementary Results for the Real Data Application We give supplementary results to the real data analysis in Section 7.
A.4.1 Chi-Squared Test with FDR Control
We show the simultaneous chi-squared tests for 35 colleges for the null hypothesis thatthe yield does not change over the three years 2015–17. We apply the false discovery rate(FDR) control at .
05 significance level (Benjamini and Hochberg, 1995). Tables 3 and 4report the 13 colleges with significant p -values and the 22 colleges with insignificant p -values,respectively. Table 3:
13 chi-squared tests with significant p -value under the FDR control at the .
05 significancelevel. Colleges’ ranking data are from
U.S. News and World Report . The ”Y/N” means the useof the waiting list varied during 2015–17. p -value Category Ranking waiting listBoston University .0013 National University 40 YesBrown University .0012 National University 14 NoClaremont McKenna College .0003 Liberal Arts College 7 Y/NCollege of Holy Cross 2.20 E -16 Liberal Arts College 27 YesEmory University 2.20 E -16 National University 21 YesGeorgia Tech .0022 National University 29 YesMiddlebury College .0065 Liberal Arts College 7 Y/NPrinceton University 8.31 E -12 National University 1 YesStanford University 2.50 E -06 National University 6 Y/NUniversity of Chicago 2.20 E -06 National University 6 Y/NUniversity of Rochester .0001 National University 29 Y/NUSC 2.31 E -11 National University 22 NoUniversity of Wisconsin .0008 National University 46 Y/N A.4.2 Hierarchical Structure
We present the evidence on the hierarchical structure in the sense that students whowere invited to the waiting list and remain available at a later stage are likely to be far worsethan the admitted students at the regular admission stage. The report of NACAC (2019)shows that the admission rate of the waiting list is significantly lower than that of regular36 able 4:
22 chi-squared tests with insignificant p -value under the FDR control at the .
05 signifi-cance level. Colleges’ ranking data are from
U.S. News and World Report . The ”Y/N” means theuse of the waiting list varied during 2015–17. p -value Category Ranking waiting listBabson College .8994 Other Program 31 YesBarnard College .6159 Liberal Arts College 25 YesBates College .0798 Liberal Arts College 21 YesCalTech .0584 National University 12 Y/NCarnegie Mellon University .4988 National University 25 YesCollege of William&Mary .2227 National University 40 YesCooper Union .9512 Other Program 3 YesDartmouth College .2217 National University 12 Y/NDickinson College .4727 Liberal Arts College 46 Y/NElon University .6872 National University 84 Y/NGeorge Washington University .0309 National University 70 YesJohns Hopkins University .1799 National University 10 YesKenyon College .8012 Liberal Arts College 27 YesLafayette College .8719 Liberal Arts College 39 YesOlin College of Engineering .5317 Other Program 5 Y/NRensselaer Polytech .0285 National University 50 Y/NScripps College .6511 Liberal Arts College 33 Y/NSt. Lawrence University .0587 Liberal Arts College 58 YesUniversity of Maryland .4438 National University 64 Y/NUniversity of Michigan .0277 National University 25 Y/NUniversity of Pennsylvania .3665 National University 6 Y/NVanderbilt University .7576 National University 15 Y/Nadmission. The top students in a college’s waiting list, uncertain about their rankings in thelist and whether the college would admit them later, may have accepted offers from theirless preferred colleges. We calculate the admission rate of the waiting list as follows:the number of offers sent to wait-listed studentsthe total number of students invited to the waiting list . Figure 8 reports that the majority ( > Figure 8:
The regression uses smoothing splines with the tuning by GCV.
B Proofs
B.1 Proof of Theorem 1
Proof.
We introduce additional notations. Let V i,k ( s ∗ i,k , B i,k ) be the expected utility of armsfrom B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } for agent P i at stage k ∈ [ K ]. That is, V i,k ( s ∗ i,k , B i,k ) ≡ (cid:88) j ∈B i,k ( v j + e ij ) π i,k ( s ∗ i,k , v j ) . Let N i,k ( s ∗ i,k , B i,k ) be the expected number of arms in B i,k accepting P i . That is, N i,k ( s ∗ i,k , B i,k ) ≡ (cid:88) j ∈B i,k π i,k ( s ∗ i,k , v j ) . By Lagrangian duality, the optimization of L i,k [ B i,k ] in Eq. (5) can be reformulated to theconstraint form:max B i,k ⊆{A k \∪ l ≤ k − B i,l } (cid:110) V i,k ( s ∗ i,k , B i,k ) − γ i max {N i,k ( s ∗ i,k , B i,k ) + card( ∪ l ≤ k − C i,l ) − q i , } (cid:111)(cid:124) (cid:123)(cid:122) (cid:125) I , s.t. UE( B i,k ) ≥ η (cid:48) i,k (cid:124) (cid:123)(cid:122) (cid:125) I , η (cid:48) i,k > k ≤ K −
1, and η (cid:48) i,K = 0.The constraint I can be written as V i,k ( s ∗ i,k , B i,k ) ≤ V i,k ( s ∗ i,k , B ∗ i,k ) − η (cid:48) i,k , ∀ s ∗ i,k , (B.22)where B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } . Since π i,k ( · , · ) is assumed to belong to an RKHS, π i,k ( · , · ) isbounded (Wahba, 1990). By Hoeffding’s bound, with probability at least 1 − e − (cid:15) , ∀ (cid:15) > V i,k ( s ∗ i,k , B i,k ) < E s ∗ i,k [ V i,k ( s ∗ i,k , B i,k )] + (cid:115) (cid:15) (cid:88) j ∈B i,k δ i,k ( v j )( v j + e ij ) < E s ∗ i,k [ V i,k ( s ∗ i,k , B i,k )] + √ (cid:15) (cid:88) j ∈B i,k δ i,k ( v j )( v j + e ij ) . Hence a sufficient condition for Eq. (B.22) is to control (cid:88) j ∈B i,k δ i,k ( v j )( v j + e ij ) < η (cid:48)(cid:48) i,k , for B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } . (B.23)Here η (cid:48)(cid:48) i,k > k ≤ K −
1. Both the I and Eq. (B.23) areconvex, and so by Lagrangian duality, they can be reformulated in the penalized form thatfinding B i,k ⊆ {A k \ ∪ l ≤ k − B i,l } to minimize (cid:88) j ∈B i,k ( v j + e ij )[ π i,k ( s i,k , v j ) − η i,k δ i,k ( v j )] − γ i max {N i,k ( s ∗ i,k , B i,k ) + card( ∪ l ≤ k − C i,l ) − q i , } , where η i,k > k ≤ K − η i,K = 0. This completes the proof. B.2 Proof of Theorem 2
Proof.
We divide the proof of this theorem into five steps.
Step 1.
We show that the optimal strategy prefers an arm with higher fit given the samescore. Suppose that arms A j , A j ∈ {A k \ ∪ l ≤ k − B i,l } have the same score v j = v j , but A j has a worse fit than A j to agent P i . Now assume that A j was pulled by P i at stage k but A j was not, that is, A j ∈ (cid:98) B i,k ( s i,k ) , A j (cid:54)∈ (cid:98) B i,k ( s i,k ). Then the expected number ofarms accepting P i is unchanged if P i replaces A j with A j in (cid:98) B i,k ( s i,k ). On the other hand,since the loss function in Eq. (8) is strictly decreasing in fit e ij , P i should pull A j instead A j . This argument holds regardless of strategies of other agents.39 tep 2. We show that the cutoff curve (cid:98) e i,k ( s i,k , v ) in Eq. (10) is well-defined. If theboundary {B + i,k ( s i,k ) \ B − i,k ( s i,k ) } is not empty, then P i pulling an arm A j on the boundaryyields the loss L † i,k [ A j ] ≤ , which justifies the condition specified by Eq. (9). Since (cid:98) e i,k ( s i,k , v ) ∈ [0 , Step 3.
We show that (cid:98) e i,k ( s i,k , v ) is the unique optimal cutoff curve. Let (cid:101) e i,k ( s i,k , v ) ∈ [0 , σ i,k ( s i,k , v, e i ; t ) ≡ t · { e i ≥ (cid:101) e i,k ( s i,k , v ) } + (1 − t ) · { e i ≥ (cid:98) e i,k ( s i,k , v ) } , for t ∈ [0 , . The corresponding loss of the mixed strategy σ i is¯ L i,k ( t ) = (cid:88) j ∈{A k \∪ l ≤ k − B i,l } ( v j + e ij )[ η i,k δ i,k ( v j ) − π i,k ( s i,k , v j )] σ i,k ( s i,k , v j , e ij ; t )+ γ i max (cid:110) (cid:88) j ∈{A k \∪ l ≤ k − B i,l } π i,k ( s i,k , v j ) σ i,k ( s i,k , v j , e ij ; t ) + card( ∪ l ≤ k − C i,l ) − q i , (cid:111) . It is clear that ¯ L i,k ( t ) is convex in t . We discuss the local change d ¯ L i,k (0) /dt in three cases.Case (I): Consider removing a single arm from (cid:98) B i,k ( s i,k ). If the arm is from the non-empty boundary {B + i,k ( s i,k ) \ B − i,k ( s i,k ) } , the condition specified by Eq. (9) implies that theloss ¯ L i,k ( t ) increases if not pulling the arm. Moreover, by construction, any other arm A j in (cid:98) B i,k ( s i,k ) satisfies( v j + e ij )[ π i,k ( s ∗ i,k , v j ) − η i,k δ i,k ( v j )] > (cid:98) b i,k ( s i,k ) π i,k ( s ∗ i,k , v j ) ≥ γ i (cid:88) j (cid:48) ∈ (cid:98) B i,k ( s i,k ) π i,k ( s i,k , v j (cid:48) ) − γ i [ q i − card( ∪ l ≤ k − C i,l )] . Hence, removing A j from (cid:98) B i,k ( s i,k ) results in a strict increase in ¯ L i,k ( t ). We have d ¯ L i,k (0) /dt > L i,k ( t ) in t , we obtain¯ L i,k (1) = ¯ L i,k (0) + d ¯ L i,k (0) dt (1 − > ¯ L i,k (0) , Case (II): Consider adding a new arm with attributes { v j (cid:48) , e ij (cid:48) } to (cid:98) B i,k ( s i,k ), where thenew arm is not from the set B + i,k ( s i,k ). Denote by B (cid:48) i,k ( s i,k ) the new arm set with the added40rm. Note that P i pulls a new arm only if the arm reduces the loss ¯ L i,k ( t ), that is,( v j (cid:48) + e ij (cid:48) )[ π i,k ( s i,k , v j (cid:48) ) − η i,k δ i,k ( v j (cid:48) )] ≥ γ i (cid:88) j ∈B (cid:48) i,k ( s i,k ) π i,k ( s i,k , v j ) − γ i [ q i − card( ∪ l ≤ k − C i,l )] . (B.24)Since the added new arm is not in B + i,k ( s i,k ) and (cid:80) j ∈B + i,k ( s i,k ) π i,k ( s i,k , v j ) ≥ q i − card( ∪ l ≤ k − C i,l ),we have (cid:88) j ∈B (cid:48) i,k ( s i,k ) π i,k ( s i,k , v j ) − [ q i − card( ∪ l ≤ k − C i,l )] ≥ (cid:88) j ∈B (cid:48) i,k ( s i,k ) π i,k ( s i,k , v j ) − (cid:88) j ∈B + i,k ( s i,k ) π i,k ( s i,k , v j ) ≥ π i,k ( s i,k , v j (cid:48) ) ≥ π i,k ( s i,k , v j (cid:48) ) − η i,k δ i,k ( v j (cid:48) ) . (B.25)Because that γ i > sup j ∈A { v j + e ij } and η i,k ≥
0, the result in Eq. (B.25) is contradictory toEq. (B.24). Hence, adding a new arm to (cid:98) B i,k ( s i,k ) results in an increase in the loss ¯ L i,k ( t ).Hence, d ¯ L i,k (0) /dt > L i,k ( t ) in t , we obtain¯ L i,k (1) = ¯ L i,k (0) + d ¯ L i,k (0) dt (1 − > ¯ L i,k (0) , Case (III): Consider removing an arm with attributes ( v j , e ij ) from (cid:98) B i,k ( s i,k ) and simulta-neously adding new arms to (cid:98) B i,k ( s i,k ). Suppose that the new arms have attributes ( v j (cid:48)(cid:48) , e ij (cid:48)(cid:48) )and are from B (cid:48)(cid:48) i,k ( s i,k ). If (cid:98) B i,k ( s i,k ) = B − i,k ( s i,k ), then the new arms are not in B − i,k ( s i,k ) andby definition,( v j (cid:48)(cid:48) + e ij (cid:48)(cid:48) )[ π i,k ( s i,k , v j (cid:48)(cid:48) ) − η i,k δ i,k ( v j (cid:48)(cid:48) )] π − i,k ( s i,k , v j (cid:48)(cid:48) ) ≤ min j ∈B − i,k ( s i,k ) (cid:8) ( v j + e ij )[ π i,k ( s i , v j ) − η i δ i,k ( v j )] π − i,k ( s i,k , v j ) (cid:9) . Hence, L † i [ B − i,k ( s i,k )] − ¯ L i,k (1) ≤ (cid:88) j (cid:48)(cid:48) ∈B (cid:48)(cid:48) i,k ( s i,k ) ( v j (cid:48)(cid:48) + e ij (cid:48)(cid:48) )[ π i,k ( s i,k , v j (cid:48)(cid:48) ) − η i,k δ i,k ( v j (cid:48)(cid:48) )] π − i,k ( s i,k , v j (cid:48)(cid:48) ) · π i,k ( s i,k , v j (cid:48)(cid:48) ) ≤ (cid:104) min j ∈B − i,k ( s i,k ) ( v j + e ij )(1 − η i,k π − i,k ( s i,k , v j ) δ i,k ( v j )) (cid:105) · (cid:104) q i − card( ∪ l ≤ k − C i,l ) − (cid:88) j ∈B − i,k ( s i,k ) π i,k ( s i,k , v j ) (cid:105) = UE † . (B.26)41f (cid:98) B i,k ( s i,k ) = B + i,k ( s i,k ), then by definition of B + i,k ( s i,k ) L † i [ B + i,k ( s i,k )] − ¯ L i,k (1) ≤ L † i [ B − i,k ( s i,k )] − ¯ L i,k (1) ≤ UE † . where the last inequality is by Eq. (B.26). Hence,¯ L i,k (0) − ¯ L i,k (1) ≤ UE † . Therefore, exchanging an arm in (cid:98) B i,k ( s i,k ) with arms not in (cid:98) B i,k ( s i,k ) could at best resultsin a decrease in the loss ¯ L i,k ( t ) by UE † . Combining the cases (I), (II), (III), we obtain that L † i [ (cid:98) B i,k ( s i,k )] ≤ min B i,k ⊆{A k \∪ l ≤ k − B i,l } L † i [ B i,k ] + UE † . Step 4.
We prove the other direction of the inequality. Since (cid:98) B i,k ( s i,k ) ⊆ {A k \∪ l ≤ k − B i,l } , L † i [ (cid:98) B i,k ( s i,k )] ≥ min B i,k ⊆{A k \∪ l ≤ k − B i,l } L † i [ B i,k ] . Step 5.
If there is a continuum of arms and π i,k ( · , v ) is continuous in v , then there exists b i,k ≥ i,k ( b i,k ) = q i − card( ∪ l ≤ k − C i,l ), where Π i,k ( b i,k ) is defined in Section3.1.3:Π i,k ( b i,k ) = (cid:88) j ∈A (cid:0) e ij ≥ min (cid:8) max (cid:8) b i,k [1 − η i,k δ i,k ( v j ) π − i,k ( s i,k , v j )] − − v j , (cid:9) , (cid:9)(cid:1) π i,k ( s i,k , v j ) . Therefore, by definition, (cid:98) B i,k ( s i,k ) = B + i,k ( s i,k ) = B − i,k ( s i,k ), and q i − card( ∪ l ≤ k − C i,l ) − (cid:88) j ∈B − i,k ( s i,k ) π i,k ( s i,k , v j ) = 0 . Hence UE † = 0. This completes the proof. B.3 Proof of Proposition 1
Proof.
We follow the proof arguments for Theorem 4 in Dai and Jordan (2020). The onlydifference is that here we define the following penalized expected utility and the expectednumber of arms: V i,k ( s ∗ i,k , (cid:98) B i,k ) ≡ (cid:88) j ∈ (cid:98) B i,k ( v j + e ij )[ π i,k ( s ∗ i,k , v j ) − η i,k δ i,k ( v j )] , N i,k ( s ∗ i,k , (cid:98) B i,k ) ≡ (cid:88) j ∈ (cid:98) B i,k π i,k ( s ∗ i,k , v j ) .
42e omit the details for simplicity.
B.4 Proof of Proposition 2
Proof.
Here the proof follows from Theorem 5 in Dai and Jordan (2020).
B.5 Proof of Proposition 3
Proof.
Recall the cutoff parameter (cid:98) b i,k ( s i,k ) defined in Eq. (10). Similarly, we define a cutoffparameter b (cid:48) i,k ( s i,k ) for the linear cutoff: e (cid:48) i,k ( s i,k , v ) = min { max { b (cid:48) i,k ( s i,k ) − v, } , } followingthree steps. First, we define thatΠ (cid:48) i,k ( b i,k ) ≡ (cid:88) j ∈A ( e ij ≥ min { max { b i,k − v j , } , } ) π i,k ( s i,k , v j ) . If there exists b i,k ≥ (cid:48) i,k ( b i,k ) = q i − card( ∪ l ≤ k − C i,l ), we let b (cid:48) i,k ( s i,k ) = b i,k .Second, if there is no solution to Π (cid:48) i,k ( b i,k ) = q i − card( ∪ l ≤ k − C i,l ), we let b + i,k ( s i,k ) = arg max b i,k ≥ (cid:8) Π (cid:48) i,k ( b i,k ) > q i − card( ∪ l ≤ k − C i,l ) (cid:9) ,b − i,k ( s i,k ) = arg min b i,k ≥ (cid:8) Π (cid:48) i,k ( b i,k ) < q i − card( ∪ l ≤ k − C i,l ) (cid:9) . Define e + i,k ( s i,k , v ) ≡ min { max { b + i,k ( s i,k ) − v, } , } and e − i,k ( s i,k , v ) ≡ min { max { b − i,k ( s i,k ) − v, } , } , which correspond to sets B + i,k ( s i,k ) = { j | e ij ≥ e + i,k ( s i,k , v j ) } and B − i,k ( s i,k ) = { j | e ij ≥ e − i,k ( s i,k , v j ) } , respectively. Consider the following condition for the arms on theboundary {B + i,k ( s i,k ) \ B − i,k ( s i,k ) } : (cid:88) j ∈B + i,k ( s i,k ) \B − i,k ( s i,k ) ( v j + e ij ) π i,k ( s i,k , v j ) ≥ γ i (cid:88) j ∈B + i,k ( s i,k ) π i,k ( s i,k , v j ) − γ i q i . If the above condition holds, let b (cid:48) i,k ( s i,k ) = b + i,k ( s i,k ) and otherwise, let b (cid:48) i,k ( s i,k ) = b − i,k ( s i,k ).Third, we let the linear cutoff e (cid:48) i,k ( s i,k , v ) = min { max { b (cid:48) i,k ( s i,k ) − v, } , } . Now for any η i,k ≥ s i,k , the set of arms that have justified envy is V ( η i,k , s i,k ) = (cid:40) ( v, e i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) b i,k ( s i,k )1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v ) > v + e i > b (cid:48) i,k ( s i,k ) (cid:41) . v, e ) has justified envy is increasing in (cid:98) b i,k ( s i,k )1 − η i,k δ i,k ( v ) π − i,k ( s i,k , v ) . (B.27)Note that (B.27) is strictly increasing in the arm’s uncertainty level δ i,k ( v ) π − i,k ( s i,k , v ), theprobability that an arm has justified envy is strictly increasing in the arm’s uncertaintylevel. B.6 Proof of Proposition 4
Proof.
The uncertainty level can be written as δ i,k ( v ) π i,k ( s i,k , v ) = π i,k ( s (2) i,k , v ) − π i,k ( s (1) i,k , v ) π i,k ( s i,k , v ) = (cid:40) π i,k ( s (2) i,k , v ) /π i,k ( s (1) i,k , v ) − s i,k = s (1) i,k , − π i,k ( s (1) i,k , v ) /π i,k ( s (2) i,k , v ) if s i,k = s (2) i,k . Hence, the uncertainty level is strictly decreasing in the ratio π i,k ( s (1) i,k , v ) /π i,k ( s (2) i,k , v ) <
1. Onthe other hand, the condition (20) states that π i,k ( s (1) i,k , v ) /π i,k ( s (2) i,k , v ) is strictly decreasingin v ∈ [0 , δ i,k ( v ) π − i,k ( s i,k , v ) is strictly increasing in v ∈ [0 , B.7 Proof of Proposition 5
Proof.
Adopting the proof in Section B.5, we note that the number of arms having justifiedenvy is (cid:88) j ∈{A T +1 k \∪ l ≤ k − B i,l }∩V ( η i ,s ) (cid:34) (cid:98) b i,k ( s i,k )1 − η i,k δ i,k ( v j ) π − i,k ( s i,k , v j ) − b (cid:48) i,k ( s i,k ) (cid:35) . (B.28)The term in the bracket of Eq. (B.28), i.e., (cid:98) b i,k ( s i,k )1 − η i,k δ i,k ( v j ) π − i,k ( s i,k , v j ) − b (cid:48) i,k ( s i,k )is strictly increasing in η i,k . Hence the number of arms having justified envy is strictlyincreasing in η i,k ≥
0. This completes the proof.44 .8 Proof of Proposition 6
Proof.
We show the improved welfare for agents by construction. Consider the strategy ofan agent, for example, P i with i ∈ [ m ]. Suppose that P i pulls arms at the first stage inmulti-stage matching using the strategy that P i would have used in single-stage matching.All arms that would have accepted P i in single-stage matching accept P i . The reason is thatarms have incomplete information on what other offers are coming in later stages. Hence, P i can achieve at least as well as its payoff from single-stage matching. Therefore, agentsbenefit from multi-stage matching. B.9 Proof of Proposition 7
Proof.
Recall that the mean calibration chooses s i as the solution of P ( s ∗ i,k (cid:54) = s i,k ) (cid:88) j ∈ ∂ (cid:98) B i,k ( s i,k ) ( v j + e ij ) E s ∗ i,k (cid:2) π i,k ( s ∗ i,k , v j ) − η i,k δ i,k ( v j ) | s ∗ i,k (cid:54) = s i,k (cid:3) = γ i [1 − F s ∗ i,k ( s i,k )] (cid:88) j ∈ ∂ (cid:98) B i,k ( s i,k ) E s ∗ i,k [ π i,k ( s ∗ i,k , v j ) | s i,k < s ∗ i,k ≤ , where the left side of the equation is strictly decreasing in η i,k ≥ s i,k ∈ (0 , s i,k is strictly increasingin η i,k . Similar argument applies to the minimax calibration. B.10 Proof of Proposition 8
Proof.
The fairness and Pareto efficiency properties are well-known (Balinski and S¨onmez,1999; Abdulkadiro˘glu and S¨onmez, 2003; Roth, 2008). The dominant strategy property isalso a well-known property of DA in the sense thwt no arm can be better off by misrepre-senting their preferences given other agents and arms submit their preferences (Dubins andFreedman, 1981; Roth, 1982).
B.11 Proof of Proposition A.9
Proof.
First, we consider the matching outcome of the straightforward strategy by pullingarms according to the latent utilities. Suppose that agents P and P use the CDM algo-45 igure 9: Cutoffs at the two stages. rithm, which is a straightforward strategy and calibrates the uncertain state in the sameway as LUB-CDM (Dai and Jordan, 2020). The mean calibration in Proposition 1 calibratesthe state parameters as s = s a for P and s = s b for P . We note that maximin calibrationin Proposition 2 gives the same calibrations in this example. Thus, P and P pull the sameset of arms at the first stage, where the arms’ scores v ≥ (cid:101) v and the cutoff (cid:101) v satisfies (cid:88) j ∈A ( v j ≥ (cid:101) v ) · s a · (1 − p ∗ ) = q, and (cid:88) j ∈A ( v j ≥ (cid:101) v ) · (1 − s b ) · (1 − p ∗ ) = q. (B.29)Here the boundary arm set is assumed to be empty in Eq. (B.29). Next, we consider P ’sstrategy. Arms with the scores worse than (cid:101) v will accept P since if they accept P , they get u for sure, but if they reject P , they will at best be pulled by P or P with probability(1 − p ∗ ) and get the utility at most u , but u > (1 − p ∗ ) u . Suppose now P pulls armswith the score v ≥ (cid:98) v , where (cid:98) v < (cid:101) v . By Eq. (B.29), there are total ( p ∗ ) q [ s a (1 − p ∗ )] − ofarms with v ≥ (cid:101) v that are not pulled by P or P and they will accept P . Thus, we canquantify (cid:98) v by letting it satisfy (cid:88) j ∈A ( (cid:98) v ≤ v j < (cid:101) v ) = q (cid:20) − ( p ∗ ) s a (1 − p ∗ ) (cid:21) . See an illustration of the cutoffs in Figure 9. Then we analyze P ’s expected payoff by usingthe CDM. If the true state is s b , P does not fill its capacity during the first stage and needsto pull more arms at the second stage. Suppose that P pulls arms with v ∈ [ˇ v, (cid:98) v ) at thesecond stage, where ˇ v satisfies (cid:88) j ∈A (ˇ v ≤ v j < (cid:98) v ) · (1 − p ∗ ) = q − (cid:88) j ∈A ( v j ≥ (cid:101) v ) · s b · (1 − p ∗ ) . (B.30)Hence, P ’s expected payoff by using CDM is U CDM1 = 12 (1 − p ∗ ) (cid:88) v j ≥ (cid:101) v v j + (cid:88) ˇ v ≤ v j < (cid:98) v v j . (B.31)46e then consider the matching outcome of the LUB-CDM algorithm. Suppose that P uses the LUB-CDM while P still uses the CDM. By Theorem 2, P pulls arms accordingto the ranking of the following quantity: v j (cid:20) − η , · δ , ( v j ) s (cid:21) = v j (cid:20) − η , · s a − s b s (cid:21) , (B.32)where δ , ( v ) = ( s a − s b ) in this example and η , ≥ s = s a for P . Then P pulls the arms with the score v ∈ [ (cid:101) v − κ (cid:48) , (cid:101) v ) ∪ { v ≥ (cid:101) v + κ } andrejects those with v ∈ [ (cid:101) v, (cid:101) v + κ ). Here the boundary arm set is assumed to be empty, and κ, κ (cid:48) satisfy (cid:88) j ∈A ( (cid:101) v − κ (cid:48) ≤ v j < (cid:101) v ) = (cid:88) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ ) · s a . (B.33)By Eq. (B.32), κ and κ (cid:48) also need to satisfy that (cid:101) v − κ (cid:48) = ( (cid:101) v + κ ) (cid:20) − η , · s a − s b s a (cid:21) . (B.34)Then we analyze P ’s expected payoff by using the LUB-CDM. If the true state is s b , P needs to pull more arms at the second stage. Since the second stage is the last stage andby Theorem 1, it is optimal for P to choose η , = 0, where the LUB-CDM coincides withthe CDM. Suppose that P pulls arms with v ∈ [¯ v, (cid:98) v ) at the second stage, where ¯ v satisfies (cid:88) j ∈A (¯ v ≤ v j < (cid:98) v ) · (1 − p ∗ )= q − (cid:88) j ∈A ( v j ≥ (cid:101) v + κ ) · s b · (1 − p ∗ ) − (cid:88) j ∈A ( (cid:101) v − κ (cid:48) ≤ v j < (cid:101) v ) · (1 − p ∗ ) . (B.35)Subtracting Eq. (B.35) from Eq. (B.30), we obtain that (cid:88) j ∈A (ˇ v ≤ v j < ¯ v ) = (cid:88) j ∈A ( (cid:101) v − κ (cid:48) ≤ v j < (cid:101) v ) − (cid:88) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ ) · s b = (cid:88) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ ) · ( s a − s b ) > . (B.36)where the second equality is by Eq. (B.33). Thus, ¯ v > ˇ v , and the P ’s expected payoff byusing the LUB-CDM is U LUB-CDM1 = (1 − p ∗ ) (cid:88) (cid:101) v − κ (cid:48) ≤ v j < (cid:101) v v j + 12 (1 − p ∗ ) (cid:88) v j ≥ (cid:101) v + κ v j + (cid:88) ¯ v ≤ v j < (cid:98) v v j . (B.37)47e now comparing the two expected payoffs in Eqs. (B.37) and (B.31), respective. Bytaking the difference, we have U LUB-CDM1 − U
CDM1 = (1 − p ∗ ) (cid:88) (cid:101) v − κ (cid:48) ≤ v j < (cid:101) v v j −
12 (1 − p ∗ ) (cid:88) (cid:101) v ≤ v j < (cid:101) v + κ v j + (cid:88) ˇ v ≤ v j < ¯ v v j > (1 − p ∗ )( (cid:101) v − κ (cid:48) ) (cid:88) j ∈A ( (cid:101) v − κ (cid:48) ≤ v j < (cid:101) v ) − ( (cid:101) v + κ ) (cid:88) j ∈A ( (cid:101) v ≤ v < (cid:101) v + κ ) − ¯ v (cid:88) j ∈A (ˇ v ≤ v < ¯ v )= [( (cid:101) v − ¯ v )( s a − s b ) − (2 κ (cid:48) s a + κ )] (cid:88) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ )= { ( s a − s b ) [(1 − η , ) (cid:101) v − ¯ v ] + [2 s a − η , ( s a − s b ) − κ } (cid:88) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ ) , (B.38)where the second equality is due to Eqs. (B.33) and (B.36), and the last equality is byEq. (B.34). For sufficiently small κ and η , , we have U LUB-CDM1 > U CDM1 . Last, we quantify the improvement of the expected payoff. From Eq. (B.33), κ satisfiesthat (cid:88) j ∈A (cid:18) ( (cid:101) v + κ ) (cid:18) − η , s a − s b s a (cid:19) ≤ v j < (cid:101) v (cid:19) = (cid:88) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ ) · s a . Suppose that v j is uniformly distributed, we have a first-order approximation of the aboveequations: (cid:101) v − ( (cid:101) v + κ ) (cid:18) − η , s a − s b s a (cid:19) = κs a , which implies that κ = (cid:101) v ( s a − s b ) η , s a (1 + s a ) − ( s a − s b ) η , = O ( η , )Plugging it to Eq. (B.38) suggests that a sufficient condition for U LUB-CDM1 > U CDM1 is η , < s a s a − s b · (1 + s a )( s a − s b )( (cid:101) v − ¯ v )( s a − s b )( (cid:101) v − ¯ v ) + (2 s a + 1) (cid:101) v . (B.39)By the condition that κ (cid:48) >
0, we have η , < s a s a − s b (1 + s a − (cid:101) v ) . (B.40)48nder Eqs. (B.39) and (B.40), and noting that (cid:80) j ∈A ( (cid:101) v ≤ v j < (cid:101) v + κ ) = O ( κ ) = O ( η , ),we have U LUB-CDM1 − U
CDM1 = O ( η , ) . This completes the proof.
References
Abdulkadiro˘glu, A., Che, Y.-K., and Yasuda, Y. (2015). Expanding” choice” in schoolchoice.
American Economic Journal: Microeconomics , 7(1):1–42.Abdulkadiro˘glu, A. and S¨onmez, T. (2003). School choice: A mechanism design approach.
American Economic Review , 93(3):729–747.Adachi, H. (2003). A search model of two-sided matching under nontransferable utility.
Journal of Economic Theory , 113(2):182–198.Aronszajn, N. (1950). Theory of reproducing kernels.
Transactions of the American Math-ematical Society , 68(3):337–404.Ashlagi, I., Braverman, M., Kanoria, Y., and Shi, P. (2020). Clearing matching mar-kets efficiently: Informative signals and match recommendations.
Management Science ,66(5):2163–2193.Avery, C., Fairbanks, A., and Zeckhauser, R. J. (2003).
The Early Admissions Game:Joining the Elite . Harvard University Press, Cambridge, MA.Avery, C. and Levin, J. (2010). Early admissions at selective colleges.
American EconomicReview , 100(5):2125–56.Azevedo, E. M. and Leshno, J. D. (2016). A supply and demand framework for two-sidedmatching markets.
Journal of Political Economy , 124(5):1235–1268.Balinski, M. and S¨onmez, T. (1999). A tale of two mechanisms: Student placement.
Journalof Economic Theory , 84(1):73–94.Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practicaland powerful approach to multiple testing.
Journal of the Royal Statistical Society: SeriesB (Methodological) , 57(1):289–300. 49ubeck, S. and Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochasticmulti-armed bandit problems.
Foundations and Trends in Machine Learning , 5(1):1–122.Chade, H., Lewis, G., and Smith, L. (2014). Student portfolios and the college admissionsproblem.
Review of Economic Studies , 81(3):971–1002.Chade, H. and Smith, L. (2006). Simultaneous search.
Econometrica , 74(5):1293–1307.Che, Y.-K. and Koh, Y. (2016). Decentralized college admissions.
Journal of PoliticalEconomy , 124(5):1295–1338.Chiappori, P.-A. and Salani´e, B. (2016). The econometrics of matching models.
Journal ofEconomic Literature , 54(3):832–61.Choo, E. and Siow, A. (2006). Who marries whom and why.
Journal of Political Economy ,114(1):175–201.Coles, P., Cawley, J., Levine, P. B., Niederle, M., Roth, A. E., and Siegfried, J. J. (2010).The job market for new economists: A market design perspective.
Journal of EconomicPerspectives , 24(4):187–206.Coles, P., Kushnir, A., and Niederle, M. (2013). Preference signaling in matching markets.
American Economic Journal: Microeconomics , 5(2):99–134.Crawford, V. P. and Sobel, J. (1982). Strategic information transmission.
Econometrica ,50(6):1431–1451.Dai, X. and Jordan, M. I. (2020). Learning the strategy in decentralized matching marketsunder uncertain preferences. arXiv preprint arXiv:2011.00159.
Das, S. and Kamenica, E. (2005). Two-sided bandits and the dating market. In
Proceedingsof the 19th International Joint Conference on Artificial Intelligence , volume 5, page 19,New York. AAAI Press.Diamantoudi, E., Miyagawa, E., and Xue, L. (2015). Decentralized matching: The role ofcommitment.
Games and Economic Behavior , 92:1–17.Dubins, L. E. and Freedman, D. A. (1981). Machiavelli and the gale-shapley algorithm.
The American Mathematical Monthly , 88(7):485–494.50pple, D., Romano, R., and Sieg, H. (2006). Admission, tuition, and financial aid policiesin the market for higher education.
Econometrica , 74(4):885–928.Fu, C. (2014). Equilibrium tuition, applications, admissions, and enrollment in the collegemarket.
Journal of Political Economy , 122(2):225–281.Gale, D. and Shapley, L. S. (1962). College admissions and the stability of marriage.
TheAmerican Mathematical Monthly , 69(1):9–15.Hafalir, I. E., Hakimov, R., K¨ubler, D., and Kurino, M. (2018). College admissions withentrance exams: Centralized versus decentralized.
Journal of Economic Theory , 176:886–934.Hastie, T., Tibshirani, R., and Friedman, J. (2009).
The Elements of Statistical Learning:Data Mining, Inference, and Prediction . Springer Science & Business Media, New York.Hitsch, G. J., Horta¸csu, A., and Ariely, D. (2010). Matching and sorting in online dating.
American Economic Review , 100(1):130–63.Hoppe, H. C., Moldovanu, B., and Sela, A. (2009). The theory of assortative matchingbased on costly signals.
The Review of Economic Studies , 76(1):253–281.Lee, R. S. and Schwarz, M. (2009). Interviewing in two-sided matching markets. Technicalreport, National Bureau of Economic Research.Menzel, K. (2015). Large matching markets as two-sided demand systems.
Econometrica ,83(3):897–941.Montgomery, J. D. (1991). Equilibrium wage dispersion and interindustry wage differentials.
The Quarterly Journal of Economics , 106(1):163–179.NACAC (2019). 2019 state of college admission. .Niederle, M. and Yariv, L. (2009). Decentralized matching with aligned preferences. Tech-nical report, National Bureau of Economic Research.Peters, M. (1991). Ex ante price offers in matching games non-steady states.
Econometrica ,49(5):1425–1454. 51oth, A. E. (1982). The economics of matching: Stability and incentives.
Mathematics ofOperations Research , 7(4):617–628.Roth, A. E. (1984). The evolution of the labor market for medical interns and residents: Acase study in game theory.
Journal of Political Economy , 92(6):991–1016.Roth, A. E. (2008). Deferred acceptance algorithms: History, theory, practice, and openquestions.
International Journal of Game Theory , 36(3-4):537–569.Roth, A. E. and Sotomayor, M. (1990).
Two-Sided Matching: A Study in Game-TheoreticModeling and Analysis , volume 18. Econometric Society Monographs, Cambridge Uni-versity Press, Cambridge.Roth, A. E. and Xing, X. (1997). Turnaround time and bottlenecks in market clearing:Decentralized matching in the market for clinical psychologists.
Journal of Political Econ-omy , 105(2):284–329.Savage, L. J. (1972).
The Foundations of Statistics . Dover Publications, Inc., New York.Silverman, B. W. (1986).
Density Estimation for Statistics and Data Analysis , volume 26.Chapman and Hall, London.Spense, M. (1973). Job market signaling.
The Quarterly Journal of Economics , 87(3):355–374.Wahba, G. (1990).
Spline Models for Observational Data . SIAM, Philadelphia, PA.Wahba, G., Wang, Y., Gu, C., Klein, R., and Klein, B. (1995). Smoothing spline anova forexponential families, with application to the wisconsin epidemiological study of diabeticretinopathy.