Bandits in Matching Markets: Ideas and Proposals for Peer Lending
CCentralized borrower and lender matching under uncertaintyfor P2P lending
Soumajyoti Sarkar ∗ [email protected] State UniversityTempe, USA ABSTRACT
Motivated by recent applications of sequential decision making inmatching markets, in this paper we attempt at formulating andabstracting market designs for P2P lending. We describe a para-digm to set the stage for how peer to peer investments can beconceived from a matching market perspective, especially whenboth borrower and lender preferences are respected. We modelthese specialized markets as an optimization problem and considerdifferent utilities for agents on both sides of the market while alsounderstanding the impact of equitable allocations to borrowers.We devise a technique based on sequential decision making thatallow the lenders to adjust their choices based on the dynamics ofuncertainty from competition over time and that also impacts therewards in return for their investments. Using simulated experi-ments we show the dynamics of the regret based on the optimalborrower-lender matching and find that the lender regret dependson the initial preferences set by the lenders which could affect theirlearning over decision making steps.
KEYWORDS economics, markets, multi-armed bandits, machine learning
ACM Reference Format:
Soumajyoti Sarkar. 2018. Centralized borrower and lender matching underuncertainty for P2P lending. In
Woodstock ’18: ACM Symposium on NeuralGaze Detection, June 03–05, 2018, Woodstock, NY.
ACM, New York, NY, USA,4 pages. https://doi.org/10.1145/1122445.1122456
Sequential decision making in two sided markets like consumersand producers has been part of bidding in e-commerce platformslike eBay, eBid for a very long time. Not only that, P2P platformslike Prosper in the past allowed lenders to bid on projects for peermicrolending until they switched to posted price mechanism [3].However, for most peer microlending platforms like Kiva, Lend-ingClub among others, sequential decision making is either not ∗ Work done while the author was at Arizona State UniversityPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
Woodstock ’18, June 03–05, 2018, Woodstock, NY © 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00https://doi.org/10.1145/1122445.1122456 available or limited in its functionality while investor funding cy-cles generally have a monopoly on who they fund . In this paper,we therefore attempt at abstracting the concept of two sided mar-kets for peer lending where we consider participants on eitherside of the market as agents. We consider a centralized platformfor peer lending where both sides have a chance to log their ownpreferences prior to the start of transactions. This option of bothside preferences for peer lending is generally not available eitherdue to the dynamic nature of the loan/project postings or thatlenders/investors generally have a budget at any given point intime. In this paper, we consider agents on borrowing side to be asingle person or a company raising money while the lender sideagents could comprise an individual investor or a venture capitalorganization. Centralized platforms to tackle these issues couldensure that the transactions between borrowers and lenders are notonly based on the money that an investor is willing to put and itspreferences but a borrower’s willingness to accept the investment(these could be due to issues in lender terms or borrower’s assess-ment of the investor profile). As an added caveat, it also allows forpotential bias mitigation that can be implicit in such platforms [14].Since we consider the matching for a group of projects to be doneprior to the start of the projects and in the same time period as is thesituation for most crowdsourcing or for-profit lending platforms,the implicit nature of matching causes competition among agentson both side - the borrowers trying to raise money from the sameset of investors while the investors trying to invest in the selectedprojects. To resolve the conflict in this competition, we introducesequential decision making into the matching process which allowsagents on the lending side to revise their choices in order to getmatched with the least regret and based on their interests over time.This nature of competition in markets for resolving conflicts hasbeen studied recently [11] where the agent preferences on one sideare concealed from the other and so the sequential decision aspectcomes into play for preference revisions over time. As mentioned,in such P2P platforms, there are mainly two sides to the market:the borrowers who want to borrow money from others for theirprojects or startups and the lenders who lend money to borrowers.The traditional rule has been that these sides involve in two sidedtrading following the Dutch Auction Mechanism [8, 16].For the rest of the paper, we lay the foundations of our ongoingwork that demonstrates a way to address competition and fairplay in such peer lending platforms with ideas from matchingmarkets[12]. The rest of the paper discusses some choices thatcould be made towards formulating the utilities on both sides, themechanism for sequential decision making over rounds, and finally https://siliconhillslawyer.com/2019/03/03/standard-term-sheets-problem-yc/ a r X i v : . [ c s . G T ] J a n oodstock ’18, June 03–05, 2018, Woodstock, NY Trovato and Tobin, et al. the tradeoff between the preference revisions and the utilities forthe agents which are also tied in some ways. Throughout the paper,we consider agents are not strategic and therefore their preferencesubmissions are honest. Our model of lending through a market matching perspective is veryclose to the Shapley-Shubik model of bilateral trade with indivisiblegoods [15] where there is a set of buyers or bidders (the lendersin our case) and a set of sellers selling a unit of good (borrowersin our case) and no lender wants more than one unit of the good.There is a monetary value that a buyer assigns to the seller’s goodand this relates to the amount of money that a lender is willing tolend to a borrower posting in our case despite what the borrowerproject funding requirements (which are generally more than anindividual lender can contribute) are.In terms of the market design for two sided lending, there arethree aspects that control how the transactions are performed inthe centralized version of our matching design: • Utilities : Traditionally, most lending platforms have focusedon static objectives to maximize the returns based on lenderportfolio. However, what is often missing from these staticobjectives is the concept of utility-driven returns. What itmeans is that the factor of the lenders on the success ofthe borrower projects extends beyond just the monetarytransaction aspect but the overlap between the interestsof the lender, the lender contributions in terms of certainexternalities, the egalitarian aspect of market sharing interms of fairness. So agent utilities (and this is in additionto the agent preferences in our model) also decides how thematching is made and the tradeoff between the preferencesand utilities based on returns decide the matching as well.This aspect in machine learning has been studied in termsof credit scores and advertiser returns recently [10]. • Matching Market Model : As one of the key components ofour lending, we abstract the resource allocation strategy be-tween the borrowers and sellers in the form of the traditionalmatching market model but with added constraints. • Sequential decision making : This notion of interactivematching between two sides of a market has been studiedwithin the framework of unknown preferences amongst thesides to the algorithm [6].Keeping the above in mind, we model the lending platform asa market with 2 sides - the lenders denoted by the set of agents L = { 𝑙 , 𝑙 , . . . 𝑙 𝑁 } and the borrowers denoted by the set of agents B = { 𝑏 , 𝑏 , . . . 𝑏 𝐾 } and we assume that 𝐾 << 𝑁 . We now havea two-sided market where the agents on the borrowing side eachhave their own funding request proposals and their correspondingrequested amount which we denote by 𝑐 𝑏 , where 𝑏 ∈ B . Similarly,the lenders each have an overall budget 𝑞 𝑙 , where 𝑙 ∈ L . In addition,each set of agents on one side of the market have the opportunityto submit their preferred rankings of the agents on the other sideof the market to the platform. These preferences can be conflicting- many lenders might prefer to lend to the same borrower, whilemultiple borrowers may prefer to tie up with the same lendershaving specific portfolio and interests. Additionally, we lay out the following desiderata for our marketmodel that allows us to understand how to design mechanisms forgetting the best matches for both the borrowers and the lenders.Each lender 𝑙 can be matched to at most one borrower while eachborrower 𝑏 can be matched to multiple lenders based on the amount 𝑐 𝑏 requested. That is, we consider the case of many-to-one matchingmarkets [2], we treat this design as a simplified version of theideal many-many matching so as to simplify the demonstrationof the dynamics of matching over time steps. Additionally, in thefinal assignment list of a borrower to multiple lenders, a correctassignment entails that the sum of the donated amounts of thelenders be at least as much as 𝑐 𝑏 of the borrower. Such mechanismsare currently followed in platforms like GoFundMe or ProsperFull Coverage lending model where the borrower only gets theproject funded when the sum of amounts lent, match or exceedthe requested amount. Also, we assume that a single agent cannothave ties in its preferences over the agents on the other side of themarket. To decide a matching between B and L , we introduce the binarydecision variable x = ( 𝑥 𝑏𝑙 ) ( 𝑏,𝑙 ) ∈B×L such that 𝑥 𝑏𝑙 = 1 if the loanfrom lender 𝑙 is assigned and accepted by borrower 𝑏 and 0 oth-erwise. Next, the borrower 𝑏 and the lender 𝑙 submit their util-ities 𝑢 𝑏 ( 𝑙 ) and 𝑢 𝑙 ( 𝑏 ) respectively ( ∀ 𝑏 ∈ B , ∀ 𝑙 ∈ L ) and thesevalues represent the ordering choice amongst the agents (we dis-cuss the utilities in the next section). The preference orders of theborrowers and the lenders can be captured in the following way: 𝑏 ≻ 𝑗 𝑏 ′ ⇐⇒ 𝑢 𝑗 ( 𝑏 ) > 𝑢 𝑗 ( 𝑏 ′ ) and 𝑙 ≻ 𝑖 𝑙 ′ ⇐⇒ 𝑢 𝑖 ( 𝑙 ) > 𝑢 𝑖 ( 𝑙 ′ ) . Then, 𝑢 𝑏𝑙 = 𝑢 𝑏 ( 𝑙 ) + 𝑢 𝑙 ( 𝑏 ) . The total utility of a matching x ∈ { , } |B |×|L | is given by (cid:205) 𝑏 ∈B (cid:205) 𝑙 ∈L 𝑢 𝑏𝑙 𝑥 𝑏𝑙 . In our work, we consider a many-one matching where each borrower is matched to multiple lendersand each lender matched to a single borrower. Defining a binarydecision variable w := ( 𝑤 𝑏,𝑙 ) ∈ { , } |B |×|↕ 𝐿 | , the matching objec-tive in the form of the Gale Shapley constraints can be formulatedas MQ1 :maximize 𝜆 (cid:205) 𝑏 ∈B (cid:205) 𝑙 ∈L 𝑢 𝑙 ( 𝑏 ) 𝑥 𝑏𝑙 − 𝜆 (cid:205) 𝑏 ∈B (cid:205) 𝑙 ∈L 𝑤 𝑏𝑙 (1)subject to (cid:205) 𝑏 ∈B 𝑥 𝑏𝑙 ≤ (∀ 𝑙 ∈ L) (cid:205) 𝑙 ∈L 𝑥 𝑏𝑙 𝑞 𝑙 ≥ 𝑐 𝑏 (∀ 𝑏 ∈ B) 𝑐 𝑏 𝑥 𝑏𝑙 + 𝑐 𝑏 (cid:205) 𝑙 ′ ≻ 𝑏 𝑙 𝑥 𝑏𝑙 ′ + (cid:205) 𝑏 ′ ≻ 𝑙 𝑏 𝑞 𝑙 𝑥 𝑏 ′ 𝑙 ≥ 𝑐 𝑏 ( − 𝑤 𝑏𝑙 )(∀ 𝑙 ∈ L , ∀ 𝑏 ∈ B) 𝑥 𝑏𝑙 ∈ { , } (∀ 𝑏 ∈ B , ∀ 𝑙 ∈ L) 𝑤 𝑏𝑙 ∈ { , } (∀ 𝑏 ∈ B , ∀ 𝑙 ∈ L) C1 The reader can refer to such formulations of matching [1] and theGale-Shapley matching algorithm [13] for further reading. Brieflythese constraints satisfy the following: (1) the lenders can only bematched to one borrower, (2) the number of blocking pairs (denotedby 𝑤 𝑏𝑙 ) should be minimized in accordance with the original stablematching constraints [12], and the borrower’s requested amountmust exceed the sum of investments from matched lenders. Weleave the proof that discusses the relevance of matching constraintsto the above formulation out of this paper but one can check withproof by contrapositive to see why failing to satisfy the constraintswould fail to respect the stability constraints. Note above that we entralized borrower and lender matching under uncertainty for P2P lending Woodstock ’18, June 03–05, 2018, Woodstock, NY optimize for the lender utility in Equation 1 but we will come backto this setting when we evaluate our matching objective and whichalso constitutes the need for our IP formulation instead of thetraditional Gale-Shapley agent optimal algorithm. The matching objective function with constraints as defined in theprevious section depends on the preferences that are set by agentson both sides at the start of each phase. We consider the utilities u 𝑏 and u 𝑙 that denote the vector of values for each agent 𝑙 or 𝑏 aboutits preferences of agents on the other side of the market. And asstated before, the ordering depends on the value that the agentsestimate prior to matching. Reasons for lender preferences overborrowers could arise from the return on investment (ROI) whichcould be calculated in a myriad ways using a lot of other factors , but for the sake of abstraction, we sample 𝑢 𝑙 ( 𝑏 ) from a uniformdistribution. For the borrower, the main reason to prefer one lenderover another is the past reputation of the lender (since networkeffects can significantly accelerate the funding [7]) as well as theinterest matches (especially in VC funding, the investor liquidationpreferences can play a role in startup preferences). As for the lendercase, we sample 𝑢 𝑏 ( 𝑙 ) from a uniform distribution. These utilitieshave been calculated before in the form of recommendation systems[4]. So in our case, for a lender 𝑙 , all borrowers are not equal andvice versa which is different from previous assumptions [5]. One important point to note here is that the agents on each side arenot aware of the preferences of the agents on the other side, whichis why the case for competition arises more prominently. In hind-sight, if each lender was aware of its preference position among theborrowers, the lenders would include that in their utility for a moreoptimal matching. So in order to arrive at a preferred matchingfaster, we perationalize the matching platform with sequential deci-sion making in the form of multi-armed bandits with a centralizedmatching platform. This notion of centralized matching marketshas been studied before in [11]. From the lender’s point of decisionmaking, the unknown probability of being matched, as well as theprobability of the borrower listing getting fully funded constitutethe uncertain elements at the start of the matching. However forsimplifying demonstration, we assume the uncertainty comes fromthe absence of knowledge of the borrower preferences (or utilities)to the lenders. The matching procedure happens as follows: thematching occurs repeatedly for multiple time steps - in each step,the lender has multiple borrowers to rank, however the rewardsfrom the borrower listing constitutes the uncertain component forthe lenders. This is a bandit setting [5] where at each round, theplatform provides a pseudo-reward to the lender based on the bor-rower it is matched to and allows the lender to revise its preferencerankings for the next round.In what follows, we explain how the reward distributions for eachlender are calculated and which lays the path for the exploration ofthe arms (here the borrowers) by the lenders at each round. At eachtime step the platform matches lender 𝑙 with borrower 𝑚 𝑙 upon http://blog.lendingrobot.com/research/calculating-financial-returns-in-peer-lending/ Algorithm 1:
Matching between borrowers and lenders
Input: B , L , 𝑢 𝑙 , 𝑢 𝑏 , 𝑇 . Output:
Matching M B , L 𝑐𝑢 𝑙 ( 𝑏 ) ← 𝑢 𝑙 ( 𝑏 ) + N ( , 𝜎 ) ( ∀ 𝑏 ∈ B , ∀ 𝑙 ∈ L ) 𝑇 𝑏,𝑙 ( ) ← ∀ 𝑏 ∈ B , ∀ 𝑙 ∈ L ) for t = 1, 2, ...T do M 𝑏 , M 𝑙 ← Matching
MQ1 using 𝑐𝑢 𝑙 and 𝑢 𝑏 ( ∀ 𝑏 ∈ B , ∀ 𝑙 ∈ L ) for l ∈ L do 𝑚 𝑙 ( 𝑡 ) ← M 𝑙 ( 𝑙 ) /* matched borrower */ if 𝑚 𝑙 is not empty then 𝑥 𝑙 ( 𝑡 ) ← N ( 𝑚 𝑙 ( 𝑡 ) , 𝜎 ) /* lender reward */ else 𝑥 𝑙 ( 𝑡 ) ← Update ^ 𝜇 𝑙 ( 𝑚 𝑙 ) using 𝑥 𝑙 ( 𝑡 ) 𝑇 𝑚𝑙,𝑙 ( 𝑡 ) ← 𝑇 𝑚𝑙,𝑙 ( 𝑡 − ) + 1 for b ∈ B do 𝑐𝑢 𝑙 ( 𝑏 ) ← ^ 𝜇 𝑙 ( 𝑏 ) + √︂ log 𝑡𝑇 𝑏,𝑙 ( 𝑡 − ) return M 𝑏 , M 𝑙 which 𝑙 is deemed to be able to pull the arm successfully and gets toknow the reward 𝑥 𝑙 ( 𝑡 ) and updates their empirical mean for ^ 𝜇 𝑙 ( 𝑚 𝑙 ) through the following equation: ^ 𝜇 𝑙 ( 𝑏 ) = + 𝑇 𝑏,𝑙 ( 𝑡 ) (cid:205) 𝑡𝑠 = N ( , 𝜎 ) + { 𝑚 𝑙 ( 𝑠 ) == 𝑏 }N ( 𝑢 𝑏 ( 𝑙 ) , 𝜎 ) , where 𝑇 𝑏,𝑙 ( 𝑡 ) = (cid:205) 𝑡𝑠 = { 𝑚 𝑙 ( 𝑠 ) == 𝑏 } is the number of times borrower 𝑏 was matched to lender 𝑙 till time 𝑡 . So it is clear that when posed with many borrowers to choosefrom and in the absence of any certainty about the rewards that thelender can receive at the end of the project, the borrower can exploreor exploit given its history. We utilize the Upper Confidence Bound(UCB) design [9] where at each time step 𝑡 the lenders compute theupper confidence bound for each borrower as follows: 𝑐𝑢 𝑙 ( 𝑏 ) = N ( , 𝜎 ) ,𝑇 𝑏,𝑙 ( 𝑡 ) = 𝜇 𝑙 ( 𝑏 ) + √︂ log 𝑡𝑇 𝑏,𝑙 ( 𝑡 − ) , 𝑜𝑡ℎ𝑒𝑟𝑤𝑠𝑖𝑒 (2)Each lender 𝑙 ranks the arms 𝑏 according to 𝑐𝑢 𝑙 ( 𝑏 ) and sendsthe new utilities to the platform while the borrower preferencesremain unchanged. We end this paper with a brief example of simulated experimentsthat lay the foundation for further investigation into what kinds ofconstraints or better incentives can drive more efficient matchingsin future. Our code is fully open sourced and can be accessed .To simulate the matching, we run the matching algorithm for 10steps and we consider 30 borrowers and 60 lenders. We sample theborrower and lender utilities 𝑢 𝑏 and 𝑢 𝑙 randomly from a uniformdistribution (values between 0 and 1).We randomly sample borrower capacities 𝑐 𝑏 randomly betweenvalues 5 and 40 and for lender budgets we randomly sample valuesbetween 1 and 10, but with one constraint, the sum of all borrowercapacities must be less than the sum of the lender capacities. Wekeep the variance factor 𝜎 to 0.3. And simulate the matching https://bit.ly/3iylv8X oodstock ’18, June 03–05, 2018, Woodstock, NY Trovato and Tobin, et al. Figure 1: Lender regret over time simulated over 50 runs. algorithm for 10 steps over 50 runs. To evaluate the quality of thelender assigned borrowers, we compute the following regret metricfor each lender at each time step: 𝑢 𝑙 ( 𝑏 ) − 𝑢 𝑙 ( 𝑏 ) where 𝑏 is thematched borrower returned in Algorithm 1 while 𝑏 is computedusing the following optimization: maximize 𝜆 (cid:205) 𝑏 ∈B (cid:205) 𝑙 ∈L 𝑢 𝑏,𝑙 𝑥 𝑏𝑙 - 𝜆 (cid:205) 𝑏 ∈B (cid:205) 𝑙 ∈L 𝑤 𝑏𝑙 subject to constraints C1 . Note we optimize thesum of borrower and the lender utilities as in hindsight, the lenderwould have adjusted its ranking based on borrower preferences hadit have access to that information. We list some of the plots of thelender regret over time in Figure 1. From Figure 1, we can see thatfor some of the lenders this regret tends to have a decreasing valuemeaning that the lender is able to learn over the steps its optimalutility function based on exploration using the UCB constraint. Butwe also find that for lenders like Lender 55, the regret grows overtime, meaning that the lender is not able to bound its regret basedon the initial preferences, and no matter what, it is not able tocorrect its preferences over time. One of the reasons behind thelenders being unable to learn might be their initial preference listbut also the rewards they obtain overt time. Although we modeledthe rewards to be proportionate to the borrower-lender utility, thisreward function can be modeled on the probability of borrowersituation in the middle of the algorithm as well. We consider a case of a centralized matching platform which re-quests proposals from borrowers about their preferences over thelenders or the agents through their personal rankings of the lenders.Then the platform decides the matching by allowing the lenders tointeract with the platform over multiple time steps by either accept-ing or rejecting the assigned borrower at a time step. From a lenderperspective, this schema thus allows them to get matched withouthaving the information of the actual returns while allowing themcertain flexibility to exploit their options.The goal of this paper hasbeen to lay out some ideas in which centralized peer lending plat-forms can be abstracted from a matching market perspective andhow bandits could play a role in mechanism design. These match-ing markets allow for more privacy as well as ensuring equitableoutcomes and in our situation can be achieved by designing properutility functions of the agents. Similarly in future one could design decision making in which lenders can elicit information about theirpeer choices as well as networks have been known to aid fundingsituations [7] .
REFERENCES [1] Hernán Abeledo and Yosef Blum. 1996. Stable matchings and linear programming.
Linear algebra and its applications
245 (1996), 321–333.[2] Elizabeth Bodine-Baron, Christina Lee, Anthony Chong, Babak Hassibi, andAdam Wierman. 2011. Peer effects and stability in matching markets. In
Interna-tional Symposium on Algorithmic Game Theory . Springer, 117–129.[3] Simla Ceyhan, Xiaolin Shi, and Jure Leskovec. 2011. Dynamics of bidding in aP2P lending service: effects of herding and predicting loan success. In
Proceedingsof the 20th international conference on World wide web . 547–556.[4] Jaegul Choo, Changhyun Lee, Daniel Lee, Hongyuan Zha, and Haesun Park. 2014.Understanding and promoting micro-finance activities in kiva. org. In
Proceedingsof the 7th ACM international conference on Web search and data mining . 583–592.[5] Sanmay Das and Emir Kamenica. 2005. Two-Sided Bandits and the Dating Market..In
IJCAI , Vol. 5. 19.[6] Ehsan Emamjomeh-Zadeh, Yannai A Gonczarowski, and David Kempe. 2020.The Complexity of Interactively Learning a Stable Matching by Trial and Error. arXiv preprint arXiv:2002.07363 (2020).[7] Emőke-Ágnes Horvát, Jayaram Uparna, and Brian Uzzi. 2015. Network vs mar-ket relations: The effect of friends in crowdfunding. In
Proceedings of the 2015IEEE/ACM international conference on advances in social networks analysis andmining 2015 . 226–233.[8] Manoj Kumar and Stuart I Feldman. 1998. Internet Auctions.. In
USENIX Workshopon Electronic Commerce , Vol. 3. 49–60.[9] Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptiveallocation rules.
Advances in applied mathematics
6, 1 (1985), 4–22.[10] Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. 2018.Delayed impact of fair machine learning. arXiv preprint arXiv:1803.04383 (2018).[11] Lydia T Liu, Horia Mania, and Michael Jordan. 2020. Competing bandits in match-ing markets. In
International Conference on Artificial Intelligence and Statistics .PMLR, 1618–1628.[12] Alvin E Roth, Uriel G Rothblum, and John H Vande Vate. 1993. Stable matchings,optimal assignments, and linear programming.
Mathematics of operations research
18, 4 (1993), 803–828.[13] Alvin E Roth and Marilda Sotomayor. 1989. The college admissions problemrevisited.
Econometrica: Journal of the Econometric Society (1989), 559–570.[14] Soumajyoti Sarkar and Hamidreza Alvari. 2020. Mitigating Bias in Online Micro-finance Platforms: A Case Study on Kiva. org.
ECML PKDD SoGood (2020).[15] Lloyd S Shapley and Martin Shubik. 1971. The assignment game I: The core.
International Journal of game theory
1, 1 (1971), 111–130.[16] Zaiyan Wei and Mingfeng Lin. 2017. Market mechanisms in online peer-to-peerlending.