Arbitrage-Free Combinatorial Market Making via Integer Programming
Christian Kroer, Miroslav Dudík, Sébastien Lahaie, Sivaraman Balakrishnan
AArbitrage-Free Combinatorial Market Making via Integer Programming
CHRISTIAN KROER , Carnegie Mellon University
MIROSLAV DUD´IK , Microsoft Research
S ´EBASTIEN LAHAIE , Microsoft Research
SIVARAMAN BALAKRISHNAN , Carnegie Mellon UniversityWe present a new combinatorial market maker that operates arbitrage-free combinatorial prediction marketsspecified by integer programs. Although the problem of arbitrage-free pricing, while maintaining a boundon the subsidy provided by the market maker, is . To our knowledge, this isthe first implementation and empirical evaluation of an arbitrage-free combinatorial prediction market onthis scale.
1. INTRODUCTION
Prediction markets have been successfully used to elicit and aggregate forecasts ina variety of domains, including business [Charette 2007; Spann and Skiera 2003],politics [Berg et al. 2008], and entertainment [Pennock et al. 2002]. In a predictionmarket, traders buy and sell securities with values that depend on some unknownfuture outcome. For instance, a play-money prediction market that Yahoo! ran for the2010 NCAA Men’s Division I Basketball Tournament included a security that paid out 1point if the team from
Duke were to win the championship and 0 points otherwise. Thus,when the price of the security was 0.15, traders who believed that
Duke ’s probability ofwinning was larger than 0.15 were incentivized to buy shares of the security, and thosethat believed it was lower were incentivized to sell. The market price can be interpretedas an aggregate belief and used as a forecast.We study prediction markets implemented by a centralized algorithm called a cost-based market maker [Abernethy et al. 2011; Chen and Pennock 2007]. All shares arebought from and sold to the market maker, rather than between traders, and the marketmaker uses a convex potential function to determine current security prices. Comparedwith an exchange, which matches buyers and sellers, a market-maker mechanism isparticularly desirable in combinatorial markets , which offer securities on interrelatedpropositions. For instance, the NCAA 2010 market included securities on events “
Duke wins more games than
Cornell ” and “a team from the
Big East conference wins thechampionship” as well as many others. Because of the large number of securities incombinatorial markets, there may be no sellers interested in trading with a given buyer,
Author addresses: C. Kroer, Computer Science Dept, CMU; [email protected]; M. Dud´ık and S. Lahaie,Microsoft Research; { mdudik,slahaie } @microsoft.com; S. Balakrishnan, Dept of Statistics, CMU; [email protected]. This work was done while C. Kroer and S. Balakrishnan were at Microsoft Research.Permission to make digital or hard copies of all or part of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. Copyrights for components of this work ownedby others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or repub-lish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected]. EC’16,
July 24–28, 2016, Maastricht, The Netherlands. ACM 978-1-4503-3936-0/16/07 ...$15.00.Copyright is held by the owner/author(s). Publication rights licensed to ACM.http://dx.doi.org/10.1145/2940716.2940767 a r X i v : . [ c s . G T ] J un problem known as low liquidity. In contrast, a market maker is always available totrade, thus providing liquidity and allowing incorporation of information in the market.Designers of cost-based markets aim to meet several desirable properties, including boundedness of loss suffered by the market maker and absence of arbitrage , that is,risk-free profitable trades. Bounded loss is a necessity for money markets, otherwisethe market operator risks bankruptcy. Lack of arbitrage is also highly desirable. First,we would like to attract traders that provide information rather than computation.Second, arbitrage-free markets produce more accurate forecasts. While Abernethyet al. [2011] provide a complete theoretical characterization of cost-based markets withbounded loss and no arbitrage, pricing in such markets is NP-hard or Bregmanprojection , which generalizes the Euclidean projection to arbitrary convex potentials.There are two specific issues in applying the Frank-Wolfe (FW) algorithm withincost-based markets. First, while all iterates of the FW algorithm are within the convexhull of valid payoff vectors, and therefore arbitrage-free, we need to ensure that boundedloss is maintained. In Sec. 4.2, we show how to achieve this by a suitable modificationof the stopping condition of FW. The second, seemingly more serious concern, is thatthe projection problems that arise for common cost functions, such as Hanson’s [2003;2007] logarithmic market scoring rule (LMSR), exhibit derivatives that go to infinityat the border of the set of arbitrage-free prices, which violates the assumptions of theFW algorithm. Fortunately, we can adapt a recently developed variant of FW [Krishnanet al. 2015], designed for the case when the derivative might grow to infinity, but itsgrowth is suitably controlled, which is the case for LMSR.Our approach, which we call the
Frank-Wolfe market maker (FWMM), is related toDud´ık et al.’s [2012] linearly-constrained market maker (LCMM), which also alternatestrades and (partial) arbitrage removal. While FWMM uses linear constraints in the IPto define valid payoff vectors, the arbitrage removal in LCMM is driven by a set of linearconstraints on the arbitrage-free prices (i.e., the convex hull of valid payoff vectors). TheP constraints of FWMM can be used directly in LCMM, as linear-programming relax-ations, but they are usually too loose, so tighter constraints need to be derived ad hoc for each new security type, sometimes using involved combinatorial reasoning [Dud´ıket al. 2012, 2013]. Since LCMM updates are usually substantially faster than solvingan IP, the arbitrage-removal steps of LCMM and FWMM can be interleaved, and themore expensive projection step of FWMM should be invoked only after LCMM cannotremove much arbitrage.We evaluate the efficacy of FWMM on Yahoo!’s NCAA 2010 basketball tournamentprediction market data, from which we extracted 88k trades on 5k securities in acombinatorial market with outcomes. Once the projections become practically fast,FWMM achieves superior accuracy to LCMM. Our experiments also show that theinitial phase of the projection algorithm, which involves calls to the IP solver to decidewhich securities can be logically settled given the games completed so far, is fast evenfor the largest problem sizes. The results from this initial phase can be propagated as apartial outcome into the cost function, which yields an improvement over LCMM evenwhen the overall projection algorithm is too slow.Tournaments have previously been considered by Chen et al. [2008] and Xia andPennock [2011]. Both focus on restricted (but non-trivial) tournament betting languagesthat yield tractability, but cannot, for instance, handle comparisons. In contrast, ourapproach works for general outcome spaces that can be represented by an IP, rather thanonly tournaments. Our work is closely related to the applications of Frank-Wolfe andinteger programming to inference in graphical models [Belanger et al. 2013; Krishnanet al. 2015], but needs to address several issues specific to incentives and informationrevelation in prediction markets.
2. PRELIMINARIES
We begin with an overview of cost-based market making [Abernethy et al. 2011; Chenand Pennock 2007] and then provide a high-level outline of our approach. As a runningexample we use the NCAA 2010 Tournament: a single-elimination tournament with 64teams playing over 6 rounds, meaning that in each round, half of the remaining teamsare eliminated.
Let Ω denote a finite set of outcomes , corresponding to mutually exclusive and exhaus-tive states of the world. We are interested in eliciting expectations of binary randomvariables φ i : Ω → { , } , indexed by i ∈ I , which model the occurrence of variousevents such as “ Duke wins the NCAA championship.” Each variable φ i is associatedwith a security , which is a contract that pays out φ i ( ω ) dollars when the outcome ω occurs. Therefore, the random variable φ i is also called the payoff function . Binarysecurities pay out $1 if the specified event occurs and $0 otherwise. The vector ( φ i ) i ∈I is denoted φ . Traders buy bundles δ ∈ R I of security shares issued by a central marketmaker; negative entries in δ are permitted and correspond to short-selling. A traderholding a bundle δ receives a (possibly negative) payoff δ · φ ( ω ) when ω ∈ Ω occurs.Following Chen and Pennock [2007] and Abernethy et al. [2011], we assume that themarket maker determines security prices using a convex and differentiable potentialfunction C : R I → R called a cost function . The state of the market is specified by avector θ ∈ R I listing the number of shares of each security sold so far by the marketmaker. A trader wishing to buy a bundle δ in the market state θ must pay C ( θ + δ ) − C ( θ ) to the market maker, after which the new state becomes θ + δ . Thus, the vector ofinstantaneous prices in the state θ is p ( θ ) := ∇ C ( θ ) . Its entries can be interpreted asmarket estimates of E [ φ i ] : a trader can make an expected profit by buying (at least asmall amount) of the security i if she believes that E [ φ i ] is larger than the instantaneousrice p i ( θ ) = ∂C ( θ ) /∂θ i , and by selling if she believes that E [ φ i ] is lower than p i ( θ ) ;therefore, risk neutral traders with sufficient budgets maximize their expected profitsby moving the price vector to match their expectation of φ . Example . Logarithmic market-scoring rule (LMSR).
Hanson’s [2003; 2007] log-arithmic market scoring rule (LMSR) is a cost function for a complete market . Ina complete market, I = Ω and securities are indicators of individual outcomes, φ i ( ω ) = 1 { ω = i } , where {·} denotes the binary indicator, equal to if true and iffalse. Thus, traders can express arbitrary probability distributions over Ω . For instance,to set up a complete market for the number of wins of Duke in the six-round NCAA tour-nament, we would set I = Ω = { , , . . . , } . LMSR has the form C ( θ ) = log( (cid:80) i ∈I e θ i ) and prices p i ( θ ) = e θ i / ( (cid:80) j ∈I e θ j ) . Example . Sum of independent markets.
Now consider a market with 7 securitiesfor the number of wins of
Duke and an additional 7 securities for the number of wins of
Cornell . The outcome space consists of pairs of numbers between 0 and 6, but not allpairs are possible, because if
Duke and
Cornell win rounds 1–4, they meet in round 5and only one advances. Thus,
Ω = { ( ω , ω ) ∈ { , . . . , } : min { ω , ω } ≤ } . Securitiesare indexed by pairs I = { , } × { , . . . , } , with the first entry indicating the schooland the second the number of wins, yielding the payoff functions φ j,x ( ω ) = 1 { ω j = x } . Anatural cost function is the sum of LMSRs, C ( θ ) = (cid:80) j =1 log( (cid:80) x =0 e θ j,x ) , which yieldsprices p j,x ( θ ) = e θ j,x / ( (cid:80) y =0 e θ j,y ) . Thus, prices vary independently for each school, as ifwe ran two separate markets. We consider two standard desiderata for cost-based markets. The first is the boundedloss property: there should be a constant which bounds the ultimate loss of the marketmaker once the outcome is determined, regardless of how many shares of each securityare sold. The second is the no arbitrage property: there should be no trade that guar-antees a positive profit, regardless of the outcome. Following Abernethy et al. [2011],we next relate bounded loss to properties of the convex conjugate of C , and reviewequivalence between optimal arbitrage removal and Bregman projection .Given a cost function C , let R denote its convex conjugate , R ( µ ) := sup θ (cid:48) ∈ R I (cid:2) θ (cid:48) · µ − C ( θ (cid:48) ) (cid:3) , (1)which is itself a convex function on R I , allowed to take on the value ∞ . If the market isin a state θ = and a trader believes that E [ φ ] = µ , then her expected profit for thebundle θ (cid:48) is θ (cid:48) · µ − (cid:0) C ( θ (cid:48) ) − C ( ) (cid:1) , which is maximized by Eq. (1), omitting the constantterm C ( ) . More generally, the maximum expected profit of a trader with a belief µ in amarket state θ can be shown to equal the mixed Bregman divergence , defined as D ( µ (cid:107) θ ) := R ( µ ) + C ( θ ) − θ · µ . Convex conjugacy implies that D ( µ (cid:107) θ ) ≥ , with equality if and only if µ = p ( θ ) , whichis equivalent to θ ∈ ∂R ( µ ) , where ∂R is the subdifferential of R . Example . For the LMSR, R ( µ ) is equal to negative entropy whenever µ is aprobability distribution and ∞ otherwise, i.e., R ( µ ) = I { µ ∈ ∆ } + (cid:80) i ∈I µ i ln µ i , where ∆ is the set of probability distributions on Ω and I {·} denotes the convex indicator, equalto if true and ∞ if false. Bregman divergence is the Kullback-Leibler (KL) divergence, D ( µ (cid:107) θ ) = I { µ ∈ ∆ } + (cid:80) i ∈I µ i ln ( µ i /p i ( θ )) , which is an information-theoretic measureof the difference between two probability distributions.et Z := { φ ( ω ) : ω ∈ Ω } denote the (finite) set of all valid payoff vectors, and M beits convex hull, called the marginal polytope . The marginal polytope is exactly the set ofvectors µ that can be written as expectations E [ φ ] under some probability distributionover Ω , so we refer to elements of M as coherent beliefs or coherent prices . Abernethyet al. [2011] show that a cost-based market maker has the bounded loss property if andonly if max z ∈Z R ( z ) < ∞ . We assume that this is the case for the conjugate of our cost C .Note that this assumption is satisfied for LMSR, because negative entropy equals zeroat the vertices of the simplex. It is also satisfied in Example 2.2, where R ( µ ) is the sumof negative entropies of the two markets.Given a state θ , we define the Bregman projection of θ on M as the point µ (cid:63) := argmin µ ∈M D ( µ (cid:107) θ ) . The Bregman projection is related to an optimal arbitraging trade by the followingstandard result (the proof is in Appendix A for completeness):P
ROPOSITION
If the market is in a state θ , the guaranteed profit of any traderis at most D ( µ (cid:63) (cid:107) θ ) where µ (cid:63) is the Bregman projection of θ on M . Furthermore, thisprofit is achieved by any trade δ (cid:63) moving the market to a state θ (cid:63) with p ( θ (cid:63) ) = µ (cid:63) . This means that an arbitrage opportunity exists whenever the prices are incoherent,since p ( θ ) (cid:54)∈ M implies that D ( µ (cid:63) (cid:107) θ ) > . After the trade δ (cid:63) , we have p ( θ (cid:63) ) = µ (cid:63) ∈ M and thus there is no arbitrage opportunity in the market. The mechanism proposed in this paper, called Frank-Wolfe market maker, alternatesbetween processing trades according to the cost C and removing arbitrage. In thearbitrage removal step, our goal is to find the state θ (cid:63) from Proposition 2.4. We dothis by solving the Bregman projection problem using the Frank-Wolfe (FW) algorithm,which reduces the Bregman projection problem to a sequence of linear programs of theform min µ ∈M c · µ , for suitably chosen vectors c . Since the optimum of a linear program occurs at a vertex,reducing the Bregman projection problem to a sequence of linear programs results inan important simplification. Instead of specifying the marginal polytope M , whosedescription can be exponentially large in the number of securities, it suffices to describeits vertices Z , which we show can be done via a compact set of linear inequalitiestogether with integer constraints. More precisely, we assume that the set Z is describedby a matrix A and a vector b such that Z = (cid:8) z ∈ { , } I : A (cid:62) z ≥ b (cid:9) . (2)Viewed in this way, the FW algorithm solves the Bregman projection problem by solvinga sequence of integer programs. We refer to the linear constraints describing the set Z as IP constraints . Example . We next derive IP constraints for the market for the number of winsof
Duke and
Cornell from Example 2.2. First, there are exclusivity and exhaustivityconstraints of the form (cid:80) x =0 z j,x = 1 for j ∈ { , } , corresponding to the fact that inany outcome ω , for each j , exactly one of the securities φ j,x ( ω ) will equal 1 across x ∈ { , . . . , } . However, these two constraints do not capture the fact that at most oneof the teams can have exactly 5 or 6 wins. Specifically, in any outcome ω , we have φ , ( ω ) + φ , ( ω ) + φ , ( ω ) + φ , ( ω ) ≤ . hus, we also include the third constraint: z , + z , + z , + z , ≤ . Our reasoning sofar shows that any valid payoff vector satisfies the three mentioned constraints. It canbe verified that any vector z satisfying these constraints is valid, i.e., it corresponds to φ ( ω ) for some ω ∈ Ω , so these three constraints correctly specify Z . The FW algorithm relies on the ability to solve integer programs (IPs), which can takeexponential time in the worst case. Therefore, our mechanism also incorporates fast(poly-time) partial arbitrage removal similar to Dud´ık et al.’s [2012] linearly-constrainedmarket maker (LCMM).In LCMM, arbitrage is partly removed by considering a set of linear constraints thatmust be satisfied by coherent prices. Namely, an LCMM takes as an input a relaxation ˜ M ⊇ M described by linear constraints called
LCMM constraints : ˜ M = { µ ∈ R I : ˜A (cid:62) µ ≥ ˜b } . When any LCMM constraint is violated, there is an arbitrage opportunity in the market,with an easy-to-compute arbitraging trade. LCMM acts as an arbitrager until none ofthe constraints are violated. Since ˜ M is a relaxation of M , the resulting state is notnecessarily arbitrage-free.Assuming we have a description of Z using IP constraints specified by a matrix A anda vector b , one simple strategy is to construct ˜ M as a linear-program (LP) relaxationof Z , i.e., ˜ M = { µ ∈ R I : µ i ∈ [0 , for all i ∈ I and A (cid:62) µ ≥ b } . (3)These constraints are satisfied by all z ∈ Z and hence also by their convex combinations µ ∈ M . Generally, this relaxation is only a loose superset of M , so various ad hoc strategies are required to obtain a tighter ˜ M [Dud´ık et al. 2012, 2013]. We present oneexample of such a strategy in Sec. 3, for the class of comparison securities.
3. MARKET DESIGN
We next show how to instantiate the market design elements of Sec. 2 in real-worldcombinatorial markets, including the NCAA 2010 tournament evaluated in Sec. 5.Namely, we need to define: (i) the payoff function φ , (ii) the cost function C , (iii) the initialmarket state θ , (iv) the IP constraints describing Z , and (v) the LCMM constraintsdescribing ˜ M . We also need to consider how the cost and market state should beupdated as the true outcome is gradually revealed over time. For example, in the NCAAtournament, 63 games play out over the course of several weeks and we would like tofix prices of securities whose payoff has already been determined. We use a compositional market design along the lines of Dud´ık et al. [2013], which is ageneralization of the sum of LMSRs structure of Example 2.2. The market constructionbegins with a collection of random variables X j : Ω → X j , indexed by j ∈ J , whosemarginal distributions we wish to elicit, such as the number of wins of Duke and
Cornell in Example 2.2. Securities are indexed by i = ( j, x ) , with j ∈ J and x ∈ X j , andcorrespond to indicators of the events X j = x , i.e., φ j,x ( ω ) = 1 { X j ( ω ) = x } . The cost function is the sum of LMSRs across the random variables X j : C ( θ ) = b (cid:80) j ∈J ln (cid:16)(cid:80) x ∈X j e θ j,x /b (cid:17) , (4) , G , team 1 team 2 G , team 3 team 4round 2:round 1: G , ∈ { , , , } G , ∈ { , } , G , ∈ { , } Fig. 1 . An example of a tournament with four teams. The domains of the game outcome variables G r,t are shown on the right. The shown variables are equivalent to additional game variables: G , ≡ G , , G , ≡ G , , and G , ≡ G , ≡ G , ≡ G , . where b > is the liquidity parameter controlling how fast the prices change in responseto trading. A smaller value of b (lower liquidity) means prices rise faster as shares arepurchased; a larger value of b (higher liquidity) yields slower changes. As in Example 2.2,Eq. (4) implies that we effectively run an independent LMSR market for each X j . Thus,in the absence of arbitrage removal steps, we say that C implements the independentmarkets cost function.Initially, our market contains no random variables and hence no securities. Themarket operator can create new random variables and specify their relationship to anyexisting variables. At the time of creation of a new variable X j , the operator specifies (i)its domain X j , (ii) the mapping X j ( ω ) , (iii) initial prices µ j,x across x ∈ X j (these pricesdetermine the initial-state coordinates θ j,x ), (iv) IP constraints to restrict z j,x across x ∈ X j , and (v) LCMM constraints to restrict µ j,x across x ∈ X j . Due to the additivestructure of the cost C , new variables X j can be added at any time during the run ofthe market without affecting prices of existing securities.Below we specify the items (i)–(v) for different types of random variables in ourmarket. When describing the IP constraints on z and LCMM constraints on µ , we usethe notation z { X j = x } and µ { X j = x } for the entries z j,x and µ j,x , respectively. We alsoallow random variables with names other than X j , e.g., X or G r,t , and use the notationsuch as z { X = x } and µ { X = x } for the corresponding entries of z and µ .When adding a new random variable X , the initial prices µ { X = x } can be chosenbased on the prices of the random variables present in the market. New IP constraintsalways include the exclusivity and exhaustivity constraint, (cid:80) x ∈X z { X = x } = 1 , butadditional constraints may be needed to correctly describe the mapping X ( ω ) . We addLCMM constraints using the simple strategy mentioned in Sec. 2.4, as an LP relaxationof IP constraints, with an exception of one variable type (comparison variables).Our market contains random variables of the following types: Atomic tournament variables.
These random variables model outcomes in a single-elimination tournament with k rounds and k teams. Teams are numbered through k .In the first round, there are k − games, between teams i − and i , and the resulting k − winners advance to the second round, where again teams are matched in the orderof increasing indices and the winners advance to the next round etc. The team t isassociated with the random variable X t whose outcome is the total number of wins ofteam t , i.e., X t = { , . . . , k } .We also have random variables corresponding to the games played, with the outcomeof each variable being the winner of the corresponding game. For a team t and round r ,let G r,t denote the game that the team t will play in the r -th round if it advances tothat point. We are slightly abusing notation, because G r,t and G r,t (cid:48) can refer to thesame game (and hence the same random variable) for distinct t and t (cid:48) (see Fig. 1). Forinstance G k,t ≡ G k,t (cid:48) for all t, t (cid:48) , as there is only one game (the finals) in round k . Withthis notation in hand, we can introduce the IP constraints relating the entries of z epresenting game and team variables: z { X t = r } = z { G r,t = t } − z { G r +1 ,t = t } for all t and r < k , z { X t = k } = z { G k,t = t } for all t .LCMM constraints are just LP relaxations of the above, i.e., they are the same as the IPconstraints, with z {·} replaced with µ {·} . The market operator needs to specify initialprices µ { X t = r } and µ { G k,t = t } explicitly, based for instance on the past performanceof teams. Sums.
Given a set of existing random variables X , . . . , X n taking on integer valueswith the minimum and maximum values m j := min X j and M j := max X j , we define anew random variable X to represent their sum, X ( ω ) := X ( ω ) + · · · + X n ( ω ) , with the domain X = { m, m + 1 , . . . , M } where m = (cid:80) nj =1 m j and M = (cid:80) nj =1 M j . Theinitial prices are set proportional to a discretized Gaussian distribution with the meanand variance equal to the sum of means and variances of X through X n , under thedistribution described by the current prices µ { X j = x } .We introduce the following IP constraint: (cid:80) x ∈X x · z { X = x } = (cid:80) nj =1 (cid:80) x j ∈X j x j · z { X j = x j } . As before, the added LCMM constraint is an LP relaxation of the added IP constraint.
Comparisons.
Given two existing random variables X and X taking on integervalues with the minimum and maximum values m j := min X j and M j := max X j , wedefine a new random variable X with the domain { lt , eq , gt } to represent the result oftheir comparison: X ( ω ) := lt if X ( ω ) < X ( ω ) , eq if X ( ω ) = X ( ω ) , gt if X ( ω ) > X ( ω ) .The initialization prices are determined by first considering an integer-valued variable Y = X − X , and initializing its distribution to the discrete Gaussian with the meanequal to the difference of means and the variance initialized to the sum of variancesof X and X under current prices. The initial prices of X = lt , X = eq and X = gt areobtained as probabilities that Y < , Y = 0 and Y > . The variable Y is discarded andis not part of the market.The IP constraints for the new entries of z are based on the following four identities: X − X ≥ ( m − M )1 { X < X } , X − X − ≥ ( m − M − { X ≤ X } ,X − X ≤ ( M − m )1 { X > X } , X − X + 1 ≤ ( M − m + 1)1 { X ≥ X } . To obtain IP constraints, we replace each X j with (cid:80) x ∈X j x · z { X j = x } on the left-handside, and replace the comparison indicators on the right-hand side by z { X = lt } for { X < X } , and z { X = lt } + z { X = eq } for { X ≤ X } , and similarly for X > X and X ≥ X .LCMM constraints in this case are not simply an LP relaxation of IP constraints, butinstead they yield a tighter set ˜ M . They are based on the following identities, whichcan be derived from the transitivity of the comparison and the union bound: P { X ≤ x } ≤ P { X < X } + P { X ≤ x } for all x ≥ m and x ≤ M , P { X ≤ x } ≤ P { X ≤ X } + P { X < x } for all x ≥ m and x ≤ M .or instance, the first inequality follows because X ≤ x implies that either X < X or X ≤ x . Otherwise we would have a contradiction: X ≥ X > x . The resulting LCMMconstraints are µ { X ≤ x } ≤ µ { X = lt } + µ { X ≤ x } for all m ≤ x ≤ M , µ { X ≤ x } ≤ µ (cid:8) X ∈ { lt , eq } (cid:9) + µ { X < x } for all m ≤ x ≤ M ,with analogous constraints with X and X swapped (and gt swapped for lt ). We usethe shorthand µ { X ∈ E} for (cid:80) x ∈E µ { X = x } . In a typical combinatorial market, outcomes are gradually revealed over time. Forexample, in the NCAA tournament, 63 games play out over the course of several weeks.Thus, the market evolves through a sequence of partial outcomes defined as follows:
Definition . A subset σ ⊆ I × { , } is called a partial outcome if there exists avalid payoff vector z ∈ Z such that z i = b for all ( i, b ) ∈ σ .We write I σ := { i : ( i, b ) ∈ σ for some b } for the set of securities whose payoffs havebeen determined, or settled , by σ . As securities get settled, we would like to fix theirprices to 0 or 1. This is not possible by simply updating the state, but instead we needto switch to a different cost function while maintaining the information state of themarket. We adapt the construction of Dud´ık et al. [2014] to our setting.First, we say that a vector u ∈ R I is compatible with σ if u i = b for all ( i, b ) ∈ σ . Wewrite V σ for the set of vectors compatible with σ —note that V σ is an axis-aligned affinespace of dimension |I\I σ | . Given a partial outcome σ , we define the set of associatedvalid payoffs Z σ := Z ∩ V σ , and the associated marginal polytope M σ := conv( Z σ ) . Weassume that given a partial outcome σ , the market maker uses the cost function C σ ( θ ) = sup µ ∈ V σ [ θ · µ − R ( µ )] , (5)whose conjugate is, by definition, R σ ( µ ) = R ( µ ) + I { µ ∈ V σ } , which coincides with R on M σ . The corresponding price map and Bregman divergence are denoted p σ and D σ .The transformation of C to C σ maintains the loss bound of the original market maker(see Appendix B) and also maintains the information state of the market analogously toconditioning, as our next example shows. Example . Partially settled LMSR.
Recall that in a complete market, I = Ω andpayoff vectors φ ( ω ) have exactly one entry equal to 1: the entry corresponding to therealized outcome. Therefore, the partial outcome σ can have at most one security settledto 1. If there is such a security i (cid:63) then the market is fully settled and, by Eq. (5), weobtain C σ ( θ ) = θ i (cid:63) , p σ,i ( θ ) = 1 { i = i (cid:63) } . If σ only contains securities settled to zero,i.e., the corresponding outcomes have been excluded, the cost function obtained byEq. (5) is an LMSR over the remaining outcomes, C σ ( θ ) = log( (cid:80) i (cid:54)∈I σ e θ i ) . The pricesare p σ,i ( θ ) = 0 for i ∈ I σ and p σ,i ( θ ) = e θ i / ( (cid:80) j (cid:54)∈I σ e θ j ) for i (cid:54)∈ I σ , so the probabilitydistribution over Ω described by p σ ( θ ) corresponds to p ( θ ) conditioned on the event ω (cid:54)∈ I σ .
4. FRANK-WOLFE MARKET MAKER
In this section we fully describe and analyze the Frank-Wolfe market maker (FWMM)outlined in Sec. 2.3.At a high level, FWMM interleaves rapid pricing according to C with arbitrageremoval, while also updating the partial outcome—see Mechanism 1. There are twokinds of arbitrage removal: fast but only partial arbitrage removal via an LCMM ECHANISM 1:
Frank-Wolfe Market Maker (FWMM)
Input: cost function C , initial state θ , initial partial outcome σ ,LCMM constraints specified by ˜A , ˜b ,IP constraints specified by A , b ,FW algorithm parameters α ∈ (0 , , ε ∈ (0 , , ε D > Initialize the market state and partial outcome: θ ← θ , σ ← σ For t = 1 , . . . , T (where T is an a priori unknown number of trades):receive a request for a bundle δ t sell the bundle δ t for the cost C σ ( θ + δ t ) − C σ ( θ ) θ ← θ + δ t σ ← σ ∪ { newly settled securities if any } perform an LCMM step: choose η ≥ such that C σ ( θ + ˜A η ) − C σ ( θ ) ≤ ˜b · ηθ ← θ + ˜A η perform a projection step: ( σ, θ ) ← ProjectFW ( θ ; C, σ, A , b , α, ε , ε D ) Observe ω , consistent with σ Pay traders δ · φ ( ω ) , δ · φ ( ω ) , . . . , δ T · φ ( ω ) step, and a complete removal of the remaining arbitrage via Bregman projection. ForLCMM steps we use the fast algorithm of Dud´ık et al. [2012]. Bregman projection isimplemented via a variant of the Frank-Wolfe (FW) algorithm, which we refer to as ProjectFW and describe later in this section.
ProjectFW does not only return a new state θ such that p σ ( θ ) is the Bregman projection of the previous state on M σ . It also extendsthe partial outcome to securities that can be logically settled based on all other settledsecurities. This permanently removes the specific arbitrage opportunities associatedwith such securities since their prices become fixed to 0 or 1.Both arbitrage-removal steps correspond to trades that yield a non-negative profitregardless of the outcome, which means that the loss bound of the original cost C is onlyimproved by the value of this profit. The non-negative profit of LCMM steps followsfrom Dud´ık et al. [2012]. For ProjectFW , which is an iterative algorithm, we guaranteenon-negative profit by designing a suitable stopping condition.As we mention earlier, while we hope that the IPs created during the run of the FWalgorithm are easy to solve, they are NP-hard in general, and so the IP solver can getstuck in a brute-force search. Therefore, we need the ability to interrupt the projectionstep, for instance, when a new trade arrives. When our implementation,
ProjectFW , isinterrupted in early stages, it yields no update. In later stages, it returns an arbitrage-free market state corresponding to a trade with a non-negative but possibly suboptimalprofit. Thus, the loss bound is always maintained, even when
ProjectFW is interrupted.
Recall that the FW algorithm reduces the problem of Bregman projection, i.e., a convexminimization over the set M , into a sequence of linear optimization problems over theset Z . Our version, presented as Algorithm 2, is based on the fully-corrective variant ofthe Frank-Wolfe algorithm [Jaggi 2013], also known as the simplicial decompositionmethod [Bertsekas 2015], which we overview next.The FW algorithm solves problems of the form min µ ∈M F ( µ ) , (6)here M is a compact convex set (in our case a polytope) and F is a convex function.Over the course of iterations t = 1 , , . . . , the algorithm maintains an active set Z t ofthe vertices of the polytope M that have been discovered so far, and repeatedly:(1) solves the minimization over the convex hull of Z t − to obtain a new iterate µ t := argmin µ ∈ conv( Z t − ) F ( µ ) , (2) finds a new descent vertex z t in the direction of the (negative) gradient of F , z t := argmin z ∈Z (cid:2) ∇ F ( µ t ) · z (cid:3) , (3) and adds z t to the set of active vertices, so Z t = Z t − ∪ { z t } .Note that while the set Z of valid payoffs can be exponentially large, the set of activevertices Z t grows by only one vertex per iteration (and is initialized with only a smallnumber of vertices). Therefore, Step (1), which is a convex optimization problem ofdimension |Z t | , can be solved efficiently by standard algorithms. We use acceleratedprojected gradient [Nesterov 2007].Step (2), the linear optimization over the set Z , is the computationally expensive step.As discussed in Sec. 2.3, in our case it can be implemented by a call to an IP solver.In all of our experiments, the running time of Step (2) substantially dominated therunning time of Step (1).The convergence of the FW algorithm is analyzed via the FW gap, defined as g ( µ ) := max z ∈Z (cid:2) ∇ F ( µ ) · ( µ − z ) (cid:3) , which bounds the suboptimality of µ . Specifically, g ( µ ) ≥ F ( µ ) − F ( µ (cid:63) ) , where µ (cid:63) is asolution to Eq. (6). Thus, we can just monitor the gap g ( µ t ) = ∇ F ( µ t ) · ( µ t − z t ) , andreturn the iterate µ t when the gap becomes sufficiently small. The gap converges tozero at the rate of O ( L diam( M ) /t ) where L is the Lipschitz constant of ∇ F under anarbitrary norm and diam( M ) is the diameter of M under the same norm [Jaggi 2013].To apply the FW algorithm to the problem of Bregman projection, we set its objectiveto the Bregman divergence: F ( µ ) = D ( µ (cid:107) θ ) = R ( µ ) + C ( θ ) − θ · µ . One formal problemarises due to the fact that the function R is not necessarily differentiable only subdif-ferentiable. To overcome this, we assume existence of a differentiable extension ¯ R . ForLMSR, this is ¯ R ( µ ) = I { µ ≥ } + (cid:80) i ∈I µ i ln µ i , and similarly for the sum of LMSRs.The key point is that ¯ R coincides with R over M , so we can optimize the (differentiable)function F ( µ ) = ¯ R ( µ ) + C ( θ ) − θ · µ . (More details in Appendix C.)Apart from differentiability, there are two additional challenges in applying the FWalgorithm within Mechanism 1. First, we need to choose a stopping condition for the FWalgorithm that would yield a state update with a guaranteed profit, since such updatesmaintain the worst-case loss bound of the market maker. Second, even though we haveachieved the differentiability of F for our case of interest (the sum of LMSRs), theresulting derivative is unbounded, so the standard convergence analysis of FW does notapply. Fortunately, the growth of the derivative at the boundary is sufficiently controlledto obtain convergence of a modified version of FW, which is what we use in Algorithm 2.(The precise statement of the controlled growth condition is in Appendix C.)The modified version of FW, due to Krishnan et al. [2015], performs FW iterationsover a contracted version of the polytope M , or, more precisely, over a contractedversion of M ˆ σ , which reflects already settled securities. The contracted polytope isdefined as M (cid:48) := (1 − ε ) M ˆ σ + ε u , where u ∈ M ˆ σ is a coherent price vector whosecoordinates are neither 0 nor 1, except for those already settled by ˆ σ . In other words, M (cid:48) is a version of M ˆ σ shrunk towards the point u , which we call an interior point . LGORITHM 2:
ProjectFW . Bregman Projection via Adaptive Fully-Corrective Frank-Wolfe.
Input: cost function C , state θ , partial outcome σ ,IP constraints specified by A , b ,approx. ratio α ∈ (0 , , initial contraction ε ∈ (0 , , convergence threshold ε D > Output: extended partial outcome ˆ σ ⊇ σ state ˆ θ , whose price vector is an approx. Bregman projection of θ on M ˆ σ in the sensethat one of the following holds:1. p ˆ σ ( ˆ θ ) ∈ M ˆ σ and moving from θ to ˆ θ guarantees the profit of αD ˆ σ ( µ (cid:63) (cid:107) θ ) ˆ θ = θ and D ˆ σ ( µ (cid:63) (cid:107) θ ) ≤ ε D
3. algorithm was interrupted; moving from θ to ˆ θ guarantees a non-negative profitwhere µ (cid:63) = argmin µ ∈M ˆ σ D ˆ σ ( µ (cid:107) θ ) Initialize the interior point, active vertex set, and extend the partial outcome: ( u , Z , ˆ σ ) ← InitFW ( σ, A , b ) Define the objective function: F ( µ ) := ¯ R ˆ σ ( µ ) − θ · µ + C ˆ σ ( θ ) For t = 1 , , . . . perform a FW iteration on the contracted polytope: let Z (cid:48) = (1 − ε t − ) Z t − + ε t − u denote the contracted active set µ t ← argmin µ ∈ conv( Z (cid:48) ) F ( µ ) θ t ← ∇ ¯ R ˆ σ ( µ t ) call IP solver to find the descent vertex (note that ∇ F ( µ t ) = θ t − θ ): z t ← argmin z ∈Z ˆ σ ( θ t − θ ) · z Z t = Z t − ∪ { z t } compute the FW gap g ( µ t ) = ( θ t − θ ) · ( µ t − z t ) update the best-iterate-so-far t (cid:63) ← argmax τ ≤ t (cid:2) F ( µ τ ) − g ( µ τ ) (cid:3) check stopping conditions: if g ( µ t ) ≤ (1 − α ) F ( µ t ) , or F ( µ t ) ≤ ε D , or termination requestedreturn ˆ σ and ˆ θ = (cid:40) θ t ∗ if g ( µ t ∗ ) ≤ F ( µ t ∗ ) θ otherwise adapt contraction if necessary: let g u = ( θ t − θ ) · ( µ t − u ) if g u < and g ( µ t ) / ( − g u ) < ε t − ε t ← min (cid:8) g ( µ t ) / ( − g u ) , ε t − / (cid:9) else ε t ← ε t − Since coordinates of u are bounded away from 0 and 1, the vertices of the contractedpolytope M (cid:48) have their coordinates also bounded away from 0 and 1 (except for I ˆ σ ). Thecontrolled growth property then gives a bound on the Lipschitz constant of the gradientand guarantees convergence for any fixed ε , for the problem of projecting onto M (cid:48) . Toobtain the convergence to the projection onto M ˆ σ , we adaptively decrease ε accordingto the rule of Krishnan et al. [2015]. Their analysis shows that this adaptive versionof FW drives the duality gap g ( µ t ) to zero and thus indeed solves the non-contractedproblem. Two missing pieces that we describe in the remainder of this section are thestopping condition and the construction of the interior point u . The stopping condition needs to ensure that moving the market from a state θ to ˆ θ constitutes a trade with a non-negative profit. We start with a lower bound on the LGORITHM 3:
InitFW . Initialization for
ProjectFW . Input: partial outcome σ , IP constraints specified by A , b Output: extended partial outcome ˆ σ ⊇ σ point u ∈ M ˆ σ such that u i ∈ (0 , for i (cid:54)∈ I ˆ σ non-empty set Z of vertices of M ˆ σ Initialize Z ← ∅ , ˆ σ ← σ , C ← ∅
For each i ∈ I\I σ and each b ∈ { , } if ( i, b ) (cid:54)∈ C call IP solver to find ˆ z = argmax z ∈Z σ (2 b − z i if ˆ z i = b Z ← Z ∪ { ˆ z }C ← C ∪ { ( j, ˆ z j ) : j ∈ I} else ˆ σ ← ˆ σ ∪ { ( i, − b ) } If Z = ∅Z ← { the unique point compatible with ˆ σ } Return ˆ σ , Z , and u = |Z | (cid:80) z ∈Z z guaranteed profit of any iterate of the FW algorithm, and then use it to derive thestopping condition. We omit the conditioning on ˆ σ from the exposition here.P ROPOSITION
Consider a purchase that moves the market from a state θ to anew state ˆ θ = ∇ ¯ R ( ˆ µ ) . The resulting profit is guaranteed to be at least D ( ˆ µ (cid:107) θ ) − g ( ˆ µ ) . Thus, it is “safe” to move the market to ˆ θ whenever D ( ˆ µ (cid:107) θ ) ≥ g ( ˆ µ ) (for proof seeAppendix D). To maximize the profit guarantee, we should return the iterate thatmaximizes the difference D ( ˆ µ (cid:107) θ ) − g ( ˆ µ ) , which is what we do in Algorithm 2.Apart from a forced interruption (e.g., because of the arrival of a new trade orexceeding of the time limit), the stopping conditions of Algorithm 2 concern two separatecases. First, recall that the algorithm is minimizing F ( µ ) = D ( µ (cid:107) θ ) via a sequence ofiterates µ t ∈ M that satisfy D ( µ t (cid:107) θ ) → D ( µ (cid:63) (cid:107) θ ) and g ( µ t ) → as t → ∞ . Therefore, ifprices p ( θ ) are incoherent, i.e., D ( µ (cid:63) (cid:107) θ ) > , eventually we will have g ( µ t ) < D ( µ t (cid:107) θ ) .In fact, we can guarantee something stronger. Namely, given a fixed α ∈ (0 , , we willreach an iteration when g ( µ t ) ≤ (1 − α ) D ( µ t (cid:107) θ ) . At this point, our profit guarantee is at least D ( µ t (cid:107) θ ) − g ( µ t ) ≥ αD ( µ t (cid:107) θ ) ≥ αD ( µ (cid:63) (cid:107) θ ) thanks to the optimality of µ (cid:63) . This means that we are extracting at least an α -fractionof the available arbitrager profits; this covers the first stopping condition and the firstoutput case of Algorithm 2. On the other hand, if the prices p ( θ ) are coherent or close-to-coherent, then D ( µ t (cid:107) θ ) will eventually drop below our convergence threshold ε D ,which we can set arbitrarily small. Since D ( µ (cid:63) (cid:107) θ ) ≤ D ( µ t (cid:107) θ ) , this covers the secondstopping condition and the second output case of Algorithm 2. The final case followsdirectly from Proposition 4.1. The goal here is to find a point u ∈ M where coordinates corresponding to unsettledsecurities are strictly between 0 and 1. In the process, we also obtain the initial set ofactive vertices and an extended partial outcome ˆ σ . To construct u , Algorithm 3 iterateshrough coordinates i that have not been settled in the provided partial outcome σ , andcalls the IP solver to find a valid vector ˆ z that is consistent with σ , but also has the i -thcoordinate equal to b = 0 or b = 1 . If the IP solver fails to find such ˆ z for either value b ,it means that the i -th coordinate can be settled to − b . Otherwise the found ˆ z is addedto the set of active vertices. This guarantees that each coordinate i is either presentin ˆ σ , or the active set contains some valid vertices with both the value and at the i -th coordinate. Therefore, the average of the active vertices satisfies the requirementfor u . If the active set is empty, it means that all of the securities have been settled andthe unique valid vector consistent with ˆ σ satisfies the requirement.
5. EXPERIMENTS5.1. Data description
Our data consists of bets made in
Predictalot , a combinatorial prediction market runby Yahoo! in 2010 for the NCAA Men’s Division I Basketball Tournament, commonlyknown as
March Madness . The tournament lasted from March 18th to April 5th, 2010.It consisted of 64 teams playing a single-elimination tournament over 6 rounds. Ineach round, half of the remaining teams were eliminated. Traders were allowed to buysecurities at any point in time throughout the tournament; the first bets were placedfour days prior to the tournament start and the last bets were placed towards the endof the final match. Many bets referred to groupings of teams, known as conferences,brackets or seeds (e.g., there are sixteen seed levels and four teams to each seed).There were 93 036 bets placed altogether on many different securities in
Predictalot .Our experiments focus on a large subset of these, which we briefly describe here.The largest group of bets (56%) can be expressed as bundles over atomic tournamentvariables (winners of individual games, and the number of wins of individual teams).These include bets such as “
Duke wins exactly 3 games”, “
Cornell exits in round 2 orlater”, “a team from the
Big Ten conference wins the championship”. In addition tothese bets, we also supported combinatorial bets for comparisons of the number of winsof single teams, e.g., “
Duke wins more games than
Cornell ”, and comparisons of thenumber of wins by teams from different conferences, e.g., “teams from
Big Ten win moregames than teams from
Big East ”. These were implemented as comparison variablesderived from pairs of atoms, and pairs of sums, respectively. The two comparison typesencompass 12% of original bets.Our resulting dataset contains 63 689 bets, constituting 68% of all bets in the originalmarket. Combinatorial bets (comparisons) make up 17% of our final dataset. Thethree largest groups of bets we did not include were: “team t wins more games than t , and t wins more games than t ” (6%); “the number of upsets in round r will beless than/equal to/greater than c ” (3%); and “the sum of seeds in round r will be lessthan/equal to/greater than c ” (3%). Price initialization.
Our dataset contains realized trades, but we have no other pricedata from the run of the market. In particular, the initial
Predictalot prices were notavailable, so we used the following scheme to initialize atomic tournament variables X t (the number of wins of team t ) and G r,t (the outcome of a game). We considered betswithin the 6 hour time window starting at 27 hours and ending at 21 hours before the Securities in the
Predictalot market were priced using the Monte Carlo method with importance samplingagainst a dynamic proposal distribution. One of the larger issues, which we do not expect with the optimizationmethods presented in this work, was substantial price volatility as the tournament progressed, due to anincreasing mismatch between the market belief and the proposal distribution. In order to avoid trivialarbitrage, independent samples were drawn to form the prices quoted to the traders, and the actual pricesimposed on trade executions. As a result, some trades transacted at prices significantly different than quoted.[D. Pennock, personal communication, Feb. 22, 2016] rst match of the tournament. Let µ (cid:48) denote the price at which securities were sold inthis window (we use last such price if multiple exist). To initialize the game variables G r,t , we use the prices of bets on the champion of the tournament (i.e., X t = k ): µ { G r,t = t } = µ (cid:48) { X t = k } (cid:80) t (cid:48) ∈ T µ (cid:48) { X t (cid:48) = k } , where T is the set of all teams that can reach the game G r,t ; if the denominator equalszero, we initialize prices µ { G r,t = t (cid:48) } across t (cid:48) ∈ T to a uniform distribution. To initializesecurities X t = x , we proceeded as follows. If µ (cid:48) { X t = x } is present, we use thatas the initialization price, otherwise we use the difference between µ (cid:48) { G x,t = t } and µ (cid:48) { G x +1 ,t = t } , where we replace one or both of these terms by our already calculatedprices according to µ whenever the µ (cid:48) prices are not present. The resulting prices arethen normalized to sum to one for each X t . The team and game prices are then projectedon the polytope described by LCMM constraints to obtain market initialization. Settling outcomes.
Similar to initialization prices, the times when the individualgames were settled were not available, so we handcrafted a dataset consisting of allgame start times (to the best of our knowledge, end times are not listed anywhere)and settled each game 100 minutes after the game start. The choice of 100 minutes isconservative, based on the anecdotal observation that the shortest NCAA games lastabout 120 minutes, including the time for commercials and timeouts. We compare three market treatments: independent markets (IND), the linearly con-trained market maker (LCMM), and a market maker with both linear constraints andBregman projections for arbitrage removal (FWMM). Each market maker builds uponand extends the previous one. Recall that in IND, we use LMSR to price the securitiesassociated with each random variable, but prices for separate variables vary indepen-dently, even if the underlying events are related. LCMM enforces price relationshipsacross random variables using linear constraints, and FWMM adds projection stepsonto the marginal polytope. The market makers were implemented in Java, usingGurobi Optimizer 5.5 to solve the integer programs in the FW algorithm. We refer toour implementation as the (market) engine.We evaluate the three market makers by a counterfactual replay of the trades placedin Predictalot . All the market makers depend on the liquidity parameter b (see Eq. 4).Rather than optimizing b , we used a fixed liquidity of 150 and varied each trader’sbudget. (The effect is equivalent, as increasing the budget increases price responsivenessto the trade orders.) Each trade order is viewed as a new agent, so the budget is constantfor each trade. We used budget levels 0.1, 1, 10, 100, and 1000.For each trade, the Predictalot dataset contains the number of shares purchased andthe total cost paid. By taking the average price per share ¯ p , we obtain a lower bound on the trader’s probability estimate when the trade was placed. From this we create alimit order for our market engine by drawing a limit price uniformly from [¯ p, , andproviding the constant budget level mentioned previously. A limit order states that thetrader wishes to purchase shares until either the market price reaches the limit price,or the budget is exhausted, whichever occurs first. Any sell orders with average price ¯ p were transformed into buy orders of the complementary bundle, at price − ¯ p , and thenconverted into limit orders. By using three different seeds for the randomization, wegenerated three input files for the market engine. All market makers were run on all Source: espn.com, e.g., http://scores.espn.go.com/ncb/boxscore?gameId=300950150 hree input files. As the results were highly consistent across the randomization seeds,we found three replicates to be sufficient.To summarize, we ran the three different market makers (IND, LCMM, FWMM) atfive budget levels (0.1, 1, 10, 100, 1000) over the three randomly generated input files.During a market run, the engine records summaries of security prices and prices ofall purchased bundles. These summaries are generated at regular intervals, includingevery hour and every 100 trades. We use the log likelihood to assess the accuracy of thesecurity prices, viewed as probability forecasts, at a given point in time. Let µ be theprice vector. We consider log likelihoods associated with two different kinds of events.First is the log likelihood assigned to the final realized value x (cid:63) of a variable X , whichequals log µ { X = x (cid:63) } . Second is the log likelihood corresponding to the bundle of theform X ∈ E , viewed as a binary variable (the event occurs or not), which is defined as { x (cid:63) ∈ E} log µ { X ∈ E} + 1 { x (cid:63) (cid:54)∈ E} log µ { X (cid:54)∈ E} . A larger log likelihood indicates a better forecast. We report the average log likelihoodover all variables, and the average log likelihood over all purchased bundles. The formercan be viewed as an average accuracy of the market, the latter is weighted towards thepart of the market that sees more trading.
Effect of liquidity.
We first examine the effect of varying the budget level (equivalently,liquidity) on the overall performance of the three market makers. Fig. 2 provides theaverage prediction accuracy of the three market makers over variables and bundles,where the average is taken over all hourly summaries. The plots show the expectedtrends: when budget is too low, traders cannot incorporate their information into themarket, while when budget is too high, prices are too sensitive to individual trades.The optimal budget setting is 10 for IND and LCMM, and 100 for FWMM. However,both LCMM and FWMM are far less sensitive to the budget level than IND, becauseinformation propagation (via constraints) can correct wrong bets.The improvement of FWMM over LCMM for variables ranges from . to . , witha median of . over all budget levels and random seeds. For bundles, the improvementranges from . to . , with a median of . . For the time period covering the first16 games, LCMM and FWMM are very similar (see the next section), bringing theiraverage performance closer together; excluding these games, the median improvementincreases from . to . for securities, and from . to . for bundles. Becauseaccuracy here is averaged over all hourly summaries, it is implictly weighed by duration,which is hard to interpret. To obtain a more fine-grained view, we next consider theevolution of market accuracy over time. Accuracy over time.
Fig. 3 plots the prediction accuracy of the three markets as timeprogresses. We set a time limit of 30 minutes for Bregman projection. The first timeit successfully completes is only at time stamp ‘2010-03-21 13:58:50’, after 45 gamesare already settled. We therefore begin the plot at time stamp ‘2010-03-19 00:00:00’,corresponding to 16 settled games, as there is very little difference between LCMM andFWMM before that point. The reason FWMM still exceeds LCMM on occasion beforethe first projection is due to the extension of the partial outcome afforded by the IP, asexplained in Sec. 4.Each point of the time series represents an average over all variables or bundlesdefined at that time, including those whose outcomes have been settled. This explainsthe upwards trends of the plots, culminating at accuracy 0 (a perfect score). The trendis not entirely monotonic, as we see from the bundle log likelihood in the stretch afterMarch 22. The dotted vertical lines indicate the beginning of days on which games areplayed. On such days, we see that accuracy is initially stable, then sharply increases asthe games take course and their outcomes are settled. udget l og li k e li hood −0.7−0.6−0.5−0.4 10 - lll lll lll lll lll FWMM LCMM IND l budget l og li k e li hood −0.35−0.30−0.25−0.20 10 - lll lll lll lll lll FWMM LCMM IND l Fig. 2 . Market maker accuracy, varying the budget level.
Left: average over variables.
Right: average overbundles. Average taken over all hourly logs and all random seeds. time l og li k e li hood −0.6−0.5−0.4−0.3−0.2−0.10.0 Mar 22 Mar 29 Apr 05 FWMM LCMM IND time l og li k e li hood −0.3−0.2−0.10.0 Mar 22 Mar 29 Apr 05 FWMM LCMM IND
Fig. 3 . Market maker accuracy over time, at budget level 10.
Left: average over variables.
Right : averageover bundles. Average taken over all variables or bundles defined at that time, over all random seeds. Thedotted vertical lines are at 00:00 on days when games are played. The solid vertical line indicates the start ofprojections in FWMM.
In Fig. 3, we see that once Bregman projections successfully complete, the improve-ment of FWMM over LCMM becomes sustained. The accuracy improvements from thispoint onwards range from to for variables, with a median of over all hourlysummaries. The improvements range from to for bundles, with a median of .
6. DISCUSSION AND CONCLUSION
In our experiments, FWMM outperformed LCMM once the outcome space was suf-ficiently reduced, via settled securities, to allow computing of Bregman projectionswithin 30 minutes on a standard workstation. This time limit yielded a manageablexperimental turnaround, with about 5 hours to execute the trades that originallyspanned 22 days. In practice, a market designer can allow longer computation and usemore powerful hardware, and expect improvements for larger problem sizes.Several approaches could further speed up our framework. For instance, FW can beused to construct separating hyperplanes to tighten the outer LCMM approximation,and thereby contribute to arbitrage removal even when there is no time to compute theprojection. Also, instead of solving IPs to optimality in each iteration, it may be possibleto interleave IP with local search to obtain additional descent vertices. Since IP is byfar the most time-consuming part of FW, this could yield substantial speedups.
REFERENCES
Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. 2011. An optimization-basedframework for automated market-making. In
EC-11 .David Belanger, Dan Sheldon, and Andrew McCallum. 2013. Marginal Inference in MRFs usingFrank-Wolfe. In
NIPS 2013 workshop on Greedy Optimization, Frank-Wolfe and Friends .Joyce Berg, Robert Forsythe, Forrest Nelson, and Thomas Rietz. 2008. Results from a DozenYears of Election Futures Markets Research. In
Handbook of Exp. Econ. Results .Dimitri P. Bertsekas. 2015.
Convex Optimization Algorithms .Robert Charette. 2007. An Internal Futures Market.
Information Management (March 2007).Yiling Chen, Lance Fortnow, Nicolas Lambert, David M. Pennock, and Jennifer Wortman. 2008.Complexity of Combinatorial Market Makers. In
EC-08 .Yiling Chen, Lance Fortnow, Evdokia Nikolova, and David M. Pennock. 2007. Betting onPermutations. In
EC-07 .Yiling Chen, Sharad Goel, and David M. Pennock. 2008. Pricing Combinatorial Markets forTournaments. In
STOC-08 .Yiling Chen and David M. Pennock. 2007. A Utility Framework for Bounded-Loss Market Makers.In
UAI-07 .Miroslav Dud´ık, Rafael Frongillo, and Jennifer Wortman Vaughan. 2014. Market Making withDecreasing Utility for Information. In
UAI-14 .Miroslav Dud´ık, S´ebastien Lahaie, and David M. Pennock. 2012. A Tractable CombinatorialMarket Maker Using Constraint Generation. In
EC-12 .Miroslav Dud´ık, S´ebastien Lahaie, David M. Pennock, and David Rothschild. 2013. A Combinato-rial Prediction Market for the U.S. Elections. In
EC-13 .Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming.
Navalresearch logistics quarterly
3, 1-2 (1956).Robin D. Hanson. 2003. Combinatorial information market design.
Information Systems Frontiers
5, 1 (2003).Robin D. Hanson. 2007. Logarithmic market scoring rules for modular combinatorial informationaggregation.
Journal of Prediction Markets
1, 1 (2007).Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In
ICML-13 .Rahul G. Krishnan, Simon Lacoste-Julien, and David Sontag. 2015. Barrier Frank-Wolfe formarginal inference. In
NIPS-15 .Yurii Nesterov. 2007.
Gradient methods for minimizing composite objective function . TechnicalReport. UCL.David M. Pennock, Steve Lawrence, C. Lee Giles, and Finn A. Nielsen. 2002. The real power ofartificial markets.
Science
291 (2002).Martin Spann and Bernd Skiera. 2003. Internet-Based Virtual Stock Markets for BusinessForecasting.
Management Science
49, 10 (2003).Lirong Xia and David M. Pennock. 2011. An Efficient Monte-Carlo Algorithm for Pricing Combi-natorial Prediction Markets for Tournaments. In
IJCAI-11 . PPENDIXA. PROOF OF PROPOSITION 2.4
We first calculate the largest possible guaranteed profit: sup δ ∈ R I min z ∈Z (cid:2) δ · z − C ( θ + δ ) + C ( θ ) (cid:3) = sup θ (cid:48) ∈ R I min µ ∈M (cid:2) ( θ (cid:48) − θ ) · µ − C ( θ (cid:48) ) + C ( θ ) (cid:3) = min µ ∈M sup θ (cid:48) ∈ R I (cid:2) ( θ (cid:48) − θ ) · µ − C ( θ (cid:48) ) + C ( θ ) (cid:3) (7) = min µ ∈M (cid:2) R ( µ ) − θ · µ + C ( θ ) (cid:3) (8) = min µ ∈M D ( µ (cid:107) θ ) = D ( µ (cid:63) (cid:107) θ ) , (9)where Eq. (7) follows by Sion’s minimax theorem and Eqs. (8) and (9) from definitionsof the convex conjugate and Bregman divergence, respectively. This shows that fromthe state θ the guaranteed profit is at most D ( µ (cid:63) (cid:107) θ ) . Recall that δ (cid:63) is any trade that moves the market to a state θ (cid:63) such that p ( θ (cid:63) ) = µ (cid:63) .We next show that δ (cid:63) is an optimal trade, i.e., that this trade gives a profit that isat least D ( µ (cid:63) (cid:107) θ ) . Let F ( µ ) := D ( µ (cid:107) θ ) . Since µ (cid:63) optimizes F on M , by the first orderoptimality, we have for any u ∈ ∂F ( µ (cid:63) ) and z ∈ Z that u · ( z − µ (cid:63) ) ≥ . Since p ( θ (cid:63) ) = µ (cid:63) ,the conjugacy implies that θ (cid:63) ∈ ∂R ( µ (cid:63) ) and thus ( θ (cid:63) − θ ) ∈ ∂F ( µ (cid:63) ) , so the first orderoptimality yields ≤ ( θ (cid:63) − θ ) · ( z − µ (cid:63) ) , which rearranges to ( θ (cid:63) − θ ) · z ≥ ( θ (cid:63) − θ ) · µ (cid:63) . (10)The profit from the trade δ (cid:63) given any outcome ω is therefore at least ( θ (cid:63) − θ ) · φ ( ω ) − C ( θ (cid:63) ) + C ( θ ) ≥ ( θ (cid:63) − θ ) · µ (cid:63) − C ( θ (cid:63) ) + C ( θ )= R ( µ (cid:63) ) − θ · µ (cid:63) + C ( θ )= D ( µ (cid:63) (cid:107) θ ) , where the first line follows by substituting φ ( ω ) for z in Eq. (10), the second line followsfrom the conjugacy of R and C , and the third line from the definition of D , completingthe proof. B. BOUNDED LOSS PROPERTY UNDER GRADUAL REVELATION OF OUTCOME
We show that the bound on the worst-case loss of the cost C is maintained if we updatethe cost function using a sequence of partial outcomes, gradually revealing the finaloutcome ω . We begin with the worst-case bound on the loss under cost C :P ROPOSITION
B.1.
If the initial market state is θ then the worst-case loss of amarket-maker using C is max ω ∈ Ω D ( φ ( ω ) (cid:107) θ ) . P ROOF . Let θ denote the final state before the outcome ω is revealed. Then themarket maker has collected C ( θ ) − C ( θ ) as the revenue for the sold shares, and needso pay out ( θ − θ ) · φ ( ω ) as a payoff to the traders. The worst-case loss is therefore max ω ∈ Ω sup θ (cid:104) ( θ − θ ) · φ ( ω ) − (cid:0) C ( θ ) − C ( θ ) (cid:1)(cid:105) = max ω ∈ Ω sup θ (cid:104)(cid:0) θ · φ ( ω ) − C ( θ ) (cid:1) − θ · φ ( ω ) + C ( θ ) (cid:105) = max ω ∈ Ω (cid:104) R (cid:0) φ ( ω ) (cid:1) − θ · φ ( ω ) + C ( θ ) (cid:105) = max ω ∈ Ω D ( φ ( ω ) (cid:107) θ ) . Now, we will analyze the case with partial outcomes. We assume that the initialpartial outcome σ = ∅ , and that the market goes through a sequence of partial outcomes σ ⊆ σ ⊆ · · · ⊆ σ T until finally an outcome ω is revealed, consistent with σ T . After therevelation of each σ t , the market-maker switches to the cost function C σ t . The initialmarket state is denoted θ and the market state in which the market switches to C σ t isdenoted θ t .P ROPOSITION
B.2.
If the initial market state is θ then, regardless of the sequence ofpartial outcomes σ , . . . , σ T , the worst-case loss of the market-maker using the sequenceof costs C σ t is max ω ∈ Ω D ( φ ( ω ) (cid:107) θ ) , i.e., the same as that of the market-maker using C without incorporating partial outcomes. P ROOF . Recall that the market state at the time of switch from C σ t − to C σ t is θ t .We first show that the value of the cost at the time of switch decreases: C σ t ( θ t ) = sup µ ∈ V σt [ θ · µ − R ( µ )] ≤ sup µ ∈ V σt − [ θ · µ − R ( µ )] = C σ t − ( θ t ) (11)where the middle inequality follows because V σ t ⊆ V σ t − . We are now ready to prove thebound on the worst-case loss. Let Ω( σ T ) denote the set of outcomes compatible with σ T ,and recall that σ = ∅ , so C σ ≡ C . Recall that θ t for t = 1 , . . . , T are the states of themarket when the cost becomes C σ t . Finally, let θ T +1 denote the final state. Then theworst-case loss of the market maker can be bounded as follows max σ ⊆ σ ⊆ ... ⊆ σ T max ω ∈ Ω( σ T ) sup θ ,..., θ T +1 (cid:34) ( θ T +1 − θ ) · φ ( ω ) − T (cid:88) t =0 (cid:0) C σ t ( θ t +1 ) − C σ t ( θ t ) (cid:1)(cid:35) = max σ ⊆ σ ⊆ ... ⊆ σ T max ω ∈ Ω( σ T ) sup θ ,..., θ T +1 (cid:20) ( θ T +1 − θ ) · φ ( ω ) − (cid:0) C σ T ( θ T +1 ) − C σ ( θ ) (cid:1) − T (cid:88) t =1 (cid:0) C σ t − ( θ t ) − C σ t ( θ t ) (cid:1)(cid:21) (12) ≤ max σ T max ω ∈ Ω( σ T ) sup θ T +1 (cid:104) ( θ T +1 − θ ) · φ ( ω ) − C σ T ( θ T +1 ) + C σ ( θ ) (cid:105) (13) = max σ T max ω ∈ Ω( σ T ) (cid:104) R (cid:0) φ ( ω ) (cid:1) − θ · φ ( ω ) + C ( θ ) (cid:105) (14) = max ω ∈ Ω D ( φ ( ω ) (cid:107) θ ) . (15)Eq. (12) follows by rearranging the terms. Eq. (13) follows by Eq. (11). Eq. (14) followsbecause the convex conjugate of C σ T is R σ T ( µ ) = I { µ ∈ V σ T } + R ( µ ) , and R σ T (cid:0) φ ( ω ) (cid:1) = R (cid:0) φ ( ω ) (cid:1) thanks to the compatibility of ω with σ T . Finally, Eq. (15) follows from thedefinition of Bregman divergence, completing the proof. . DIFFERENTIABILITY AND CONTROLLED GROWTH OF R The algorithm used by our market maker requires a differentiable objective whosegradient does not grow too fast as it approaches the boundary of M . Note that for LMSR,the Bregman divergence is formally not even differentiable in its first argument (it issubdifferentiable). So, in addition to requiring the controlled growth of the gradient, wealso need to assume that R can be extended into a differentiable function. Specifically,we say that ¯ R : R I → ( −∞ , ∞ ] is a convex extension of R if ¯ R is convex and coincideswith R wherever R < ∞ . We require existence of an extension with the controlledgrowth property in the following sense: Definition
C.1 . Let
S ⊆ [0 , n be a compact convex set. We say that a convex function F exhibits controlled growth on S if it is differentiable on S ∩ (0 , n and if there exists afixed p ≥ and L ≥ such that for any ε > , the gradient ∇ F has a bounded Lipschitzconstant L ε ≤ Lε − p over S ∩ [ ε, − ε ] n . Assumption
C.2 . R has a convex extension ¯ R such that for all partial outcomes σ ,when ¯ R is viewed as a function on V σ , it exhibits controlled growth on M σ .We write ¯ R σ for the restriction of ¯ R to V σ . Note that this restriction is formally a functiondefined on a space of dimension |I\I σ | and thus, formally, ∇ ¯ R σ has the dimension |I\I σ | . We extend ∇ ¯ R σ into a vector in R I by inserting zeros at coordinates i ∈ I σ . Akey consequence of this construction is that for any partial outcome σ and all µ ∈ M σ such that µ i ∈ (0 , for i (cid:54)∈ I σ , the gradient ∇ ¯ R σ ( µ ) is defined, and ∇ ¯ R σ ( µ ) ∈ ∂R σ ( µ ) .As a result, we have that θ = ∇ ¯ R σ ( µ ) implies that ∇ C σ ( θ ) = µ (but not vice versa).Assumption C.2 can be verified for instance by upper-bounding the operator norm ofthe Hessian, which directly upper-bounds the Lipschitz constant of the gradient. Example
C.3 . Controlled growth for LMSR.
We define the extension of negativeentropy over the non-negative orthant, ¯ R ( µ ) = I { µ ≥ } + (cid:80) i ∈I µ i log µ i , which yields ¯ R σ ( µ ) = I { µ i ≥ for all i (cid:54)∈ I σ } + (cid:80) i (cid:54)∈I σ µ i log µ i . The Hessian is a diagonal matrix withentries /µ i , so its operator norm is max i (cid:54)∈I σ /µ i , and thus L ε = O (1 /ε ) , which satisfiesthe controlled growth condition with p = 1 . D. PROOF OF PROPOSITION 4.1
The guaranteed profit when moving from θ to ˆ θ is min ω ∈ Ω (cid:104) ( ˆ θ − θ ) · φ ( ω ) − C ( ˆ θ ) + C ( θ ) (cid:105) = min µ ∈M (cid:104) ( ˆ θ − θ ) · µ − C ( ˆ θ ) + C ( θ ) (cid:105) (16) = min µ ∈M (cid:104) ( ˆ θ − θ ) · ( µ − ˆ µ ) + ˆ θ · ˆ µ − C ( ˆ θ ) − θ · ˆ µ + C ( θ ) (cid:105) = min µ ∈M (cid:104) ( ˆ θ − θ ) · ( µ − ˆ µ ) + R ( ˆ µ ) − θ · ˆ µ + C ( θ ) (cid:105) (17) = D ( ˆ µ (cid:107) θ ) − g ( ˆ µ ) . (18)Eq. (16) follows because the minimized objective is linear in φ ( ω ) . Eq. (17) follows fromthe definition of R . Finally, Eq. (18) follows because ∇ F ( ˆ µ ) = ∇ ¯ R ( ˆ µ ) − θ = ˆ θ − θ , andhence g ( ˆ µ ) = max µ ∈M (cid:104) ( ˆ θ − θ ) · ( ˆ µ − µ ) (cid:105)(cid:105)