[PDF] Incentive-Compatible Forecasting Competitions

Abstract

We initiate the study of incentive-compatible forecasting competitions in which multiple forecasters make predictions about one or more events and compete for a single prize. We have two objectives: (1) to incentivize forecasters to report truthfully, so that forecasts are informative and forecasters need not spend any cognitive effort strategizing about reports, and (2) to award the prize to the most accurate forecaster. Proper scoring rules incentivize truthful reporting if all forecasters are paid according to their scores. However, incentives become distorted if only the best-scoring forecaster wins a prize, since forecasters can often increase their probability of having the highest score by reporting more extreme beliefs. In this paper, we introduce two novel forecasting competition mechanisms. Our first mechanism is dominant strategy incentive compatible and guaranteed to select the most accurate forecaster with probability higher than any other forecaster. Moreover, we show that in the standard single-event, two-forecaster setting and under mild technical conditions, no other incentive-compatible mechanism selects the most accurate forecaster with higher probability. Our second mechanism is incentive compatible when forecasters' beliefs are such that information about one event does not lead to a belief update on the other events, and it selects the best forecaster with probability approaching 1 as the number of events grows. Our mechanisms are easy to implement and can be generalized to the related problems of outputting a ranking over forecasters and hiring a forecaster with high accuracy on future events.

Full PDF

aa r X i v : . [ c s . G T ] J a n Incentive-Compatible Forecasting Competitions † Jens Witkowski

Frankfurt School of Finance & Management, Frankfurt, Germany, [email protected]

Rupert Freeman

University of Virginia, Charlottesville, VA, USA, [email protected]

Jennifer Wortman Vaughan

Microsoft Research, New York, NY, USA, [email protected]

David M. Pennock

Rutgers University, New Brunswick, NJ, USA, [email protected]

Andreas Krause

ETH Zurich, Zurich, Switzerland, [email protected]

We initiate the study of incentive-compatible forecasting competitions in which multiple forecasters makepredictions about one or more events and compete for a single prize. We have two objectives: (1) to incentivizeforecasters to report truthfully, so that forecasts are informative and forecasters need not spend any cognitiveeﬀort strategizing about reports, and (2) to award the prize to the most accurate forecaster. Proper scoringrules incentivize truthful reporting if all forecasters are paid according to their scores. However, incentivesbecome distorted if only the best-scoring forecaster wins a prize, since forecasters can often increase theirprobability of having the highest score by reporting more extreme beliefs. In this paper, we introduce twonovel forecasting competition mechanisms. Our ﬁrst mechanism is dominant strategy incentive compatibleand guaranteed to select the most accurate forecaster with probability higher than any other forecaster.Moreover, we show that in the standard single-event, two-forecaster setting and under mild technical condi-tions, no other incentive-compatible mechanism selects the most accurate forecaster with higher probability.Our second mechanism is incentive compatible when forecasters’ beliefs are such that information about oneevent does not lead to a belief update on the other events, and it selects the best forecaster with probabilityapproaching 1 as the number of events grows. Our mechanisms are easy to implement and can be generalizedto the related problems of outputting a ranking over forecasters and hiring a forecaster with high accuracyon future events.

Key words : forecasting, data science, incentives, mechanism design

1. Introduction

The study of probabilistic predictions dates back to at least the 1950s when meteorologists devel-oped proper scoring rules as a way to both incentivize truthful forecasts about future events andcompare the relative accuracy of diﬀerent forecasters (Brier 1950, Good 1952). Proper scoring rules † This paper is a signiﬁcantly extended version of Witkowski et al. (2018). Witkowski et al.:

Incentive-Compatible Forecasting Competitions are still widely used today to motivate and measure forecasting accuracy (e.g., Atanasov et al. 2017)as well as an active area of research in decision analysis (e.g., Jose 2017, Grushka-Cockayne et al.2017).When forecasters are paid their proper scores, they maximize expected payment by truthfullyreporting their beliefs. However, it is rare to see proper scoring rule payments outside of experimen-tal labs. Instead, the majority of real-world forecasting settings are competitions, where forecastersare ranked according to their score and where prizes are given only to the highest-ranked fore-casters. Hence, forecasters do not care about maximizing their expected score, but about whethertheir forecasts are judged to be better than others’. For example, in the Good Judgment Project, arecent geopolitical forecasting tournament, the top 2% of forecasters were awarded so-called “super-forecaster” status (Tetlock and Gardner 2015), which (on top of bragging rights) gave them fulltravel reimbursement to a superforecaster conference. In play-money prediction markets, forecastersoften compete for a place at the top of a leaderboard (e.g., Servan-Schreiber et al. 2004). And thesame phenomenon holds for algorithmic forecasters; Netﬂix oﬀered $ and the machine learning competitions run by Kaggle rank submitted models basedon how well they predict the labels of data points from an undisclosed test set. One of Kaggle’smain uses today is for recruiters to hire the developers of the best-performing algorithms (Harris2013, Chakraborty 2016).There are good reasons for organizations to run forecasting competitions as opposed to directlypaying each forecaster her proper score. First, from a marketing perspective awarding a single, largeprize to the winner is more enticing than oﬀering small payments to everyone. For example, it isunlikely that the Netﬂix Prize would have created the same media buzz without oﬀering participantsthe prospect of winning $ itkowski et al.: Incentive-Compatible Forecasting Competitions forecaster, with a parameter trading oﬀ these two components. They show that when forecastersoptimize for their relative rank, they typically want to report more extreme probabilities thanthose corresponding to their true beliefs.This kind of misreporting is not a purely academic possibility but is also observed in real-worldforecasting competitions. An example is Kaggle’s annual machine learning competition to predictthe game outcomes of the NCAA March Madness college basketball tournament, where everyparticipant submits up to two statistical models predicting the outcomes of each possible teampairing. At the end of the 2017 competition, Andrew Landgraf, the creator of that year’s winningmodel was interviewed by the Kaggle team about his approach, saying (Kaggle 2017): “ My ideawas to model not only the probability of each team winning each game, but also the competitors’submissions. Combining these models, I searched for the submission with the highest chance ofﬁnishing with a prize (top 5 on the leaderboard). [...] The three main processes are [...]: (1) Amodel of the probability of winning each game, (2) a model of what the competitors are likely tosubmit, and (3) an optimization of my submission based on these two models. ” While rational froma forecaster’s point of view, this strategic behavior creates two problems for organizations that runforecasting competitions in order to obtain accurate forecasts: ﬁrst, the reported forecasts are nottruthful and hence not optimized for accuracy but for “winning the game.” Second, each forecasterresponding to the gaming incentives spends signiﬁcant eﬀort on strategizing and predicting otherforecasters’ behavior instead of investing full eﬀort into acquiring the most accurate prediction forthe event in question.In this paper, we initiate the study of incentive-compatible forecasting competitions. After show-ing that the failure to provide strict truthfulness incentives is inherent to any deterministic forecast-ing competition mechanism, we present the E vent L otteries F orecasting Competition Mechanism(ELF). ELF borrows a trick from the competitive scoring rule of Kilgour and Gerchak (2004),which truthfully elicits probabilistic forecasts for single events. Under Kilgour and Gerchak’s mech-anism, a forecaster’s payment depends on her relative performance (measured by a proper scoringrule) compared with other forecasters. Speciﬁcally, her total payment is the diﬀerence betweenher own score and the average score of all other forecasters. For a single event, ELF uses a sim-ilar idea to compute scores for all forecasters that are non-negative and sum up to 1. Treatingthese scores as a probability distribution over forecasters, ELF then runs a lottery to determinethe winner of the prize. For the prominent single-event, two-forecaster setting, as also studiedby Lichtendahl and Winkler (2007), we prove that, under mild technical conditions, there existsno other incentive-compatible mechanism that selects the more accurate forecaster with higherprobability. Witkowski et al.:

Incentive-Compatible Forecasting Competitions

Our second mechanism is the Independent-Event Lotteries Forecasting Competition Mechanism(I-ELF), which is speciﬁcally designed for multiple, independent events, and strictly incentivecompatible when forecasters’ beliefs are such that information about one event does not lead to abelief update on the other events. I-ELF runs one ELF lottery for each individual event, eventuallyawarding the prize to the forecaster who has won the most event lotteries. As the number of eventsgrows, I-ELF selects the most accurate forecaster with probability approaching 1. Moreover, bothELF and I-ELF are robust towards unknown risk preferences and our techniques generalize to othernatural settings, such as the incentive-compatible ranking of forecasters and hiring a forecasterwith high accuracy on future events.The question of how to aggregate forecasts has been studied extensively in the decision analy-sis community (e.g., Satop¨a¨a et al. 2014, Palley and Soll 2019). We emphasize that using ELF orI-ELF as incentive schemes does not restrict the choice of whether and how to aggregate forecastsonce they have been elicited. Indeed, a forecasting competition mechanism is not a substitute fora forecast aggregation algorithm, but a complement. Lichtendahl et al. (2013) show that under acommonly-known public-private signal model, a simple average of “gamed” forecasts is more accu-rate than a simple average of truthful forecasts. However, state-of-the-art aggregation algorithms,such as the extremized mean (Atanasov et al. 2017) and the logit aggregator (Satop¨a¨a et al. 2014),consistently outperform simple averaging in practice and can take advantage of truthful reports.

2. Model

We consider a group of n ≥ i ∈ [ n ] = { , . . . , n } , and m events, indexed by k ∈ [ m ] = { , . . . , m } . We model these as m random variables X k that take values in { , } , and wesay that “event k occurred” if X k = 1 and that “event k did not occur” if X k = 0. Independent ofthe event’s outcome, we say that “event k materialized.” Let X denote the vector-valued randomvariable of event outcomes and x = (cid:0) x , . . . , x k , . . . , x m (cid:1) its realization. Every forecaster i has asubjective belief p i,k ∈ [0 ,

1] of the probability that event k will occur. We denote the vector offorecaster i ’s subjective beliefs over all m events as p i = (cid:0) p i, , . . . , p i,k , . . . , p i,m (cid:1) ∈ [0 , m . All fore-casters report their probabilistic forecasts for all events at the same time, before the ﬁrst eventmaterializes. (In Section 6.5, we discuss how this assumption can be relaxed for practical purposes.)The reported forecast of forecaster i for event k is denoted by y i,k ∈ [0 , y i,k = p i,k ) but does not have to be, and we denote the vector of i ’sreported forecasts as y i = (cid:0) y i, , . . . , y i,k , . . . , y i,m (cid:1) ∈ [0 , m . In settings with only a single event, i.e., m = 1, we drop the subscript k denoting the event from the event outcomes as well as from theforecasters’ reports and beliefs. Once all m events have materialized, the mechanism selects one ofthe n forecasters as the “winner.” The selection is based on the event outcomes and all forecasters’reports on all events. We allow this selection to be randomized. itkowski et al.: Incentive-Compatible Forecasting Competitions Definition 1. A forecasting competition mechanism M takes all forecasters’ reports on allevents y , . . . , y n ∈ [0 , m × · · · × [0 , m and the materialized outcomes of all events x ∈ { , } m ,and selects a forecaster M ( y , . . . , y n , x ) ∈ [ n ].In contrast to standard proper scoring rules, forecasters only care about being selected. Everyforecaster thus seeks to maximize the probability of being selected. The primary objective is toincentivize forecasters to report their true beliefs about the expectation of X . Incorporating fore-caster i ’s subjective beliefs, the uncertainty about other forecasters’ reports, and the mechanism’srandomization (if any), we obtain the following deﬁnition for strict incentive compatibility of amechanism. Definition 2.

Forecasting competition mechanism M is strictly incentive compatible if andonly if for all forecasters i ∈ [ n ], all belief vectors p i , all joint distributions D over outcomes X andreports Y − i such that E X ∼ D (cid:2) X (cid:3) = p i , and all alternative report vectors y ′ i = p i ,Pr X , Y − i ∼ D (cid:0) M ( Y , . . . , p i , . . . , Y n , X ) = i (cid:1) > Pr X , Y − i ∼ D (cid:0) M ( Y , . . . , y ′ i , . . . , Y n , X ) = i (cid:1) .Observe that this deﬁnition of incentive compatibility is very general, allowing us to capture,for instance, that forecaster i believes that j = i is a perfect forecaster while i knows that sheherself is not. In particular, our deﬁnition of incentive compatibility applies to standard Bayesianmodels, where the participants’ beliefs stem from noisy observations of some ground truth (e.g.,Lichtendahl and Winkler 2007); this is in contrast to previous work (Kilgour and Gerchak 2004,Lambert et al. 2008), which only requires truthful reporting to be optimal when the reports of otherforecasters are constant (i.e., with no dependence on each other or the event outcomes). Moreover,and crucially, we do not require p i to come from any particular parametric belief model. In Section 5,we will assume that the events X are known to be independent and also assume that this indepen-dence of events is reﬂected in the uncertainty about others’ reports. In contrast to previously stud-ied competitive forecasting settings, most notably those by Lichtendahl and Winkler (2007) andLichtendahl et al. (2013), we do not restrict our analysis to Bayes’ Nash equilibrium play. Instead,and in line with the literature on (single-forecaster) proper scoring rules (e.g., Gneiting and Raftery2007), the mechanisms we design obtain strict incentive compatibility in dominant strategies. Thatis, our objective is to provide strict incentives for truthful reports independent of the reports ofother forecasters.Also observe that we do not require the typical assumption that forecasters are risk neutral:every forecaster strictly prefers being selected over not being selected, so that the higher theprobability of being selected, the better. This idea is not new; previous work used lotteries toaddress unknown risk preferences of forecasters (Karni 2009, Lambert 2011). While we also rewardforecasters probabilistically (and obtain robustness to unknown risk preferences as a bonus), theprimary reason we use lotteries is because we have many forecasters but only a single prize to Witkowski et al.:

Incentive-Compatible Forecasting Competitions award. To the best of our knowledge, we are the ﬁrst to study this competitive lottery setting inthe context of forecasting.

3. Forecasting Competitions Using Standard Proper Scoring Rules

Consider a single forecaster and a single event X . A scoring rule R computes a payment thatdepends on the materialized event outcome x and the forecaster’s report y ∈ [0 ,

1] regarding theprobability that X = 1, paying the forecaster some amount R ( y, x ). Definition 3 (Strictly Proper Scoring Rule). A scoring rule R is a mapping fromreports y ∈ [0 ,

1] and outcomes x ∈ { , } to scores R ( y, x ) ∈ R ∪ {−∞} . A scoring rule R is strictlyproper if, for all y, p ∈ [0 ,

1] with y = p , it holds that E X ∼ p (cid:2) R ( p, X ) (cid:3) > E X ∼ p (cid:2) R ( y, X ) (cid:3) . R is bounded if there exist R, R ∈ R such that R ( y, x ) ∈ [ R, R ] for all y ∈ [0 , , x ∈ { , } . Proper scoring rule R is normalized if it is bounded between 0 and 1, and if R (0 ,

0) = R (1 ,

1) = 1 and R ( y, x ) = 0 for some y ∈ [0 ,

1] and x ∈ { , } .When clear from context, we will write R ∈ [0 ,

1] to refer to a scoring rule bounded between 0and 1. There exist inﬁnitely many proper scoring rules since any (strictly) convex function yieldsa (strictly) proper scoring rule (Gneiting and Raftery, 2007; Theorem 1). A widely used boundedscoring rule is the quadratic scoring rule (Brier 1950), which we will regularly refer to throughoutthe paper.

Proposition 1. (Brier 1950) The quadratic scoring rule R q ( y, x ) = 1 − ( y − x ) is strictlyproper. Bounded proper scoring rules used in practice are often already normalized. For example, boththe quadratic scoring rule and the spherical rule (e.g., Jose 2009) already are. We note that anybounded proper scoring rule R can be transformed into a normalized proper scoring rule ˜ R , andrefer the reader to Appendix A for details. A natural way to extend a strictly proper scoring rule R to a forecasting competition mechanismis to output the forecaster with highest score according to R , summed across all m events. Thismechanism is commonly used in practice to select top forecasters, including by the Good JudgmentProject (Tetlock and Gardner 2015) and FiveThirtyEight’s NFL Forecasting Game. Let M PSR R denote the mechanism derived in this way from proper scoring rule R . That is, M PSR R selects theforecasters with highest score, M PSR R ( y , . . . , y n , x ) ∈ arg max i ∈ [ n ] m X k =1 R ( y i,k , x k ) , with ties broken by forecaster index. https://projects.fivethirtyeight.com/2019-nfl-forecasting-game Other tie-breaking procedures are possible and our results do not rely on any particular one. itkowski et al.:

Incentive-Compatible Forecasting Competitions It is well known that selecting a forecaster according to highest proper scoring rule score mayproduce perverse incentives. In general, forecasters are incentivized to make over-conﬁdent reportsto increase their chance of being judged the best forecaster ex post for at least some outcomes.To see this, consider an event X and two forecasters who believe that X occurs with probability0.8 and 0.9, respectively. If both report their beliefs truthfully, the forecaster who reports 0.8achieves the highest score—and is therefore selected by the mechanism—whenever X = 0, whichshe believes to occur with probability 0.2. However, if she instead reports some y > .

9, she isselected by the mechanism whenever X = 1, which she believes to occurs with probability 0.8. Wepresent a more general example illustrating the failure of incentive compatibility of proper scoringrule selection for any n ≥ m ≥ Theorem 1.

No deterministic forecasting competition mechanism is strictly incentive compati-ble.

4. Incentive-Compatible Forecasting Competitions

Theorem 1 motivates the study of randomized forecasting competition mechanisms. In Section 4.1,to build intuition, we begin by considering the single-event setting ( m = 1) and introduce the EventLotteries Forecasting Competition Mechanism (ELF), a strictly incentive-compatible mechanism.In Section 4.2, we then show how to extend ELF to handle multiple, arbitrarily-correlated events.What needs to hold in order for a forecasting competition to be strictly incentive compatible?First note that strict incentive compatibility requires that, for any beliefs over outcomes X andreports Y − i , the probability f i of selecting forecaster i must behave like a strictly proper scoring rulefor i . If this is not the case, then i could increase her probability of being selected by misreporting.Thus we need strictly proper scoring rules for each forecaster that are non-negative and always sumto 1 so that they form a valid probability distribution. A natural ﬁrst attempt to achieve this wouldbe to use any strictly proper scoring rule, such as the quadratic scoring rule R q , and “normalize” bydividing by the sum of all forecasters’ scores. However, such a multiplicative normalization violatesincentive compatibility because the factor by which scores are normalized is 1/(sum of forecasters’scores), which may diﬀer between outcomes, causing forecasters to bias their predictions towardsless likely outcomes. For an example illustrating this phenomenon, see Appendix D.To get around this, we borrow a trick from the competitive scoring rule mechanism of Kilgourand Gerchak (2004), which takes advantage of the fact that incentive compatibility is preserved Witkowski et al.:

Incentive-Compatible Forecasting Competitions when adding or subtracting a function of other reports and the outcome. Using their mechanism,each forecaster’s payment is her score according to a proper scoring rule minus the average scoreof all other forecasters. Our Event Lotteries Forecasting Competition Mechanism (ELF) uses asimilar idea to normalize all forecasters’ scores additively , so that they are non-negative and sumup to 1. ELF then runs a lottery based on these scores to determine the winner of the prize.

For a single event, the

Event Lotteries Forecasting Competition Mechanism (ELF) M ELF R ( y , . . . , y n , x ) selects forecaster i ∈ [ n ] with probability f i ( y , . . . , y n , x ) = 1 n + 1 n (cid:18) R (cid:0) y i , x (cid:1) − n − X j = i R (cid:0) y j , x (cid:1)(cid:19) , (1)where R ∈ [0 ,

1] is a bounded strictly proper scoring rule. One can think of ELF as giving each forecaster a 1 /n probability to start with, adjusting thisup or down depending on how their performance compares to that of other forecasters. It is easyto see that the vector (cid:0) f , . . . , f n (cid:1) is a valid probability distribution: that each f i is non-negativefollows immediately from R being bounded in [0 , P ni =1 f i = 1 since n X i =1 f i = 1 + 1 n n X i =1 (cid:18) R (cid:0) y i , x (cid:1) − n − X j = i R (cid:0) y j , x (cid:1)(cid:19) = 1 + 1 n (cid:18) n X i =1 R (cid:0) y i , x (cid:1) − n − n − n X i =1 R (cid:0) y i , x (cid:1)(cid:19) = 1 . Generalizing the argument of Kilgour and Gerchak (2004) to incorporate Bayesian reasoningabout other forecasters, we can show that ELF is incentive compatible.

Theorem 2.

The Event Lotteries Forecasting Competition Mechanism M ELF R is strictly incen-tive compatible for m = 1 . We now consider a natural generalization of single-event ELF to multiple events. For multipleevents, ELF proceeds as follows after all events have materialized. M ELF R ( y , . . . , y n , x ) selectsforecaster i ∈ [ n ] with probability g i ( y , . . . , y n , x ) = 1 m m X k =1 f i,k , where f i,k = 1 n + 1 n (cid:18) R (cid:0) y i,k , x k (cid:1) − n − X j = i R (cid:0) y j,k , x k (cid:1)(cid:19) , (2)and where R ∈ [0 ,

1] is a bounded strictly proper scoring rule.This corresponds to running single-event ELF for every event, and selecting each forecasterwith probability equal to the average probability assigned to her across all events. Note that this Although our deﬁnition allows for any bounded R , we will see in Section 5 that the optimal accuracy guaranteesare achieved for normalized R . We drop the dependencies of each f i for clarity. itkowski et al.: Incentive-Compatible Forecasting Competitions procedure can equivalently be interpreted as sampling a single event uniformly at random, andawarding the prize to the forecaster selected by single-event ELF on that event. Strict incentivecompatibility of ELF then follows directly from strict incentive compatibility of single-event ELF. Theorem 3.

The Event Lotteries Forecasting Competition Mechanism M ELF R is strictly incen-tive compatible for m ≥ events.

5. Incentive-Compatible and Accurate Forecasting Competitions

The ELF mechanism from Section 4.2 is strictly incentive compatible for arbitrarily-correlatedevents. If (strict) incentive compatibility is the only objective, ELF provides a deﬁnitive solution.In many settings, however, the system designer strives for an additional objective, namely that theprize is awarded to the most accurate forecaster. Hence, in addition to incentive compatibility, theobjective in this work is to select the forecaster with the highest accuracy with as high a probabilityas possible, and ideally with probability tending to 1 as the number of events grows. Of course, onecould imagine other objectives, such as maximizing the expected accuracy of the selected forecasteror minimizing the accuracy gap between the selected and the best forecaster. We brieﬂy discussalternatives in Section 6.In judging accuracy, one needs to have a model for ground truth. Here, we borrow from statisti-cal learning theory and assume that event outcomes are drawn from an unknown joint probabilitydistribution θ over X , . . . , X m . The marginal probability that event k will occur is denoted by θ k ∈ [0 , θ k = x k . In Deﬁnition 3 on p. 6, proper scoring rules are deﬁned inan incentive spirit, as a tool for the incentive-compatible elicitation of subjective beliefs. In par-ticular, the expectation is taken with respect to a forecaster’s subjective belief p . Proper scoringrules also have an accuracy interpretation. If the expectation is taken with respect to the trueprobability θ k of event k occurring, then properness implies that reporting the true probabilityobtains a higher expected score than any other report. Reports that do not coincide with the trueprobability lead to lower expected scores, and diﬀerent proper scoring rules correspond to diﬀerentaccuracy measures in that they punish reports diverging from the true probability diﬀerently. Forexample, with true probability θ k , the quadratic scoring rule (Proposition 1) punishes a report y by E X k ∼ θ k (cid:2) R q ( θ k , X k ) − R q ( y, X k ) (cid:3) = (cid:0) y − θ k (cid:1) .Importantly, the choice of proper scoring rule has implications for the relative rank of forecasters.For example, let θ k = 0 . y ,k = 0 . y ,k = 0 .

51, respectively. Then,under the quadratic scoring rule, forecaster 2 obtains a higher expected score than forecaster 1(less punishment), whereas under the spherical scoring rule, forecaster 1 obtains a higher expected The spherical scoring rule (Jose 2009) is deﬁned as R s ( y, x ) := yx +(1 − y )(1 − x ) √ y +(1 − y ) . Forecaster 1 obtains an expected scoreof 0.73 and forecaster 2 obtains an expected score of only 0.71. Witkowski et al.:

Incentive-Compatible Forecasting Competitions score than forecaster 2. That is, the system designer’s choice of proper scoring rule in a forecastingcompetition determines the (relative) accuracy measure that forecasters are judged by. For theincentive-compatible mechanisms in this paper, the proper scoring rules need to be bounded. Inparticular, the accuracy measure implied by the unbounded logarithmic scoring rule (Good 1952)cannot be used. Note that this restriction to bounded scoring rules (such as the quadratic orspherical scoring rule) is also present outside of competition settings when forecasters are simplypaid their score as one cannot ensure non-negative payments for unbounded scoring rules. Hence, forthe remainder of the paper, the accuracy measure that is used will be given by a particular boundedproper scoring rule. The objective will be to select the forecaster with highest expected scoreaccording to that scoring rule while ensuring that the mechanism is strictly incentive compatibleeven in the competition setting. For this, it is helpful to overload notation of proper scoring rule R and deﬁne R ( y i , θ ) := E X ∼ θ m m X k =1 R ( y i,k , X k )as the expected score of report y i using R and given joint probability θ . This allows us to makestatements about the relative accuracy of forecasters with respect to R and θ . In particular, fore-caster i is more accurate than forecaster j on the m ≥ R ( y i , θ ) > R ( y j , θ ). We ﬁrst observe that ELF selects forecasters with higher accuracy more often than those withlower accuracy.

Definition 4.

Forecasting competition mechanism M is rank accurate with respect to properscoring rule R if and only if it holds that R ( y i , θ ) > R ( y j , θ ) ⇔ Pr X ∼ θ (cid:16) M (cid:0) y , . . . , y n , X (cid:1) = i (cid:17) > Pr X ∼ θ (cid:16) M (cid:0) y , . . . , y n , X (cid:1) = j (cid:17) for all joint distributions θ over X , . . . , X m , all y , . . . , y n ∈ [0 , m ,and all i, j ∈ [ n ].The next statement follows immediately from taking expectation over X in Equation 2. Proposition 2.

The probability that M ELF R selects forecaster i given joint probability θ is Pr X ∼ θ (cid:16) M ELF R (cid:0) y , . . . , y n , X (cid:1) = i (cid:17) = n + n (cid:16) R ( y i , θ ) − n − P j = i R ( y j , θ ) (cid:17) . Corollary 1. M ELF R is rank accurate with respect to R . In particular, it selects the mostaccurate forecaster with higher probability than any other forecaster. One may wonder if there exist incentive-compatible forecasting competition mechanisms thatselect the most accurate forecaster with higher probability than ELF. In Theorem 4 we rule outthis possibility for the standard two-forecaster, single-event setting (e.g., Lichtendahl and Winkler2007), subject to mild conditions on the form of the forecasting competition mechanism. itkowski et al.:

Incentive-Compatible Forecasting Competitions Definition 5.

Forecasting competition mechanism M is anonymous if the selected forecasterdoes not depend on the identities of the forecasters. That is, M is anonymous if for any permu-tation σ of [ n ], any forecaster i , any reports y , . . . , y n , and any outcome vector x , it holds thatPr (cid:0) M ( y , . . . , y n , x ) = i (cid:1) = Pr (cid:16) M ( y σ − (1) , . . . , y σ − ( n ) , x ) = σ ( i ) (cid:17) .In order to exploit existing characterization theorems of competitive scoring rules (Lambert et al.2008), we restrict attention to smooth forecasting competition mechanisms in Theorem 4. Definition 6.

A forecasting competition mechanism M is smooth if the corresponding functionthat outputs a probability distribution over forecasters, Pr (cid:0) M ( y , . . . , y n , x ) (cid:1) , is twice continuouslydiﬀerentiable with respect to each y i .Theorem 4 shows that if a strictly incentive-compatible mechanism M ever selects the moreaccurate forecaster from a single-event, two-forecaster competition with higher probability thanELF with normalized R , then M is not rank accurate with respect to R , i.e., there must existanother instance in which M selects the less accurate forecaster with higher probability than themore accurate one. Recall that we denote by ˜ R the proper scoring rule that results from normalizing R as described in Appendix A. Theorem 4.

Let M be a smooth and anonymous forecasting competition mechanism that isrank accurate with respect to R and for which there exist y , y ∈ [0 , and distribution θ such that R ( y , θ ) > R ( y , θ ) and Pr X ∼ θ (cid:0) M ( y , y , X ) = 1 (cid:1) > Pr X ∼ θ (cid:0) M ELF ˜ R ( y , y , X ) = 1 (cid:1) . Then M is not strictlyincentive compatible. Theorem 4 shows that we cannot do better than ELF for the standard single-event, two-forecastersetting in terms of maximizing the probability of selecting the most accurate forecaster. But whatif there is more than just a single event? Let ∆ := min j = i (cid:0) max i R ( y i , θ ) − R ( y j , θ ) (cid:1) denote thediﬀerence between the expected scores of the most accurate forecaster and the second-most accurateforecaster. Ideally, one would like to guarantee that for any “accuracy gap” ∆ and any probability π arbitrarily close to 1, there exists some minimal number of events after which it is guaranteedthat the forecasting competition mechanism selects the most accurate forecaster with probabilityat least π . This intuition is formally captured in the deﬁnition of limit accuracy . Definition 7.

Forecasting competition mechanism M is limit accurate with respect to properscoring rule R and set of joint distributions Θ if and only if, for any n , any ∆ >

0, and any π ∈ [0 , m ∈ N such that for all joint distributions θ ∈ Θ and all y , . . . , y n ∈ [0 , m with m ≥ m and ∆ > ∆, it holds that Pr X ∼ θ (cid:16) M (cid:0) y , . . . , y n , X (cid:1) = i (cid:17) > π. Proposition 3 shows that some restriction on θ is necessary as limit accuracy cannot be achievedfor all joint distributions. Witkowski et al.:

Incentive-Compatible Forecasting Competitions

Proposition 3.

No forecasting competition mechanism is limit accurate for all θ over X , . . . , X m . In the remainder of this section, we design a forecasting competition mechanism that is limitaccurate when the events are independent and strictly incentive compatible when this independenceis also reﬂected in the uncertainty about others’ reports. The restriction on forecasters’ beliefs isreferred to as belief independence . Definition 8.

For joint distribution D over outcomes X and reports Y − i , let D k be the corre-sponding joint distribution over outcome X k and reports Y − i,k . D is belief independent if and onlyif all D k for k ∈ [ m ] are independent.Note that belief independence is a fairly mild assumption. For example, forecaster i can stillbelieve that other forecasters are more accurate than herself and also that others’ reports are moreaccurate on some events than others. Definition 9.

Forecasting competition mechanism M is strictly incentive compatible underbelief independence if and only if for all forecasters i ∈ [ n ], all belief vectors p i , all belief indepen-dent joint distributions D over outcomes X and reports Y − i such that E X ∼ D (cid:2) X (cid:3) = p i , and allalternative report vectors y ′ i = p i ,Pr X , Y − i ∼ D (cid:0) M ( Y , . . . , p i , . . . , Y n , X ) = i (cid:1) > Pr X , Y − i ∼ D (cid:0) M ( Y , . . . , y ′ i , . . . , Y n , X ) = i (cid:1) . The Independent-Event Lotteries Forecasting Competition Mechanism (I-ELF) M I-ELF R ( y , . . . , y n , x ) is deﬁned as:1. For each event k , pick forecaster i to be the event winner w k ∈ [ n ] with probability f i,k ( y ,k , . . . , y n,k , x k ) = 1 n + 1 n (cid:16) R (cid:0) y i,k , x k (cid:1) − n − X j = i R (cid:0) y j,k , x k (cid:1)(cid:17) . where R ∈ [0 ,

1] if m = 1 and R ∈ [0 ,

1) if m ≥

2. Select the forecaster who won the most events, arg max i P mk =1 ( w k = i ), breaking ties uni-formly at random. Here denotes the 0/1 indicator function.In essence, I-ELF runs a single ELF lottery for each event and awards the prize to the forecasterwho won the most lotteries. Theorem 5. M I-ELF R is strictly incentive compatible under belief independence for m ≥ events. If used in conjunction with a normalized R for m ≥ M I-ELF R may fail to be strictly incentive compatible (it is stillweakly incentive compatible) when there exists an event for which a forecaster believes that she is a perfect forecasterreporting 100% for the eventually materialized outcome and every other forecaster is doing the opposite, i.e., reporting0% for the eventually materialized outcome. We do not expect this to be an issue in practical application. itkowski et al.: Incentive-Compatible Forecasting Competitions Finally, we show that I-ELF is limit accurate when events are independent.

Theorem 6. M I-ELF R is limit accurate for all R and all θ such that event outcomes X , . . . , X m are independent.

6. Discussion

In this section, we describe extensions to our model and discuss the practical implementation ofour methods.

So far, we have restricted our analysis to events with binary outcomes. In practice, we are alsointerested in events with non-binary (categorical) outcomes. Unsurprisingly, selecting the fore-caster with highest average proper score (e.g., using Brier’s (1950) categorical generalization of thequadratic scoring rule) inherits the violation of incentive compatibility exhibited in Section 3.ELF readily extends to categorical outcomes. The competitive scoring rule ofKilgour and Gerchak (2004) is incentive compatible for categorical outcomes when used in con-junction with any proper multi-outcome scoring rule, and ELF inherits this incentive compatibilityfor all such rules that are bounded. Under belief independence, incentive compatibility of I-ELFfollows from the same arguments used in the proof of Theorem 5. Moreover, it still holds thatmore accurate forecasters obtain higher scores in expectation, so the most accurate forecasterstill wins the most events in expectation. Hence, we can prove limit accuracy by a qualitativelyidentical argument to the one in the proof of Theorem 6.

In many business contexts, we are interested in forecasting events that take real-valued outcomesinstead of categorical values. For instance, events could be the monthly demand of particularitems, the cost of infrastructure projects, or the annual inﬂation rate. Both ELF and I-ELF readilyextend to handle these cases. In contrast to events with categorical outcomes, where one typicallyseeks to elicit the forecaster’s entire subjective probability distribution over the outcomes, this iscumbersome with inﬁnitely many outcomes on the real line. Instead, practitioners typically chooseto only elicit properties of the underlying distribution, such as the mean or the median, whichsummarize the underlying distribution in ways meaningful for the application at hand. There existmany proper scoring rules for the elicitation of these properties. For example, it is well known thatthe quadratic scoring rule R q ( y, x ) = 1 − ( y − x ) , which was introduced in Section 3, generalizesto real valued outcomes x ∈ [0 , X denotes the real-valuedoutcome, the forecaster maximizes her expected score by reporting y = E [ X ], i.e., her subjectiveestimate of the mean of X . Meanwhile, the absolute scoring rule R a ( y, x ) = 1 − | y − x | is strictly Witkowski et al.:

Incentive-Compatible Forecasting Competitions proper when used to elicit subjective estimates of the median of X (e.g., Jose 2017). Note that thesescoring rules can be scaled to incorporate any bounded interval [ a, b ] with b > a . Moreover, whileit is easy to obtain upper and lower bounds on the variable of interest for almost any conceivableapplication, tighter bounds yield better discrimination in score between more and less accuratereports.While the quadratic and absolute scoring rules are strictly proper when used as payments toelicit subjective estimates of the mean and median, respectively, misreporting remains an issuewhen they are naively applied to forecasting competitions. Consider random variable X commonlyknown to be uniformly distributed on [0 , n = 3 forecasters all report a subjective estimate ofthe mean, i.e., y i = 0 . i , and the forecaster with highest quadratic score is selected as theprize winner, then each forecaster wins with probability 1 / y = 0 . − ǫ for some small ǫ , then she achievesthe highest score whenever X < . − ǫ , which occurs with probability 0 . − ǫ > / The sameexample continues to break incentive compatibility when the absolute scoring rule is used to elicitestimates of the median.To overcome the issue of misreporting, we can deﬁne ELF and I-ELF as in Sections 4 and 5, justwith an appropriately chosen scoring rule R that is strictly proper for the property being elicited.Strict incentive compatibility of ELF and I-ELF (under the belief independence restriction) followsby reasoning analogous to the binary case. The accuracy guarantee provided by I-ELF carries overas well, with the accuracy implied by the scoring rule R used to deﬁne the mechanism. As for thebinary-outcome setting, both ELF and I-ELF work in conjunction with any bounded R . Observethat this is analogous to using proper scoring rules as payments, where R needs to be bounded toguarantee non-negative payments. In some practical applications, it may be more appropriate to output a ranking rather than a singleforecaster. For example, most play-money prediction markets maintain a ranking of contestants.Similarly, many Kaggle competitions award prizes to the highest-ranked forecasters with prizesdecreasing in value as the forecasters’ ranks increase. Ranking forecasters in order of any properscore again inherits all of the problems described in Section 3.I-ELF can be adapted to produce a ranking by simply ordering forecasters according to thenumber of events that the forecasters win. As long as forecasters strictly prefer higher positionsin the ranking (e.g., because higher rankings correspond to higher-valued prizes), I-ELF remains Observe that this misreport is somewhat diﬀerent from those in the categorical setting, where rational forecasterswill generally “extremize” their reports towards an outcome. In contrast, in the example above, a forecaster whounilaterally deviates to reporting an extreme value of 0 or 1 would only be selected with probability 1 / itkowski et al.: Incentive-Compatible Forecasting Competitions strictly incentive compatible, since forecasters maximize their probability of winning an event(and potentially moving up in the ranking) by reporting truthfully. Moreover, the same style ofaccuracy results from Section 5.3 hold, at least qualitatively, when the objective is to maximizethe probability of outputting the correct ranking. In expectation, more accurate forecasters achievehigher proper scores, leading to higher expected values of f i,k . Thus, more accurate forecasters winmore events in the long run, and the true ranking is faithfully revealed. Forecasting competitions are often used as a method of selecting a forecaster to hire when futurepredictions are needed. In this setting, the goal of the competition mechanism is to select theforecaster who will be (approximately) the most accurate on future events. There is an implicitassumption here that good performance on the observed events translates into good performancein the future, a well-established fact in practice (e.g., Mellers et al. 2014).Our methods and results can be extended to this setting. Instead of determining accuracy throughthe m events being predicted, we could instead assume a joint distribution D θ over event proba-bilities θ and the beliefs p i of each forecaster i . We could then deﬁne the accuracy of forecaster i in terms of the expected quadratic score of her forecasts with respect to D θ .Under this model, mechanism M PSR R discussed in Section 3 can be viewed as performing ananalog of empirical risk minimization. Similar to how basic empirical risk minimization boundsare proved for PAC learning (Kearns and Vazirani 1994), we could then argue that, with highprobability, the forecaster with the highest accuracy on any observed sample of events has expectedaccuracy close to that of the best forecaster in the set. Therefore, as the number of events growslarge, the forecaster selected by M PSR R would be guaranteed to have accuracy arbitrarily closeto that of the most accurate forecaster. However, the incentive issues remain. The advantage ofI-ELF is that it obtains truthful reports for any m while achieving similar accuracy guarantees as m grows large. In this sense, I-ELF can be viewed as a mechanism for learning in the presence ofstrategic agents. In Section 2, we require that all forecasters report their predictions for all events before the ﬁrstevent materializes. With an appropriate generalization of the deﬁnition of incentive compatibility,this requirement can be relaxed without sacriﬁcing the properties of ELF. In particular, whenreporting on event k , we can allow forecasters to update joint distribution D conditioned on theoutcomes of past events and the reports on these events. Our results continue to hold if incentivecompatibility requires that forecasters truthfully report their updated beliefs. Witkowski et al.:

Incentive-Compatible Forecasting Competitions

For I-ELF, suppose that a forecaster reports on event k after some subset of the other events havematerialized. Given belief independence, the reports of other forecasters on any other event, as wellas the corresponding outcomes for any events already materialized, do not lead to a belief update.Therefore, the competition organizer does not need to protect or withhold any information fromthe forecasters as long as the randomness involved in selecting event winners w k from probabilities f i,k is not realized until all predictions have been reported.Note that both ELF and I-ELF are easy to implement. Indeed, even for very large competitions,both mechanisms can be implemented in standard spreadsheet software. Each value f i,k is computedby a simple formula, after which the only remaining step is to implement 1 or m lotteries for ELFand I-ELF, respectively.

7. Conclusion

In real-world forecasting settings, forecasters typically compete for a single prize. Motivated bythe prevalence of these forecasting competitions and their poor incentive properties, we initiatethe study of incentive-compatible forecasting competitions. Despite a rich literature on incentive-compatible forecast elicitation in the non-competitive setting, the mechanisms in this work are theﬁrst to solve the incentive challenge in the competition setting. The forecasting competition mech-anism most widely used in practice is to simply select the forecaster with highest score according tosome proper scoring rule. Not only does this particular mechanism fail to elicit truthful forecasts,but, as we show, any deterministic forecasting competition mechanism must violate incentive com-patibility. We therefore turn to randomized forecasting competitions, which can be thought of asrewarding forecasters with a lottery ticket that has a higher chance of winning the more accuratethe forecaster was relative to the other forecasters in the competition. This intuitive principle isbehind both mechanisms we design.We ﬁrst deﬁne the Event Lotteries Forecasting Competition Mechanism (ELF), which incen-tivizes truthful reports for arbitrary beliefs on behalf of the forecasters. Due to its randomizednature, ELF may not always select the most accurate forecaster, but it does select more accurateforecasters with higher probability than less accurate ones. For the special case of one event andtwo forecasters, we show that, under mild technical conditions, no incentive-compatible mechanismcan select the most accurate forecaster with higher probability than ELF does.Our second mechanism, I-ELF, is strictly incentive compatible when forecasters’ beliefs satisfybelief independence, which, intuitively, requires that information about one event does not informforecasters’ beliefs about other events. I-ELF uses ELF as a building block, ﬁrst selecting a winnerfor each event using ELF, and then selecting the competition winner as the forecaster who wonthe most individual events. In addition to being incentive compatible under belief independence, itkowski et al.:

Incentive-Compatible Forecasting Competitions I-ELF also selects the most accurate forecaster with a probability that tends to 1 as the numberof events grows.Our results have signiﬁcant implications for organizations that employ groups of forecasters toinform managerial decision making under uncertainty. Previous studies on forecasters’ competitiveincentives encouraged the fostering of collaboration and cooperation to mitigate the distortedincentives at play (Lichtendahl and Winkler 2007). Our work yields a diﬀerent perspective. Bycleverly exploiting randomization, the decision maker can embrace competitive stakes when elicitingpredictions without having to sacriﬁce the quality of the information received.

Appendix

A. Procedure to Normalize a Proper Scoring Rule

Let R be a bounded proper scoring rule with R = min y,x R ( y, x ) and R = max y,x R ( y, x ) for y ∈ [0 , , x ∈{ , } . Then R can be transformed into a normalized proper scoring rule ˜ R as follows. As an intermediate step,deﬁne R ′ ( y, x ) = R ( y, x ) + β ′ ( x ) with β ′ (0) = − R (0 ,

0) and β ′ (1) = − R (1 , R is strictly proper, so is R ′ , and both the maximum and the minimum must be taken for y ∈ { , } . In particular, it must hold thatboth 0 = R ′ (0 , > R ′ (1 ,

0) and 0 = R ′ (1 , > R ′ (0 , r := R ′ (0 , − R ′ (1 ,

0) and r := R ′ (1 , − R ′ (0 , R ′ for X = 0 and X = 1, respectively. Then ˜ R ( y, x ) := r ,r ) R ′ ( y, x ) + 1 isa normalized scoring rule. B. Proper Scoring Rule Selection Violates Incentive Compatibility

Let R be any strictly proper scoring rule. Consider an instance with m ≥

1, and n ≥

2. Suppose that p i =(0 . , . . . , . , . and consider joint distribution D over X and Y − i deﬁned as follows. • With probability 0.4, X = (0 , . . . , ,

1) and Y j = (0 . , . . . , . , . j n ) for all j = i . • With probability 0.4, X = (1 , . . . , ,

1) and Y j = (0 . , . . . , . , . j n ) for all j = i . • With probability 0.1, X = (0 , . . . , ,

0) and Y j = (0 . , . . . , . , . j n ) for all j = i . • With probability 0.1, X = (1 , . . . , ,

0) and Y j = (0 . , . . . , . , . j n ) for all j = i .Note in particular that E X ∼ D (cid:2) X (cid:3) = p i , and that 0 . < Y j,m ≤ . j = i If forecaster i reports p i , then all forecasters receive the same score on all events except event m . Forecaster i receives the highest score, and is therefore selected by M PSR R , whenever X m = 0, which occurs withprobability 0.2. That is, Pr X , Y − i ∼ D (cid:0) M PSR R ( Y , . . . , p i , . . . , Y n , X ) = i (cid:1) = 0 . y ′ i = (0 . , . . . , . , M PSR R whenever X m =1, which occurs with probability 0.8. That is, Pr X , Y − i ∼ D (cid:0) M PSR R ( Y , . . . , y ′ i , . . . , Y n , X ) = i (cid:1) = 0 . > . X , Y − i ∼ D (cid:0) M PSR R ( Y , . . . , p i , . . . , Y n , X ) = i (cid:1) , violating incentive compatibility. We instantiate a particular p i , but the example is not sensitive to this choice. Witkowski et al.:

Incentive-Compatible Forecasting Competitions

C. Proof of Theorem 1

Let M be a deterministic and strictly incentive compatible forecasting competition mechanism. Further, let m ≥ n ≥

2, and observe that there are |P ([ m ]) | = 2 m possible values of the outcome vector x . Considerforecaster i , and suppose that every forecaster j = i reports a probability y j,k = 0 . k . We ﬁrstuse these ﬁxed reports of agents j = i to derive candidate misreports for agent i , and then again to deﬁnean appropriate joint distribution D that yields a violation of strict incentive compatibility.For any report y i , forecaster i is selected as the winner for some subset of possible event outcomes X ⊆{ , } m . Note that—since there are 2 m possible values of x —there are |P ( { , } m ) | = 2 m possible subsets X .Consider then 2 m + 1 diﬀerent possible reports of forecaster i , denoted y i , y i , . . . , y m i , and the corre-sponding subsets X , X , . . . , X m of event outcomes for which she is selected given these reports. By thepigeonhole principle there must exist r, s ∈ { , . . . , m } with r = s such that X r = X s . That is, forecaster i is selected for exactly the same set of possible event outcomes regardless of whether she reports y ri or y si .We use this fact to illustrate a violation of strict incentive compatibility. Deﬁne D as follows: each event k occurs with probability equal to y ri,k independent of other events, and every forecaster j = i reports aprobability of 0.5 for every event. Note that p i = y ri . Then we have that Pr X , Y − i ∼ D (cid:0) M ( Y , . . . , p i , . . . , Y n , X ) = i (cid:1) = Pr X ∼ D (cid:0) X ∈ X r (cid:1) = Pr X ∼ D (cid:0) X ∈ X s (cid:1) = Pr X , Y − i ∼ D (cid:0) M ( Y , . . . , y si , . . . , Y n , X ) = i (cid:1) , violating strict incentivecompatibility. (cid:3) D. Multiplicatively Normalizing Scores From Proper Scoring Rules ViolatesTruthfulness

Let n = 2, m = 1, and suppose p = 0 .

5. Let distribution D over X and Y be deﬁned as follows.With probability 0.5, Y = 1 and X = 0, and with probability 0.5, Y = 1 and X = 1. Observe that E X ∼ D (cid:2) X (cid:3) = p . If forecaster 1 reports p , then she is selected with probability R q (0 . , / (cid:0) R q (0 . ,

1) + R q (1 , (cid:1) = 0 . / .

75 = 3 / X = 1, and R q (0 . , / (cid:0) R q (0 . ,

0) + R q (1 , (cid:1) = 1 when X = 0. That is,Pr X,Y ∼ D (cid:0) M ( p , Y , X ) = 1 (cid:1) = 5 / ≈ .

71. If forecaster 1 instead reports y ′ = 0 .

8, then she is selected withprobability R q (0 . , / (cid:0) R q (0 . ,

1) + R q (1 , (cid:1) = 0 . / .

96 = 24 /

49 when X = 1, and R q (0 . , / (cid:0) R q (0 . ,

0) + R q (1 , (cid:1) = 1 when X = 0. Her probability of being selected has increased to Pr X,Y ∼ D (cid:0) M ( y ′ , Y , X ) = 1 (cid:1) =73 / ≈ .

74, violating truthfulness.

E. Proof of Theorem 2

To show strict truthfulness of M ELF R for m = 1, we show that reporting y i = p i maximizes forecaster i ’sprobability of being selected for any joint distribution over outcomes X and reports Y − i :arg max y i Pr X,Y − i ∼ D (cid:0) M ELF R ( Y , . . . , y i , . . . , Y n , X ) = i (cid:1) = arg max y i E X,Y − i ∼ D [ f i ( Y , . . . , y i , . . . , Y n , X )]= arg max y i E X,Y − i ∼ D " n + 1 n (cid:18) R (cid:0) y i , X (cid:1) − n − X j = i R (cid:0) Y j , X (cid:1)(cid:19) = arg max y i E X,Y − i ∼ D (cid:2) R (cid:0) y i , X (cid:1)(cid:3) = p i The last line follows from linearity of expectation and from R being a strictly proper scoring rule. (cid:3) itkowski et al.: Incentive-Compatible Forecasting Competitions F. Proof of Theorem 3 arg max y i Pr X , Y − i ∼ D (cid:0) M ELF R ( Y , . . . , y i , . . . , Y n , X ) = i (cid:1) = arg max y i E X , Y − i ∼ D [ g i ( Y , . . . , y i , . . . , Y n , X )]= arg max y i E X , Y − i ∼ D " m m X k =1 n + 1 n (cid:18) R (cid:0) y i,k , X k (cid:1) − n − X j = i R (cid:0) Y j,k , X k (cid:1)(cid:19)! = arg max y i E X , Y − i ∼ D " m X k =1 (cid:18) R (cid:0) y i,k , X k (cid:1) − n − X j = i R (cid:0) Y j,k , X k (cid:1)(cid:19) = arg max y i E X , Y − i ∼ D " m X k =1 R (cid:0) y i,k , X k (cid:1) = p i (cid:3) G. Proof of Proposition 2

The statement follows directly from the deﬁnition of M ELF R .Pr X ∼ θ (cid:0) M ELF R (cid:0) y , . . . , y n , X (cid:1) = i (cid:1) = E X ∼ θ " m m X k =1 n + 1 n (cid:18) R (cid:0) y i,k , X k (cid:1) − n − X j = i R (cid:0) y j,k , X k (cid:1)(cid:19)! = 1 n + 1 n (cid:18) E X ∼ θ m m X k =1 (cid:20) R (cid:0) y i,k , X k (cid:1)(cid:21) − E X ∼ θ m m X k =1 (cid:20) n − X j = i R (cid:0) y j,k , X k (cid:1)(cid:21)(cid:19) = 1 n + 1 n (cid:18) R ( y i , θ ) − n − X j = i R ( y j , θ ) (cid:19) H. Proof of Theorem 4

Our proof of Theorem 4 proceeds in two parts. In the ﬁrst part, we exploit the connection between wageringmechanisms and forecasting competition mechanisms to narrow down the particular form that any smooth,anonymous, strictly truthful forecasting competition mechanism must take. This form is parameterized bythe choice of proper scoring rule R . In the second part of the proof, we show that using any normalizedproper scoring rule diﬀerent from the one used to deﬁne accuracy must violate rank accuracy. Since we areconsidering only a single event X , for this proof we will slightly abuse notation and use θ to denote a singleprobability rather than a joint distribution. Part 1.

We begin by formally introducing wagering mechanisms. A wagering mechanism Π = (Π i ) i ∈ [ n ] isa set of functions Π i , each of which takes as input the forecasters’ reports y = ( y , . . . , y n ) ∈ [0 , n , a vectorof wagers ω = ( ω , . . . , ω n ) ∈ R n ≥ , and the event outcome x ∈ { , } , and outputs a payment to forecaster i , Π i ( y , ω , x ) ≥

0. For our analysis, it will be suﬃcient to restrict ourselves to wagering mechanisms thatonly accept the vector of wagers ω = (1 /n, . . . , /n ). We refer to the resulting mechanisms as equal-wagerwagering mechanisms , and denote the payments Π i ( y , . . . , y n , x ), omitting the (non-)dependence on ω .The following deﬁnitions are standard in the wagering mechanism literature. Definition 10.

An equal-wager wagering mechanism Π is budget balanced if, for all reports y , . . . , y n ∈ [0 ,

1] and outcomes x ∈ { , } , it holds that P ni =1 Π i ( y , . . . , y n , x ) = 1. That is, the sum of payments fromthe mechanism equals the sum of agents’ wagers. We note that equal-wager wagering mechanisms can be equivalently expressed as

Competitive ScoringRules (Kilgour and Gerchak 2004). Witkowski et al.:

Incentive-Compatible Forecasting Competitions

Definition 11.

An equal-wager wagering mechanism Π is strictly incentive compatible under immutablebeliefs if, for all p i , all reports y i = p i , and all y j ∈ [0 ,

1] for j = i , it holds that E X ∼ p i Π i ( y , . . . , y i , . . . , y n , X ) < E X ∼ p i Π i ( y , . . . , p i , . . . , y n , X ). That is, truthfully reporting their subjective probability maximizes a fore-caster’s expected payment, given the reports of the other forecasters. Definition 12.

An equal-wager wagering mechanism Π is normal if, for all probabilities θ ∈ [0 , y , . . . , y n ∈ [0 ,

1] and all y ′ i ∈ [0 , E X ∼ θ Π i ( y , . . . , y i , . . . , y n , X ) < E X ∼ θ Π i ( y , . . . , y ′ i , . . . , y n , X )then E X ∼ θ Π j ( y , . . . , y i , . . . , y n , X ) ≥ E X ∼ θ Π j ( y , . . . , y ′ i , . . . , y n , X ) for all j = i . That is, if a forecaster i changes her report yielding a change ǫ i in her expected payment, the change in expected payments of allother forecasters ǫ j is null or has the opposite sign of ǫ i . Definition 13.

An equal-wager wagering mechanism Π is anonymous if for any permutation σ of [ n ],any forecaster i , and any outcome x , it holds that Π i ( y , . . . , y n , x ) = Π σ ( i ) ( y σ − (1) , . . . , y σ − ( n ) , x ). That is,the payouts do not depend on the identities of the agents.It will be useful to deﬁne smoothness for wagering mechanisms and proper scoring rules. Definition 14.

An equal-wager wagering mechanism is smooth if, for all i ∈ [ n ], Π i is twice continu-ously diﬀerentiable with respect to each report y j , j ∈ [ n ]. A proper scoring rule R is smooth if it is twicecontinuously diﬀerentiable with respect to the report y .Our ﬁrst lemma provides a formal relationship between budget-balanced equal-wager wagering mechanismsand forecasting competition mechanisms. Definition 15.

Given a forecasting competition mechanism M , deﬁne the corresponding equal-wagerwagering mechanism by Π Mi ( y , . . . , y n , x ) = Pr( M ( y , . . . , y n , x ) = i ) ≥ i ∈ [ n ]. Lemma 1.

If a forecasting competition mechanism M is strictly incentive compatible, anonymous, andsmooth, then the corresponding equal-wager wagering mechanism Π M is budget-balanced, strictly incentivecompatible for immutable beliefs, anonymous, and smooth.Proof. Consider a strictly incentive compatible and anonymous forecasting competition mechanism M and the corresponding equal-wager wagering mechanism Π M .For budget balance, note that P ni =1 Π Mi ( y , . . . , y n , x ) = P ni =1 Pr( M ( y , . . . , y n , x ) = i ) = 1, where the latterequality follows from the fact that M outputs a probability distribution over forecasters.For anonymity, we have Π Mi ( y , . . . , y n , x ) = Pr( M ( y , . . . , y n , x ) = i ) = Pr( M ( y σ − (1) , . . . , y σ − ( n ) , x ) = σ ( i )) = Π Mσ ( i ) ( y σ − (1) , . . . , y σ − ( n ) , x ).For strict incentive compatibility under immutable beliefs, for any p i , reports y i = p i , and any y j ∈ [0 , j = i , we have E X ∼ p i Π Mi ( y , . . . , y i , . . . , y n , X ) = Pr X ∼ p i ( M ( y , . . . , y i , . . . , y n , X ) = i ) < Pr X ∼ p i ( M ( y , . . . , p i , . . . , y n , X ) = i )= E X ∼ p i Π Mi ( y , . . . , p i , . . . , y n , X )Where the inequality follows from strict incentive compatibility of M , taking joint distribution D to be suchthat Y j = y j with probability 1, and E X ∼ D [ X ] = p i .Finally, smoothness of Π M follows directly from smoothness of M and Deﬁnition 15. (cid:3) itkowski et al.: Incentive-Compatible Forecasting Competitions

Lemma 2 (Lemma 4, Lambert et al. (2008) (restated)) . For any n ≥ , if a smooth equal-wagerwagering mechanism Π is budget balanced, strictly incentive compatible for immutable beliefs, anonymousand normal then there exists a smooth strictly proper scoring rule R such that Π i ( y , . . . , y n , x ) = 1 n + R ( y i , x ) − n − X j = i R ( y j , x ) . (3)The following lemma incorporates two observations about Lemma 2. First, R must be bounded to guaranteenon-negative payouts as required by the deﬁnition of a wagering mechanism. Second, when restricted to n = 2, normality is implied by budget balance. Lemma 3.

For n = 2 , if an equal-wager wagering mechanism is budget balanced, strictly incentive com-patible for immutable beliefs, anonymous, and smooth, then there exists a smooth strictly proper scoring rule R ∈ [0 , such that Π i ( y , y , x ) = 12 + 12 ( R ( y i , x ) − R ( y − i , x )) . (4) Proof.

When n = 2, budget balance implies that Π ( y , y , x ) = 1 − Π ( y , y , x ) for all y , y ∈ [0 ,

1] and all x ∈ { , } . Taking the expectation over possible outcomes yields E X ∼ θ Π ( y , y , x ) = 1 − E X ∼ θ Π ( y , y , x ).In particular, any change in the expected payment to forecaster i is exactly oﬀset by the change in expectedpayment to forecaster 3 − i . Therefore, normality is implied by budget balance.Boundedness of R follows from Lemma 2 and the deﬁnition of a wagering mechanism. By the constraintthat 0 ≤ Π i ( y , y , x ) ≤

1, where Π i is deﬁned as in Lemma 2, it must be the case that | R ( y i , x ) − R ( y − i , x ) | ≤ . y , y , x . We can therefore deﬁne R ′ ( y, x ) by R ′ ( y, x ) = 2( R ( y, x ) + β ( x )), where β (0) = − R (1 , β (1) = − R (0 , R ′ is derived from R by a positive aﬃne transformation, and therefore inherits strictproperness from R . Note that the minimum value of R ′ is R ′ (0 ,

1) = R ′ (1 ,

0) = 0 and the maximum value iseither R ′ (0 ,

0) = 2( R (0 , − R (1 , ≤ R ′ (1 ,

1) = 2( R (1 , − R (0 , ≤

1, and so R ′ is bounded in [0 , R ′ into Equation 4 yields exactly Equation 3. (cid:3) We can now characterize the form that any strictly incentive-compatible, anonymous, and smooth fore-casting competition mechanism must have.

Lemma 4.

For n = 2 , if a forecasting competition mechanism M is strictly incentive compatible, anony-mous, and smooth, then there exists a smooth strictly proper scoring rule R ( y, x ) ∈ [0 , such that for all i ∈ { , } Pr( M ( y , y , x ) = i ) = 12 + 12 (cid:0) R ( y i , x ) − R ( y − i , x ) (cid:1) . Proof.

Let M be a strictly incentive compatible, anonymous, and smooth forecasting competition mech-anism. Then, by Lemma 1, the corresponding equal-wager wagering mechanism Π M is budget balanced, Lambert et al. restrict attention to smooth wagering mechanisms, so this condition does not explicitly appear intheir lemma statement. Witkowski et al.:

Incentive-Compatible Forecasting Competitions strictly incentive compatible for immutable beliefs, anonymous, and smooth. Therefore, by Lemma 3, theremust exist a smooth strictly proper scoring rule R ∈ [0 ,

1] such that for all i ∈ { , } Π Mi ( y , y , x ) = 12 + 12 ( R ( y i , x ) − R ( y − i , x )) . By Deﬁnition 15, this implies that for all i ∈ { , } Pr( M ( y , y , x ) = i ) = 12 + 12 (cid:0) R ( y i , x ) − R ( y − i , x ) (cid:1) , which is the desired result. (cid:3) We have now established the form that any strictly incentive compatible, smooth, and anonymous fore-casting competition mechanism M must have for n = 2. In particular, M is equivalent to M ELF R for somesmooth, bounded proper scoring rule R . Further, we show that R can always be represented by a diﬀerentiableconvex function G . Lemma 5.

Let R be a smooth strictly proper scoring rule. There exists a strictly convex, diﬀerentiablefunction G : [0 , → R with R ( y, θ ) = G ( y ) + dG ( y ) · ( θ − y ) , where θ ∈ [0 , and dG ( y ) is the derivative of G at y . Furthermore, G ( y ) is the expected score for reporting y = θ . Every R deﬁnes a unique G and every G deﬁnes a unique R .Proof. It is well known that every strictly proper scoring rule can be expressed as R ( y, θ ) = G ( y ) + dG ( y ) · ( θ − y ) for some strictly convex function G , where dG ( y ) is a subgradient of G at y (McCarthy 1956, Savage1971, Schervish et al. 1989, Gneiting and Raftery 2007). Observe that setting y = θ yields expected score G ( y ), and it immediately follows that every R deﬁnes a unique G .Let R be smooth (and, in particular, continuous). Suppose for the sake of contradiction that the convexfunction G associated with R is not diﬀerentiable at some y ′ ∈ [0 , G at y ′ ( d − G ( y ′ ) and d + G ( y ′ ) respectively) are not equal. Note that convexity implies that d − G ( y ′ ) ≤ d + G ( y ′ ),so the fact that the left and right derivatives are not equal yields d − G ( y ′ ) < d + G ( y ′ ). We therefore havelim ǫ → + R ( y ′ − ǫ,

1) = G ( y ′ ) + d − G ( y ′ ) · (1 − y ′ ) < G ( y ′ ) + d + G ( y ′ ) · (1 − y ′ ) = lim ǫ → + R ( y ′ + ǫ, R at y ′ for θ = 1, a contradiction to smoothness of R . Further, note that diﬀerentiability of G implies a unique scoring rule R . (cid:3) Part 2:

The remainder of the proof is devoted to comparing the behavior of M ELF R for diﬀerent choicesof smooth proper scoring rule R . We will require the notion of equivalent scoring rules. A proper scoringrule R is equivalent to another proper scoring rule R ′ if R can be obtained from R ′ by a positive aﬃnetransformation. Definition 16.

Proper scoring rules R and R ′ are equivalent if and only if R ′ ( y, x ) = αR ( y, x ) + β ( x ) forsome α > β ( x ) ∈ R for x ∈ { , } .This deﬁnition partitions the space of proper scoring rules into equivalence classes. It will be useful todeﬁne the canonical form of a scoring rule R as a convenient representative of each class. In particular, thecanonical form ensures that every perfect forecast of a sure event obtains a score of 1 and that the minimumexpected score of a perfect forecast is 0. itkowski et al.: Incentive-Compatible Forecasting Competitions Definition 17.

Let R and R ′ be strictly proper scoring rules. We say that R ′ is the canonical form of R if R ′ and R are equivalent, and R ′ (0 ,

0) = R ′ (1 ,

1) = 1 as well as min θ R ′ ( θ, θ ) = 0 for some θ ∈ (0 , Lemma 6.

For any smooth proper scoring rule R , there exists a canonical form R ′ .Proof. It is suﬃcient to show that any strictly proper scoring rule R can be brought into canonical formthrough one particular positive aﬃne transformation.To transform any proper scoring rule R into its canonical form, we ﬁrst deﬁne linear function f ( x ) for x ∈ { , } such that, when added to R ( y, x ), every perfect forecast of a sure event obtains a score of 0.That is, f (0) := − R (0 ,

0) and f (1) := − R (1 , R ( y, x ) + f ( x ) by α := − min θ E X ∼ θ [ R ( θ,X )+ f ( X )] such that its minimum expected score of a perfect forecast is -1. Note that α > θ E X ∼ θ [ R ( θ, X ) + f ( X )] < R (0 ,

0) + f (0) = 0 and R (1 ,

1) + f (1) = 0 by design of f ( x ) andbecause of strict convexity of the expected score function. Finally, we add a constant 1 to R , resulting in α (cid:0) R ( y, x ) + f ( x ) (cid:1) + 1. (cid:3) It immediately follows from Deﬁnition 17 and Lemma 6 that if two proper scoring rules have the samecanonical form, then they are equivalent. In order to prove our key result, we require a technical lemma.

Lemma 7.

Let f, g : [0 , → R be diﬀerentiable, strictly convex functions. Additionally, suppose that f isstrictly decreasing, f (0) = g (0) = 1 and that there exists a ¯ t ∈ [0 , for which f (¯ t ) < g (¯ t ) . Then there mustexist a t ′ ∈ [0 , ¯ t ] for which f ( t ′ ) < g ( t ′ ) and d ( f ( t ′ )) < d ( g ( t ′ )) .Proof. Let t ∗ = sup { t ∈ [0 , ¯ t ] : f ( t ) ≥ g ( t ) } . We are guaranteed that t ∗ is well-deﬁned because f (0) = g (0)so we are taking a supremum over a non-empty set. Further, it is easy to see that f ( t ∗ ) = g ( t ∗ ) and that f ( t ) < g ( t ) for all t ∈ [ t ∗ , ¯ t ]. Suppose for contradiction that d ( f ( t )) ≥ d ( g ( t )) for all t ∈ ( t ∗ , ¯ t ]. This wouldimply that f (¯ t ) ≥ g (¯ t ), contradicting the assumption of the lemma. Therefore, there must exist a t ′ ∈ ( t ∗ , ¯ t ]with d ( f ( t ′ )) < d ( g ( t ′ )). (cid:3) Finally, we show that two smooth proper scoring rules R and R ′ are equivalent if and only if they alwaysagree on the relative accuracy of forecasters. Lemma 8.

Smooth proper scoring rules R and R ′ are equivalent if and only if R ′ ( y , θ ) > R ′ ( y , θ ) ⇔ R ( y , θ ) > R ( y , θ ) for all y , y , θ ∈ [0 , .Proof. We ﬁrst prove the forward direction. Suppose that R and R ′ are equivalent, i.e., R ′ ( y, x ) = αR ( y, x ) + β ( x ) for some α > β ( x ) ∈ R . Then, R ′ ( y , θ ) > R ′ ( y , θ ) ⇔ E X ∼ θ (cid:2) αR ( y , X ) + β ( x ) (cid:3) > E X ∼ θ (cid:2) αR ( y , X ) + β ( X ) (cid:3) ⇔ E X ∼ θ (cid:2) αR ( y , X ) (cid:3) + E X ∼ θ (cid:2) β ( X ) (cid:3) > E X ∼ θ (cid:2) αR ( y , X ) (cid:3) + E X ∼ θ (cid:2) β ( X ) (cid:3) ⇔ E X ∼ θ (cid:2) αR ( y , X ) (cid:3) > E X ∼ θ (cid:2) αR ( y , X ) (cid:3) ⇔ R ( y , θ ) > R ( y , θ ).For the backward direction, suppose that R and R ′ are not equivalent. Assume that R and R ′ are in theirrespective canonical forms (if not, we can convert them to canonical form without changing the way theyrank forecasters). Note that smoothness of R and R ′ implies the existence of associated diﬀerentiable convexfunctions G and G ′ , as per Lemma 5. Since R and R ′ are in canonical form, min θ G ( θ ) = min θ G ′ ( θ ) = 0, and G (0) = G (1) = G ′ (0) = G ′ (1) = 1. Further, since R and R ′ are not equivalent, we know that G = G ′ . We treattwo cases.4 Witkowski et al.:

Incentive-Compatible Forecasting Competitions

Case 1: Suppose that arg min θ G ( θ ) = arg min θ G ′ ( θ ). However, because G = G ′ , there must exist a y at which G ( y ) = G ′ ( y ). Without loss of generality, suppose G ( y ) < G ′ ( y ). For mathematical convenience,suppose that y < arg min θ G ( θ ); the case in which y > arg min θ G ( θ ) follows similarly.By Lemma 7, taking f = G and g = G ′ , there must exist a point y < y for which 0 < G ( y ) < G ′ ( y ) and d ( G ( y )) < d ( G ′ ( y )) <

0. Set y = arg min θ G ( θ ) equal to the point at which G ( y ) = G ′ ( y ) = 0. Since G and G ′ are both diﬀerentiable, d ( G ( y )) = d ( G ′ ( y )) = 0. Finally, set θ so that R ( y , θ ) = 0. That is, G ( y ) + d ( G ( y ))( θ − y ) = 0 . Note that, since G ( y ) > d ( G ( y )) <

0, we have θ > y . Then, R ( y , θ ) = G ( y ) + d ( G ( y )) · ( θ − y )= 0= G ( y ) + d ( G ( y )) · ( θ − y )= R ( y , θ )But, R ′ ( y , θ ) = G ′ ( y ) + d ( G ′ ( y )) · ( θ − y ) > G ( y ) + d ( G ( y )) · ( θ − y )= 0= G ′ ( y ) + d ( G ′ ( y )) · ( θ − y )= R ′ ( y , θ ) , so that forecasters 1 and 2 obtain the same expected score according to R , but forecaster 1 obtains higherexpected score according to R ′ . In particular, R and R ′ disagree on the relative accuracy.Case 2: Suppose that, without loss of generality, θ min := arg min θ G ( θ ) < arg min θ G ′ ( θ ) := θ ′ min . In par-ticular, G ( θ min ) = 0 < G ′ ( θ min ), and G ( θ ′ min ) > G ′ ( θ ′ min ). By Lemma 7, there must exist a y < θ min for which G ( y ) < G ′ ( y ) and d ( G ( y )) < d ( G ′ ( y )) <

0. Similarly, there must exist a y > θ ′ min for which G ( y ) > G ′ ( y ) and 0 < d ( G ( y )) < d ( G ′ ( y )). Let θ be such that R gives the same expected score to bothreports. That is, R ( y , θ ) = G ( y ) + d ( G ( y )) · ( θ − y ) = G ( y ) + d ( G ( y )) · ( θ − y ) = R ( y , θ ) . Note that, by strict convexity of G , it needs to hold that θ ∈ ( y , y ). For R ′ we have R ′ ( y , θ ) = G ′ ( y ) + d ( G ′ ( y )) · ( θ − y ) > G ( y ) + d ( G ( y )) · ( θ − y )= G ( y ) + d ( G ( y )) · ( θ − y ) > G ′ ( y ) + d ( G ′ ( y )) · ( θ − y )= R ′ ( y , θ ) , itkowski et al.: Incentive-Compatible Forecasting Competitions θ ∈ ( y , y ), and thesecond equality follows from the deﬁnition of θ . Again, forecasters 1 and 2 obtain the same expected scoreaccording to R , but forecaster 1 obtains higher expected score according to R ′ . This completes the backwarddirection. (cid:3) We can now complete the proof of Theorem 4.

Proof of Theorem 4

By Lemma 4 and Lemma 5, any smooth, anonymous, strictly incentive-compatibleforecasting competition mechanism M must take the form of M ELF R ′ for some smooth, bounded properscoring rule R ′ ∈ [0 ,

1] with associated diﬀerentiable convex function G ′ . We complete the proof by showingthat every forecasting competition mechanism of this form either fails to be rank accurate with respect to R , or has Pr X ∼ θ (cid:0) M ( y , y , X ) = 1 (cid:1) ≤ Pr X ∼ θ (cid:0) M ELF ˜ R ( y , y , X ) = 1 (cid:1) for every y , y , θ ∈ [0 ,

1] for which R ( y , θ ) >R ( y , θ ).If R ′ is not equivalent to R , then M ELF R ′ is not rank accurate with respect to R by Corollary 1 andLemma 8. If R ′ is equivalent to R then we have that R ′ ( y, x ) = αR ( y, x ) + β ( x ). We also know that ˜ R ( y, x ) =˜ αR ( y, x ) + ˜ β ( x ), where ˜ α ≥ α (if ˜ α < α then R ′ is not bounded in [0 , y , y , θ ∈ [0 ,

1] such that R ( y , θ ) > R ( y , θ ). ThenPr X ∼ θ (cid:0) M ELF ˜ R ( y , y , X ) = 1 (cid:1) = 12 + 12 ( ˜ R ( y , θ ) − ˜ R ( y , θ ))= 12 + 12 (˜ αR ( y , θ ) + E X ∼ θ [ ˜ β ( X )] − ˜ αR ( y , θ ) − E X ∼ θ [ ˜ β ( X )])= 12 + 12 (˜ αR ( y , θ ) − ˜ αR ( y , θ ) ≥

12 + 12 ( αR ( y , θ ) − αR ( y , θ )= 12 + 12 ( αR ( y , θ ) + E X ∼ θ [ β ( X )] − αR ( y , θ ) − E X ∼ θ [ β ( X )])= Pr X ∼ θ (cid:0) M ELF R ′ ( y , y , X ) = 1 (cid:1) , where the inequality follows from ˜ α ≥ α and R ( y , θ ) > R ( y , θ ). (cid:3) I. Proof of Proposition 3

Proof.

Let n = 2 with y = (0 . , . . . , .

4) and y = (0 . , . . . , . R be the strictly proper scoring rulethat deﬁnes accuracy. Now suppose M is a limit accurate forecasting competition mechanism and considerthe following two cases with two diﬀerent “perfectly correlated” joint distributions θ for which all m outcomesare the same, i.e., either X k = 0 for all k or X k = 1 for all k :1. θ k = 0 . k . Since y ,k = θ k and y ,k = θ k for all k , strict properness of R implies that forecaster 1 isstrictly more accurate. Hence, limit accuracy implies that there exists an m such that for all m ≥ m , M selects forecaster 1 with probability at least π = 0 . θ k = 0 . k . Since y ,k = θ k and y ,k = θ k for all k , strict properness of R implies that forecaster 2 isstrictly more accurate. Hence, limit accuracy implies that there exists an m such that for all m ≥ m , M selects forecaster 2 with probability at least π = 0 . m = max( m , m ) be the number of events. Since both θ are “perfectly correlated,” the outcomevector is either x = (0 , . . . ,

0) or x = (1 , . . . , M selects given each of6 Witkowski et al.:

Incentive-Compatible Forecasting Competitions these. Let q | and q | be the probabilities that M selects forecaster 1 given x = (0 , . . . ,

0) and x = (1 , . . . , . · q | + 0 . · q | > . . · (1 − q | ) + 0 . · (1 − q | ) > .

7. But this is impossible because the former implies that q | > − q | and the latter implies that q | < − q | , with no q | , q | ∈ [0 ,

1] satisfying both; and a contradiction that M is limit accurate. (cid:3) J. Proof of Theorem 5

Proof.

WLOG, take the perspective of any forecaster i ∈ [ n ] seeking to maximize the probability of beingselected. The proof proceeds by ﬁrst showing that she can reason about each event independently because ofbelief independence. In a second step, we then show that increasing her probability of winning event k cannotdecrease her probability of winning overall. Finally, we show that increasing her probability of winning event k in fact strictly increases her probability of winning overall.In reasoning about forecaster i ’s probability of winning, she needs to reason about the joint probability ofthe event winners vector ( w , . . . , w m ), which is given by the vector of probability distributions ( f , . . . , f m ),where each f k is the distribution over forecasters for event k . From forecaster i ’s perspective, each f k is aninstantiation of a random variable F k , depending on her belief about Y − i and X . Without any restrictions on Y − i and X , these F k can be dependent even if—given instantiated ( f , . . . , f m )—the draws of the event win-ners themselves are independent by deﬁnition of the mechanism. For belief independent joint distributions D over outcomes X and reports Y − i , however, all random vectors ( Y ,k , . . . , Y i − ,k , Y i +1 ,k , . . . , Y n,k , X k ) indexedby k are independent, so that all F k are independent as well. Consider now event k and let K ′ ∈ P ([ m ]) beany subset of event indices with k K ′ . By independence of F k for all k , changing forecaster i ’s report onevent k does not aﬀect the (joint) distribution of F K ′ .It is easy to see that increasing forecaster i ’s expected (subjective) winning probability for event k , E [ F i,k ],simultaneously decreases the expected winning probability E [ F j,k ] of every j = i . To see this, ﬁrst observethat, if E [ F i,k ] increases, the sum of all other forecasters’ event winning probabilities needs to decrease bythe same amount since E [ F i,k ] + P j = i E [ F j,k ] = 1 for all k . Second, by deﬁnition of f i,k , any increase of ǫ > E [ F i,k ] leads to a uniform decrease of ǫn − in each E [ F j,k ] with j = i . This means that, since the F k areindependent, increasing E [ F i,k ] on event k cannot decrease your probability of winning overall.It remains to be shown that increasing E [ F i,k ] strictly increases forecaster i ’s probability of winningoverall. To show this, we need to show that there are situations, where event k is pivotal for winningoverall and that these situations occur with positive probability. First, there exist event win outcomes w , . . . , w k − , w k +1 , . . . , w m on the other m − k is pivotal, i.e., winning or losing event k changes the probability of winning the prize. This is the case if and only if, without event k , some forecaster j = i won most events with forecaster i winning one fewer; or forecaster i won most events with at least oneother forecaster j = i having won exactly the same number, or one event less than forecaster i . For example,with m odd, m − i and j = i can each win half of those events. Similarly, with m even, m − i wins ⌊ m − ⌋ and j wins ⌈ m − ⌉ . Second, these casesoccur with positive probability because we know that every E [ F j,k ] for all j and all k is strictly in between 0and 1 by deﬁnition of f i,k and R ∈ [0 , k is pivotal for forecaster i with positive probability,and reporting truthfully on event k strictly increases the probability of winning the prize. (cid:3) itkowski et al.: Incentive-Compatible Forecasting Competitions K. Proof of Theorem 6

The proof uses the one-sided version of Hoeﬀding’s inequality (Hoeﬀding 1963), which we state here forconvenience.

Theorem (Hoeﬀding’s inequality)

Let X , . . . , X m be independent random variables bounded by theinterval [0 , . Deﬁne S m = X + . . . + X m . Then Pr (cid:16) S m − E (cid:2) S m (cid:3) ≥ t (cid:17) ≤ e − t m . and Pr (cid:16) E (cid:2) S m (cid:3) − S m ≥ t (cid:17) ≤ e − t m . Proof.

Let w i,k := ( w k = i ) indicate whether forecaster i is the event winner for event k , and let W i,k bethe corresponding random variable. Note that the reports y , . . . , y n are ﬁxed, so that the uncertainty is onlyabout the event outcomes X . In particular, with X , . . . , X m independent, W i, , . . . , W i,m are independentconditional on y , . . . , y n .Let z i = P mk =1 w i,k be the number of events won by forecaster i . Furthermore, let Z i be the correspondingrandom variable, so that E X ∼ θ [ Z i ] = E X ∼ θ " m X k =1 f i,k , where the latter expectation is taken over the outcomes, and the former is taken over the outcomes and therandomness of the lotteries.To show limit accuracy, let i be the most accurate forecaster with ∆ := min j = i (cid:0) R ( y i , θ ) − R ( y j , θ ) (cid:1) > i and the second-most accurate forecaster. We ﬁrstbound the diﬀerence between the expected number of events won by i and the expected number of eventswon by some other forecaster j = i : E X ∼ θ [ Z i ] − E X ∼ θ [ Z j ] = E X ∼ θ " m X k =1 (cid:0) f i,k − f j,k (cid:1) = E X ∼ θ hP mk =1 (cid:0) R ( y i,k , X k ) − R ( y j,k , X k ) (cid:1)i n − m (cid:0) R ( y i , θ ) − R ( y j , θ ) (cid:1) n − ≥ m ∆ n − . (5)The second equality follows from substituting the deﬁnition of f i,k and simplifying, the third equality followsfrom rewriting in terms of expected average score, and the inequality follows from the deﬁnition of ∆.We now upper bound the probability that forecaster j wins more events than forecaster i . From Equation 5,if z j ≥ z i , then it holds that E [ Z i ] − z i ≥ m ∆2( n − or z j − E [ Z j ] ≥ m ∆2( n − (both may apply simultaneously). ByHoeﬀding’s inequality, Pr (cid:18) E (cid:2) Z i (cid:3) − z i ≥ m ∆2( n − (cid:19) ≤ e − m ∆22( n − , and Pr (cid:18) z j − E (cid:2) Z j (cid:3) ≥ m ∆2( n − (cid:19) ≤ e − m ∆22( n − , Witkowski et al.:

Incentive-Compatible Forecasting Competitions

Putting these together, we havePr( z j ≥ z i ) ≤ Pr (cid:18) E (cid:2) Z i (cid:3) − z i ≥ m ∆2( n − (cid:19) ∪ (cid:18) z j − E (cid:2) Z j (cid:3) ≥ m ∆2( n − (cid:19)! ≤ Pr (cid:18) E (cid:2) Z i (cid:3) − z i ≥ m ∆2( n − (cid:19) + Pr (cid:18) z j − E (cid:2) Z j (cid:3) ≥ m ∆2( n − (cid:19) ≤ e − m ∆22( n − . Finally, we lower bound the probability that ELF selects forecaster i .Pr X ∼ θ (cid:16) M ELF R (cid:0) y , . . . , y n , X (cid:1) = i (cid:17) = 1 − X j = i Pr X ∼ θ (cid:16) M ELF R (cid:0) y , . . . , y n , X (cid:1) = j (cid:17) ≥ − X j = i Pr X ∼ θ (cid:16) z j ≥ z i (cid:17) ≥ − n − e − m ∆22( n − , where the ﬁrst transition holds because exactly one forecaster is selected and the second because z j ≥ z i isa necessary condition for forecaster j to be selected by ELF. The ﬁnal transition holds by plugging in theearlier inequality. In particular, for ﬁxed n and ‘accuracy gap’ ∆, for any π ∈ [0 , π if m ≥ n − ∆ ln (cid:18) n − − π (cid:19) , which yields limit accuracy. (cid:3) References

Atanasov P, Rescober P, Stone E, Servan-Schreiber E, Tetlock PE, Ungar L, Mellers B (2017) Distilling theWisdom of Crowds: Prediction Markets versus Prediction Polls.

Management Science

Monthly Weather Review https://blog.udacity.com/2016/07/companies-kaggle-machine-learning-talent.html ,[Online; accessed 24-December-2020].Gneiting T, Raftery AE (2007) Strictly Proper Scoring Rules, Prediction, and Estimation.

Journal of theAmerican Statistical Association

Journal of the Royal Statistical Society. Series B

Operations Research https://gigaom.com/2013/08/30/facebook-is-hiring-a-data-scientist-but-youll-have-to-fight-for-the-job/ ,[Online; accessed 24-December-2020]. itkowski et al.:

Incentive-Compatible Forecasting Competitions

Journal of the AmericanStatistical Association

Theory and Decision

Operations Research https://medium.com/kaggle-blog/march-machine-learning-mania-1st-place-winners-interview-andrew-landgraf-f18214efc659 , [Online; accessed 24-December-2020].Karni E (2009) A Mechanism for Eliciting Probabilities.

Econometrica

An Introduction to Computational Learning Theory (MIT press).Kilgour DM, Gerchak Y (2004) Elicitation of Probabilities Using Competitive Scoring Rules.

Decision Anal-ysis

Proceedings of the 9th ACM Conference on Electronic Commerce(EC’08) , 170–179 (ACM).Lambert NS (2011) Probability Elicitation for Agents with Arbitrary Risk Preferences, Working Paper.Lichtendahl KC, Grushka-Cockayne Y, Pfeifer PE (2013) The Wisdom of Competitive Crowds.

OperationsResearch

Management Science

Proceedings of the National Academy of Sciences

Psychological Science

ManagementScience

International Journal of Forecasting

Journal of the American StatisticalAssociation

The annals of statistics

Witkowski et al.:

Incentive-Compatible Forecasting Competitions

Servan-Schreiber E, Wolfers J, Pennock DM, Galebach B (2004) Prediction Markets: Does Money Matter?

Electronic Markets

Superforecasting: The Art and Science of Prediction (New York, NY, USA:Crown Publishing Group).Witkowski J, Freeman R, Wortman Vaughan J, Pennock DM, Krause A (2018) Incentive-Compatible Fore-casting Competitions.