Non-probabilistic odds and forecasting with imperfect models
aa r X i v : . [ m a t h . S T ] J a n Non-probabilistic odds and forecasting withimperfect models
Kevin JuddOctober 3, 2018
Abstract
Probability forecasts are intended to account for the uncertainties in-herent in forecasting. It is suggested that from an end-user’s point ofview probability is not necessarily sufficient to reflect uncertainties thatare not simply the result of complexity or randomness, for example, proba-bility forecasts may not adequately account for uncertainties due to modelerror. It is suggested that an alternative forecast product is to issue non-probabilistic odds forecasts, which may be as useful to end-users, and givea less distorted account of the uncertainties of a forecast. Our analysis ofodds forecasts derives from game theory using the principle that if fore-casters truly believe their forecasts, then they should take bets at the oddsthey offer and not expect to be bankrupted. Despite this game theoreticapproach, it is not a market or economic evaluation; it is intended to bea scientific evaluation. Illustrative examples are given of the calculationof odds forecasts and their application to investment, loss mitigation andensemble weather forecasting.
Our concern here is the quantification of uncertainty in forecasting. Supposeour task is to forecast the possibility of overnight temperatures at a particularlocation falling below freezing. This is a task of significant economic impor-tance, for example, for the salting and gritting of roads, and for preparations tomitigate of frost damage in horticulture. Forecasts of this type use atmosphericobservations and computer models of the physical processes of the atmosphere.Uncertainty arises in these forecasts from the complexity of the weather, fromthe sparsity of observations, from the simplifications and inadequacies of thecomputer models, and so on. The suggestion we make here is that in somesituations, such as using forecasts to mitigate losses, probability may not bethe best means to quantify the uncertainty of the forecast. We suggest thatnon-probabilistic odds may provide a useful alternative to some forecast users.A probability forecast quantifies uncertainty by assigning a probability tothe occurrence of an event. If the process that determines the outcome of the1vent is intrinsically random, then there can be a correct or optimal assignmentto the probability. The forecaster, however, may be uncertain about the cor-rect probability value to assign, and this leads to consideration of the influenceof other uncertainties, which we discuss shortly. A well established means forquantifying the uncertainty of probability forecasting systems are reliability orskill scores. Brier [1950], Murphy, Winkler [Murphy and Winkler, 1977, 1987,Murphy, 1993, Winkler, 1994], and many others since, have considered usingscores to assess the skill of a probability forecast. Skill scores are closely relatedto issues of calibration of probability forecasts [Foster and Vohra, 1998, Palmer,2000, Roulston and Smith, 2002, Smith, 1995]. An alternative assessment ofskill is the economic value of the forecast [Granger and Pesaran, 2000]. Eco-nomic value is closely related to the idea of wagers. One can imagine a marketof forecasters who take bets on the outcomes of events they forecast; the fore-casters that profit the most, or at least avoid bankruptcy, are considered thebetter forecasters. In a competitive market only those forecasters who knowthe true probabilities of events will survive in the long term [Shafer and Vovk,2001]. In the short term, however, merely lucky forecasters can survive, and evenexcel [Johnstone, 2007]. Consequently, for a forecaster to be a top performerthey are forced to act like a bookmaker or marketeer rather than a scientist.Scientists aim to learn the true probabilities of events, whereas bookmakers andmarketeers respond to the opportunities of a market [Levitt, 2004].A probability forecast and a skill score together give a more complete pictureof the uncertainty of a forecasted event. We will argue here, however, thatprobability forecasts can give end-users a misleading picture of uncertainty. Wesuggest a method of avoiding this problem is to issue odds forecasts, which arisenaturally as wagers. In the next subsections we state our distinction between odds and probability , and describe briefly how odds can quantify multiple aspectsof uncertainty. Section 2 introduces a mathematical formalism for the processof issuing odds forecasts, including a brief review of the necessary concepts andtechniques of game theory. Most importantly we show how the computation ofodds can be framed as a simple optimisation problem. In section 3 we computeodds in four basic forecasting situations and compare these computed oddswith the corresponding probabilities. In section 4 we describe how odds canbe employed in investment and loss mitigation. Finally, in section 5 we useoperational ensemble weather forecasts and station data to present a concreteexample of issuing odds forecasts. Given a complete set of m mutually exclusive events, that is, one and only oneof the events will occur, we define odds to mean an assignment of real numbers q i ≥ i = 1 , . . . , m , to the events. If P mi =1 q i = 1, then the odds are probabilisticodds , and the q i are probabilities. Casinos and bookmakers assign odds so that P mi =1 q i >
1, which has the consequence that, on average, they should profitfrom the bettors. The excess over one of the sum is sometimes termed “juice”,“take”, vigour”. 2e argue that if a forecaster truly believes their forecasts, then they shouldtake bets at the odds they offer and not expect to be bankrupted. This canalways be achieved by a sufficiently large excess, however, the scientist’s goalought to be to avoid bankruptcy with the smallest excess.Taking wagers while avoiding bankruptcy is a powerful principle. Extendingearly work of Ville it has been used by Foster and Vovk [1999], Skouras and Dawid[1999], Shafer and Vovk [2001], Dawid [2004] and others. Shafer and Vovk [2001]provides a beautiful development that derives probability theory itself frombetting principles. The ideas developed in this paper share a conceptual andstructural formulation with Shafer and Vovk [2001], but there are significant dif-ferences. Briefly, the differences arise because in our opinion non-probabilisticodds are relevant to the immediate and short-term consequences of uncertainty,whereas probabilities are generally more relevant to long-term and asymptoticuncertainty.Bid-ask spreads in financial markets can be interpreted as non-probabilisticodds [Levitt, 2004, Johnstone, 2007], but our development of odds differs be-cause we want our forecaster to behave as a scientist and avoid market pressuresinfluencing the odds.
Uncertainty arises from many sources, not just randomness. To appreciate thisconsider Jacob Bernoulli’s foundational example of drawing balls, with replace-ment, from a urn containing a number of red and black balls. Suppose thefraction θ of red balls in the urn is known, and the process of extracting a ball issufficiently complex to appear uniformly random , that is, on any selection eachball in the urn is equally likely to be drawn. In this situation the uncertaintyis entirely due to the “random” process of selection, and the uncertainty is ad-equately described by the probability of drawing a red ball, which is θ . We willrefer to this situation as having first order uncertainty.Knowing θ allows one to make a probability forecast, that is, the probabilityof the event “drawing a red ball” is θ . This is a simple forecast model . Indeed,under our assumptions, this is a perfect model , because the model exactly rep-resents the process, and there is no better model. In a perfect model there isonly first order uncertainty.If the number of red and black balls in the urn is unknown, then a value ψ for the fraction of red balls could be inferred from the observed fraction of redballs seen in a number of draws. Using ψ to forecast the probability of a red ballis an imperfect model , there is both first order uncertainty from the randomnessof the process being forecast, and second order uncertainty due to the value of ψ used.It is possible to make a further distinction between a perfect model class and an imperfect model class . For the urn a value of 0 ≤ ψ ≤ ψ = θ providesa perfect model. If the random selection assumption does not hold, because3erhaps the balls are not thoroughly stirred before each selection, then no valueof ψ provides a perfect model; a perfect model would have to take into accountthe conditional randomness, or even non-random effects, of mixing the urn aftereach replacement. Uncertainty about the correct model class is at least thirdorder. One goal of forecasters is to issue accountable, or reliable, or well calibrated,probability forecasts [Foster and Vohra, 1998, Smith, 1995, Palmer, 2000], thatis, if it is forecast that an event E will occur with probability p , then the ob-served fraction of events E out of all events asymptotically approaches p . It isvital to recognise that an imperfect model will almost certainly not provide anaccountable probability forecast. At best forecasts are only accountable asymp-totically in a perfect model class, but even this is a delicate problem [Oakes,1985, Foster and Vohra, 1998].Suppose that for an urn game a forecaster a provides odds for outcomes ofdraws, and takes bets at these odds. If the forecaster knew θ , and the uniformrandom selection assumption held, then probabilistic odds could be issued wherea one unit bet on a red ball pays 1 − θ and betting a black ball pays θ . Withthese probabilistic odds the forecaster does not expect their wealth to increaseor decrease on average, no matter how skillful the bettors. If the forecaster hadonly an estimate ψ , then providing probabilistic odds, where a bet on a red ballpays 1 − ψ and a bet on a black ball pays ψ , would be unwise. Any bettorwho knew θ would almost surely bankrupt the forecaster by always betting afraction θ of their current wealth on the red ball, and betting the fraction 1 − θ of their current wealth on the black ball [Kelly, 1956]. A similar result is truewhen the bettor just has a better estimate of θ . The fact that ψ = θ is the onlyvalue of ψ that does not lead asymptotically to certain bankruptcy can be usedto define the concept of probability [Shafer and Vovk, 2001].The essential problem with using imperfect models to provide probabilityforecasts is the higher order uncertainties, like model error, distort the odds.In particular, we will show that model error often results in under estimatingthe chance of events that have low probability, a so-called base-rate effect. Weargue that by using non-probabilistic odds forecasts, rather than probabilityforecasts, a forecaster can provide a less distorted forecast that takes into ac-count model error. (Here we only show how to do this for particular secondorder uncertainties.)We argue that if a forecaster aims to provide odds that are as close toprobabilistic as possible, subject to known information and the model classused, then the amount by which the odds exceed one is a measure of howcertain the forecaster is of their model. The excess is a measure of second orderuncertainty. (Here we only demonstrate how to do this for a perfect model classwith particular utility functions.) 4 Computation of odds forecasts
The theoretical framework in which we analyse odds forecasting is game the-ory [von Neumann and Morgenstern, 1944, Shafer and Vovk, 2001]. There arethree “players” in our game: the Forecaster, the Client, and Nature. The Clientdetermines the structure of the game. Imagine that the Client approaches theForecaster and requests odds for a complete set of m mutually exclusive events E i , i = 1 , . . . , m . Which event actually occurs is determined by Nature, who isignorant and indifferent to the Client-Forecaster negotiations. For our purposeswe may assume that Nature’s selection of an event is random, or of such com-plexity that the selection is assumed random by the Client and Forecaster. (Bymaking this assumption about Nature we avoid issues of third order uncertain-ties.)When the Client approaches the Forecaster, the Client does not specify in-terest in any particular event within the set of mutually exclusive events. TheForecaster is required to supply odds for all events, based on past observationsof Nature and the Forecaster’s model. The Forecaster is not a bookmaker, theyare a scientist. The objective of the scientist Forecaster is to provide odds asclose to probabilistic as possible given the available information and their modelclass, because by doing so they are aiming to obtain the best forecast model.It may be tempting to imagine many clients and forecasters competing ina market to determine the best forecast model, but this has the propensity forforecasters to adopt the strategies of marketeers and bookmakers. To avoidforecasters acting this way we will isolate the Forecaster from market informa-tion: there will be one Client and the Forecaster is not allowed to know how theClient bets, only the choices of Nature. If the Forecaster knew the pattern ofthe Client’s bets, then this constitutes an additional information stream [Kelly,1956], and consequently can be used to improve the Forecaster’s performanceagainst the Client, or against Nature if the Client is a more skillful forecasterthan the Forecaster. Using this additional information stream allows marketeer-ing and bookmaking, rather than scientific forecasting. Denying the Forecasterknowledge of the Client’s bets forces the Forecaster to rely on available observa-tions and modelling skill alone. Shortly we will see, however, that the Forecasterwill need to know a little about the Client’s betting. We have a three person game. Nature plays indifferently with no aim. TheClient aims to accumulate winnings. The Forecaster aims to set as close toprobabilistic odds as possible, without the Client bankrupting the Forecaster.If there were just two events, E and its complement E ′ , then the game can berepresented by a game matrix G . G : client nature E E ′ E P − E ′ − P ′ (1)5he game matrix represents the pay-out to the Client for a one unit bet on anevent given Nature’s outcome. The odds set by the Forecaster in this case are P to 1 for event E , and P ′ to 1 for event E ′ .There are several variants of this game according to the rules that governthe Client’s bets; the variants can influence how the Forecaster sets the odds.One rule that can be introduced is the Client’s bets are a fixed size and anegligible fraction of the Client’s (and Forecaster’s) total wealth, effectively,infinitesimal bets. Alternatively, the Client can bet a substantial fraction oftheir wealth. Other rules that might be introduced govern whether the Client isforced to make a bet, regardless of the odds, or whether the Client can split betsover several events, or whether there is a minimum size of a bet on any event.Rules governing the sizes of bets can influence the Client’s utility function ofwealth, and consequently influence the pattern of bets. With variable sizedbets the Client can choose to maximise the growth of wealth, a logarithmicutility function, which requires distributing bets over several events. Forcedbets of a fixed (infinitesimal) size essentially implies a linear utility function ofwealth, and results in betting only on the event that the Client believes givesthe maximum pay-out.Since the Forecaster does not know the Client’s individual bets, the Fore-caster must at least know the Client’s utility function of wealth, or the rulesgoverning how the Client can bet, which effectively force a utility function ontothe Client.The ultimate challenge for the scientist Forecaster is to compete against aClient who knows the true probabilities of Nature, and yet do so without beingbankrupted. Much of the analysis in this section is text book game theory [von Neumann and Morgenstern,1944]. We first consider the zero-sum game between the Client and Nature anddetermine the Client’s optimal strategy for fixed odds. We then derive an opti-mal odds assignment of the Forecaster for an arbitrary probability model underthe assumptions of a perfect model class.
For a complete set of m mutually exclusive events E i , i = 1 , . . . , m , the m × m game matrix G of the zero-sum game between the Client and Nature (representedin 2 × G ij = ( δ ij /q i ) −
1, where δ ij = 1 if i = j and zerootherwise, and q i is the odds for event E i , or equivalently, the odds on event E i is “ P i to 1”, where q i = 1 / ( P i + 1). Assume that Nature plays a strategy whereevent E i is chosen randomly with probability π i . Supposing that the Clientplays a strategy where event E i is chosen with probability p i , then the averagepay-out to the Client is m X i =1 π i p i q i ! − . (2)6ere P mi =1 π i = 1, P mi =1 p i = 1, but P mi =1 q i ≥
1, with equality if and only ifthe odds are probabilistic.The minimax strategy for the Client is to select event E i with probability p ⋆i , and has average pay-out V ⋆ , where p ⋆i = q i P i q i , V ⋆ = 1 P i q i − . By using the minimax strategy the Client is guaranteed an average pay-out ofat least V ⋆ regardless of the π i . When the odds are probabilistic, then theminimax strategy for the Client has p i = q i and V ⋆ = 0. It follows that theoptimal Client strategy is to choose p i = 1, when i is the index such that π i /q i is maximal, and p i = 0 otherwise. The average pay-out to the Client is thenmax i { π i /q i } − The forecaster’s aim is to make the odds as probabilistic as possible withoutthe client’s winnings accumulating without bound. The forecaster has to allowfor the possibility that the client is a better forecaster, in the worst case, theClient knows Nature’s probabilities π = ( π , . . . , π m ). There is no single optimalstrategy to achieve the goals we have set for the forecaster, so some additionalguidance is necessary.Since the game matrix G is symmetric, it can be seen from equation 2 thatthe ideal strategy for the forecaster is to set the odds so that q i / P mj =1 q j = π i ,in which case the closest to probabilistic odds are the probabilistic odds q i = π i .This is, of course, a meaningless solution, because the π i are unknown. Simplytaking an estimate ˆ π of Nature’s probabilities π , and using these as odds, q i = ˆ π i ,is unwise, because at least one π i / ˆ π i >
1, which will be exploited by the moreinformed Client, who can obtain a better estimate of π i .If the forecaster accepts their forecast model is imperfect, then they will ac-knowledge there are many likely values of π . However, if the forecaster assumestheir model class is perfect, then given data D they can assert that under theirmodel Pr( π | D ) ∝ Pr( D | π ) Pr( π ), where Pr( π ) represents prior knowledgeabout possible values of π . From this the forecaster can compute the averageloss (pay-out to the client) V ( π, q ) for given π and odds q = ( q , . . . , q m ), V ( π, q ) = max i (cid:26) π i q i (cid:27) − E ( V | D, q ), given the data D and fixed odds q , is E ( V | D, q ) = R S V ( π, q ) Pr( D | π ) Pr( π ) dπ R S Pr( D | π ) Pr( π ) dπ , (4)where S is the simplex P i π i = 1. A possible optimal strategy for the forecasteris min X i q i subject to Z S V ( π, q ) Pr( D | π ) Pr( π ) dπ = 0 . (5)7he minimisation attempts to ensure the odds q are as probabilistic as possible.The constraint E ( V | D, q ) = 0 attempts to ensure that whatever the true π ,the odds are such that the client’s average pay-out is zero given the assumeddistribution of π . The construction of equation (5) implies that the client’swealth is a martingale [Doob, 1953, Williams, 1991], that is, if W n is the wealthof the client after n plays of the game, then E ( W n | D, q ) = W n − + E ( V | D, q ) = W n − .Observe that the constraint E ( V | D, q ) = 0 is not, in principle, hard tocompute. Define S = { π : P i π i = 1 } , (6) S i ( q ) = { π ∈ S : i = arg max { π i /q i }} , (7) A i ( q ) = Z S i ( q ) π Pr( D | π ) Pr( π ) dπ, (8) C = Z S Pr( D | π ) Pr( π ) dπ. (9)The strategy given by problem (5) in equivalent tomin X i q i subject to X i A i ( q ) q i = C. (10)Observe that even if Pr( D | π ) Pr( π ) has a complex form, if it can be computedby Monte Carlo methods, then all the A i ( q ) and C can be computed simul-taneously. The main difficulty is that the constraints have to recomputed foreach q . We now consider the situation where the game rules require the client to bet theirentire wealth. This is a natural mathematical extension of infinitesimal forcedbets. With probabilistic odds this game leads to the Kelly betting strategy ofdistributing the client’s entire wealth as bets proportional to the probabilitiesof events. This strategy maximises the growth rate of wealth, or equivalently,a logarithmic utility function of wealth. However, as Kelly [1956] explains,when the odds are not probabilistic, then the client should not bet their entirewealth, only a fraction of it. We will insist that the client is forced to bet theirentire wealth. The unfairness of this restriction is balanced by the informationadvantage of the client. If the forecaster tries to make the odds as probabilisticas possible, then the odds can still interest the client.Once again, suppose there is a complete set of mutually exclusive events E i , i = 1 , . . . , m and a m × m game matrix G with G ij = ( δ ij /q i ) −
1. The clientplays nature and is required to bet their entire wealth at each play. Supposethat after n plays of this game, the client has total wealth W n , and at eachplay the client distributes their current wealth as bets in the proportion p i onevent E i . Assume nature acts as though the events are chosen randomly, with8 i chosen with probability π i . Then the expected wealth of the client after n plays, given an initial wealth W , is E ( W n ) = m Y i =1 (cid:18) p i q i (cid:19) π i ! n W Consequently, the average rate of growth of wealth islim n →∞ n log( W n ) = m X i =1 π i log( p i /q i ) . It is easily shown that the maximum growth rate occurs when p i = π i . This isthe optimal Kelly betting strategy for a client that knows nature’s π i . Hence,define G ( π, q ) = P mi =1 π i log( π i /q i ).An appropriate forecaster strategy is to set the closest to probabilistic oddsso that expected rate of growth of wealth of the client is zero, that is,min m X i =1 q i subject to Z S G ( π, q ) Pr( D | π ) Pr( π ) dπ = 0 . (11)Equation (11) implies that the logarithm of the client’s wealth is a martingale,that is, E (log W n | D, q ) = log W n − + E ( G | D, q ) = log W n − .Remarkably the optimisation of equation (11) has an explicit form for thesolution, which means that in some instances closed form solutions are possible.Define, using C as in equation (9),¯ H = 1 C Z S m X i =1 π i log( π i ) Pr( D | π ) Pr( π ) dπ, (12)¯ π i = 1 C Z S π i Pr( D | π ) Pr( π ) dπ, (13) α = ¯ H − m X i =1 ¯ π i log(¯ π i ) . (14)The constraint E ( G | D, q ) = 0 in (11) is equivalent to P mi =1 ¯ π i log( q i ) = ¯ H .By straight forward application of Lagrange multipliers it can be shown that q i = ¯ π i e α solves the required optimisation (11). The explicit form of the solu-tion, in terms of constants that are obtained from integrals, means that in someinstances closed form solutions are possible. A valid interpretation of this opti-misation result is that the inflation exponent α is the discrepancy between theexpected entropy (or information) − ¯ H and the entropy implied by the expectedprobabilities ¯ π . To illustrate the computation of odds we consider two situations forecastingbinary events. The first situation uses a frequency model , which requires only9nowledge of the frequency of past events. The aim of the forecaster is toprovide odds on a future event. (This is the urn game.) The second situationuses a
Gaussian model , where the available data is a finite collection of scalarmeasurements assumed to be drawn from a Gaussian distribution. The aim ofthe forecaster is to provide odds on a future measurement being below somethreshold. We will actually consider four situations because we will computethe odds for both linear and logarithmic utility.
Consider the situation where there are two events E and its complement E ′ ,where Nature selects E with probability π and E ′ with probability π ′ = 1 − π .A frequency model assumes that all realizations of the events are independent,so only the frequency of the event provides any information about π . Theforecaster is required to assign odds q and q ′ to events E and E ′ .Suppose the event E has been observed to occur x times in n realizations.Under the frequency model Pr( π | x, n ) ∝ Pr( x | π, n ) Pr( π ) = C nx π x (1 − π ) n − x Pr( π ). Suppose the forecaster has no prior information on π , and soassumes a uniform prior Pr( π ) = 1, that is, assumes all values of π are equallylikely.Following equation (10) the constraint on q and q ′ to obtain E ( V | x, n, q, q ′ ) =0 can be expressed in terms of beta and incomplete beta functions,1 q ′ β qq + q ′ ( x + 1 , n − x + 2) + 1 q β q ′ q + q ′ ( n − x + 1 , x + 2) = β ( x + 1 , n − x + 1) , (15)where β x ( a, b ) = R x t a − (1 − t ) b − dt and β ( a, b ) = β ( a, b ). Defining s = q + q ′ and q = ps , so q ′ = (1 − p ) s , then solving equation (15) for s , will convenientlytransform problem (10) into a one-dimensional problem,min ≤ p ≤ s = − p β p ( x + 1 , n − x + 2) + p β − p ( n − x + 1 , x + 2) β ( x + 1 , n − x + 1) . (16)This problem is easily solved numerically using Brent’s method or similar [Press et al.,1988].Figure 1 shows computed odds for various observed frequencies for a smallnumber of observations in a table, and for progressively larger numbers of ob-servations in the graph. Figure 2 shows the total odds s = q + q ′ . A number ofinteresting, but not unexpected, facts can be seen. For a small number of obser-vations the odds are far from probabilistic, but they become more probabilisticas the number of observations increase. Furthermore, the odds deviate mostfrom a probability for the event with a low frequency count, which is consistentwith the base-rate effect. Observe that meaningful odds are given when x = 0or n , and that q → x/n and q ′ → ( n − x ) /n as n → ∞ . Furthermore, we findthat for a fixed ratio x/n , q + q ′ → c √ n for some constant c as n → ∞ .10 .2 Frequency model with logarithmic utility This situation allows an essentially closed form solution for the odds. Usingequation (12) with Pr( x, n | π ) = C nx π x (1 − π ) n − x and Pr( π ) = 1, obtains q = (cid:18) x + 1 n − x + 1 (cid:19) n − x +1 n +2 e ψ , q ′ = (cid:18) n − x + 1 x + 1 (cid:19) x +1 n +2 e ψ ,ψ = (cid:18) x + 1 n + 2 (cid:19) H x +1 + (cid:18) n − x + 1 n + 2 (cid:19) H n − x +1 − H n +2 where we use the harmonic numbers H k = P ki =1 1 i .Figure 3 provides a table of computed values of the odds for small valuesof n and graphs the odds for larger values. These computed odds should becompared with fig. 1. Figure 4 shows how the total odds q + q ′ varies with numberof observations, which should be compared with fig. 2. It is observed that thetotal odds have a smaller excess over unity, and vary less with the observedfraction. This is not surprising, because a client who aims to maximise theirrate of growth of wealth is less likely to exploit forecast errors of low probabilityevents. Observe also that the odds converge much faster in this situation, at arate of 1 /n , as opposed to 1 / √ n in the infinitesimal bets case. (The author alsohas closed form expressions for the odds in this case of an arbitrary number ofevents, which will discussed elsewhere.) Now consider a situation where the forecaster is given scalar observations D = { x , . . . , x n } , with statisticsˆ µ = 1 n X i x i ˆ σ = 1 n X i ( x i − ˆ µ ) = 1 n X i x i − ˆ µ . Furthermore, the forecaster has reason to model these as observations of a ran-dom variable X with a Gaussian distribution N ( µ, σ ), for some unknown µ and σ ; although the forecaster may have additional prior belief Pr( µ, σ ) inthe values of µ and σ . The client requires an odds forecast for the events E = { X ≤ x } and E ′ = { X > x } for some fixed x .Since we are assuming a perfect model class (to avoid third order uncertaintyissues), then the true probability π of the event E is π = Pr( X ≤ x ) = Z x −∞ √ πσ e − ( t − µ ) / σ dt = Φ (cid:18) x − µσ (cid:19) . Whereas the optimisation problem (5) that obtains the odds is formulated interms of integrals over the probabilities of the events, in this situation we see11hese probabilities are determined entirely from x , µ and σ . Consequently, it ismore appropriate to reformulate the optimisation (5) it terms of integrals over µ and σ . This requires reformulating equations (3) and (4).When the forecaster assigns odds q and q ′ to E and E ′ respectively, thenin the worst case where the client knows µ and σ (and hence π ) the optimalstrategy of the client is to bet on E if qq + q ′ < Φ (cid:18) x − µσ (cid:19) ,E ′ if q ′ q + q ′ < − Φ (cid:18) x − µσ (cid:19) . and the average pay-out to client is then V ( q, q ′ | x, µ, σ ) = max (cid:26) q Φ (cid:18) x − µσ (cid:19) , q ′ (cid:18) − Φ (cid:18) x − µσ (cid:19)(cid:19)(cid:27) − . Given their model the forecaster will assert that Pr( π | D ) ∝ Pr( D | µ, σ ) Pr( µ, σ ) wherePr( D | µ, σ ) = n Y i =1 √ πσ e − ( x i − µ ) / σ = (cid:18) √ πσ (cid:19) n e − n (ˆ σ +(ˆ µ − µ ) ) / σ . Just as in section 3.1 we once again have a situation involving binary eventswhere there is advantage in defining s = q + q ′ , q = ps , q ′ = (1 − p ) s , so thatproblem (5) can be transformed tomin ≤ p ≤ s = 1 C (cid:18) A ( p, x )1 − p + B ( p, x ) p (cid:19) , where A ( p, x ) = Z ∞ Z ∞ M ( p,x,σ ) Φ (cid:18) x − µσ (cid:19) F ( µ, σ ) dµ dσ,B ( p, x ) = Z ∞ Z M ( p,x,σ ) −∞ (cid:18) − Φ (cid:18) x − µσ (cid:19)(cid:19) F ( µ, σ ) dµ dσ,C = Z ∞ Z ∞−∞ F ( µ, σ ) dµ dσ,M ( p, x, σ ) = x − σ Φ − ( p ) ,F ( µ, σ ) = Pr( D | µ, σ ) Pr( µ, σ ) . The odds can be calculated once the prior Pr( µ, σ ) has been assigned, how-ever, the assignment of the prior requires a little care. It is not unreasonableto assume no knowledge of µ , and hence place a uniform prior on µ . On the12ther hand, one cannot assume a uniform prior for σ , because the integrals willdiverge. Taking a uniform prior for σ on (0 , R ), then one finds that s → p → / R → ∞ . Essentially a model that freely allows arbitrarily large vari-ances cannot provide useful forecasts. Consequently, one is required to specifya prior for σ with a tail that thins sufficiently rapidly.Figure 5 shows numerically computed odds for a situation where ˆ µ = 0 andˆ σ = 1. The prior used was Pr( µ, σ ) = σe − σ / , which is a χ -distribution oftwo degrees of freedom. This prior was used because we want ˆ σ = 1 to be atypical observed value. The computed odds were similar for half-normal, F-distributions and other similar distributions where ˆ σ = 1 is a typical observedvalue. By a shift and scale the computed odds shown in fig. 5 provide a generalsolution for arbitrary ˆ µ and ˆ σ . Let x i = ˆ µ +ˆ σz i so that the z i have mean zero andvariance one. These z i should be modelled as observations of a random variable Z = ( X − ˆ µ ) / ˆ σ . The events of interested are now expressed as E = { Z ≤ z } and E ′ = { Z > z } , where x = ˆ µ + ˆ σz . Hence, if q ( z ) and s ( z ) represent theodds and total odds for the generic case shown in fig. 5, then the odds in thegeneral case are obtained using z = ( x − ˆ µ ) / ˆ σ .Figure 5 shows that the odds on E exceed one for z larger than about 1. Ourinterpretation of this is that given the information available to the forecaster,the event E is so likely that offering odds favourable to the client would be lossmaking to the forecaster. The client is therefore offered odds so that to beton E makes a consistent small loss with an rare loss of −
1, and to bet on E ′ makes a consistent loss of − Computation of odds in this situation follow in a similar fashion to the previoussection, in that the integrals (12) are reformulated in terms of integrals over µ and σ . Thus, using the same notation as the previous sections, the odds are q = ¯ πe α and q ′ = (1 − ¯ π ) e α where¯ H = 1 C Z ∞ Z ∞−∞ Φ(( z − µ ) /σ ) log(Φ(( z − µ ) /σ )) F ( µ, σ ) dµ dσ, + 1 C Z ∞ Z ∞−∞ (1 − Φ(( z − µ ) /σ )) log(1 − Φ(( z − µ ) /σ )) F ( µ, σ ) dµ dσ, ¯ π = 1 C Z ∞ Z ∞−∞ Φ(( z − µ ) /σ ) F ( µ, σ ) dµ dσ,α = ¯ H − ¯ π log(¯ π ) − (1 − ¯ π ) log(1 − ¯ π )Figure 6 shows the computed odds for a situation where ˆ µ = 0 and ˆ σ = 1, andthe prior Pr( µ, σ ) = σe − σ / . These odds should be compared with the linearutility situation shown in fig. 5. In comparison it is seen that the logarithmicutility function gives odds with smaller excess s and less weight attached to thelarge z values. this is similar to the frequency model, for the same reasons.13 Investment and loss mitigation
The client who has featured thus far in our analysis is more accurately describedas a speculative client , whose primary goal is to profit from inadequacies of theforecaster’s predictions. We now introduce the invested client , whose wealth isinvested in some venture whose profit, costs, and losses are determined in partby the outcomes of the events. Think here of the road-gritter, or horticulturistwho is concerned with the possibility of freezing temperatures. The investedclient has no desire or ability to challenge the forecaster’s skill, rather theywish to use the forecasts to mitigate their losses. The invested client can dothis by betting against the forecaster; a bet is essentially an insurance policy.In this section we analyse how an invested client should bet and show thatsuch bets are beneficial to the invested client despite the forecaster’s odds beingnon-probabilistic. Furthermore, we will see that the client deals only with theforecaster’s odds, they do not try to normalise the odds to obtain probability“estimates”; the odds contain all the information the invested client needs.
Consider a situation where the return on a client’s investment is influenced bywhether an event E , or its complement E ′ , occurs. Let R be the return on theinvestment when the event E occurs and R ′ the return on E ′ . Let π and π ′ ,where π + π ′ = 1, be the probabilities of the events E and E ′ respectively, underassumption that Nature selects the events at random. The expected return onthe investment is Rπ + R ′ π ′ .Now suppose the forecaster provides odds q and q ′ for the events E and E ′ .The game matrix for betting of these events is (1), in terms of the pay-outs P = (1 /q ) − P ′ = (1 /q ′ ) −
1. If the client places bets with the forecasterof λ ≥ E and λ ′ ≥ E ′ , then theexpected return to the client is( R + λP − λ ′ ) π + ( R ′ − λ + λ ′ P ′ ) π ′ . If the client chooses the bets λ and λ ′ so that R + λP − λ ′ = R ′ − λ + λ ′ P ′ , thenthe client’s return is fixed and independent of π and π ′ , indeed independent ofwhich event occurs. Under this condition R − R ′ = ( P ′ + 1) λ ′ − ( P + 1) λ = λ ′ q ′ − λq , and so, if R > R ′ , then the client should bet λ = 0 and λ ′ = q ′ ( R − R ′ ) for areturn of (1 − q ′ ) R + q ′ R ′ , and if R < R ′ , then the client should bet λ = q ( R ′ − R )and λ ′ = 0 for a return of qR + (1 − q ) R ′ .Since q + q ′ > Consider a situation where a client incurs a loss L if an event E occurs, butthey can take an action at cost C which if taken results in inclusive mitigatedlosses M . This situation is represented by the following game matrix.client nature E E ′ no action − L − M − C In this situation it is usual that 0 < C < M < L , although it can happen thatmitigating the losses includes a reward that more than covers costs of the actionso that
M < C . In either case the optimal strategy for the client is to take theaction when π > C/ ( L + C − M ).As before a forecaster provides odds on the events E and E ′ . Suppose theclient places bets λ and λ ′ on E and E ′ respectively when no action is taken,and places bets µ and µ ′ when the action is taken. The game matrix is thefollowing. E E ′ no action − L + λP − λ ′ λ ′ P ′ − λ action − M + µP − µ ′ − C + µ ′ P ′ − µ Following the analysis of the previous subsection there are three possibilities:(a) Take no action and place bets λ = qL and λ ′ = 0, which results in a fixedloss − qL .(b) Given C < M , take the action and place bets µ = q ( M − C ) and µ ′ = 0,which results in a fixed loss − qM − (1 − q ) C .(c) Given M < C , take the action and place bets µ = 0 and µ ′ = q ′ ( C − M ),which results in a fixed loss − (1 − q ′ ) M − q ′ C .It follows that when 0 < C < M < L the client always bets on the eventoccurring and takes the action when q > C/ ( L + C − M ). In the situationwhere 0 < M < C < L the client bets on the event occurring when taking noaction, and bets on the event not occurring when taking the action, and takesthe action when qL − ( C − M ) q ′ > M . Once again if q + q ′ >
1, then the losseswith bets are more than the expected losses without bets, but the losses arefixed and the forecaster takes the risk.The important point to note about these results, and those of the previoussubsection, is the client uses the odds q and q ′ directly, and does not normalisethese to obtain probability “estimates” q/ ( q + q ′ ) and q ′ / ( q + q ′ ) of the events.15ll the useful information to the client in contained in the odds, and whetherthe clients refers to both q and q ′ , or just one of these values, depends on theircircumstances. For example, in the 0 < C < M < L situation the decision totake the action is based on q alone, where as, in the 0 < M < C < L situationit depends on both q and q ′ , but not in a way that implies normalising them toobtain probability “estimates”. Finally we consider an example of issuing of odds forecasts on temperaturevariations using numerical ensemble weather predictions. A common event ofinterest is whether the temperature at a locality will fall below freezing. Theseevents are fairly rare, so we consider a related and more general event of whetherthe temperature at a location at some set lead-time falls more than a certainamount below the current temperature at that station. These calculations areintended to provide an illustration of odds forecasting using the results we haveobtained so far. We do not claim that the odds forecast we compute are thebest or most appropriate, indeed the results suggest otherwise. Odds based onkernel density estimates, kernel dressing and the like could provide better oddsforecasts.
We use London Heathrow (Station 03772) as the locality and National Centerfor Environmental Prediction GFS ensemble predictions [Troth, 2008] for con-structing odds forecasts. The GFS model provides a temperature at the stationby interpolation of the temperatures of the global circulation model’s nearestgrid-points to the location. Our first step is to prepare suitable time series dataincluding bias correction and temporal interpolation.The GFS 2-metre-temperature deterministic-forecast initialisation states wereused to calibrate GFS model temperatures with the station readings as follows.Firstly, we compute a cubic spline of the time series of deterministic-forecastinitialisation states. Secondly, we compute using the spline a model tempera-ture time-series to pair with the station temperature time-series. Finally, we fitby least squares the model y = p ( x ) + p ( x ) cos(2 πh/
24) + p ( x ) sin(2 πh/ , (17)where y is the station temperature, x the splined model temperature, h the hourof the day, and each p j ( x ) is a cubic polynomial. This is a twelve parametermodel. We used times series from the calendar year 2005. The residuals ofthis model had mean 6 . × − , skewness 0 . . . . adjusted control and the adjusted ensemble forecasts . Figure 7shows an example of the station data, adjusted control and adjusted ensembletime-series. The goal is to provide odds forecasts on whether the minimum daily temperatureat one to ten days lead-time will be less than 3 degrees centigrade below theadjusted control temperature at the time of issuing the forecasts. In order to usethe results we have already derived we consider four forecasts. The first forecastapplies the odds from a frequency model, which is based on counting numberof adjusted ensemble members below the target threshold. The second forecastapplies the odds from a Gaussian model, which assumes the adjusted ensemblemembers have a Gaussian distribution. The third forecast is a probabilisticforecast obtained by assuming the adjusted ensemble members have a Gaussiandistribution. The forth forecast is intended to act as the ultimate-challengeclient. It is not really a forecast at all, but rather the predicted probability ofstation temperature being below the threshold based on a Gaussian distributioncentred on the adjusted control with standard deviation equal to 1 . ′ ) that uses the probabilistic odds but sets minimum odds of 0 .
1, which ef-fectively caps pay-outs at 10. These capped probabilities are a crude form ofodds. Odds capping could be applied by our forecasters 1 and 2, although inthe test considered it makes little difference. Table 1 shows the total pay-out tothe three non-probabilistic odds forecasters 1, 2, and 3 ′ , on bets taken from theultimate-challenge client. If forecasters 1 and 2 are allowed odds caps of 0 . ′ competitive with forecasters 1 and 2 beyond a lead time of one ortwo days. Forecasters 1 and 2 have similar performance although forecaster 2is more successful. In all cases it appears that performance decreases withincreasing lead time. We have confronted forecasters with the challenge that if they offer a probabilityforecast, then they should be prepared to accept bets at the odds these imply.Unless a forecaster has a perfect forecast model, then they would be unwise toaccept this challenge, because any that do so will almost surely be bankruptedby more informed bettors. We argue that probability forecasts fail to accountfor higher order uncertainties, such as model error. Our alternative is to offernon-probabilistic odds forecasts obtain using an optimisation principle. Theexcess of the odds reveals the forecaster’s uncertainty about the model. Wehave shown how to compute odds forecasts in several situations, and illustratedhow odds forecasts could be used in investment, loss mitigation, and weatherforecasting.There are gaps in our development and demonstrations, especially in theapplication to ensemble weather forecasting. A gap occurs because we haveonly shown how to compute odds under the assumption of a perfect modelclass; this assumption eliminates third and higher order uncertainties. Figure 7shows that an assumption of a perfect model class is not well supported inour application to ensemble weather forecasting, because during lead-times of144 to 180 hours the entire forecast ensemble fails to represent what actuallyhappened. This implies the assumption that the ensemble is random selectionform the distribution of possibilities is false. Nonetheless, table 1 shows thatour odds calculation is sufficiently conservative at these lead times to cope fairlywell with this level of uncertainty; pay-outs of around 50 units compared to 200units for capped probabilities. The goal, however, was that the pay-outs shouldbe zero on average. It would be optimistic to suggest this were the case forlead times of 5 days or more. Improvement of these odds forecasts is certainly18ossible. Either by more careful consideration of the selection of the prior, or byabandoning the simple frequency and Gaussian models for a more sophisticatedmodel that takes into account the conditional aspects of the weather. In analogyto the urn game, a more sophisticated model means looking more closely at themixing process.We have argued that odds forecasting has uses in investment and loss miti-gation, we claim also that it can be used for model assessment. The results oftable 1 suggest that the Gaussian model is quite successful out to lead times of 5days. Kernel-density based models may well do better, extending to longer leadtimes, or having average pay-outs closer to zero. Comparison with an ultimate-challenge client as we do provides a diagnostic of the models performance, andfurthermore, if a model achieves a zero average pay-out, then excess of the oddsprovide indications of the model’s higher order uncertainty.
References
G.W. Brier. Verification of forecasts expressed in terms of probability.
MonthlyWeather Review , 78(1):1–3, 1950.A.P. Dawid. Probability, causality, and the empirical world : A Bayes–deVinetti–Popper–Borel synthesis.
Statistical Science , 19(1):44–57, 2004.J.L. Doob.
Stochastic processes . Wiley and Sons, 1953.D.P. Foster and R.V. Vohra. Asymptotic calibration.
Biometrika , 85(2):379–390, 1998.D.P. Foster and R. Vovk. Regret in the on-line decision problem.
Games andEconomic Behavior , 29:7–35, 1999.C.W.J. Granger and M.H. Pesaran. Economic and statistical measures of fore-cast accuracy.
Journal of Forecasting , 19(7):537–560, 2000.D. Johnstone. Economic Darwinism : who has the best probabilities?
Theoryand Decision , 62:47–96, 2007.J. Kelly. A new interpretation of information rate.
Bell Systems TechnicalJournal , 35:916–926, 1956.S.D. Levitt. Why gambling markets organized so differently from financial mar-kets?
The Economic Journal , 114:223–246, 2004.A.H. Murphy. What is a good forecast? an essay on the nature of goodness inweather forecasting.
Weather and Forecasting , 8(2):281–293, 1993.A.H. Murphy and R.L. Winkler. Reliability of subjective probability forecastsof precipitation and temperature.
Applied Statistics , 26:41–47, 1977.19.H. Murphy and R.L. Winkler. A general framework for forecast verification.
Monthly Weather Review , 115:1330–1338, 1987.D. Oakes. Self-calibrating priors do not exist.
J. Am. Statist. Assoc. , 80:339–,1985.T. Palmer. Predicting uncertainty in forecasts of weather and climate.
Rep.Prog. Physics , 63:71–116, 2000.W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling.
NumericalRecipes in C . Cambridge University Press, Cambridge, 1988.M.S. Roulston and L.A. Smith. Evaluating probabilistic forecasts using infor-mation theory.
Monthly Weather Review , 130(6):1653–1660, 2002.G. Shafer and V.G. Vovk.
Probability and Finance : It’s all a game . Wiley,2001.K. Skouras and A.P. Dawid. On efficient probability forecasting systems.
Biometrika , 86, 1999.L.A. Smith. Accountability and error in ensemble forecasting. In
Predictability ,volume 1 of
ECMWF Seminar Proceedings
Theory of Games and Economic Behavior .reprinted Dover, 1944.D. Williams.
Probability With Martingales . Cambridge Univ. Press, 1991.R.L. Winkler. Evaluating probabilities: Asymmetric scoring rules.
ManagementScience , 40(11):1395–1405, 1994. 20 /n q q + q ′ / / / / / / / / / / / / / / odd s observed frequency x/n n = 2n = 4n = 8n = 16n = 32n = 64n = 128n = 256n = 512 Figure 1: Odds for frequency model with linear utility function. The tableshows for frequencies x/n of event E , and the values of q and q + q ′ , as com-puted from (16). The value of q ′ can be obtained, by symmetry, from thefrequency ( n − x ) /n . Values for larger n are plotted in fig. 1. Only markedpoints are realizable values of the frequency.21 t o t a l odd s observed frequency x/n n = 2n = 4n = 8n = 16n = 32n = 64n = 128n = 256n = 512 Figure 2: The total odds s = q + q ′ for frequency model with linear utilityfunction. Only marked points are realizable values of the frequency.22 /n q q + q ′ / / / / / / / / / / / / / / odd s observed frequency x/n n = 2n = 4n = 8n = 16n = 32n = 64n = 128n = 256n = 512 Figure 3: Odds for frequency model with logarithmic utility function. The tableshows for frequencies x/n of event E , and the values of q and q + q ′ , as computedfrom the closed form formula. The value of q ′ can be obtained, by symmetry,from the frequency ( n − x ) /n . Values for larger n are plotted in fig. 1. Onlymarked points are realizable values of the frequency.23 t o t a l odd s observed frequency x/n n = 2n = 4n = 8n = 16n = 32n = 64n = 128n = 256n = 512 Figure 4: The total odds s = q + q ′ for frequency model with logarithmic utilityfunction. Only marked points are realizable values of the frequency.24 C u m u l a t i v e den s i t y o r t o t a l odd s z Figure 5: Odds q (dashed) and total odds s = q + q ′ (solid) for Gaussian modelwith linear utility function, here computed for the generic situation ˆ µ = 0 andˆ σ = 1, prior Pr( µ, σ ) = σe − σ , and events E = { Z ≤ z } and E ′ = { Z > z } . Thestandard normal cumulative distribution function is also plotted for comparison(dotted). 25 C u mm u l a t i v e den i t y o r t o t a l odd s z Figure 6: Odds q (dashed) and total odds s = e α (solid) for Gaussian model withlogarithmic utility function, here computed for the generic situation ˆ µ = 0 andˆ σ = 1, prior Pr( µ, σ ) = σe − σ , and events E = { Z ≤ z } and E ′ = { Z > z } . Thestandard normal cumulative distribution function is also plotted for comparison(dotted). 26 T e m pe r a t u r e i n c en t g r ade Lead time in hours
Figure 7: Station temperature time series (+), the adjusted control (dashed)and adjusted ensemble forecast time-series (solid). The horizontal line (dashed)shows the threshold line for which the odds forecasts are required. The forecastsshown were launched 00:00UTC day 40 of 2005.27 O dd s Lead time in hours
Figure 8: Four computed odds forecasts (linear utility) for the data shown infigure 7. Frequency odds (dashed), Gaussian odds (solid), and probabilistic odds(dash-dotted). Challenger (dashed) refers to the odds used by the ultimate-challenge client against whom the forecasters must compete.28 P a y ou t Lead time in hours
Figure 9: The pay-outs of the three forecasters when faced with the ultimate-challenge client, for linear utility. Frequency odds (dashed), Gaussian odds(solid), and probabilistic odds (dash-dotted). Same data as shown is figures 7and 8. The pay-outs are shown on a linear scale for pay-outs less than one, thentransitions into a logarithm base 10 scale above.29tility Linear LogarithmicLead 1. 2. 3 ′ . 1. 2. 3 ′′