A Probabilistic Model for Predicting Shot Success in Football
AA Probabilistic Model for Predicting ShotSuccess in Football
Edward Wheatcroft and Ewelina Sienkiewicz
London School of Economics and Political Science, Houghton Street, London, UnitedKingdom, WC2A 2AE.Corresponding email: [email protected]
Abstract
Football forecasting models traditionally rate teams on past matchresults, that is based on the number of goals scored. Goals, however,involve a high element of chance and thus past results often do notreflect the performances of the teams. In recent years, it has becomeincreasingly clear that accounting for other match events such as shotsat goal can provide a better indication of the relative strengths of twoteams than the number of goals scored. Forecast models based on thisinformation have been shown to be successful in outperforming thosebased purely on match results. A notable weakness, however, is thatthis approach does not take into account differences in the probabilityof shot success among teams. A team that is more likely to score froma shot will need fewer shots to win a match, on average. In this paper,we propose a simple parametric model to predict the probability ofa team scoring, given it has taken a shot at goal. We show thatthe resulting forecasts are able to outperform a model assuming anequal probability of shot success among all teams. We then show thatthe model can be combined with predictions of the number of shotsachieved by each team, and can increase the skill of forecasts of boththe match outcome and of whether the total number of goals in amatch will exceed 2.5. We assess the performance of the forecastsalongside two betting strategies and find mixed evidence for improvedperformance. a r X i v : . [ s t a t . A P ] J a n eywords: Probability forecasting, Sports forecasting, Football fore-casting, Expected goals, Shot success
Association football (hereafter football) is by far the most popular sportglobally with almost every country in the world having a national team andoften a multitude of domestic leagues. In England alone, there are over 100professional teams. The vast popularity of the sport has led to demand forpredictive information regarding the outcomes of matches, competitions andleagues, often driven by the desire to gamble on them. In recent years, a vastnumber of betting markets have opened up, providing opportunities for thosewith useful predictive information and/or insight to make a profit. Bettingstrategies are usually underpinned with predictive models that attempt topredict the probability of different outcomes, thus informing which bets totake.Whilst early attempts at building predictive models focused on the num-ber of goals scored by each team in previous matches, in recent years it hasbecome clear that there is predictive value in other match events such as thenumber and nature of shots and corners taken by each team (Wheatcroft(2019, 2020)). The key insight is that the number of goals scored by eachteam is subject to a higher level of chance than events such as shots andcorners, which can be more reflective of the quality of the performances ofthe teams. Take, for example, a match in which the home team takes a largenumber of shots but is unable to score, whilst the away team takes few shotsand happens to score from one of them and win the match. A predictivemodel that only takes goals into account would not reflect the fact that thehome team dominated the match and may wrongly downgrade the forecastprobability that they win future matches.In this paper, we consider a large number of football matches in whichmatch data such as the number of shots, shots on target and corners isprovided. In order to build a set of match forecasts, we are then interestedin (i) the number of shots taken by each team and, (ii) the probability thatany shot results in a goal. Given these two ingredients, we can then predictthe number of goals scored by each team in a match. For (i), we make use ofa recently developed methodology which uses a rating system to predict thenumber of shots taken by each team. For (ii), we propose a simple model to2redict the probability of scoring from a shot. The latter is tested on overone million shots from European football matches in 22 different leagues and,when calibrated, is shown to be capable of producing skillful probabilisticforecasts. Forecasts of the number of shots are then combined with forecastsfor the probability of shot success to construct forecasts of both the matchoutcome and whether the total number of goals in a match will exceed 2.5.The focus of this paper is on assessing the probability of a team scoringconditioned on them taking a shot at goal. In fact, the question of howto assess the probability of scoring from a shot is one that has received alot of attention in the football forecasting literature. However, the focushas almost exclusively been on factors such as the location and nature of theshot, position of players etc. Here, we do not attempt to take this informationinto account and rather estimate the probability of shot success on past data,focusing on the strengths of the teams. This is not simply a limitation of ourmethodology but a property of the question we are trying to address. Welook to estimate the probability of shot success before the match has startedand therefore we cannot condition on the specific nature of each shot. Whilstwe can attempt to predict the number of shots taken by each team, it is notrealistic to be able to predict the nature of those shots. The output of ourmodel is therefore a fixed forecast probability of shot success for each teamin a match.Typically, the nature of football prediction models is that each team in-volved in a league or cup competition is given a ‘rating’. These rating systemsoften take one of two different approaches. In the first, each team’s ratingis a variable which is updated as new information emerges. The nature ofthose updates are governed by a small number of parameters which deter-mine aspects such as the effect of the result of the last match or the marginof victory/defeat. We refer to these as
Variable Rating Systems . The othercategory assigns each team one or more parameters which determine theirstrength and these are usually estimated using maximum likelihood (Leyet al. (2019)). In that case, a large number of parameters are required tobe estimated simultaneously and fairly sophisticated optimisation algorithmsare often needed. We deviate from the terminology used by Ley et al. (2019),who refer to such models as ‘Maximum Likelihood models’ and, instead, usethe more general term
Parametric Rating Systems . In this paper, we makeuse of both approaches. Our shot probability model (the novel model in thisresearch) is a Parametric Rating System which assigns attacking and defen-sive ratings to each team and these are estimated using maximum likelihood3stimation. In addition, we make use of a Variable rating system in theform of Generalised Attacking Performance (GAP) ratings which estimatethe number of shots achieved by each team (Wheatcroft (2020)).There is a large body of literature proposing approaches to building rat-ings systems for sports teams or players. By far the most well known ap-proach is the Elo rating system which has a long history in sport and hasinspired many other systems. Elo ratings were initially designed with theintention of providing rankings for chess players and the system was imple-mented by the United States Chess Federation in 1960 (Elo (1978)). TheElo system assigns ratings to each player or team, which are then used toestimate probabilities of the outcome of a game. The rating of each player isthen updated to take the result of the game into account. Whilst the systemwas initially designed for cases in which the outcomes are binary (i.e. thereare no ties), more recently, it has been extended to account for draws so thatthey are applicable to sports such as football, in which draws are common.After each match, the system takes the difference between the estimatedprobabilities and the outcome (assigned a one, a zero, or 0.5 for a draw) andadjusts the ratings accordingly. The system in its original form therefore doesnot account for the size of a win. Elo ratings have been demonstrated in thecontext of football and shown to perform favourably with respect to six otherrating systems (Hvattum & Arntzen (2010)). FIFA switched to an Elo rat-ing system in 2018 to produce its international football world rankings (Fifa(2018)). Elo ratings are also common in other sports such as Rugby League(Carbone et al. (2016)), American Football (FiveThirtyEight (2020 b )) andBasketball (FiveThirtyEight (2020 a )).Whilst Elo ratings have been an important part of sports prediction formany years, they are limited in that they do not directly take home advantageinto account. This is important because home advantage has a very big effectin football (Pollard (2008)). Adjustments have been made to the Elo ratingsystem to account for this but this typically consists of a single parameterthat doesn’t account for variation in the home advantage of different teams(FiveThirtyEight (2020 a , b )). Rating systems such as the GAP rating systemused in this paper distinguish between home and away performances by givingseparate ratings for each. This is also true of the pi-rating system introducedby Constantinou & Fenton (2013), for example.Variable Ratings Systems such as the GAP rating system assign ratings toeach team which are updated each time they are involved in a match. Similarapproaches have been taken by a large number of authors. For example,4aher (1982) assigned fixed ratings (i.e. not time varying) to each teamand used them in combination with a Poisson model in order to estimatethe number of goals scored. A similar approach was used by Dixon & Coles(1997) to estimate match probabilities. It was shown that the forecasts wereable to make a statistically significant profit for matches in which there wasa large discrepancy between the estimated probabilities and the probabilitiesimplied by the odds. The Dixon and Coles model was modified by Dixon &Pope (2004) who were able to demonstrate a profit using a wider range ofpublished bookmaker odds. A Bayesian model which produced time-varyingattacking and defensive ratings was defined by Rue & Salvesen (2000). Thereare many other examples of systems that use attacking and defensive ratingsand these can be found in, for example, Karlis & Ntzoufras (2003), Lee (1997)and Baker & McHale (2015).A number of authors have taken a Parametric Rating System approachto modelling football matches. An overview can be found in Ley et al. (2019)in which a Bivariate Poisson model is shown to produce the most favourableresults according to the Ranked Probability Score (RPS). A profitable bet-ting strategy has also been demonstrated by Koopman & Lit (2015) using aBivariate Poisson model. The approach taken by Ley et al. (2019), in whichless recent matches are weighted lower than more recent matches, providesinspiration for our shot success model.Related to the prediction of shot success is the concept of ‘expected goals’which has been growing significantly in prominence in football analysis inrecent years. The rationale is that the nature of a team’s attempts at goalcan be used to estimate the number of goals they would be ‘expected to score’in a match. For a particular shot, the ‘expected’ number of goals is simplythe estimated probability of scoring given characteristics such as the location,angle to goal, position of defenders etc. As a result, a great deal of efforthas been made to model the probability of scoring based on information ofthis kind. For example, Ruiz et al. (2015) attempt to evaluate the efficacyof football teams in terms of converting shots into goals by taking accountof characteristics such as the location and type of shot (e.g. whether theshot was taken from open play). Gelade (2014) built a model to evaluatethe performance of goalkeepers by taking the factors such as the location,deflections and swerve of the ball into account. Many other papers have beenwritten on the subject and a good overview can be found in Eggels (2016)and Rathke (2017) who also present their own models.The main aim of this paper is to define and demonstrate a model for5he probability of a team scoring from a shot in a football match. To ourknowledge, whilst significant effort has been made to estimate probabilitiesof scoring given the specific nature of a shot (such as location), none of theseapproaches attempt to provide predictions of shot success before the matchand cannot be used for this purpose. In short, the aim of those models isto predict the probability of scoring from a particular shot given variouscharacteristics, whilst the purpose of our model is to predict the probabilityof scoring given the strengths of the teams involved and the location of thematch (i.e. which team is at home). The latter can easily be combined withpredictions of the number of shots achieved to predict the overall number ofgoals for each team.This paper is organised as follows. In section 2, we describe the data setused to demonstrate our model. In section 3, we describe our model of shotsuccess and assess its performance in terms of forecast skill and reliabilityin 22 different football leagues. In section 4, we demonstrate the use of ourshot success model in combination with the GAP rating system to provideforecasts of match outcomes and whether the total number of goals in amatch will exceed 2.5. Section 5 is used for discussion. In this paper, we make use of the football data repository available at , which supplies match-by-match data for 22 Eu-ropean Leagues. For each match, a variety of statistics are provided includ-ing the number of shots, shots on target and corners. In addition, odds datafrom multiple bookmakers are provided for the match outcome market, theover/under 2.5 goal market and the Asian Handicap match outcome market.For some leagues, match statistics are available from the 2000/2001 seasononwards whilst, in others, these are available for later seasons only. Sincewe require shot data, only matches from the 2000/2001 season onwards areconsidered. A summary of the data used in this paper is shown in table 1.Here, the total number of matches since 2000/2001, the number of matches inwhich shots and corner data are available and the number of these excludinga ‘burn-in’ period for each season are shown. The ‘burn-in’ period is simplythe first six matches of the season for the each team. This is excluded fromforecast evaluation to allow the forecasts time to ‘learn’ sufficiently about thestrengths and weaknesses of the teams in a given season. All leagues include6ata up to and including the end of the 2018/19 season.League No. matches Match data available Excluding burn-inBelgian Jupiler League 5090 480 384English Premier League 9120 7220 5759English Championship 13248 10484 8641English League One 13223 10460 8608English League Two 13223 10459 8613English National League 7040 5352 4642French Ligue 1 8718 4907 4126French Ligue 2 7220 760 639German Bundesliga 7316 5480 3502German 2.Bundesliga 5670 1057 753Greek Super League 6470 477 381Italian Serie A 8424 5275 4439Italian Serie B 8502 803 680Netherlands Eredivisie 5814 612 504Portuguese Primeira Liga 5286 612 504Scottish Premier League 5208 4305 3427Scottish Championship 3334 524 297Scottish League One 3335 527 298Scottish League Two 3328 525 297Spanish Primera Liga 8330 5290 4449Spanish Segunda Division 8757 903 771Turkish Super lig 5779 612 504Total 162435 77124 62218Table 1: Data used in this paper.
We propose a simple model for predicting the probability of a football teamscoring from a shot at goal. We are primarily interested in estimating theprobability pre-match and therefore we do not take into account any specificinformation about the location or nature of a shot. In short, in a match7etween two teams, we ask the question ‘If a particular team takes a shot,what is the probability that they score as a result?’Consider a football league with T teams that play each other over thecourse of a season. Let a , ..., a T and d , ..., d T be attacking and defensiveratings respectively for each team. In a match with the i -th team at hometo the j -th team, the forecast probability of a home goal given a home shotis given by p ( G h ) = 11 + exp {− m h } (1)where m h = c + h + ( a i + d j ). Here, c is a constant parameter and h aparameter that allows for home advantage (if any).The forecast probability of an away goal given an away shot is given by p ( G a ) = 11 + exp {− m a } (2)where m a = c − h + ( a j + d i ).Here, we have a total of 2 T + 2 parameters to be estimated. We takea maximum likelihood approach with a slight adjustment such that morerecent matches are given a higher weight than those that were played longerago. To do this, we make use of the ‘half life’ approach taken by Ley et al.(2019) in which the weighting placed on the m -th match is determined by w time , m ( x m ) = (cid:18) (cid:19) xmH , (3)where x m is the number of days since the m -th match was played and H is the ‘half life’, that is the number of days until the weighting halves.The likelihood function, adjusted with the half life parameter, is given by L = M (cid:89) m =1 φ ( p m , O m ) w time , m ( x m ) (4)where φ ( a , b ) = (cid:40) a if b = 1,1 − a if b = 0. (5)The model requires the simultaneous optimisation of 2 T + 2 parameters.In the experiments performed in this paper, we use the ‘fmincon’ function8n Matlab and select the ‘interior point’ algorithm which provides a com-promise between speed and accuracy. We set the constraints (cid:80) Ti =1 a i = 0and (cid:80) Ti =1 d i = 0 so that all of the ratings are distributed around zero. Allparameters are initialised to zero in the optimisation algorithm. If our forecast model of shot success described in section 3 is to be useful, itis important to show that the forecasts it produces are informative in termsof predicting the probability of scoring from a shot at goal. In this section,we evaluate the performance of the forecasts and examine the effect of thehalf life parameter.To evaluate whether the forecasts are informative at all, we can investigatewhether they outperform a very simple system in which forecasts consist ofthe historical shot success frequency over all past matches. If our forecastsare able to outperform this simple system, we have shown there is value intaking into account the strengths of the teams involved.In weather forecasting, the simple forecasting system described above isoften called the ‘climatology’ and we adopt this terminology. The climatologyis commonly used as a benchmark for the skill of a set of forecasts and if theforecasts cannot outperform the climatology, the forecast system is of littlevalue (Katz & Murphy (2005)). Formally, in our case, the climatologicalprobability p ( G ) of scoring given a shot at goal takes the form p c = (cid:80) Mm =1 G m (cid:80) m S m (6)where G m and S m are the total number of goals and shots respectively in the m -th match and M is the number of past matches considered.Probabilistic forecasts are best evaluated using scoring rules. The Igno-rance and Brier scores, described in appendix B, are two examples of scoringrules that are suitable for evaluating binary probabilistic forecasts and weconsider the skill according to both. For context, in each case, the score isgiven with that of the climatology subtracted such that, if the relative scoreis negative, the forecasts can be considered to be skillful.The mean Ignorance and Brier scores of the forecasts relative to the cli-matology are shown as a function of the half life parameter in figure 1. Here,the forecast skill under both scoring rules is positive for all values of the half9ife parameter implying that the forecasts do not outperform the climatology,on average.Figure 1: Mean Ignorance (blue line, left axis) and Brier (red line, right axis)scores for forecasts of the probability of scoring from a shot at goal, givenrelative to the climatology as a function of the half life parameter.To investigate why the forecasts are unable to outperform the climatology,we can make use of reliability diagrams to attempt to diagnose whether thereare any systematic biases. Reliability diagrams are used to visualise the‘reliability’ of a set of forecasts, that is whether the observed frequencies areconsistent with the forecast probabilities (Br¨ocker & Smith (2007 a )). Theforecasts are divided into ‘bins’ and the mean forecast probability within eachbin is plotted against the relative frequency of the outcomes. If the pointsare close to the diagonal, the forecasts are ‘reliable’. We make use of theapproach taken by Br¨ocker & Smith (2007 a ) in which ‘consistency bars’ areadded which provide a 95 percent interval for the relative frequency under10he assumption that the forecasts are perfectly reliable (that is, the outcomesoccur at the rate implied by the forecasts).Reliability diagrams for different values of the half life parameter H areshown in figure 2. Here, in all cases, it is clear that the forecasts are overdis-persed. The highest forecast probabilities tend to correspond to far lowerrelative frequencies than would be expected if they were reliable, whilst thelowest forecast probabilities tend to correspond to much higher relative fre-quencies than expected. To understand why we see the above pattern, it isuseful to recall how the forecasts are formed. The model assigns attackingand defensive parameters to each team as well as constant and home advan-tage parameters. This means that a large number of parameters are requiredto be optimised simultaneously and this risks overfitting, in which the modeldoes not generalise well out of sample. For example, suppose a team happensto score with a large proportion of its shots in recent matches. This will bereflected in their rating but may be unsustainable in the longer term, leadingto an overestimate of the probability of scoring from a shot. Conversely, ateam that happens to have scored from a low proportion of its shots mayhave its probability of scoring in future matches underestimated.To attempt to deal with overfitting, we adjust the forecasts using twodifferent approaches. In the first, we attempt to calibrate the forecasts usingPlatt Scaling, a simple approach in which the original forecast is used as aninput to a logistic regression with a ‘calibrated’ forecast as the output (Plattet al. (1999)). The adjusted forecast ˜ p is therefore given by˜ p = 11 + exp ( A + bp ) , (7)where p is the original forecast and A and b are parameters to be optimisedover past forecasts and outcomes. We use Maximum Likelihood to optimisethe parameters over all available past forecasts.Our second approach is ‘Blending’ (Br¨ocker & Smith (2008)). Underthis approach, the adjusted forecasts are a weighted average of the originalforecast and the climatology (that is the historical average, see equation 6).Formally, the blended forecast is given by˜ p = αp + (1 − α ) p c (8)where p is the original forecast, p c is the climatology and α is a parameterto be estimated. Parameter estimation is done by minimising the mean11igure 2: Reliability diagrams for forecasts of shot success for different valuesof the half life parameter. The consistency bars show the region in whichthere is a 95 percent probability of the relative frequencies falling if theforecasts are perfectly reliable.ignorance score over all past forecasts (note this is equivalent to the MaximumLikelihood approach used in Platt Scaling).The mean Ignorance and Brier scores (both shown relative to that of theclimatology) of the Platt scaled and blended forecasts are shown in figure 3(note the change in scale on the y axis from figure 1). Here, unlike the orig-inal forecasts, both the Platt scaled and blended forecasts produce negativemean Ignorance and Brier scores and are therefore able to outperform theclimatology, demonstrating forecast skill.It is clear that the choice of the half life parameter is crucial in determiningthe skill of the forecasts. If it is too high, matches that were played a longtime ago and have low relevance to the current time are given too muchweight. If it is too low, recent matches are given too little weight and theratings assigned to each team are not robust. Here, under both scores andboth approaches, the optimal half life parameter (out of those considered)is 60 days indicating that relatively recent matches play the biggest role in12etermining the probability of scoring. It is also clear that the blendingapproach consistently outperforms Platt Scaling. Reliability diagrams forthe forecasts produced under Blending and Platt Scaling with a half lifeparameter of 60 days are shown in figure 4. Under both approaches, it isclear that the effect is to moderate the forecasts by moving them closer tothe climatology, creating improved reliability and skill.Figure 3: Mean Ignorance (blue line, left axis) and Brier (red line, right axis)scores for Blended (solid lines) and Platt Scaled (dashed lines) forecasts ofthe probability of scoring from any shot, given relative to the climatology, asa function of the half life parameter.In summary, the results here show that, when combined with Platt Scalingor Blending, our model is able to make skillful predictions of the probabilityof scoring from a given shot. Having shown that we are able to constructskillful shot success forecasts, we now investigate whether they are effectivein improving the skill of forecasts of match outcomes and whether the totalnumber of goals in a match will exceed 2.5.13igure 4: Reliability diagrams for forecasts of shot success adjusted usingblending (left) and Platt Scaling (right) with a half life parameter of 60days. The consistency bars show the region in which there is a 95 percentprobability of the relative frequencies falling if the forecasts are perfectlyreliable. In this section, we investigate whether our shot success model can be usedalongside predictions of the number of shots to make informative probabilisticforecasts for (i) the outcomes of football matches (i.e. whether the matchwill end as a home win, draw or away win) and (ii) whether the total numberof goals will exceed 2.5 (henceforth ‘over/under 2.5 goal forecasts’). In eachcase, we assess both the forecast skill and the profitability when using theresulting forecasts alongside the two betting strategies defined in appendix C.Given a point prediction of the number of shots and the forecast probabil-ity of each of those shots being successful, we can obtain a point estimate forthe number of goals scored by each team in a match by simply multiplyingthem together. To predict the number of shots achieved by each team, wemake use of the Generalised Attacking Performance (GAP) rating systemproposed by Wheatcroft (2020) which has been shown to be a useful predic-tor variable for producing over/under 2.5 goal forecasts and forecasts of thematch outcome (Wheatcroft (2019)). The system is described in detail in14ppendix A. Define a point prediction of the number of goals for the hometeam in a match to be E h = ˆ S h P ( G h ) (9)and, for the away team, E a = ˆ S a P ( G a ), (10)where ˆ S h and ˆ S a are the predicted number of shots for the home and awayteams respectively, and P ( G h ) and P ( G a ) are the predicted probabilities thatthe home or away team will score given they have taken a shot at goal. Notethat E h and E a will usually not be integer values and represent a predictionof the ‘expected’ number of goals achieved by each team.For comparison, we can define a point prediction for the number of goalsadjusted with the climatological probability such that C h = ˆ S h p c (11)and C a = ˆ S a p c (12)for the home and away teams respectively where p c is the climatologicalprobability of shot success (i.e. the probability of a team scoring from a shotregardless of ability).We make use of ordered logistic regression to map predictor variables intoforecast probabilities for the match outcome. The ordered logistic regressionmodel is chosen because the outcomes of football matches can be considered‘ordered’. In a sense, a home win and a draw are ‘closer together’ than ahome win and an away win and this is reflected in the parametrisation ofthe model. The ordered regression model allows K predictor variables to bemapped into forecast probabilities. A sensible choice of predictor variable forthe match outcome is the difference in the predicted number of goals scoredby each team defined by V = E h − E a . (13)We use logistic regression to build probabilistic forecasts of whether thetotal number of goals in a match will exceed 2.5. Since this is a binaryevent, logistic regression is a suitable model for mapping predictor variablesto probabilities. Since we are interested in the total number of goals scoredin a match, we use as a predictor variable the sum of the predicted number ofgoals scored by the home and away teams. The predictor variable is therefore V = E h + E a . (14)15or our model of shot success to be effective in terms of predicting thematch outcome and whether the number of goals will exceed 2.5, our predic-tor variables should be more informative than when E h and E a are replacedwith C h and C a , that is the case in which the probability of shot success istaken to be that of the climatology. This comparison is the main focus ofour experiment.In addition to the predictor variables specified above, we consider the useof odds-implied probabilities as additional predictor variables. The rationaleof this is that we may be able to ‘augment’ the substantial information inthe odds with additional information to provide more skillful forecasts. We make use of the data described in section 2 to produce probabilisticforecasts both for the match outcome and for whether the total numberof goals in a match will exceed 2.5. We do this for each match in whichboth shot data and the relevant odds are available. This means we have atotal of 62218 forecasts of the match outcome and 53447 over/under 2.5 goalforecasts. We produce two sets of forecasts in each case. First we include inthe model only our chosen predictor variable based on the predicted numberof goals. Second, we include an odds-implied probability as an additionalvariable. In the match outcome case, this is the odds-implied probability ofa home win and, in the total goals case, the odds-implied probability thatthe total number of goals will exceed 2.5.In all cases, the forecasts for each match are constructed using regressionparameters fitted with least squares estimation on all available matches inall leagues up to the day before the match is played. In order to allowthe forecasts to have sufficiently learned about the quality of the teams, wefollow the approach of Wheatcroft (2020) and allow a ‘burn-in’ period, thusexcluding from calculations of forecast skill and profit the first six matchesof the season for each team.Since we are primarily interested in the potential value added by our shotsuccess model, our comparison of interest is between the forecasts producedusing as predictor variables the predicted number of goals calculated usingour shot success model (that is formed using equations (9) and (10)), andthose produced using the climatological shot success probability defined inequation (6). The latter case includes no information about the strength ofthe teams and therefore the extent to which it is able to be outperformed by16ur shot success model demonstrates its value to the forecasts. We thereforepresent the skill of the forecasts formed using our shot success forecasts ‘rel-ative’ to those formed using the climatological probability of shot success.This is done by subtracting the skill of the latter from the former such thatnegative values imply better relative skill.We also compare the betting performance under the Level Stakes andKelly betting strategies described in section C. To calculate the overall profit,we use the maximum odds available from the BetBrain odds-comparisonwebsite, which are included in the ‘football-data’ data set.
We begin by considering forecasts of the match outcome. The mean relativeIgnorance and Ranked Probability Scores for the case in which the odds-implied probability is not included as an additional predictor variable areshown as a function of the half life parameter H in the top panel of figure 5.As described above, in both cases, the skill is given relative to (i.e is sub-tracted from) that of forecasts formed using the predicted number of goalsadjusted using the climatological probability of shot success. Since both themean relative ignorance and RPS are negative, the shot success forecasts areshown to add skill to the match outcome forecasts for all considered valuesof the half life.The overall profit under the Level Stakes (magenta) and Kelly (green)betting strategies are shown in the lower panel. The dashed line shows theoverall profit for the case in which the predicted number of goals is calculatedusing the climatological probability of the rate of shot success. Interestingly,despite the fact that our shot success model improves forecast skill, the overallprofit is slightly decreased and there is therefore no evidence of improvedgambling performance under either strategy. Both forecast skill and theoverall profit are optimised by setting the half life parameter to 30 days,implying that shot success in relatively recent matches is the most informativein terms of the match outcome.Figure 6 is the same as figure 5 but for the case in which the odds-impliedprobability of a home win is included as an additional predictor variable.Here, both the relative ignorance and RPS are positive, implying that ourmodel of shot success is counterproductive. Similarly, there is a reduction inprofit under both betting strategies. We can provide a speculative answer asto why this is the case. Betting odds are complex and reflect a great deal17f information brought together by participants in the market. We suggestthat differences in the probability of shot success are efficiently reflected inthe odds (punters may account for efficient goal scorers/goalkeepers etc.)and therefore, by including this information, there is an element of doublecounting which negatively impacts the forecasts. It is worth noting thatfinding information that can ‘augment’ the information in the betting oddsis a much more difficult task than finding information to produce forecastsfrom scratch. We discuss this further in section 4.4. We now turn to the over/under 2.5 goal forecasts. The results for the case inwhich the odds-implied probability is not included as an additional predictorvariable are shown in figure 7. Similarly to the forecasts of the match out-come, here, the top panel shows the mean Ignorance and Brier scores givenrelative to the case in which the forecasts are formed using the predictednumber of goals produced using the climatological probability of the rate ofshot success. Since both relative scores are negative, our shot success modelis able to increase the skill of the forecasts.The overall profit achieved using the Kelly and Level Stakes bettingstrategies is shown in the lower panel of figure 7. Here, as before, the solidlines show the overall profit for the case in which the predicted number ofgoals are produced using our shot success model and the dashed lines thecase in which the climatological rate of shot success is used. Here, there isa major improvement in the gambling return from using our model of shotsuccess, although the profit is still slightly negative for all values of the halflife parameter. The optimal half life parameter of 90 days is slightly longerthan for the match outcome forecasts but this still suggests that relativelyrecent matches are most relevant.Figure 8 shows the same results as figure 7 but for the case in whichthe odds-implied probability is included as an additional predictor variable.Here, the relative skill under both the Ignorance and Brier scores is negative,implying that our shot success model increases the skill of the over/under2.5 goal forecasts. For most values of the half life parameter, there is also anincrease in profit under both strategies. This is a very different result to thematch outcome case in which we were unable to improve the forecasts usingour model of shot success. Interestingly, the most effective choice of half lifeparameter is 300 days, suggesting that shot success over a longer period of18igure 5: Top panel: Mean ignorance (blue line, left axis) and RPS (redline, left axis) for forecasts of the match outcome as a function of the halflife parameter when the odds-implied probability is not included as an addi-tional predictor variable. Both scores are given relative to that of the casein which the predicted number of goals is calculated using the climatologicalprobability of scoring. Lower panel: Overall profit from the Level Stakes(magenta) and Kelly (green) strategies as a function of half life. The dashedhorizontal lines show the overall profit when the predicted number of goalsis calculated using the climatological probability of scoring.time is relevant here.
The results above demonstrate that our model for predicting shot successcan improve the skill of shot-based forecasts for both match outcomes andfor whether the total number of goals will exceed 2.5. For the case in which19igure 6: Top panel: Mean ignorance (blue line, left axis) and RPS (redline, left axis) for forecasts of the match outcome as a function of the halflife parameter when the odds-implied probability is included as an additionalpredictor variable. Both scores are given relative to that of the case in whichthe predicted number of goals is calculated using the climatological probabil-ity of scoring. Lower panel: Overall profit from the Level Stakes (magenta)and Kelly (green) strategies as a function of half life. The dashed horizontallines show the overall profit when the predicted number of goals is calculatedusing the climatological probability of scoring.the odds-implied probability is not included in the forecasts, gains in forecastskill are demonstrated for both sets of forecasts. For the case in which theodds-implied probability is included, the results are more mixed with animprovement in the skill of over/under 2.5 goal forecasts and a reduction inthe skill of forecasts of the match outcome.It is worth noting the philosophical difference between forecasts formedwith and without the odds-implied probability included as an additional20igure 7: Top panel: Mean ignorance (blue line, left axis) and Brier score (redline, left axis) for the over/under 2.5 goal forecasts as a function of the half lifeparameter when the odds-implied probability is not included as a predictorvariable. Both scores are given relative to that of the case in which thepredicted number of goals is calculated using the climatological probabilityof scoring. Lower panel: Overall profit from the Level Stakes (magenta) andKelly (green) strategies as a function of the half life parameter. The dashedhorizontal lines show the overall profit when the predicted number of goalsis calculated using the climatological probability of shot success.predictor variable. In the latter case, we are building forecasts effectivelyfrom scratch and therefore it should be relatively straightforward to findinformation that adds to the skill. We know that we are able to build skillfulforecasts using predicted match statistics and, logically, if we can incorporateskillful forecasts of the rate of shot success, we should be able to improve theforecasts and this has proven to be the case. In the former case, we have avery different situation. Betting odds are generally considered to be highly21igure 8: Top panel: Mean ignorance (blue line, left axis) and Brier score (redline, left axis) for the over/under 2.5 goal forecasts as a function of the halflife parameter when the odds-implied probability is included as a predictorvariable. Both scores are given relative to that of the case in which thepredicted number of goals is calculated using the climatological probabilityof scoring. Lower panel: Overall profit from the Level Stakes (magenta) andKelly (green) strategies as a function of the half life parameter. The dashedhorizontal lines show the overall profit when the predicted number of goalsis calculated using the climatological probability of shot success.informative reflections of the underlying probability of an outcome (thoughthere are a number of known biases), taking into account a wide range offactors. Finding information that can ‘augment’ this information is thereforea much more difficult task. Further, there is likely to be a complicatedrelationship between our forecasts of the rate of shot success and the extentto which this information is reflected in the odds. The fact that we are ableto improve the over/under 2.5 goal forecasts but not the match outcome22orecasts is testament to this complex relationship.It is less clear whether the general improvement in skill achieved from theforecasts of the rate of shot success leads to increased gambling profit. Thisprobably reflects the complex relationship between forecast probabilities andgambling returns. For both the Level Stakes and Kelly strategies, gamblingsuccess is dependent on finding bets that offer a positive expected return.Success at doing this, however, will not necessarily increase with forecast skill.Consider the Level Stakes case. Here, a bet is taken if the forecast probabilityis higher than the odds-implied probability and the forecast therefore impliesthat there is value. The success of the strategy is dependent on the forecastssuccessfully identifying bets in which there is genuine value. If improvementsin skill are largely seen in forecasts in which the decision as to whether tobet or not is unchanged and reductions in skill are in ‘borderline’ cases, it iseasy to see how the profit may fall with increased average forecast skill. Thisis not a criticism of the approach of using scoring rules to evaluate forecastsbut rather a demonstration of the difference between forecast skill and theutility of using the forecasts for a particular decision process.
In this paper, we have presented a model for predicting the probability ofa football team scoring from a shot at goal. Whilst the model suffers fromoverfitting, we are able to calibrate the forecasts to produce good forecastskill. We have also demonstrated that the model of shot success can be usedalongside predictions of the number of shots achieved by each team to provideimproved skill for both match outcome and over/under 2.5 goal forecasts.Whilst our shot success model has been shown to be able to produceimproved forecast skill, there is also an economic interpretation of the re-sults. The experiments we have conducted were partly inspired by the resultsshown in Wheatcroft (2020) and Wheatcroft (2019) that showed that pre-dicted match statistics, formed using GAP ratings, can provide forecast skillbeyond that reflected in the odds. We have built on this and shown that,for over/under 2.5 goal forecasts, we can provide further improvement usingour forecasts of the rate of shot success. As described in the aforementionedpapers, the fact that predicted match statistics can improve a set of forecastshas implications for the efficiency of the betting markets, implying that themarket does not efficiently account for this information. The results in this23aper build on that and suggest that the over/under 2.5 goal market doesnot adequately account for the probability of scoring from a shot. We do nothave evidence that this is the case for the match outcome market, however.In our opinion, there is potential value in the model beyond those ap-plications demonstrated here. It is, of course, desirable for a team to scorewith a relatively high proportion of shots, since doing so would result inmore goals and better match results. Similarly, it is desirable to concedefrom a relatively small proportion of shots. A manager looking to improvetheir team’s results may be interested both in the quality of their players’shot conversion and the ability of their defence to prevent the oppositionfrom converting their shots. However, simply looking at observed rates ofshot conversion in recent matches would likely not give a robust estimateof their skill in converting shots to goals. Our shot success forecasts are apotentially useful alternative to looking at observed numbers because theyprovide a more robust measure of the skill of each team since the half lifeand blending parameters have been chosen with respect to objective forecastskill. This objectivity allows some of the inevitable biases of the manager tobe removed when assessing the performance of their team.Another interesting question regards the value of combining the modelpresented here with expected goals methodologies. The idea behind expectedgoals is that the location and nature of each shot is used to provide anestimate of the probability of a shot ending with a goal. The sum of theprobabilities assigned to the shots in a match can then be interpreted as ameasure of the number of goals a team would be ‘expected’ to score, giventhe shots it has taken. Importantly, expected goals typically don’t takeinto account the relative abilities of the teams or players. Conversely, it isimportant to note that, under our model, the nature of a shot is not takeninto account. This is potentially important because the location from whichshots are taken have a big impact on the probability of scoring and someteams may be more likely to take shots from locations in which it is difficultto score, reducing their shot conversion rate. In order to determine whethera rate of shot conversion is due to the nature of the shots or poor shootingability, one could compare the probability of scoring from each shot under theexpected goals methodology with the forecast probability of scoring underour model (which, unlike expected goals, takes into account the ability of theteams). If the latter is typically higher than the former, one might concludethat a team’s shooting ability is high.In conclusion, it is becoming increasingly clear that forecasts based on24he number of shots at goal have great value in predicting the outcomes offootball matches. An obvious weakness of this approach is that the abilityof the two teams involved is not taken into account. This paper provides apotential solution to that weakness.
A GAP Rating System
The Generalised Attacking Performance (GAP) rating system was introducedby Wheatcroft (2020) and is a rating system designed to assess the attackingand defensive strength of sports teams with relation to some defined measureof attacking performance. In this paper, we are interested in the number ofshots taken by each team in football. For the chosen measure of attackingperformance (in our case, shots), each team is given a separate attackingand defensive rating for its home and away matches such that it has fourratings in total. An attacking GAP rating is interpreted as an estimate ofthe number of defined attacking plays the team can be expected to achieveagainst an average team in the league. Its defensive rating can be interpretedas an estimate of the number of attacking plays it can be expected to concedeagainst an average team. A team’s ratings are updated each time it plays amatch. The GAP ratings of the i -th team for its k -th match in a league aredenoted as follows: • H ai , k - Home attacking GAP rating of the i -th team in a league after k matches. • H di , k - Home defensive GAP rating of the i -th team in a league after k matches. • A ai , k - Away attacking GAP rating of the i -th team in a league after k matches. • A di , k - Away defensive GAP rating of the i -th team in a league after k matches.For a match involving the i -th team at home to the j -th team, the predictednumbers of shots achieved by the home and away teams respectively are givenby ˆ S h = H ai + A dj S a = A aj + H di i -th teamin the league is at home to the j -th team and in which the i -th team hasplayed k previous matches and the j -th team k . Let S i , k and S j , k be thenumber of defined attacking plays by teams i and j in the match. The GAPratings for the i -th team (the home team) are updated as follows: H ai , k +1 = max( H ai , k + λφ ( S i , k − ( H ai , k + A dj , k ) / A ai , k +1 = max( A ai , k + λ (1 − φ )( S i , k − ( H ai , k + A dj , k ) / H di , k +1 = max( H di , k + λφ ( S j , k − ( A aj , k + H di , k ) / A di , k +1 = max( A di , k + λ (1 − φ )( S j , k − ( A aj , k + H di , k ) / j -th team (the away team) are updated accordingto: A aj , k +1 = max( A aj , k + λφ ( S j , k − ( A aj + H di ) / H aj , k +1 = max( H aj , k + λ (1 − φ )( S j , k − ( A aj + H di ) / A dj , k +1 = max( A dj , k + λφ ( S i , k − ( H ai + A dj ) / H dj , k +1 = max( H dj , k + λ (1 − φ )( S i , k − ( H ai + A dj ) / λ >
0, 0 < φ < < φ < λ is to determine the overall influence of a match on the ratings ofeach team. The parameters φ and φ determine the impact of a home matchon a team’s away ratings and of an away match on a team’s home ratingsrespectively. After a given match, a home team is said to have outperformedexpectations in an attacking sense if its attacking performance is higher thanits predicted performance. In this case, its attacking ratings are increased,whilst its ratings are decreased if it underperforms expectations.GAP ratings are determined by three parameters which, as in Wheatcroft(2020), are optimised using least-squares minimisation, with the aim of min-imising the mean absolute error between the estimated and observed numberof attacking plays. The function to be minimised is therefore f = 1 N N (cid:88) m =1 | S h , m − ˆ S h , m | + | S a , m − ˆ S a , m | (18)where, for the m -th match, S h , m and S a , m are the observed numbers of at-tacking plays for the home and away team respectively and ˆ S h , m and ˆ S a , m B Scoring rules
In this paper, we construct probabilistic forecasts for (i) the probability ofscoring from a shot, (ii) the outcomes of football matches and (iii) whetherthe total number of goals in a match will exceed 2.5. We evaluate the forecastsusing scoring rules. A scoring rule is a function of a probabilistic forecastand corresponding outcome aimed at evaluating forecast performance. Wemake use of three different scoring rules and these are defined below.Let an event have r possible outcomes and let p j be the forecast probabil-ity at position j where the ordering of the positions is preserved and (cid:80) ri p i .Let y ∈ {
1, ..., r } be the outcome and define o , ..., o r such that o j = (cid:40) j = y r (cid:88) i =1 ( p i − o i ) . (20)The Ranked Probability Score (RPS) is defined (Epstein (1969)) asRPS = r − (cid:88) i =1 i (cid:88) j =1 ( p j − o j ) . (21)The ignorance score (Good (1952), Roulston & Smith (2002)) is defined asIGN = − log ( p y ). (22)There is much debate surrounding choices of scoring rules and this usu-ally centres on whether they have certain desirable properties. It is widelyagreed that scores should be proper which means that, in expectation, noimperfect forecast will outperform a forecast coinciding with the ‘true’ prob-ability distribution (Br¨ocker & Smith (2007 b )). All three of the above scores27re proper and therefore forecasters are incentivised to give a forecast re-flecting their true belief. Note that the RPS is only suitable for ‘ordered’outcomes. We make use of the Ignorance and Brier scores to evaluate thebinary shot forecasts and forecasts of whether the total number of goals in amatch will exceed 2.5. We use the Ignorance and Ranked Probability Scoresto evaluate the match outcome forecasts. For a discussion on the relativemerits of the three scoring rules in terms of evaluating forecasts of footballmatches, see Wheatcroft (2019). C Betting Strategies
In section 4, we assess the performance of match forecasts in terms of bettingperformance. To do this, we make use of two betting strategies: a simple levelstakes value betting strategy and a strategy based on the Kelly Criterion.Both strategies are described below and follow the terminology described inWheatcroft (2020).In this paper, we use decimal betting odds in which the odds offered onan event is simply the number by which the gambler’s stake is multiplied inthe event of success. Therefore, if the decimal odds are 2, a £
10 bet on saidevent would result in a return of 2 × £
10 = £
20. Let o i be the odds offeredon the i -th potential outcome. The odds-implied probability r i = o i is simplythe multiplicative inverse of the odds.The Level stakes strategy is a simple value betting strategy in which aunit bet is placed on the i -th possible outcome of an event if ˆ p i > r i , whereˆ p i and r i are the predicted and odds-implied probabilities, respectively. Therationale here is that, if the forecast implies that the true probability ishigher than the odds-implied probability, the bet offers ‘value’, that is apositive expected profit.The Kelly strategy is based on the Kelly Criterion (Kelly Jr (1956)) and,like the Level stakes strategy, is based on the concept of ‘value’. However,under this strategy, the stake is dependent on the difference between theforecast probability and the odds-implied probability. When there is a largediscrepancy, a higher stake is made. Under the Kelly Criterion, the amountstaked is proportional to one’s wealth. For a particular outcome, the propor-tion of wealth staked is f i = max (cid:18) o i + ˆ p i − o i − (cid:19) (23)28here ˆ p i is the estimated probability of the outcome and o i represents thedecimal odds on offer. Here, we do not bet proportionally to wealth but,rather, ensure that the average stake is 1 such that both betting strategiesare directly comparable. We therefore set the stake for the i -th bet to s i = kf i where k is a normalising constant set such that m (cid:80) mi =1 kf i = 1, f i iscalculated from equation 23 and m is the total number of bets placed.29 eferences Baker, R. D. & McHale, I. G. (2015), ‘Time varying ratings in associationfootball: the all-time greatest team is..’,
Journal of the Royal StatisticalSociety: Series A (Statistics in Society) (2), 481–492.Brier, G. W. (1950), ‘Verification of forecasts expressed in terms of probabil-ity’,
Monthly weather review (1), 1–3.Br¨ocker, J. & Smith, L. A. (2007 a ), ‘Increasing the reliability of reliabilitydiagrams’, Weather and forecasting (3), 651–661.Br¨ocker, J. & Smith, L. A. (2007 b ), ‘Scoring probabilistic forecasts: Theimportance of being proper’, Weather and Forecasting (2), 382–388.Br¨ocker, J. & Smith, L. A. (2008), ‘From ensemble forecasts to predictivedistribution functions’, Tellus A: Dynamic Meteorology and Oceanography (4), 663–678.Carbone, J., Corke, T. & Moisiadis, F. (2016), ‘The Rugby League Predic-tion Model: Using an Elo-based approach to predict the outcome of Na-tional Rugby League (nrl) matches’, International Educational ScientificResearch Journal (5), 26–30.Constantinou, A. C. & Fenton, N. E. (2013), ‘Determining the level of abilityof football teams by dynamic ratings based on the relative discrepanciesin scores between adversaries’, Journal of Quantitative Analysis in Sports (1), 37–50.Dixon, M. J. & Coles, S. G. (1997), ‘Modelling association football scoresand inefficiencies in the football betting market’, Journal of the RoyalStatistical Society: Series C (Applied Statistics) (2), 265–280.Dixon, M. J. & Pope, P. F. (2004), ‘The value of statistical forecasts in the UKassociation football betting market’, International journal of forecasting (4), 697–711.Eggels, H. (2016), Expected goals in soccer: Explaining match results usingpredictive analytics, in ‘The Machine Learning and Data Mining for SportsAnalytics workshop’, p. 16. 30lo, A. E. (1978), The rating of chessplayers, past and present , Arco Pub.Epstein, E. S. (1969), ‘A scoring system for probability forecasts of rankedcategories’,
Journal of Applied Meteorology (6), 985–987.Fifa (2018), ‘Revision of the FIFA / Coca-Cola WorldRanking’, https://resources.fifa.com/image/upload/fifa-world-ranking-technical-explanation-revision.pdf?cloudid=edbm045h0udbwkqew35a . Accessed: 27/04/2019.FiveThirtyEight (2020 a ), ‘NBA Elo ratings’, https://fivethirtyeight.com/tag/nba-elo-ratings/ . Accessed: 16/01/2020.FiveThirtyEight (2020 b ), ‘The complete history of the NFL’, https://projects.fivethirtyeight.com/complete-history-of-the-nfl/ .Accessed: 16/01/2020.Gelade, G. (2014), ‘Evaluating the ability of goalkeepers in English PremierLeague football’, Journal of quantitative analysis in sports (2), 279–286.Good, I. J. (1952), ‘Rational decisions’, Journal of the Royal Statistical So-ciety. Series B (Methodological) (1), 107–114.Hvattum, L. M. & Arntzen, H. (2010), ‘Using ELO ratings for match re-sult prediction in association football’, International Journal of forecasting (3), 460–470.Karlis, D. & Ntzoufras, I. (2003), ‘Analysis of sports data by using BivariatePoisson models’, Journal of the Royal Statistical Society: Series D (TheStatistician) (3), 381–393.Katz, R. W. & Murphy, A. H. (2005), Economic value of weather and climateforecasts , Cambridge University Press.Kelly Jr, J. (1956), ‘A new interpretation of the information rate’,
Bell Sys-tem Technical Journal , 917–926.Koopman, S. J. & Lit, R. (2015), ‘A dynamic bivariate Poisson model foranalysing and forecasting match results in the English Premier League’, Journal of the Royal Statistical Society. Series A (Statistics in Society) pp. 167–186. 31ee, A. J. (1997), ‘Modeling scores in the Premier League: is ManchesterUnited really the best?’,
Chance (1), 15–19.Ley, C., Wiele, T. V. d. & Eetvelde, H. V. (2019), ‘Ranking soccer teams onthe basis of their current strength: A comparison of maximum likelihoodapproaches’, Statistical Modelling (1), 55–73.Maher, M. J. (1982), ‘Modelling association football scores’, Statistica Neer-landica (3), 109–118.Platt, J. et al. (1999), ‘Probabilistic outputs for support vector machines andcomparisons to regularized likelihood methods’, Advances in large marginclassifiers (3), 61–74.Pollard, R. (2008), ‘Home advantage in football: A current review of anunsolved puzzle’, The open sports sciences journal (1).Rathke, A. (2017), ‘An examination of expected goals and shot efficiency insoccer’.Roulston, M. S. & Smith, L. A. (2002), ‘Evaluating probabilistic forecastsusing information theory’, Monthly Weather Review (6), 1653–1660.Rue, H. & Salvesen, O. (2000), ‘Prediction and retrospective analysis ofsoccer matches in a league’,