[PDF] A mathematical take on the competitive balance of a football league

Abstract

Competitive balance in a football league is extremely important from the perspective of economic growth of the industry. Many researchers have earlier proposed different measures of competitive balance, which are primarily adapted from the standard economic theory. However, these measures fail to capture the finer nuances of the game. In this work, we discuss a new framework which is more suitable for a football league. First, we present a mathematical proof of an ideal situation where a football league becomes perfectly balanced. Next, a goal based index for competitive balance is developed. We present relevant theoretical results and show how the proposed index can be used to formally test for the presence of imbalance. The methods are implemented on the data from top five European leagues, and it shows that the new approach can better explain the changes in the seasonal competitive balance of the leagues. Further, using appropriate panel data models, we show that the proposed index is more suitable to analyze the variability in total revenues of the football leagues.

Full PDF

aa r X i v : . [ s t a t . A P ] F e b A mathematical take on the competitive balance of afootball league

Soudeep Deb ∗ Indian Institute of Management, BangaloreBannerghatta Main Rd, BilekahalliBengaluru, Karnataka 560076, India.February 19, 2021

Abstract

Competitive balance in a football league is extremely important from the perspective ofeconomic growth of the industry. Many researchers have earlier proposed diﬀerent measuresof competitive balance, which are primarily adapted from the standard economic theory.However, these measures fail to capture the ﬁner nuances of the game. In this work, wediscuss a new framework which is more suitable for a football league. First, we presenta mathematical proof of an ideal situation where a football league becomes perfectly bal-anced. Next, a goal based index for competitive balance is developed. We present relevanttheoretical results and show how the proposed index can be used to formally test for thepresence of imbalance. The methods are implemented on the data from top ﬁve Europeanleagues, and it shows that the new approach can better explain the changes in the seasonalcompetitive balance of the leagues. Further, using appropriate panel data models, we showthat the proposed index is more suitable to analyze the variability in total revenues of thefootball leagues.

Keywords:

Concentration ratio, Herﬁndahl index, Panel data, Skellam distribution, Soccer. ∗ Corresponding author. Email: [email protected]. ORCiD: 0000-0003-0567-7339 Introduction

Football (or association football, or soccer) is arguably the most popular sport in the world.Especially in Europe, it is not only popular, but also a big industry and is growing every year.In 2018/19 season, the total revenue in Europe’s top ﬁve leagues was approximately 17 billionEuros, which is more than double of what it was a decade ago. Buraimo and Simmons [2009],Dima [2015] and Rohde and Breuer [2017] are some interesting reads on the football marketin Europe. Now, with respect to the growth of the market, a common hypothesis is that thecompetitive balance in a sports league, which primarily relies on the strength and talent of theteams, is a crucial factor to aﬀect public interest and thereby the ﬁnancial health of the industryof that sports. Rottenberg [1956] is a seminal work in this context. This study brought forththe Uncertainty of Outcome Hypothesis (UOH) in the sports economics literature. One canargue that if there is a high degree of uncertainty about the outcomes of a match, it attractsmore viewers and thereby more revenue for the teams as well as the league. There have beena few interesting studies over the last couple of decades where the authors have analyzed theeﬀect of competitive balance on attendance, broadcast rights or revenue sharing of the footballindustry in Europe. K´esenne [2000], Szymanski [2001], Peeters [2011] and Plumley et al. [2018]are some notable examples.Naturally, it is of primary importance to have a good measure of competitive balance.Szymanski [2003] distinguished among three diﬀerent ways of measuring competitive balance:match-level, championship-level and season-level. A season-level measure is the most appropri-ate in capturing the competitiveness of a league as a whole. One of the earliest and standardtechniques in this regard is to use the ratio of standard deviation of actual win percentagesin the league to the same in an ideal league where each match can end in equally likely out-comes. While this measure and its variants have been used extensively in many studies (seeQuirk and Fort [1997] and Humphreys [2002] for example), Michie and Oughton [2004] arguedin detail why it is not adequate in the context of football. One of the main limitations of thismeasure is that it was primarily developed for major sports in the United States of America(such as baseball, basketball etc) where draws are almost non-existent, but that is not the casefor football. For instance, in the 2018-19 season, approximately 23.9%, 18.7%, 29%, 29% and28.4% matches were drawn respectively in Bundesliga, English Premier League (EPL), La Liga,Ligue 1 and Serie A. Another signiﬁcant drawback of the above measure is that it fails to cap-ture the presence of dominance in the league. In order to circumvent these issues, two othermeasures have been developed from the standard economic theory. First one is inspired fromthe concept of concentration ratio (see Hall and Tideman [1967]) which quantiﬁes the fractionof industry size held by the top L ﬁrms. In case of a football league, one can consider the top L clubs and compute the total proportion of their point share to deﬁne the concentration ratio.Note that the ﬁrst four to six teams (depending on the league) of every league qualify for theEuropean competitions. In view of that, we use L = 6 and deﬁne the six club concentrationindex or C6 index as follows. Deﬁnition 1 (C6 index) . For a league with n teams, let S i be the total points gathered by the i th team at the end of the season. The point share of the i th team is deﬁned by P i = S i / P ni =1 S i .Let P ( j ) denote the j th largest value in the set { P , P , . . . , P n } . Then, the C6 index is deﬁnedas C6 = n X j =1 P ( j ) . (1.1)2he C6 index quantiﬁes the imbalance between the top six clubs in a league and the rest,but it fails to capture the change in imbalance within these two groups. The second measurewe shall discuss is more suitable in that aspect. It is derived from the Herﬁndahl-Hirschmanindex (HHI) and it captures the inequalities between all the clubs in a league. Deﬁnition 2 (Herﬁndahl index of competitive balance) . Consider a league with n teams, andlet P i be the point share of the i th team as deﬁned in Deﬁnition 1. Then, the Herﬁndahl indexof competitive balance (HICB) is deﬁned asHICB = n n X i =1 P i . (1.2)In essence, HICB is a weighted average index for the league where the weights are propor-tional to each club’s point share. Evidently, it reﬂects the extent of competitive balance amongall the teams. As the competitive balance declines, there is a greater inequality among theteams and hence the value of either of the above two measures increases. The reader is referredto Zimbalist [2002], Brandes and Franck [2007] and the references therein for more relevantdiscussions.The above two measures have been widely used and can be considered to be benchmarks formeasuring concentration of ﬁrms within an industry. While the adaptation to football leaguesis natural and understandable, there is an interesting limitation that most authors have failedto address. In football leagues, it is not rare to have more than one team ﬁnishing up withsame number of points, in which case the standing of those teams is usually determined by theoverall goal diﬀerences, followed by other criteria such as goals scored, head-to-head record etc.Naturally, ultimate competitive balance should not only mean equally probable outcomes of amatch, but it should also mean identically distributed goal diﬀerences (and goals scored) forthe teams. Motivated by this, we propose a new measure in this paper which uses the actualscorelines of the games instead of just the outcomes. We also derive attractive theoreticalproperties of the proposed measure for a perfectly balanced league in the sense of the deﬁnitionprovided below. Deﬁnition 3 (Perfectly balanced league) . A league is ‘perfectly balanced’ if any permutationof the teams is equally likely to be the ﬁnal standing.The paper is structured as follows. Section 2 lays out the theoretical background, alongwith relevant results and methods to analyze the data. A short simulation study is presentedin Section 3. Next, we analyze a decade long data from the top ﬁve European leagues using theproposed method, and provide a comparative study of the three measures in terms of explainingthe changes in overall revenues. These results are provided in Section 4. Following that, wesummarize the advantages of the proposed approach in Section 5. Some important concludingremarks are also presented in that section.

In the discussions below, n denotes the number of teams in the league. All of them playagainst each other in home and away system, implying that the total number of matches is N = n ( n − Z ij (correspondingly, Z ij ) is the outcome of the match between the i th andthe j th team, played in the home ground of the i th team (correspondingly, the j th team).3or instance, Z ij = 1 , , − i thteam against the j th team. Further, let S ij (and S ij ) denote the points gathered by the i thteam against the j th team at home (and at away); and let X ij and X ij be the correspondingnumber of goals scored. We shall use S i , GS i , GC i and GD i to denote the overall points, totalgoals scored, total goals conceded and the overall goal diﬀerence of the i th team. Thus, S i = n X j =1 j = i X k =0 S ijk , GS i = n X j =1 j = i X k =0 X ijk , GC i = n X j =1 j = i X k =0 X jik , GD i = GS i − GC i . (2.1)Note that, for i = 1 , , . . . , n , the ﬁnal standing of the league is determined ﬁrst by S i , thenby GD i and then by GS i . This is the standard norm for most domestic leagues and we shallconsider this rule throughout the study. Assumption 1. X ijk , for all i, j ∈ { , , . . . , n } , k ∈ { , } , are independent and X ijk follows aPoisson distribution with mean λ ijk .Our main theory revolves around the above assumption. Note that diﬀerent values of λ ijk can be attributed to various factors that aﬀect the number of goals scored by a team in amatch. See Clarke and Norman [1995] and Everson and Goldsmith-Pinkham [2008] for relateddiscussions. In what follows, ﬁrst we present an ideal setup in which a league becomes perfectlybalanced and then we develop an index of competitive balance based on how much a leaguedeviates from that ideal setup. Theorem 1.

If Assumption 1 is true with λ ijk = λ > i, j ∈ { , , . . . , n } , k ∈ { , } ,then there exists a λ such that every match is equally likely to end in a win, draw or loss forany team.Detailed proof of the above theorem is relegated to the Appendix for brevity. It leverages theconcept of the modiﬁed Bessel function of the ﬁrst kind. From the point of view of this paper,Theorem 1 provides an attractive ideal framework where a football league ensures ultimatecompetitive balance in the sense that every outcome of every match is equally likely. Animmediate consequence is the following. Corollary 1. If X ijk , for all i, j ∈ { , , . . . , n } , k ∈ { , } , are iid Poisson( λ ), then a footballleague is perfectly balanced.The proof is trivial in view of the fact that every match in the league is iid multinomialwith the probability of every outcome being 1 / λ ( n − Remark 1.

The value of λ is computed to be approximately 0.88. Figure 1 further shows howthe probability of draw changes with diﬀerent values of λ . Note that, under the assumptions ofTheorem 1, if the probability of draw is d λ , then any team can win or lose with equal probability(1 − d λ ) /

2. From the ﬁgure, it can be observed that as λ →

0, a draw is expected whereas forlarge λ , the chance of having a deﬁnitive result increases.Interestingly, λ is not the only value that makes a league perfectly balanced. It ensures4 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l p r obab ili t y o f d r a w Figure 1: Probability of draw for diﬀerent values of λ . Ideal situation of λ ≈ .

88 is displayedwith a dotted line.that every outcome of a match is equally likely, a suﬃcient condition for a perfectly balancedleague. However, this is not a necessary condition. In fact, the following is true.

Theorem 2.

If every win, draw and loss are awarded 3, 1 and 0 points respectively, and ifAssumption 1 is true with λ ijk = λ > i, j ∈ { , , . . . , n } , k ∈ { , } , then a footballleague is perfectly balanced.The above-mentioned point structure has been the practice in football leagues over the lastfew decades. A detailed proof is presented in the Appendix. We use properties of a Skellamdistribution and some combinatorial arguments in the proof. We also point out that even if thepoint structure is diﬀerent, similar idea can be used to get similar results.Note that under the assumption that all X ijk are iid Poisson( λ ), the maximum likelihoodestimate (MLE) of λ is given by ˆ λ = 12 N n X i =1 n X j =1 i = j X k =0 X ijk , (2.2)which is essentially the average number of goals scored in the league. Based on ˆ λ , one can alsoestimate the theoretical probability of draws under Assumption 1. Hereafter, it is denoted by d ˆ λ . In Section 4, the values of ˆ λ and d ˆ λ for every league and every season in this study are5resented.We next move on to developing a new measure of competitive balance. Recall that the meanequals the variance for a Poisson distribution. Under the general structure of Assumption 1,the X ijk ’s are not identically distributed and are dependent on individual team’s skill, oppo-nent, form or the ground. Therefore, the observed variance of the number of goals should beconsiderably more than the mean. In that light, if we assume that the X ijk ’s are independentlydistributed Poisson random variables, it is the deviations of X ijk ’s from ˆ λ which would speakabout how much a league deviates from the hypothetical situation of being perfectly balanced.That motivates us to develop the following. Deﬁnition 4 (Goal-based index) . Let X ij (and X ij ) denote the number of goals scored bythe i th team against the j th team at home (and at away). ˆ λ is the overall mean number of goalsin the entire league. Then, the following goal-based index (GBI) is an appropriate measure ofcompetitive balance of the league.GBI = 1(2 N − λ n X i =1 n X j =1 i = j X k =0 ( X ijk − ˆ λ ) . (2.3) Theorem 3.

Assume that X ijk , for all i, j ∈ { , , . . . , n } , k ∈ { , } , are independently dis-tributed Poisson random variables. Then, for a league perfectly balanced in the sense of Theo-rem 2, (2 N − N − N − GBI isgreater than the critical value χ N − α , the upper α -quantile of the corresponding chi-squareddistribution.From a more applied point of view, it is worth checking whether GBI has better use overC6 or HICB in explaining the variation of revenue in a league. To that end, we collect the dataon season-wide total commercial revenue of the ﬁve leagues (source: Deloitte [2020]). Let R lt be the revenue (in billion Euros) of the l th league in the t th season. Let CB lt be a measureof competitive balance for the same (we shall apply the following method for all of the abovethree measures). Our objective is to understand if R lt is signiﬁcantly dependent on CB lt . Theframework here is suitable for a panel data analysis, and hence, we use the following model. R lt = α l + βt + γ CB lt + ǫ lt , ≤ l ≤ , ≤ t ≤ T, (2.4)where ǫ lt are iid errors, α l is the eﬀect of the l th league, β captures the eﬀect of a linear trendand γ is the main parameter of interest as it describes how the measure of competitive balanceaﬀects the revenue. For α l , we consider both ﬁxed-eﬀect and random-eﬀect structures andcompare their results. In case of random-eﬀect models, α l is taken as α + α ′ l where α is theoverall mean, and α ′ l is the zero-mean random eﬀect for the l th league. After ﬁtting both typesof models, we perform Hausman test (Hausman [1978]) to ﬁnd out which one works better.All of the computations in this paper are done in RStudio version 1.2.5033, coupled with Rversion 3.6.2. The panel data analysis is carried out using the ‘plm’ package by Millo [2017].6 Simulation study

We illustrate the theory from Theorem 1 and Theorem 2 by simulating toy examples. Fordiﬀerent values of λ , large number of leagues are generated under Assumption 1. For eachleague, we calculate the proportion of draws and plot them against the values of λ . Next, welook at the ﬁnal standing of the league table and calculate the proportion of times each possiblepermutation is happening. Under the assumptions we have, Theorem 2 suggests that the leagueis perfectly balanced and hence every permutation of the teams is equally likely to happen.That indicates a multinomial distribution with equal cell probabilities. On that note, using theobserved frequencies of every permutation from the simulated data, we conduct multinomialgoodness of ﬁt test and ﬁnd out the corresponding p -values. Note that the number of possiblepermutations for a league with n teams is n ! which is quite large even for small n . So, inorder to conduct an appropriate goodness of ﬁt test, one needs to replicate the experimentsmillions of times. For computational ease and since the objective is to illustrate the resultswhich are already proved, we limit ourselves to n = 5 in this case and take 10000 replicationsin all experiments.Figure 2 shows that the proportion of draws decrease steadily as λ increases. The iidassumption of the teams and the games also imply that the proportion of home wins and awaywins are expected to be equal, and that is reﬂected in the left panel of the plot. Right panelof the ﬁgure shows the p -values for the multinomial goodness of ﬁt tests based on the 10000replications. It is evident that in all cases, the tests fail to reject the null hypothesis of everypermutation being equally likely. ob s e r v ed p r opo r t i on away win draw home win 00.250.50.7510.05 0 1 2 3lambda p − v a l ue Figure 2: Results from 10000 simulated leagues under Assumption 1 for diﬀerent λ : (Left)Proportions of the three types of results; (Right) p -value of the multinomial goodness-of-ﬁttests.Next, we wish to evaluate the eﬀectiveness of GBI in testing whether the goal-scoring pat-terns of all the teams in the league are iid. We consider leagues of n = 20 teams in this case.Once again, for diﬀerent values of λ , we assume that all X ijk are iid Poisson( λ ) and all matches7re simulated accordingly. Then, the chi-squared test using GBI, as described in the previoussection, is conducted. We replicate the experiment 10000 times to compute the empirical typeI error of the test. These values are displayed in the left panel of Figure 3. We see that forall values of λ , the type I error remains at around 5%. We also compute the power of the testby simulating leagues where all X ijk are not iid. Here, we assume that X ijk are independentand X ijk ∼ Poisson(1 + λ i ), where λ i is a team-speciﬁc parameter. 20 diﬀerent scenarios areconsidered in this regard. In the k th such scenario, λ i is assumed to be 0 for all but k teams,randomly selected from the 20. Then, the GBI is used to detect deviation from the ideal situa-tion. Similar to before, each experiment is repeated 10000 times and we use them to calculatethe power, which is presented in the right panel of Figure 3. It is evident that the testingprocedure performs very well, recording more than 99% power even when only one team hasdiﬀerent goal-scoring pattern. t y pe I e rr o r po w e r Figure 3: Results on the GBI based test: (Left) Type I error from 10000 simulated leagues underAssumption 1 for diﬀerent λ ; (Right) Power from 10000 simulated leagues with some numberof teams having diﬀerent scoring pattern. As mentioned above, we consider ten seasons’ data from the top ﬁve leagues of Europe - Bun-desliga, EPL, La Liga, Ligue 1 and Serie A. These data are collected from the public repositoryDatahub (link: https://datahub.io/). In view of Theorem 1 and the discussions following that,we begin our analysis by computing ˆ λ (see eq. (2.2)) for every season and every league. Corre-sponding theoretical probabilities of draw under the assumptions of this study and the observedproportions of drawn matches are also computed. These quantities are denoted as d ˆ λ and ˆ D ,respectively. Overall, the hypothesized probabilities are close to the observed proportions ofdraws. There are a few cases, for example 2013-14 and 2018-19 seasons of EPL, 2010-11 seasonof La Liga and Ligue 1, and 2014-15 season of Serie A, when the diﬀerence between d ˆ λ and ˆ D

8s more than or equal to 0.05. These results are shown in Table 1.League 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16 2016-17 2017-18 2018-19Bundesliga ˆ λ d ˆ λ D λ d ˆ λ D λ d ˆ λ D λ d ˆ λ D λ d ˆ λ D λ is the average number of goals, d ˆ λ is the corresponding theoreticalprobability, and ˆ D is the true proportion of draws.Next, the GBI values are computed for every league, across the ten seasons, and are comparedagainst the C6 indexes and the HICB values. These quantities are displayed in Figure 4. Forthe GBI values, we also test whether the league signiﬁcantly deviates from the hypotheticalassumption of perfect balance. In the graph, a black circle indicates that the league was notperfectly balanced in that season whereas a black triangle says the opposite.There are a couple of interesting aspects that arise out of this plot. Bundesliga, EPL andLa Liga, barring a single season for each, have never been perfectly balanced. For Bundesliga,the only season that did not deviate signiﬁcantly from the assumption of perfectly balanced was2017/18. We ﬁnd that it was a season when three teams (Schalke, Hoﬀenheim and BorussiaDortmund) ﬁnished on 55 points each, and the qualiﬁcation for UEFA Champions League hadto be determined on goal diﬀerence. Meanwhile, in the 2010-11 season of EPL, Chelsea andManchester City ﬁnished on second and third with same number of points, and there were asmany as 12 teams within a range of 10 points (39 to 49). Our approach picks that phenomenanicely and determines that this season was balanced. On the other hand, La Liga has usuallyobserved a higher GBI value, thereby pointing to less competitiveness within the league. Theonly exception was the 2018-19 season, when three pairs of teams (Getafe and Sevilla, Espanyoland Athletic Bilbao, Valladolid and Celta Vigo) ﬁnished on equal points while three more teams(Real Sociedad, Real Betis, Alaves) ﬁnished on exactly same points as well.So far as Ligue 1 is concerned, it has not observed competitive balance after the 2013-14season. It might be interesting that this happened shortly after the huge investment on ParisSaint-Germain (PSG) by Qatar Sports Investments, the new majority shareholders of the club.Since 2013-14, PSG has topped the league in all but 2016-17 season whereas in the six Ligue1 seasons before 2013-14, there were six diﬀerent winners. Not only PSG, AS Monaco FC also9 unde s li ga EP LLa L i gaL i gue S e r i e A v a l ue balancednoyesindexC6GBIHICB Figure 4: Three measures of competitive balance for the ﬁve leagues in ten seasons. GBI valuesare marked with a circle or a triangle, depending on whether the values signiﬁcantly deviatefrom the assumption of perfect balance or not.spent huge amount after Russian billionaire Dmitry Rybolovlev bought two-thirds of the shareand the club secured promotion back to the top division in 2013. The disparity in spending bythese few clubs might have caused the earlier competitive balance go away for Ligue 1. SerieA, on the contrary, has been the most balanced league of the lot. In only four out of theten seasons, their GBI index is signiﬁcantly higher and there is no discernible pattern in theseason-by-season values.We further notice that the other two indexes also behave generally in the same directions.The overall correlation coeﬃcient between GBI and C6 is 0.52 and the same for GBI and HICBis 0.46. C6 and HICB are more correlated with a coeﬃcient of 0.9. The league-wise correlationcoeﬃcients for the three measures are presented in Table 2.League C6 and HICB C6 and GBI HICB and GBIBundesliga 0.90 0.31 0.19EPL 0.94 0.59 0.59La Liga 0.93 0.48 0.52Ligue 1 0.91 0.47 0.61Serie A 0.95 0.57 0.65Table 2: Correlation coeﬃcient of the three measures for diﬀerent leagues.10inal piece of the analysis is to analyze the relationship between revenue and the three typesof measures. Figure 5 shows the revenues for every season for the ﬁve leagues. The growths ofthe overall revenues are similar for all the leagues, and that justiﬁes the use of the linear trendterm in our panel data model. It is clear that EPL has been the biggest league in terms ofrevenues generated over the years. Ligue 1 sits at the bottom of the list. r e v enue BundesligaEPLLa LigaLigue 1Serie A

Figure 5: Season-wise revenue (in billion Euros) of the top ﬁve European leagues.As mentioned before, panel data models (see eq. (2.4)) are used for the statistical analysis.For each of the three measures, a ﬁxed-eﬀect model (denoted as FE) and a random-eﬀect model(denoted as RE) are implemented on the data. For the RE models, we use the method ofSwamy and Arora [1972]. All of the results are displayed in Table 3. There, σ indicates theestimate of the idiosyncratic standard deviation of the model, that is the standard deviation ofthe error process for a random-eﬀect model; and σ α is the estimated standard deviation for therandom eﬀect.It is quite evident that the model using GBI performs better than the other two measures. Infact, only in the models with GBI (last two columns in the table), the coeﬃcients correspondingto the competitive balance variable are signiﬁcant. The linear trend term is always signiﬁcantand those estimates are quite similar across diﬀerent models. In case of the FE models, we seethat the coeﬃcients are usually not signiﬁcant when we use C6 or HICB. On the contrary, theyare highly signiﬁcant when we use GBI, which seems to better explain the variability in therevenues.The p -values from the Hausman tests between the FE and RE models are always non-signiﬁcant, implying that the RE models are providing better ﬁts for the data. Further, notethat the standard deviation for the random eﬀect is much higher than the idiosyncratic standarddeviation for all RE models. All in all, RE model with GBI has proved to be the best modelhere. 11E(C6) RE(C6) FE(HICB) RE(HICB) FE(GBI) RE(GBI)(Intercept) 2 .

24 2 .

67 2 . ∗∗ (1 .

72) (3 .

16) (0 . . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ (0 .

02) (0 .

02) (0 . − . − . .

22) (1 . − . − . .

88) (2 . − . ∗ − . ∗ (0 .

72) (0 . .

45 2 .

99 3 . ∗∗∗ (1 .

61) (3 .

10) (0 . . ∗ .

67 4 . ∗∗∗ (1 .

71) (3 .

15) (0 . .

38 2 .

89 2 . ∗∗ (1 .

69) (3 .

15) (0 . .

47 1 .

98 1 . ∗ (1 .

62) (3 .

07) (0 . .

95 2 .

46 2 . ∗∗∗ (1 .

67) (3 .

13) (0 . R .

67 0 .

68 0 .

67 0 .

68 0 .

70 0 . σ .

39 0 .

39 0 . σ α .

89 0 .

91 1 . p -value 0 .

48 0 .

65 0 . ∗∗∗ p < . ∗∗ p < . ∗ p < . Table 3: Results from the panel data models. Standard errors of the coeﬃcient estimates aregiven in parentheses. FE denotes a ﬁxed eﬀect model and RE denotes a random eﬀect model. σ and σ α are the idiosyncratic standard deviation and the random eﬀect standard deviation forthe RE models. In this study, we have developed a new measure of competitive balance for a football leagueand have showed that it is more suitable than the existing approaches. Earlier measures ofcompetitive balance, such as concentration ratio or Herﬁndahl index, were primarily developedto analyze the balance between ﬁrms in an industry; but a football league is more interestingbecause of some speciﬁc sets of rules to determine the ordering of the clubs. In this work, wehave discussed the ideal assumptions under which a league becomes perfectly balanced when wetake into account the actual rules of the leagues. To the best of our knowledge, it is the ﬁrstpaper to provide such a mathematical understanding. As the goal-based index we propose relieson the scorelines of the matches instead of only the outcomes, it has a natural advantage overHICB or C6 index. Moreover, solid theoretical framework behind the GBI makes it possible forus to formally test if a league is perfectly balanced or not. Simulation study conﬁrms that theproposed test has great power. 12sing the data from the top ﬁve European leagues, we also show that GBI can providesome valuable insights about the ﬂuctuations in the competitive balance for a particular league.An interesting observation was the case of Ligue 1, where we see that the league starts to beimbalanced after PSG and Monaco got huge funding. Potentially, one can leverage the GBI tofurther analyze how investments or player transfers aﬀect the competitive balance of the league.One can also hypothesize that the imbalance in a league results in signiﬁcant changes in revenuesharing or broadcasting rights. A formal analysis with a bigger dataset in this regard is suitablefor a great follow-up study to the current work.On the other hand, we have quantiﬁed the seasonal imbalance and provided relevant results.A natural extension is to develop match-level and gameweek-level measures of competitive bal-ance; and to explore how that can be used to predict the ﬁnal standing of the season. It canbe possible to analyze the short-term impact of the gameweek-level competitive balance, forinstance on the attendance or television viewership of every match, as well. One can also lookat the competitive balance from the championship or relegation perspective and see if that canact as a signiﬁcant factor in sports economics studies.

Funding source

This research did not receive any speciﬁc grant from funding agencies in the public, commercial,or not-for-proﬁt sectors.

Declarations of interest

The author declares no conﬂict of interest.

References

Leif Brandes and Egon Franck. Who made who? an empirical analysis of competitive balancein european soccer leagues.

Eastern Economic Journal , 33(3):379–403, 2007.Babatunde Buraimo and Rob Simmons. Market size and attendance in english premier leaguefootball.

International Journal of Sport Management and Marketing , 6(2):200–214, 2009.Stephen R Clarke and John M Norman. Home ground advantage of individual clubs in englishsoccer.

Journal of the Royal Statistical Society: Series D (The Statistician) , 44(4):509–521,1995.Deloitte. Commercial revenue of ”big ﬁve” football leagues in europe from 2004 to 2019, byleague, 2020. URL http://bit.ly/3pqYpmA . Last accessed on 5th January, 2021.Teodor Dima. The business model of european football club competitions.

Procedia economicsand ﬁnance , 23:1245–1252, 2015.Philip J Everson and Paul Goldsmith-Pinkham. Composite poisson models for goal scoring.

Journal of Quantitative Analysis in Sports , 4(2), 2008.Marshall Hall and Nicolaus Tideman. Measures of concentration.

Journal of the americanstatistical association , 62(317):162–168, 1967.13erry A Hausman. Speciﬁcation tests in econometrics.

Econometrica: Journal of the econometricsociety , pages 1251–1271, 1978.Brad R Humphreys. Alternative measures of competitive balance in sports leagues.

Journal ofsports economics , 3(2):133–148, 2002.Stefan K´esenne. Revenue sharing and competitive balance in professional team sports.

Journalof Sports Economics , 1(1):56–65, 2000.Jonathan Michie and Christine Oughton.

Competitive balance in football: Trends and eﬀects .The sportsnexus London, 2004.Giovanni Millo. Robust standard error estimators for panel models: A unifying approach.

Journal of Statistical Software , 82(3):1–27, 2017. doi: 10.18637/jss.v082.i03.Thomas Peeters. Broadcast rights and competitive balance in european soccer.

InternationalJournal of Sport Finance , 6(1):23, 2011.Daniel Plumley, Girish Ramchandani, and Rob Wilson. Mind the gap: an analysis of competitivebalance in the english football league system.

International Journal of Sport Managementand Marketing , 18(5):357–375, 2018.James Quirk and Rodney D Fort.

Pay dirt: The business of professional team sports . PrincetonUniversity Press, 1997.Marc Rohde and Christoph Breuer. The market for football club investors: a review of theoryand empirical evidence from professional european football.

European Sport ManagementQuarterly , 17(3):265–289, 2017.Simon Rottenberg. The baseball players’ labor market.

Journal of political economy , 64(3):242–258, 1956.John G Skellam. The frequency distribution of the diﬀerence between two poisson variatesbelonging to diﬀerent populations.

Journal of the Royal Statistical Society: Series A , 109:296, 1946.PAVB Swamy and Swarnjit S Arora. The exact ﬁnite sample properties of the estimatorsof coeﬃcients in the error components regression models.

Econometrica: Journal of theEconometric Society , pages 261–275, 1972.Stefan Szymanski. Income inequality, competitive balance and the attractiveness of team sports:Some evidence and a natural experiment from english soccer.

The Economic Journal , 111(469):F69–F84, 2001.Stefan Szymanski. The economic design of sporting contests.

Journal of economic literature , 41(4):1137–1187, 2003.Andrew S Zimbalist. Competitive balance in sports leagues: An introduction, 2002.14 ppendix

Proof of Theorem 1.

Let us consider a game where the i th team is playing at home against the j th team. For convenience, let us use Y and Y to denote X ij and X ji , respectively. First, weconsider the probability of draw, i.e. P ( Y = Y | λ ). Using the independence assumption andthe Poisson probability mass function, P ( Y = Y | λ ) = ∞ X k =0 e − λ λ k ( k !) . (5.1)Note that the modiﬁed Bessel function of the ﬁrst kind of order α is deﬁned by I α ( x ) = ∞ X k =0 k !Γ( k + α + 1) (cid:16) x (cid:17) k + α . (5.2)Clearly, eq. (5.1) can be rewritten as P ( Y = Y | λ ) = e − λ I (2 λ ). As λ →

0, this probabilitygoes to 1. For large z , I ( z ) is approximately equal to z − / e z / √ π and therefore, P ( Y = Y | λ ) → λ → ∞ . Using the continuity of both terms, it is easy to argue that thereexists a λ > /

3. Finally, using the iid assumption, P ( Y > Y | λ ) = P ( Y < Y | λ ) and that completes the proof. Proof of Theorem 2.

Let Y ij = X ij − X ji . Clearly, Y ij > i thteam, Y ij < j th team and Y ij = 0 corresponds to a draw.Since all X ijk ’s are iid Poisson( λ ), Y ij ’s are iid and follow Skellam distributions (Skellam [1946])with parameters ( λ, λ ). Using the modiﬁed Bessel function of the ﬁrst kind (see eq. (5.2)), theprobability mass function (pmf) of this distribution can be written as P ( Y ij = k ) = e − λ I | k | (2 λ ) , (5.3)and the corresponding moment generating function (MGF) is M ( t ) = exp (cid:0) λ ( e t − e − t ) (cid:1) . (5.4)It is evident that the distribution of Y ij is symmetric. Let P ( Y ij = 0) = e − λ I (2 λ ) bedenoted as d λ . Then, the probability of winning for any team in any match is (1 − d λ ) / w λ ,say. Thus, the pmf of S ijk is P ( S ijk = 0) = w λ , P ( S ijk = 1) = d λ , P ( S ijk = 3) = w λ , (5.5)and P ( S ijk = r ) = 0 otherwise.Without loss of generality, consider any two teams and let S , S be their total points inthe entire league. One can write S = S ′ + S , where S = S + S , and S ′ is the totalpoints gathered against all but the 2nd team. Similarly, write S = S ′ + S . Under the givenassumptions, it is easy to note that P ( S = a, S = b ) =  w λ for a = 6 , b = 0 , or a = 0 , b = 6 , w λ dλ for a = 4 , b = 1 , or a = 1 , b = 4 , w λ for a = 3 , b = 3 ,d λ for a = 2 , b = 2 , . (5.6)15enote the support of ( S , S ) by B ⊂ A × A , with A = { , , , , , } . Then, P ( S > S ) = P ( S ′ − S ′ > S − S ) = X ( a,b ) ∈ B P ( S ′ − S ′ > b − a ) P ( S = a, S = b ) . (5.7)In view of the iid nature of every game, S ′ and S ′ are iid random variables. Therefore, P ( S ′ − S ′ > − r ) = P ( S ′ − S ′ > − r ) = 1 − P ( S ′ − S ′ ≤ r ), and thereby P ( S ′ − S ′ > − r ) + P ( S ′ − S ′ > r ) = 1 − P ( S ′ − S ′ = r ). It further implies the following. P ( S > S ) = w λ (1 − P ( S ′ − S ′ = 6))+2 w λ dλ (1 − P ( S ′ − S ′ = 3))+(2 w λ + d λ ) P ( S ′ = S ′ ) . (5.8)A straightforward implication of the above, considering iid nature of S ′ and S ′ , is that P ( S > S ) = P ( S < S ). From eq. (5.6), one should also note that P ( S > S ) = P ( S >S ).Next, recall the deﬁnitions of GS , GS , GD , GD from eq. (2.1). Let GS = GS ′ + GS , GS = GS ′ + GS , GD = GD ′ + GD and GD = GD ′ + GD , deﬁned on an identical fashionas above. It is easy to see that GS ′ , GS ′ are iid Poisson( mλ ) for m = 2 n − GS and GS areiid Poisson(2 λ ), GD = GS − GS follows a Skellam distribution with parameters (2 λ, λ ),and GD = − GD . Thus, P ( GD > GD ) = P ( GD ′ − GD ′ > GD − GD ) = P ( V +2 GD > V follows a Skellam distribution with parameters ( mλ, mλ ). Clearly, both V and GD are symmetric around 0 and hence, simple calculations can prove that P ( GD >GD ) = P ( GD > GD ). In a similar line, one can also show that P ( GS > GS ) = P ( GS >GS ).Combining the above results, we can easily argue that the ﬁnal standing of the two teams isequally likely to be (1 ,

2) or (2 , S , S ), ( GD , GD ),( GS , GS ) and ( S , S ). In fact, the same can be proved for any other criteria which relieson the iid assumption.As a last piece of the proof, note that if there are two or more permutations of the teamswhich are not equally probable to be the ﬁnal standing, then there are two teams which arenot equally likely to be above each other in the league, a contradiction to what we have provedabove. Proof of Theorem 3. X ijk , for all i, j ∈ { , , . . . , n } , k ∈ { , } , are independent. Let X ijk be Poisson with mean λ ijk . Now, consider the null hypothesis H : λ ijk = λ for all i, j ∈{ , , . . . , n } , k ∈ { , } , against the alternate hypothesis H : λ ijk ’s are not all equal. The jointlikelihood of X ijk ’s can be written as L ( λ ijk ; i, j ∈ { , , . . . , n } , k ∈ { , } ) = n Y i =1 n Y j =1 i = j Y k =0 e − λ ijk λ X ijk ijk X ijk ! . (5.9)Let Λ be the likelihood ratio. It is easy to see that the MLE of λ ijk under H is ˆ λ fromeq. (2.2), and the MLE of λ ijk under H ∪ H is X ijk . Then, the standard likelihood ratio teststatistic is − − n X i =1 n X j =1 i = j X k =0 "(cid:16) X ijk − ˆ λ (cid:17) + X ijk log ˆ λX ijk ! = 2 n X i =1 n X j =1 i = j X k =0 X ijk log (cid:18) X ijk ˆ λ (cid:19) . (5.10)16ake f ( x ) = x log( x/x ′ ) for a ﬁxed x ′ . The Taylor series expansion of f ( x ) about x ′ is f ( x ) = ( x − x ′ ) + 12 x ′ ( x − x ′ ) + δ, where δ is negligible under H and for a large sample, which is ensured considering N = O ( n ).Thus, under that scenario, the likelihood ratio test statistic can be approximated as − n X i =1 n X j =1 i = j X k =0 (cid:16) X ijk − ˆ λ (cid:17) + (cid:16) X ijk − ˆ λ (cid:17) λ  . (5.11)Since the ﬁrst term above adds up to 0, it is easy to see that − N − − ∼ χ with degrees of freedom 2 N −−