[PDF] Predicting sports scoring dynamics with restoration and anti-persistence

Abstract

Professional team sports provide an excellent domain for studying the dynamics of social competitions. These games are constructed with simple, well-defined rules and payoffs that admit a high-dimensional set of possible actions and nontrivial scoring dynamics. The resulting gameplay and efforts to predict its evolution are the object of great interest to both sports professionals and enthusiasts. In this paper, we consider two online prediction problems for team sports:~given a partially observed game Who will score next? and ultimately Who will win? We present novel interpretable generative models of within-game scoring that allow for dependence on lead size (restoration) and on the last team to score (anti-persistence). We then apply these models to comprehensive within-game scoring data for four sports leagues over a ten year period. By assessing these models' relative goodness-of-fit we shed new light on the underlying mechanisms driving the observed scoring dynamics of each sport. Furthermore, in both predictive tasks, the performance of our models consistently outperforms baselines models, and our models make quantitative assessments of the latent team skill, over time.

Full PDF

PPredicting sports scoring dynamics with restoration and anti-persistence

Leto Peel ∗ and Aaron Clauset

1, 2, 3, † Department of Computer Science, University of Colorado, Boulder, CO 80309 BioFrontiers Institute, University of Colorado, Boulder, CO 80303 Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501

Professional team sports provide an excellent domain for studying the dynamics of social com-petitions. These games are constructed with simple, well-deﬁned rules and payoﬀs that admit ahigh-dimensional set of possible actions and nontrivial scoring dynamics. The resulting gameplayand eﬀorts to predict its evolution are the object of great interest to both sports professionals and en-thusiasts. In this paper, we consider two online prediction problems for team sports: given a partiallyobserved game

Who will score next? and ultimately

Who will win?

We present novel interpretablegenerative models of within-game scoring that allow for dependence on lead size ( restoration ) andon the last team to score ( anti-persistence ). We then apply these models to comprehensive within-game scoring data for four sports leagues over a ten year period. By assessing these models’ relativegoodness-of-ﬁt we shed new light on the underlying mechanisms driving the observed scoring dynam-ics of each sport. Furthermore, in both predictive tasks, the performance of our models consistentlyoutperforms baselines models, and our models make quantitative assessments of the latent teamskill, over time.

I. INTRODUCTION

Competition in social systems is a natural and perva-sive mechanism for improving performance and distribut-ing limited resources. The quantitative study of suchcompetitions can improve our ability to predict the out-comes associated with speciﬁc strategies and the strategicchoices that competitors may make. However, most realcompetitions take place in complex and evolving environ-ments [15, 26], which makes them diﬃcult to study. Pro-fessional team sports, with their well deﬁned and consis-tently enforced rules, provide a controlled setting for thestudy of competition dynamics [14, 27, 29] and have pre-viously been used as model systems for studying businessdecision making and human behavioral biases [28, 34].The recent trend toward recording comprehensive anddetailed data on the events within particular games pro-vides us with new opportunities to study, model, andpredict the dynamics of these games. The results of thesestudies promise to shed new light on a wide variety of ex-isting competitive social systems, and enhance work ondesigning new ones, both oﬄine and online.Here, we examine the time series of scoring events inall league games across four diﬀerent team sports over aperiod of ten years. We construct and test probabilisticmodels for two online predictive tasks: given a partiallyobserved game

Who will score next? and ultimately

Whowill win?

We then use these models to investigate thepredictiveness of the dynamical phenomena of restoration and anti-persistence , which are deﬁned below.The events within a particular game can be eﬀectivelymodeled as the interaction of skill and chance. Inferringskill from a series of competitions has a long history of ∗ [email protected] † [email protected] study, both for individuals [11, 17] and for teams [20, 35].However, this past work has typically only considered theﬁnal outcome of games, in terms of either a win or loss,or the ﬁnal point diﬀerence. Here, we focus on modelingthe speciﬁc pattern of scoring events within an individualgame.The role of chance also has a long history of study,typically focusing on the question of whether one successincreases the likelihood of subsequent success. This ideacan be formalized at diﬀerent levels, e.g., success by in-dividual players within a game [2, 16, 39], or a team’ssuccess across multiple games [1, 32, 37]. Here, for theﬁrst time, we focus on a diﬀerent level: success by a wholeteam within a game.A simple starting point for such models is the basicidea of many skill ranking systems [6, 11], which modelgame outcomes as random variables dependent on thecompeting teams’ skills. We extend this idea to considerthe point-scoring events within a game to be a sequence ofindependent contests. Past work supports this approach,as some studies have found a lack of dependence betweenan individual scoring and their ability to score subse-quent points [16, 39], or between a team winning andtheir chance to win future games [32, 37]. On the otherhand, there is also evidence of non-independence, e.g., theprobability of scoring itself can vary with the clock timewithin a game or with the size of the lead [14, 26, 27]. Toinvestigate the degree to which non-independence gov-erns scoring probabilities, we construct a sequence ofmore complex models that allow speciﬁc aspects of agame’s current state to inﬂuence scoring rates, e.g., theteam that scored last and the lead size.In many sports, including American football and bas-ketball, a simple source of non-independence is a forcedchange in ball possession after each scoring event, puttingthe scoring team at a disadvantage. This can result in aphenomenon called anti-persistence , in which a score byone team is more likely to be followed by a score by their a r X i v : . [ phy s i c s . d a t a - a n ] A p r opponent [14].Another potential source of non-independence is thesize of the lead itself. Past work has shown that theobserved probability of scoring next can vary with leadsize [14, 27]. A negative dependence may be the result ofstrategy, e.g., a team using its best players when it fallsbehind and substituting them out when they are ahead.Such strategies have a restorative eﬀect on the lead size,serving to pull the size of the lead back toward zero.Conversely, anti-restoration or momentum occurs whenthe leading team has a higher chance of scoring again,perhaps by improving their control over the playing ﬁeldor by learning from gameplay how to better exploit theweaknesses of the opposing team.In this paper, we develop probabilistic generative mod-els around these ideas to explore and predict the evolu-tion of point scoring over the course of a game. We usethese models to deduce the impact of chance, strategy,and the rules of the game itself, and to test two simplehypotheses:1. the probability of scoring does not depend on thecurrent state of the game (team skill alone mat-ters).2. the probability of scoring does depend on the cur-rent game state (as well as team skill).Our probabilistic models encode speciﬁc instances ofthese assumptions and we assess their accuracy undertwo online predictive tasks. We present novel predictivemodels that can not only predict the outcome of a game,but also provide better predictions over baseline modelsabout the sequence of scoring events. II. RELATED WORK

Our work addresses two novel prediction problems forpredicting

Who will score next? and

Who will win? , us-ing only the sequence of scoring events that have alreadyoccured during the game. In the following we outline re-lated work to each of these questions in turn.Essential to answering the question

Who will scorenext? is understanding the underlying mechanisms ofscoring dynamics. The study of competitive team sportshas a rich history spanning a broad selection of featuresincluding the timing of scoring events [7, 12, 14, 21,27, 36, 39], long-range correlations in scoring [30], therole of timeouts [31], the identiﬁcation of safe leads [8],and the impact of spatial positioning and playing ﬁelddesign [5, 26, 40]. The most relevant of these studiesfocuses on the analysis of individual player “momen-tum” or “hot-hands” [2, 16, 39] and on team winningstreaks [1, 16, 32, 37, 39]. Here, we bring together thesetwo ideas by considering the notion of momentum, or itsreverse “restoration”, at the team level. Although someanalysis has previously been undertaken in this direc-tion [14], we go further to provide the ﬁrst predictivemodels that answer the question:

Who will score next?

The foundations of our approach lie in the ﬁeld of skillmodeling and team ranking [6, 11], which originated inthe mid-20th century. Work in this area includes theranking of individuals [9, 11, 17], teams [18, 32, 35],or both [20, 22]. These models have been applied to awide range of competitive events, including baseball [32],chess [9, 11, 17], American football [18, 35], associationfootball (soccer) [23], and tennis [17]. More recently, theyhave been adapted to matchmaking problems in onlinegames [20] and to calibrating reviewer scores in computerscience conferences [13].Our work is the ﬁrst to use skill ranking models to pre-dict

Who will win? by predicting the sequence of scoringevents within a game. Skill ranking models have previ-ously been applied to predicting game outcomes but onlybased on the ﬁnal outcome of the game, either in termsof the win/loss result or the ﬁnal point diﬀerence. Thesepast approaches thus cannot update their prediction asthe game unfolds, while our models can. We train on ahistory of scoring event sequences so that we may pre-dict

Who will win? in an online fashion. Some commer-cial online sports betting systems exist that make similaronline predictions, but these systems are proprietary andclosed, which precludes a scientiﬁc evaluation or compar-ison with our models. They are not considered hereafter.

III. SPORTS DATASETS

We use scoring event data from four team sports:college-level American football (CFB, 10 seasons; 2000-2009), professional American football (NFL, 10 seasons;2000-2009), hockey (NHL, 9 seasons; 2000-2003, 2005-2009) and basketball (NBA, 9 seasons; 2002-2010). Eachdataset consists of the set of scoring events for each gameplayed in the season. It includes the time the event wasscored, the team and player that scored, and its pointvalue. Table I gives a summary of these data including thenumber of teams, games, and individual scoring events.In our analysis and modeling, we discard the timestampsof the events and instead consider only the order in whichevents appear within a game. A. Preprocessing

We extract from the raw event data two sequences torepresent each game: a point sequence φ , where φ i is thepoint value of scoring event i in the game, and a teamsequence ψ , where ψ i ∈ { r, b } is the identity of the teamthat won those points. If there are N t events in game Data provided by STATS LLC, copyright 2015 The entire 2004 NHL season was canceled due to an extensivelockout over a dispute about player salary caps [33].

TABLE I. Summary of our sports data for multiple seasons across four team competitive sports.number of games number of scoring events mean eventssport abbrv. seasons teams total preprocessed total preprocessed (preprocessed)Football (college) CFB 10, 2000–2009 461 14,588 13,689 190,337 117,752 8.60Football (pro) NFL 10, 2000–2009 32 2,645 2,561 32,800 20,115 7.85Basketball (pro) NBA 9, 2002–2010 30 11,744 11,744 1,301,408 1,096,179 93.34Hockey (pro) NHL 9, 2000–2009 30 11,813 10,259 65,085 59,227 5.77 t , then the corresponding φ and ψ each contain N t ele-ments, and the lead size at event i is L i = i (cid:88) j =1 φ j δ ( ψ j , r ) − φ j δ ( ψ j , b ) , (1)for team labels r and b (arbitrarily chosen), where δ ( ., . )is the Kronecker delta function and by convention wecompute L from team r ’s perspective.We begin by removing some games and scoring events.We remove any events that occur during regulation over-time (0.88% of all events), because these events followdiﬀerent scoring processes than events in regular gametime [27]. Additionally, any games in which only one teamscored are removed (6.24% of all games), as the raw datado not indicate the identity of the non-scoring team.Under certain game conditions, multiple scoringevents, potentially by diﬀerent teams, can occur at thesame game clock time. For example, in American foot-ball, the clock is stopped after a touchdown is scored butthe scoring team gets a chance to score a conversion. Ifthe conversion is unsuccessful, occasionally the opposingteam gains control and scores points before the clock isrestarted. Similarly, in basketball, the clock is stoppedduring free throws after a foul, after which the ball isinbounded (thrown in). If the ball is inbounded close tothe other basket, it is possible to score before a secondhas elapsed on the clock. In these cases, the ordering ofthese events is ambiguous.Removing these events would alter the running leadsize, which is one of the game states of interest. Instead,we merge simultaneous events into a single scoring playthat removes the ordering ambiguity while preserving thecorrect score. If one team scores two simultaneous events i and i + 1, we merge their values, setting φ i = φ i + φ i +1 ,and removing event i + 1 from both sequences. If twoteams score simultaneously, we merge their values withthat of the immediately preceding event in a way thatpreserves the running lead. Speciﬁcally, we set φ i − = φ i − ± | φ i − φ i +1 | , where the sign is consistent with theprevious assignment of r and b labels to teams, and thenremove events i and i + 1 from both sequences. B. Scoring and lead size

We use these point and team sequences to make aninitial investigation of our hypotheses. If the scoring dy- −50 0 500.00.20.40.60.81.0 p ( s c o r e | l e a d ) CFB −100 −50 0 50 100

NBA −60−40−20 0 20 40 60 lead size p ( s c o r e | l e a d ) NFL −10 −5 0 5 10 lead sizeNHL

FIG. 1. Probability that a team scores next as a function ofits lead size, for the observed ( yellow ) and simulated ( black )patterns, each with a linear least squares ﬁt line. The simu-lated scoring sequence assumes that the probability of scoringis independent of the game’s state. namics are truly independent of the game’s state, thesedynamics will be indistinguishable from an independentBernoulli process, in which each Bernoulli trial representsa scoring event. We evaluate this model by calculatingthe empirical probability that a team will score the nextevent as a function of the current lead size L . Recall thatwe compute L from the perspective of team r ; thus, if r is leading, then L is positive, while if r is trailing, then L is negative (and vice versa for b ). This function is thusrotationally symmetric about a lead of L = 0, whereneither team leads, and has the mathematical form of P ( ψ i = r | L i − ) = 1 − P ( ψ i = b | − L i − ).We compare the empirical scoring function to one cal-culated from synthetic team sequences generated accord-ing to an independent Bernoulli process, in which we ﬂipa biased coin to determine which team wins each scoringevent. The coin’s bias is determined by the proportion ofscoring events each team wins in that particular game, N (cid:80) Ni =1 δ ( ψ i , r ) (or for b ). In this simulation, events arethus independent of the game state (hypothesis 1). Wealso compute a least-squares regression line for the em-pirical and for the synthetic data, in which each pointis given weight proportional to the number of times thecorresponding lead size was observed.All of the resulting gradients relating scoring probabil-ity to lead size are nonzero (Fig. 1), and each Bernoulliprocess produces a positive gradient. This pattern simplyreﬂects the empirical distribution of biases used to sim-ulate the ensemble of games, with a more positive slopereﬂecting broader variance in these biases. The variancein the estimated scoring probability increases with leadsize simply because progressively fewer games produceleads of that magnitude.Comparing the observed and simulated scoring func-tions (Fig. 1), we observe a clear contradiction. The gra-dient and, in particular for NBA, the range of lead sizesgenerated by the Bernoulli process disagree strongly withthose properties observed in the empirical data. These re-sults suggest that the probability of scoring does indeeddepend, somehow, on the game state (hypothesis 2). Insubsequent sections, we investigate this dependence us-ing sophisticated probabilistic models to determine howthe probability of scoring depends on game state. IV. WHERE STANDARD TESTS FAIL

To determine whether scoring events are independent,we now apply a suite of statistical randomization tests,which compare observed sequences to random sequenceswith similar properties. Speciﬁcally, we employ the • serial test ( non-uniformity ), • Wald-Wolfowitz runs test ( anti-restoration ), and • autocorrelation test ( persistence/anti-persistence ),where for each the null hypothesis is that the team se-quence ψ is simply a random sequence.The serial test [25] examines bigram frequencies in asequence and compares them to their expected frequen-cies under a uniformly random sequence. For a team se-quence with N elements, the observed fractions of bi-grams { rr, rb, br, bb } are compared to their expectationsof N/

4. This test can identify the existence of a biaswithin each game, i.e., if one team is systematically morelikely to score than another.The Wald-Wolfowitz runs test [38] examines the ob-served number of runs in a sequence, i.e., substrings of ψ for which each element is the same (either r or b ), whichallows us to identify either positive momentum or anti-restorative eﬀects in within-game scoring. We reject thenull hypothesis that ψ is random if the observed numberof runs is signiﬁcantly below its expected value. Previ-ously, this test has been used to detect winning streaksin sequences of games [37].The autocorrelation test measures the correlation of asequence with itself, shifted by one element, which allowsus to identify periodic dynamics that occur as a result ofanti-persistence. Here, we reject the null hypothesis that ψ contains no dependence between values if the autocor-relation is signiﬁcantly higher or lower than zero.We apply each of these three tests to each of our fourdata sets, and compare the results against a false positive number of scoring events p r o b a b ili t y CFBNBANFLNHL p r o p o r t i o n o f g a m e s '00 '01 '02 '03 '04 '05 '06 '07 '08 '090.00.10.2 CFB '02 '03 '04 '05 '06 '07 '08 '09 '10 season

NBA

Serial TestWald-WolfowitzAuto-correlation'00 '01 '02 '03 '04 '05 '06 '07 '08 '090.00.10.2

NFL '00 '01 '02 '03 '05 '06 '07 '08 '09 season

NHL

FIG. 2. ( top ) Probability distributions for the number of scor-ing events in a game, and ( bottom ) the randomization testresults for each sport, by season, versus a false positive rateof α = 0 .

05 (dashed line). The team sequences of each gameare tested independently and we plot the proportion of gamesthat reject the null hypothesis that the sequences are random.Because CFB, NFL and NHL typically have a small number ofevents per game (upper panel), the null hypothesis is diﬃcultto reject. rate of α = 0 .

05 (Fig. 2). We also consider each seasonseparately so as to reveal non-stationarities. Basketball,unlike the other sports, produces a large proportion ofrejections for the serial and autocorrelation tests, whichreﬂects the known anti-persistence pattern in basketballscoring [14].On the other hand, for all sports except basketball,each of these tests rejects the null hypothesis at close toor below the chosen false positive rate, a ﬁnding consis-tent with each of these sequences being random. How-ever, this interpretation is problematic. The serial testmakes the very strict assumption that each sequence isdrawn from a uniform random distribution, i.e., each isgenerated by ﬂipping a fair coin several times. A face-value interpretation thus implies that all teams have anequal chance of winning each game—a highly unlikelysituation—and it predicts that the scoring function fromSection III B should be independent of lead size, whichcontradicts the observed pattern (Fig. 1).In fact, however, there is no contradiction: the ψ se-quences are simply too short (Fig. 1) for these tests toreliably distinguish random from non-random sequenceswhen we assume they are generated independently, i.e.,the tests have low statistical power. The one exceptionis basketball, whose sequences typically contain 90 or soevents, while those for American football or hockey typ-ically contain less than 10.In the following sections, we show how to circumventthe low statistical power of these tests by exploiting thefact that team sequences are not, in fact, independentof each other. Instead, each season’s sequences are gen-erated by repeatedly selecting pairs from a ﬁnite andﬁxed population of teams. This process induces substan-tial correlations across games that we can capture bymodeling the latent skills of each team within a givenseason. V. SKILL-BASED SCORING DYNAMICS

Toward this end, we develop a series of models of in-creasing complexity based on speciﬁc underlying mech-anisms for sports scoring dynamics, including indepen-dence, restoration, and anti-persistence. Each of thesemodels represents team skill as a latent variable. We as-sume that team skill is ﬁxed over the course of any partic-ular season [18], which reﬂects the relatively stable teamrosters and coaching staﬀs, and low injuries rates in thesesports. Furthermore, modeling each season separately al-lows us to run multiple tests for each sport—one for eachseason—and allows our models to capture real changesin team skill across seasons [18].Each of our models generates a team sequence ψ byextending the popular Bradley-Terry (BT) model [6] togenerate individual scoring events within a game. Tra-ditionally, the BT model is used to estimate unobserved(latent) team skills from the observed outcomes of manygames among pairs of teams. The probability that team r wins in a match against team b is given by the skill of r relative to b : P ( r wins against b ) = d rb = π r π r + π b , (2)where π r , π b ∈ [0 ,

1] is the latent skill for team r . A. Independent model

When scoring events within a game are independent,their generation is equivalent to a simple Bernoulli pro-cess with a game-speciﬁc bias. This is equivalent to an“independent model” that applies the game-level BTmodel of Eq. (2) to each of the individual scoring eventswithin a game, yielding P ( ψ i = r ) = d rb . (3)This represents our ﬁrst model, which can capture vari-ability in a team sequence caused by diﬀerences in teamskill parameters, but not other sources of variability. B. Restorative models

Real scoring functions (Fig. 1) produce a range of gra-dients. However, the independent model can only pro-duce positive slopes. To capture a wider variety of scor-ing function shapes, and in particular a negative slope or“restorative” pattern, we extend the independent modelby allowing each team’s skill to explicitly covary with its lead. Such a relationship could arise for psychologicalreasons, e.g., a winning team “loses steam” or a losingteam gains motivation [4], or for strategic reasons, e.g.,substituting out or in the more skilled players while inthe lead in order to conserve their energy, avoid injury,or create momentum [27].Our restorative model augments the independentmodel with an explicit per-team “restorative force” pa-rameter γ r , which modiﬁes team r ’s strength in responseto the current lead size from its perspective (cid:96) r and cap-tures the fact that diﬀerent teams may have diﬀerentbehaviors in response to how far ahead or behind theyare. When γ r <

0, team r exhibits a restorative pattern,with skill being proportional to − (cid:96) r . When γ r >

0, team r exhibits an anti-restorative or momentum pattern, withskill being proportional to (cid:96) r .The probability that team r scores against b is givenby P ( ψ i = r ) = d rb + (cid:96) ir c rb , (4)where (cid:96) ir is r ’s lead size just before event i and c rb = γ r + γ b = c br . (5)A game as a whole exhibits a restorative pattern when-ever c rb <

0. This occurs either when both teams exhibita restorative pattern themselves ( γ r < γ b <

0) orwhen one team’s restorative force is stronger than theother team’s anti-restorative force ( γ r < γ b >

0, and | γ r | > | γ b | ).The additional term in Eq. (4) relative to the in-dependent model means this model’s scoring functionis no longer bounded on the [0 ,

1] interval. We correctthis behavior by using a sigmoid function of the form σ ( x ) = (1 + e − x ) − to provide a smooth and continuousapproximation of the misspeciﬁed linear function.To make this approximation, we change variables sothat a logistic curve most closely approximates the linearequation, which occurs when we match the gradients atthe point of symmetry at P ( ψ i = r ) = 1 /

2. Setting thederivative σ (cid:48) equal to c rb , we ﬁnd σ (cid:48) ( m rb (cid:96) ir + v rb ) = m rb e m rb (cid:96) ir + v rb (e m rb (cid:96) ir + e v rb ) = c rb , (6)We then solve for when the logistic function equals 1 / σ ( m rb (cid:96) ir + v rb ) = 11 + e − ( m rb (cid:96) ir + v rb ) = 1 / . (7)Finally, in solving Eqs. (6) and (7) we obtain the fol-lowing transformation of variables: v rb = − / − d rb ) (8) m rb = 4 c rb , (9)where m rb and v rb are the variables used in the logisticfunction such that c rb and d rb retain their linear interpre-tation and are thus comparable to the skill variables in −40 −20 0 20 40 60 80lead size0.00.20.40.60.81.0 p ( s c o r e | l e a d ) y = ¡ : x +0 : y = ¾ ( ¡ : x +0 : y = ¡ : x +0 : y = ¾ ( ¡ : x +1 : FIG. 3. Two examples of linear functions matched to logisticfunctions using the change of variables in Eqs. (8) and (9). the independent scoring model. Figure 3 shows examplesof two linear functions and their corresponding logisticapproximations.

C. Anti-persistence models

In many sports, we observe an anti-persistent patternin the team sequences, in which the probability that r scores next depends on which team scored last, i.e., P ( ψ i +1 = r | ψ i ). For example, for NBA team sequences,the rate of rr and bb bigrams is only 0.35, indicatingstrong anti-persistence. (The rates for CFB, NFL, andNHL are 0.45, 0.44, and 0.49, respectively.) Such an anti-persistence pattern can occur when teams have diﬀerentdegrees of skill at defensive and oﬀensive play, e.g., whenboth teams have oﬀenses that are relatively stronger thanthe opposing team’s defense.To capture these eﬀects, we extend the independentmodel so that each team has an oﬀensive skill parame-ter π oﬀ and a defensive parameter π def . For sports likeAmerican football and basketball, ball possession (oﬀen-sive play) typically alternates after a scoring event. Wemodel this game rule by applying a team’s defensive skillimmediately after it scores and its oﬀensive skill after theother team scores. Under this independent anti-persistentmodel, the probability of scoring event i is P ( ψ i = r | ψ i − ) = (cid:40) π def r (cid:14)(cid:0) π def r + π oﬀ b (cid:1) if ψ i − = rπ oﬀ r (cid:14)(cid:0) π oﬀ r + π def b (cid:1) if ψ i − = b . (10)Finally, we obtain a fourth model by combining therestorative model with the anti-persistent model. VI. MODELING SCORING DYNAMICS

We ﬁt the (i) independent, (ii) restorative, (iii) in-dependent anti-persistent, and (iv) restorative anti-persistent models to the team sequences within a givenseason of each sport, using Markov chain Monte Carloto estimate each model’s parameters. For each, we assess model goodness-of-ﬁt by calculating the held out like-lihood for each model under a 10-fold cross validation.Furthermore, we follow this procedure for each season ofeach sport separately, the results of which are given inTables II–V. By treating seasons independently, we ob-tain multiple model assessments within each sport whilecontrolling for within season variability. For each season,we highlight the two highest scores in blue and the high-est score in bold.In basketball (NBA), we ﬁnd that the restorative anti-persistent model consistently provides the best ﬁt acrossall seasons (Table II), with the second best model be-ing the independent anti-persistent model. These resultsindicate a strong role for both restoration and anti-persistence in driving basketball scoring dynamics. Pre-vious analysis of basketball scoring using random walktheory came to similar conclusions [14].American football (NFL and CFB) shows a diﬀerentresult, with both types of independent model being heav-ily favored over both types of restorative model (TablesIII and IV). The poor ﬁt here of the restorative modelsindicates that the competitive processes that produce arestorative force in basketball are largely absent in Amer-ican football. This diﬀerence may be related to the muchgreater scoring rate in basketball relative to Americanfootball (Fig. 2): an increased scoring rate lowers themarginal value of each scoring event relative to the gameoutcome (who wins), and low value interactions in othersystems are associated with restorative forces [10, 19].Furthermore, the anti-persistent model for NFL is fa-vored in 8 of 10 seasons over the independent model,while in CFB, it is favored in only 2 of 10 seasons. Thatis, anti-persistence appears to play a stronger role in NFLgames than in CFB games. In fact, CFB is the only sportto strongly favor the independent model, a result thatagrees with the our previous simulation results (Fig. 1),which showed that the trivial independent model pro-duced the smallest disagreement for CFB between realand simulated scoring function gradients among the foursports.The results for hockey (NHL) are less clear cut (Ta-ble V). In 8 out of 9 seasons, the independent anti-persistent model is either the best or second best model,and the independent model is best or second best in7 out of 9. On the other hand, the simple restorativemodel wins for 2 seasons, and is second best for one.(The restorative anti-persistent model is a poor ﬁt forall hockey seasons.) We note, however, that the log-likelihoods among these three models are all very close,indicating that each performs about as well as the oth-ers for these data. Given that NHL is also the one sportamong the four that is not anti-persistent by design (pos-session is determined by a “faceoﬀ” after each goal) andthat its scoring function has a negative gradient, we ten-tatively conclude that the restorative model is better.Across seasons, the best overall models appear to beCFB: independent; NFL: independent anti-persistent;NBA: restorative anti-persistent; and NHL: restorative.

TABLE II. Log-likelihoods on held-out data for NBA games.2002 2003 2004 2005 2006 2007 2008 2009 2010Independent -80849 -78814 -84698 -84744 -84795 -86070 -85727 -86314 -85114Restorative -80573 -78506 -84361 -84404 -84469 -85777 -85444 -86005 -84704Independent anti-persistent -75655 -73823 -79151 -78841 -79088 -80174 -79841 -80513 -79386Restorative anti-persistent -75627 -73777 -79097 -78796 -79040 -80141 -79812 -80465 -79297

TABLE III. Log-likelihoods on held-out data for NFL games.2000 2001 2002 2003 2004 2005 2006 2007 2008 2009Independent -1286 -1307 -1408 -1372 -1403 -1373 -1369 -1433 -1484 -1395Restorative -1324 -1347 -1450 -1402 -1451 -1422 -1424 -1466 -1530 -1432Independent anti-persistent -1278 -1290 -1401 -1361 -1392 -1378 -1372 -1425 -1473 -1387

Restorative anti-persistent -1322 -1337 -1450 -1496 -1448 -1427 -1434 -1470 -1520 -1426TABLE IV. Log-likelihoods on held-out data for CFB games.2000 2001 2002 2003 2004 2005 2006 2007 2008 2009Independent -7487 -7575 -8098 -8105 -7675 -7708 -7265 -8673 -8435 -8097Restorative -8114 -8182 -8689 -8656 -8268 -8176 -7884 -9334 -9065 -8777Independent anti-persistent -7486 -7643 -8142 -8201 -7741 -7759 -7328 -8678 -8458 -8078

Restorative anti-persistent -8011 -8113 -8625 -8586 -8198 -8110 -7781 -9214 -8880 -8630TABLE V. Log-likelihoods on held-out data for NHL games.2000 2001 2002 2003 2005 2006 2007 2008 2009Independent -4432 -4238 -4300 -4078 -5026 -4712 -4504 -4755 -4655

Restorative -4432 -4238 -4313 -4056 -5031 -4695 -4511 -4761 -4663Independent anti-persistent -4420 -4237 -4287 -4068 -5020 -4706 -4497 -4761 -4668Restorative anti-persistent -4449 -4254 -4318 -4090 -5045 -4721 -4521 -4787 -4687 −50 0 500.00.20.40.60.81.0 p ( s c o r e | l e a d ) CFB −100 −50 0 50 100

NBA −60−40−20 0 20 40 60 lead size p ( s c o r e | l e a d ) NFL −10 −5 0 5 10 lead sizeNHL

FIG. 4. Probability that a team scores next as a function ofits lead size, for the observed ( yellow ) and simulated ( black )patterns, each with a linear least squares ﬁt line. Each simula-tion uses the best overall skill model for that sport to generatesynthetic point and team sequences.

We check these models by performing a semi-parametricbootstrap, generating synthetic φ and ψ sequences of thesame number and lengths as observed empirically in each season, and comparing the simulated and empirical scor-ing functions. That is, we repeat the assessment of Fig-ure 1, but now using models that can capture depen-dence across sequences. The results show that our skill-based models are a dramatic improvement over simulat-ing each game independently (Fig. 4), agreeing closelywith the empirical scoring patterns in both the gradientand range of lead sizes. VII. PREDICTING OUTCOMES

We now apply our models to two online predictiontasks in each of the sports:

Who will score next? and

Who will win?

For both tasks, we let our models observethe point and team sequences of the ﬁrst T games in aparticular season. We then use these models to predictfor each unobserved game in that season (i) the team se-quence values ψ i for 1 ≤ i ≤ N , and (ii) the identity ofthe winning team, when each model is allowed to observethe game states ( ψ j , φ j ) for 1 ≤ j < i . In the second task,all models predict point values φ i as the mean value (cid:104) φ (cid:105) averaged over all events in the season. We compare ourpredictions to those of three baseline models.The ﬁrst baseline is a na¨ıve leading model, which as-sumes that the team currently in the lead is the stronger A U C CFB

NBA independentrestorativeindependent anti-persrestorative anti-persBradley-Terryfirst order Markovleading proportion of season observed A U C NFL proportion of season observed

NHL

FIG. 5. Probability of accurately predicting which team willscore next (AUC), when models observe diﬀerent fractions ofa season. Based on 95% conﬁdence intervals, our best modelperforms signiﬁcantly better than the baseline models forCFB and NBA, and after observing at least half of the seasonfor NFL and NHL. team and thus more likely to both score next and winthe game. Speciﬁcally, it predicts that team holding thelead at event i will win the next event, i.e., it predicts ψ i +1 = r if L > ψ i +1 = b if L <

0, and will alsowin the game. If L = 0, the model ﬂips a fair coin for r and b .The second baseline is the standard Bradley-Terry model in which we infer latent team skills π from the win-loss records among teams in the observed games. Thismodel is simpler than our independent model, which in-fers team skills using team sequences { ψ } of the observedgames.The third baseline is a simple ﬁrst order Markov model. It predicts that the next team to score will eitherbe the same or diﬀerent than the team that scored last ac-cording to the empirical bigram frequencies { rr, bb, rb, br } observed in the ﬁrst T games of the season. Formally, itpredicts that a team will score again given it scored lasttime as P ( ψ i +1 = ψ i ) = (cid:32) T (cid:88) t =1 N t − (cid:88) i =1 δ ( ψ i +1 , ψ i ) (cid:33)(cid:44)(cid:32) T (cid:88) t =1 N t − (cid:33) . (11)For both prediction tasks, we assess prediction accu-racy via AUC statistic, which gives the probability thata randomly selected true positive is ranked above a ran-domly selected false positive. The AUC is a statisticallyprincipled measure for binary classiﬁcation tasks like ourswhere the cost of an error is the same in either direction(since team labels, r or b , are arbitrary). A. Who will score next?

In the ﬁrst task, we aim to predict which team willscore event i , for each 1 ≤ i ≤ N , given the sequenceof preceding game states ( φ j , ψ j ) for 1 ≤ j < i . For thisonline prediction task, we learn each model’s parametersfrom the ﬁrst T games in a season and then make pre-dictions across all unobserved games within a season andcalculate the AUC for all predictions across all seasons toobtain a single score. Each model observes at least 10%of a season, which ensures that every team has played atleast a few times.The results show that the overall best models identiﬁedin the previous section also tend to be the best predictorsat who will score next (Fig. 5), although some alternativemodels also perform well. For instance, the best model forNFL games early in the season is the ﬁrst order Markovmodel; however, the best NFL model beats this baselineafter about 30% of a season is observed. Similarly, theﬁrst order Markov model performs almost as well as thebest skill model in predicting who will score next in NBAgames, by capturing the known anti-persistence patternin that sport. One of the worst models across all foursports is the leading baseline, which often performs onlyslightly better than chance. B. Who will win?

Predicting who will win a game requires extrapolatingthe point and team sequences to determine the game’sﬁnal outcome. We simplify this task slightly by assum-ing that the number of scoring events N in each game isknown. We then allow the models to learn their parame-ters from the ﬁrst 30% of each season (other choices leadto qualitatively similar results as those reported here).For each game in the remainder of a season, the modelspredict the identity of the winning team when they are al-lowed to observe a progressively greater fraction of gamestates ( φ i , ψ i ) for 0 . ≤ i/N ≤ . FIG. 6. AUC scores for predicting which team will win giventhe current state of the game.

The greatest diﬀerence occurs at the start of the game. Inparticular, the ﬁrst order Markov model performs muchworse than the skill-based models at the beginning be-cause it has no information about the heterogeneity ofteam scoring abilities. As the game progresses the predic-tions tend to converge. This occurs because these modelsall make predictions based on random walks on a binarysequence { r, b } , the diﬀerence being in how they modelthe transition probabilities. Later in the game we extrap-olate less and so the diﬀerences between models becomeless pronounced. VIII. TEAM SKILL EVOLVES OVER TIME

A useful feature of our probabilistic models is the in-terpretability of their parameters, which are meaningfulmeasures of team skill here. By learning these parame-ters independently for each season in each sport, we caninvestigate how team skills have evolved over time.Using the best overall model for each sport, we learnits parameters using all data in each particular seasonand calculate the Spearman rank correlation across teamskills for each pair of seasons (Fig. 7). We ﬁnd thatthe relative ordering of teams by their inferred skills ex-hibits strong serial correlation over time, which appearsas a strong diagonal component in the pairwise correla-tion matrices. The low or inverse correlation in the faroﬀ-diagonal elements, as well as the block-like patternsobserved in CFB and NFL, implies an underlying non-stationarity in team skills for each of the leagues over theroughly 10-year span of data.The manner in which team rosters change over timeis a likely source of such long-term dynamics in relativeteam skill. At short time scales, team rosters are fairlystable, with only a few players changing from season toseason. However, over longer time scales, these changesaccumulate, and rosters separated in time by more than a '00'01'02'03'04'05'06'07'08'09'00'01'02'03'04'05'06'07'08'09

CFB

NBA −0.30−0.150.000.150.300.450.600.750.90'00'01'02'03'04'05'06'07'08'09'00'01'02'03'04'05'06'07'08'09

NFL −0.30−0.150.000.150.300.450.600.750.90 '00'01'02'03'05'06'07'08'09'00'01'02'03'05'06'07'08'09

NHL −0.30−0.150.000.150.300.450.600.750.90

FIG. 7. Correlation of inferred skills over years for each sport.We see that the highest correlations occur along the block di-agonal indicating that adjacent years are more similar. Notethat the scale is diﬀerent for CFB due to a much higher cor-relation across all years. few years are likely to be very diﬀerent, with concomitantdiﬀerences in team skill.The exception to this pattern is CFB, which shows alarger long-term correlation, i.e., a slower rate of changein relative team skills, than in professional sports. Wespeculate that this diﬀerence is caused by the diﬀerencein player mobility between college and professional-levelsports: professional teams operate in a national playermarket, and players can move relatively freely amongteams, while colleges operate as rough regional monopo-lies over the sources of their players.The inferred season-by-season skill orderings them-selves are also of interest, as they reveal the particulartrajectories of individual teams over time. We show vi-sualizations of these trajectories for NBA and NFL inFigures 8 and 9. We omit CFB because there are toomany teams (461) to meaningfully visualize and NHL forspace reasons.For each plot we highlight the two teams that won theleague championship (NFL Super Bowl and NBA Finals)more than once during the period covered by the dataset.It is notable that these teams are not necessarily the mostskilled teams under our model. This is unsurprising, astournaments by bracket are the highest variance methodof identifying the most skilled team [3]. Interestingly, inboth NFL and NBA games, the highlighted teams tendto have strong oﬀensive skills, while their defensive skillsare more variable. This pattern suggests that oﬀensiveskills are more important for winning games, which seemsreasonable given that a strong defense alone cannot wina game.Looking at individual teams, we can see how theirskills change with respect to the total ordering. For in-stance, the Cleveland Cavaliers drafted LeBron James in2003 and went from being ranked the third worst (of-0 I n c r e a s i n g S k ill I n c r e a s i n g S k ill D e f e n s i v e O ff e n s i v e . - . . - . . - . - . . - . - . . - . . - . - . . - . . - . . - . - . . - . . - . - . . - . - . . - . . - . . - . - . . - . . - . - . . - . . - Denver NuggetsMiami HeatCleveland CavaliersChicago BullsToronto RaptorsAtlanta HawksCharlotte BobcatsLos Angeles ClippersBoston CelticsMemphis GrizzliesNew York KnicksDetroit PistonsSeattle SuperSonicsOrlando MagicMilwaukee BucksWashington WizardsNew Orleans HornetsLos Angeles LakersIndiana PacersGolden State WarriorsPhiladelphia 76ersPhoenix SunsHouston RocketsSacramento KingsMinnesota TimberwolvesPortland Trail BlazersNew Jersey NetsDallas MavericksUtah JazzSan Antonio Spurs Cleveland CavaliersNew Jersey NetsWashington WizardsSacramento KingsGolden State WarriorsToronto RaptorsDetroit PistonsMinnesota TimberwolvesCharlotte BobcatsLos Angeles ClippersAtlanta HawksMilwaukee BucksIndiana PacersUtah JazzNew York KnicksPhiladelphia 76ersPhoenix SunsHouston RocketsNew Orleans HornetsPortland Trail BlazersDallas MavericksOrlando MagicChicago BullsSeattle SuperSonicsSan Antonio SpursMemphis GrizzliesBoston CelticsMiami HeatDenver NuggetsLos Angeles Lakers - . . - . - . . - . - . . - . - . . - . - . . - . - . . - . - . . - . - . . - . - . . - . Cleveland CavaliersToronto RaptorsNew York KnicksMilwaukee BucksCharlotte BobcatsLos Angeles ClippersDenver NuggetsMiami HeatAtlanta HawksChicago BullsOrlando MagicBoston CelticsMemphis GrizzliesGolden State WarriorsHouston RocketsNew Orleans HornetsPhoenix SunsDetroit PistonsLos Angeles LakersWashington WizardsIndiana PacersSeattle SuperSonicsPhiladelphia 76ersMinnesota TimberwolvesUtah JazzDallas MavericksPortland Trail BlazersSacramento KingsNew Jersey NetsSan Antonio Spurs Cleveland CavaliersMinnesota TimberwolvesWashington WizardsNew Jersey NetsToronto RaptorsNew York KnicksGolden State WarriorsCharlotte BobcatsSacramento KingsIndiana PacersDetroit PistonsUtah JazzPhoenix SunsMilwaukee BucksLos Angeles ClippersHouston RocketsAtlanta HawksOrlando MagicDenver NuggetsPhiladelphia 76ersSeattle SuperSonicsPortland Trail BlazersDallas MavericksNew Orleans HornetsMemphis GrizzliesSan Antonio SpursLos Angeles LakersBoston CelticsChicago BullsMiami Heat

FIG. 8. NBA defensive ( top ) and oﬀensive ( bottom ) skill rankings. Teams that won more than one NBA ﬁnals game in the dataare highlighted, i.e., Lakers ( orange ) 2002, 2009 and 2010, Spurs ( black ) 2003, 2005 and 2007. - . . - . . - . - .

75 0 . - . . - - . . - .

75 0 . - . . - . . - - .

25 0 - . . - . . - . - . - . . - . . - . . - - . . - . . - . . - - . . - . . - . - . - . . - . . - . . - - . . - . . - . . - - . . - . . - . . - San Francisco 49ersAtlanta FalconsSeattle SeahawksCincinnati BengalsSan Diego ChargersBu ﬀ alo BillsSt. Louis RamsNew England PatriotsCleveland BrownsArizona CardinalsMinnesota VikingsWashington RedskinsNew Orleans SaintsGreen Bay PackersJacksonville JaguarsDenver BroncosDallas CowboysTennessee TitansPittsburgh SteelersDetroit LionsTampa Bay BuccaneersChicago BearsPhiladelphia EaglesKansas City ChiefsCarolina PanthersNew York GiantsHouston TexansNew York JetsIndianapolis ColtsOakland RaidersBaltimore RavensMiami Dolphins St. Louis RamsDetroit LionsOakland RaidersJacksonville JaguarsTampa Bay BuccaneersCarolina PanthersKansas City ChiefsMiami DolphinsPittsburgh SteelersSan Francisco 49ersCleveland BrownsSeattle SeahawksBu ﬀ alo BillsAtlanta FalconsChicago BearsBaltimore RavensWashington RedskinsNew York GiantsIndianapolis ColtsSan Diego ChargersArizona CardinalsNew Orleans SaintsDenver BroncosHouston TexansDallas CowboysCincinnati BengalsNew England PatriotsPhiladelphia EaglesNew York JetsGreen Bay PackersMinnesota VikingsTennessee Titans - . . - . . - . - .

75 0 . - . . - - . . - .

75 0 . - . . - . . - - .

25 0 - . . - . . - . - . - . . - . . - . . - - . . - . . - . . - - . . - . . - . - . - . . - . . - . . - - . . - . . - . . - - . . - . . - . . - Arizona CardinalsCleveland BrownsSan Diego ChargersAtlanta FalconsCincinnati BengalsChicago BearsDallas CowboysNew York JetsOakland RaidersSan Francisco 49ersMiami DolphinsWashington RedskinsPhiladelphia EaglesPittsburgh SteelersKansas City ChiefsIndianapolis ColtsDetroit LionsJacksonville JaguarsNew York GiantsNew England PatriotsCarolina PanthersHouston TexansNew Orleans SaintsBu ﬀ alo BillsSeattle SeahawksBaltimore RavensGreen Bay PackersDenver BroncosSt. Louis RamsTampa Bay BuccaneersMinnesota VikingsTennessee Titans Tampa Bay BuccaneersCleveland BrownsSeattle SeahawksDetroit LionsKansas City ChiefsWashington RedskinsSt. Louis RamsChicago BearsDenver BroncosOakland RaidersArizona CardinalsTennessee TitansCincinnati BengalsNew York GiantsSan Francisco 49ersBu ﬀ alo BillsHouston TexansDallas CowboysNew York JetsJacksonville JaguarsMiami DolphinsAtlanta FalconsCarolina PanthersGreen Bay PackersNew England PatriotsIndianapolis ColtsPhiladelphia EaglesSan Diego ChargersBaltimore RavensPittsburgh SteelersNew Orleans SaintsMinnesota Vikings I n c r e a s i n g S k ill I n c r e a s i n g S k ill D e f e n s i v e O ff e n s i v e FIG. 9. NFL defensive ( top ) and oﬀensive ( bottom ) skill rankings. Teams that won more than one NFL super bowl games inthe data are highlighted, i.e., Patriots ( black ) 2002, 2004 and 2005, Steelers ( orange ) 2006 and 2009.

IX. CONCLUSION

In this work we considered the online prediction tasksof

Who will score next? and

Who will win? based on thesequence of scoring events in the game so far. Our proba-bilistic models based on latent team skills perform well atboth predictive tasks and can predict with a high degreeof certainty ( > X. ACKNOWLEDGEMENTS

We thank Ruben Coen Cagli, Ramsey Faragher, The-ofanis Karaletsos, Marina Kogan, David Edward Lloyd-Jones and Sam Way for helpful conversations, and ac-knowledge support from Grant [1] J. Arkes and J. Martinez. Finally, evidence for a momen-tum eﬀect in the NBA.

J. of Quantitative Analysis inSports , 7(3):article 13, 2011.[2] M. Bar-Eli, S. Avugos, and M. Raab. Twenty years of“hot hand” research: Review and critique.

Psychology ofSport and Exercise , 7(6):525–553, 2006.[3] E. Ben-Naim and N. W. Hengartner. How to choose achampion.

Phys. Rev. E , 76:026106, 2007.[4] J. Berger and D. Pope. Can losing lead to winning?

Man-agement Sci. , 57(5):817–827, 2011.[5] J. Bourbousson, C. S`eve, and T. McGarry. Space-timecoordination dynamics in basketball: Part 2. The interac-tion between the two teams.

J. of Sports Sci. , 28(3):349– 358, 2012.[6] R. A. Bradley and M. E. Terry. Rank analysis of incom-plete block designs: I. the method of paired comparisons.

Biometrika , 39(3-4):324–345, 1952.[7] S. E. Buttrey, A. R. Washburn, and W. L. Price. Esti-mating NHL scoring rates.

J. of Quantitative Analysisin Sports , 7(3):article 24, 2011.[8] A. Clauset, M. Kogan, and S. Redner. Safe leads andlead changes in competitive team sports.

Preprint,arXiv:1503.03509 , 2015.[9] P. Dangauthier, R. Herbrich, T. Minka, and T. Graepel.TrueSkill through time: Revisiting the history of chess.In

Neural Information Processing Systems 20 , pages 337– American Economic Rev. , 88(4):970–983, 1998.[11] A. E. Elo.

The rating of chessplayers, past and present ,volume 3. Batsford, 1978.[12] P. Everson and P. S. Goldsmith-Pinkham. CompositePoisson models for goal scoring.

J. of Quantitative Anal-ysis in Sports , 4(2):article 13, 2008.[13] P. A. Flach, S. Spiegler, B. Gol´enia, S. Price, J. Guiver,R. Herbrich, T. Graepel, and M. J. Zaki. Novel toolsto streamline the conference review process: experiencesfrom SIGKDD’09.

ACM SIGKDD Explorations Newslet-ter , 11(2):63–67, 2010.[14] A. Gabel and S. Redner. Random walk picture of bas-ketball scoring.

J. of Quantitative Analysis in Sports ,8(1):manuscript 1416, 2012.[15] T. Galla and J. D. Farmer. Complex dynamics in learningcomplicated games.

Proc. Natl. Acad. Sci. , 110(4):1232–1236, 2013.[16] T. Gilovich, R. Vallone, and A. Tversky. The hot hand inbasketball: On the misperception of random sequences.

Cognitive Psychology , 17(3):295–314, 1985.[17] M. E. Glickman. Parameter estimation in large dynamicpaired comparison experiments.

J. of the Royal Statisti-cal Society: Series C (Applied Statistics) , 48(3):377–394,1999.[18] M. E. Glickman and H. S. Stern. A state-space modelfor National Football League scores.

J. of the AmericanStatistical Association , 93(441):25–35, 1998.[19] K. Hartley and T. Sandler.

Handbook of defense eco-nomics , volume 2. Elsevier, 2007.[20] R. Herbrich, T. Minka, and T. Graepel. Trueskill(TM):A Bayesian skill rating system. In

Neural InformationProcessing Systems 20 , pages 569–576, 2007.[21] A. Heuer, C. M¨uller, and O. Rubner. Soccer: Is scoringgoals a predictable Poissonian process?

Eur. Phys. Lett. ,89(3):38007, 2010.[22] T.-K. Huang, R. C. Weng, and C.-J. Lin. GeneralizedBradley-Terry models and multi-class probability esti-mates.

J. Machine Learning Research , 7:85–115, 2006.[23] L. M. Hvattum and H. Arntzen. Using ELO ratings formatch result prediction in association football.

Int. J. ofForecasting , 26(3):460–470, 2010.[24] P. Jackson and M. Arkush.

The last season: a team insearch of its soul . Penguin Press, 2004.[25] D. E. Knuth.

The art of computer programming 2:seminumerical algorithms . Addision Wesley, 1998. [26] S. Merritt and A. Clauset. Environmental structure andcompetitive scoring advantages in team competitions.

Sci. Reports , 3:3067, 2013.[27] S. Merritt and A. Clauset. Scoring dynamics across pro-fessional team sports: tempo, balance and predictability.

Eur. Phys. J. Data Sci. , 3(1):article 4, 2014.[28] M. Rabin and D. Vayanos. The gambler’s and hot-handfallacies: Theory and applications.

The Rev. of EconomicStudies , 77(2):730–778, 2010.[29] D. Reed and M. Hughes. An exploration of team sportas a dynamical system.

Int. J. of Performance Analysisin Sport , 6(2):114–125, 2006.[30] H. V. Ribeiro, S. Mukherjee, and X. H. T. Zeng.Anomalous diﬀusion and long-range correlations in thescore evolution of the game of cricket.

Phys. Rev. E ,86(2):022102, 2012.[31] S. Saavedra, S. Mukherjee, and J. P. Bagrow. Is coachingexperience associated with eﬀective use of timeouts inbasketball?

Sci. Reports , 2:676, 2012.[32] C. Sire and S. Redner. Understanding baseball teamstandings and streaks.

Eur. Phys. J. B , 67(3):473–481,2009.[33] P. D. Staudohar. The hockey lockout of 2004-05.

MonthlyLab. Rev. , 128:23–29, 2005.[34] T. St¨ockl, J. Huber, M. Kirchler, and F. Lindner. Hothand belief and gambler’s fallacy in teams: Evidence frominvestment experiments. Technical report, University ofInnsbruck, 2013.[35] D. Tarlow, T. Graepel, and T. Minka. Knowing whatwe don’t know in NCAA football ratings: Understand-ing and using structured uncertainty. MIT Sloan SportsAnalytics Conf., 2014.[36] A. Thomas. Inter-arrival times of goals in ice hockey.

J.of Quantitative Analysis in Sports , 3(3):article 5, 2007.[37] R. C. Vergin. Winning streaks in sports and the misper-ception of momentum.

J. of Sport Behavior , 23(2):181–197, 2000.[38] A. Wald and J. Wolfowitz. On a test whether two samplesare from the same population.

The Annals of Mathemat-ical Statistics , 11(2):147–162, 1940.[39] G. Yaari and S. Eisenmann. The hot (invisible?) hand:can time sequence patterns of success/failure in sports bemodeled as repeated random independent trials?

PLOSONE , 6(10):e24532, 2011.[40] Y. Yue, P. Lucey, P. Carr, A. Bialkowski, andI. Matthews. Learning ﬁne-grained spatial models fordynamic sports play prediction. In