Continuous-time state-space modelling of the hot hand in basketball
CContinuous-time state-space modelling of the hot handin basketball
Sina Mews , ∗ and Marius ¨Otting Bielefeld University, Germany
Abstract
We investigate the hot hand phenomenon using data on 110,513 free throws takenin the National Basketball Association (NBA). As free throws occur at unevenlyspaced time points within a game, we consider a state-space model formulated incontinuous time to investigate serial dependence in players’ success probabilities.In particular, the underlying state process can be interpreted as a player’s (latent)varying form and is modelled using the Ornstein-Uhlenbeck process. Our resultssupport the existence of the hot hand, but the magnitude of the estimated effect israther small.
Keywords: free throws, hot hand, irregularly sampled data, Ornstein-Uhlenbeck process,sports analytics, state-space model
In several areas of society, it remains an open question whether a “hot hand” effect exists,according to which humans may temporarily enter a state during which they perform ∗ Corresponding author; email: [email protected] . a r X i v : . [ s t a t . A P ] N ov etter than on average. While this concept may occur in different fields, such as amonghedge fund managers and artists (Jagannathan et al., 2010; Liu et al., 2018), it is mostprominent in sports. Sports commentators and fans — especially in basketball — oftenrefer to players as having a “hot hand”, and being “on fire” or “in the zone” when theyshow a (successful) streak in performance. In the academic literature, the hot hand hasgained great interest since the seminal paper by Gilovich et al. (1985), who investigateda potential hot hand effect in basketball. They found no evidence for its existence andattributed the hot hand to a cognitive illusion, much to the disapproval of many athletesand fans. Still, the results provided by Gilovich et al. (1985) have often been used as aprimary example for showing that humans over-interpret patterns of success and failurein random sequences (see, e.g., Thaler and Sunstein, 2009; Kahneman, 2011).During the last decades, many studies have attempted to replicate or refute the resultsof Gilovich et al. (1985), analysing sports such as, for example, volleyball, baseball, golf,and especially basketball. Bar-Eli et al. (2006) provide an overview of 24 studies on thehot hand, 11 of which were in favour of the hot hand phenomenon. Several more recentstudies, often based on large data sets, also provide evidence for the existence of the hothand (see, e.g., Raab et al., 2012; Green and Zwiebel, 2017; Miller and Sanjurjo, 2018;Chang, 2019). Notably, Miller and Sanjurjo (2018) show that the original study fromGilovich et al. (1985) suffers from a selection bias. Using the same data as in the originalstudy by Gilovich et al. (1985), Miller and Sanjurjo (2018) account for that bias, and theirresults do reveal a hot hand effect. However, there are also recent studies which providemixed results (see, e.g., Wetzels et al., 2016) or which do not find evidence for the hothand, such as Morgulev et al. (2020). Thus, more than 30 years after the study of Gilovichet al. (1985), the existence of the hot hand remains highly disputed.Moreover, the literature does not provide a universally accepted definition of the hothand effect. While some studies regard it as serial correlation in outcomes (see, e.g.,Gilovich et al., 1985; Dorsey-Palmateer and Smith, 2004; Miller and Sanjurjo, 2018), others2onsider it as serial correlation in success probabilities (see, e.g., Albert, 1993; Wetzelset al., 2016; ¨Otting et al., 2020). The latter definition translates into a latent (state)process underlying the observed performance — intuitively speaking, a measure for aplayer’s form — which can be elevated without the player necessarily being successful inevery attempt. In our analysis, we follow this approach and hence consider state-spacemodels (SSMs) to investigate the hot hand effect in basketball. Specifically, we analysefree throws from more than 9,000 games played in the National Basketball Association(NBA), totalling in 110,513 observations. In contrast, Gilovich et al. (1985) use data on2,600 attempts in their controlled shooting experiment.Free throws in basketball, or similar events in sports with game clocks, occur at un-evenly spaced time points. These varying time lengths between consecutive attempts mayaffect inference on the hot hand effect if the model formulation does not account for thetemporal irregularity of the observations. As an illustrative example, consider an irregu-lar sequence of throws with intervals ranging from, say, two seconds to 15 minutes. Forintervals between attempts that are fairly short (such as a few seconds), a player will mostlikely be able to retain his form from the last shot. On the other hand, if several minuteselapse before a player takes his next shot, it becomes less likely that he is able to retainhis form from the last attempt. However, we found that existing studies on the hot handdo not account for different interval lengths between attempts. In particular, studies in-vestigating serial correlation in success probabilities usually consider discrete-time modelsthat require the data to follow a regular sampling scheme and thus, cannot (directly) beapplied to irregularly sampled data. In our contribution, we overcome this limitation byformulating our model in continuous time to explicitly account for irregular time intervalsbetween free throws in basketball. Specifically, we consider a stochastic differential equa-tion (SDE) as latent state process, namely the Ornstein-Uhlenbeck (OU) process, whichrepresents the underlying form of a player fluctuating continuously around his averageperformance. 3n the following, Section 2 presents our data set and covers some descriptive statistics.Subsequently, in Section 3, the continuous-time state-space model (SSM) formulation forthe analysis of the hot hand effect is introduced, while its results are presented in Section4. We conclude our paper with a discussion in Section 5. We extracted data on all basketball games played in the NBA between October 2012and June 2019 from , covering both regularseasons and playoff games. For our analysis, we consider data only on free throw attemptsas these constitute a highly standardised setting without any interaction between players,which is usually hard to account for when modelling field goals in basketball. We furtherincluded all players who took at least 2,000 free throws in the period considered, totallingin 110,513 free throws from 44 players. For each player, we included only those games inwhich he attempted at least four free throws to ensure that throws did not only followsuccessively (as players receive up to three free throws if they are fouled). A single sequenceof free throw attempts consists of all throws taken by one player in a given game, totallingin 15,075 throwing sequences with a median number of 6 free throws per game (min: 4;max: 39).As free throws occur irregularly within a basketball game, the information on whetheran attempt was successful needs to be supplemented by its time point t k , k = 1 , . . . , T ,where 0 ≤ t ≤ t ≤ . . . ≤ t T , corresponding to the time already played (in minutes) asindicated by the game clock. For each player p in his n -th game, we thus consider anirregular sequence of binary variables Y p,nt , Y p,nt , . . . , Y p,nt Tp,n , with Y p,nt k = t k is successful;0 otherwise .
4n our sample, the proportion of successful free throw attempts is obtained as 0.784.However, there is considerable heterogeneity in the players’ throwing success as the corre-sponding empirical proportions range from 0.451 (Andre Drummond) to 0.906 (StephenCurry). Players can receive up to three free throws (depending on the foul) in the NBA,which are then thrown in quick succession, and the proportion of successful free throwsdiffers substantially between the three attempts, with 0.769, 0.8, and 0.883 obtained forthe first, second, and third free throw, respectively. To account for the position of thethrow in a player’s set of (at most) three free throws, we hence include the dummy vari-ables ft2 and ft3 in our analysis. In our sample, 54.5% of all free throws correspond tothe first, 43.7% to the second, and only 1.8% to the third attempt in a set (cf. Table 1).Furthermore, as the outcome of a free throw is likely affected by intermediate informa-tion on the game — such as a close game leading to pressure situations — we considerseveral further covariates, which were also used in previous studies (see, e.g., Toma, 2017;Morgulev et al., 2020). Specifically, we consider the current score difference ( scorediff ),a home dummy ( home ), and a dummy indicating whether the free throw occurred in thelast 30 seconds of the quarter ( last30 ). Corresponding summary statistics are shown inTable 1.In Table 2, example throwing sequences used in our analysis are shown for free throwstaken by LeBron James in five NBA games. These throwing sequences illustrate thatfree throw attempts often appear in clusters of two or three attempts at the same time
Table 1:
Descriptive statistics of the covariates.mean st. dev. min. max. scorediff −
45 49 home last30 ft2 ft3 able 2:
Throwing sequences of LeBron James.
Miami Heat @ Houston Rockets, November 12, 2012 y James , t k t k (in min.) 11.01 11.01 23.48 23.48 46.68 46.68 47.09 47.09 - - - - -Miami Heat @ Los Angeles Clippers, November 14, 2012 y James , t k t k (in min.) 9.47 23.08 23.08 24.73 36.95 36.95 41 41 42.77 - - - -Milwaukee Bucks @ Miami Heat, November 21, 2012 y James , t k t k (in min.) 7.5 7.5 10.02 10.02 35.62 35.62 42.68 42.68 - - - - -Cleveland Cavaliers @ Miami Heat, November 24, 2012 y James , t k t k (in min.) 11.62 21.07 21.07 21.38 21.38 31.95 31.95 32.68 32.68 - - - -Los Angeles Lakers @ Miami Heat, January 23, 2014 y James , t k t k (in min.) 9.28 9.28 20.53 25.97 25.97 31.57 31.57 33.43 33.43 42.1 42.1 47.62 47.62 (depending on the foul), followed by a time period without any free throws. Therefore,it is important to take into account the different lengths of the time intervals betweenconsecutive attempts as the time elapsed between attempts affects a player’s underlyingform. Following the idea that the throwing success depends on a player’s current (latent) form(see, e.g., Albert, 1993; Wetzels et al., 2016; ¨Otting et al., 2020), we model the observedfree throw attempts using a state-space model formulation as represented in Figure 1.The observation process corresponds to the binary sequence of a player’s throwing success,while the state process can be interpreted as a player’s underlying form (or “hotness”). Wefurther include the covariates introduced in Section 2 in the model, which possibly affect aplayer’s throwing success. In particular, we model the binary response of throwing success Y p,nt k using a Bernoulli distribution with the associated success probability π p,nt k being afunction of the player’s current state S p,nt k and the covariates. Dropping the superscripts p n for notational simplicity from now on, we thus have Y t k ∼ Bern( π t k ) , logit( π t k ) = S t k + β ,p + β home + β scorediff + β last30 + β ft2 + β ft3 , (1)where β ,p is a player-specific intercept to account for differences between players’ averagethrowing success. To address the temporal irregularity of the free throw attempts, weformulate the stochastic process { S t } t ≥ in continuous time. Furthermore, we require thestate process to be continuous-valued to allow for gradual changes in a player’s form,rather than assuming a finite number of discrete states (e.g. three states interpreted ascold vs. normal vs. hot; cf. Wetzels et al., 2016; Green and Zwiebel, 2017). In addition,the state process ought to be stationary such that in the long-run a player returns to hisaverage form. A natural candidate for a corresponding stationary, continuous-valued andcontinuous-time process is the OU process, which is described by the following SDE: dS t = θ ( µ − S t ) dt + σdB t , S = s , (2)where θ > µ ∈ R , while σ > B t denotes the Brownian motion. We further specify µ = 0 such that the state processfluctuates around a player’s average form, given the current covariate values. Specifically,positive values of the state process indicate higher success probabilities, whereas nega-tive values indicate decreased throwing success given the player’s average ability and thecurrent game characteristics.As shown in Figure 1, we model the hot hand effect as serial correlation in successprobabilities as induced by the state process: while the observed free throw attemptsare conditionally independent, given the underlying states, the unobserved state processinduces correlation in the observation process. Regarding the hot hand effect, the drift7 t S t S t ... Y t Y t Y t throwing success (observed)player’s form (hidden) Figure 1:
Dependence structure of our SSM: the throwing success Y t k is assumed to bedriven by the underlying (latent) form of a player. To explicitly account for the irregulartime intervals between observations, we formulate our model in continuous time.parameter θ of the OU process is thus of main interest as it governs the speed of reversion(to the average form). The smaller θ , the longer it takes for the OU process to return toits mean and hence the higher the serial correlation. To assess whether a model includingserial dependence (i.e. an SSM) is actually needed to describe the structure in the data, weadditionally fit a benchmark model without the underlying state variable S t k in Equation(1). Consequently, the benchmark model corresponds to the absence of any hot handeffect, i.e. a standard logistic regression model. The likelihood of the continuous-time SSM given by Equations (1) and (2) involves inte-gration over all possible realisations of the continuous-valued state S t k , at each observationtime t , t , . . . , t T . For simplicity of notation, let the integer τ = 1 , , . . . , T denote the number of the observation in the time series, such that Y t τ shortens to Y τ and S t τ shortensto S τ . Further, t τ represents the time at which the observation τ was collected. Then the8ikelihood of a single throwing sequence y , . . . , y T is given by L T = (cid:90) . . . (cid:90) Pr( y , . . . , y T , s , . . . , s T ) ds T . . . ds = (cid:90) . . . (cid:90) Pr( s )Pr( y | s ) T (cid:89) τ =2 Pr( s τ | s τ − )Pr( y τ | s τ ) ds T . . . ds , (3)where we assume that each player starts a game in his stationary distribution S ∼N (cid:16) , σ θ (cid:17) , i.e. the stationary distribution of the OU process. Further, we assume Y τ to beBernoulli distributed with corresponding state-dependent probabilities Pr( y τ | s τ ) addition-ally depending on the current covariate values (cf. Equation (1)), while the probabilitiesof transitioning between the states Pr( s τ | s τ − ) are normally distributed as determined bythe conditional distribution of the OU process: S τ | S τ − = s ∼ N (cid:18) e − θ ∆ τ s, σ θ (cid:0) − e − θ ∆ τ (cid:1)(cid:19) , (4)where ∆ τ = t τ − t τ − denotes the time difference between consecutive observations.Due to the T integrals in Equation (3), the likelihood calculation is intractable. Torender its evaluation feasible, we approximate the multiple integral by finely discretisingthe continuous-valued state space as first suggested by Kitagawa (1987). The discretisationof the state space can effectively be seen as a reframing of the model as a continuous-time hidden Markov model (HMM) with a large but finite number of states, enablingus to apply the corresponding efficient machinery. In particular, we use the forwardalgorithm to calculate the likelihood, defining the possible range of state values as [ − , According to the information criteria AIC and BIC, the continuous-time model formulationincluding a potential hot hand effect is clearly favoured over the benchmark model withoutany underlying state process (∆AIC = 61.20, ∆BIC = 41.97). The parameter estimates ofthe OU process, which represents the underlying form of a player, as well as the estimatedregression coefficients are shown in Table 3. In particular, the estimate for the driftparameter θ of the OU process is fairly small, thus indicating serial correlation in thestate process over time. However, when assessing the magnitude of the hot hand effect,we also observe a fairly small estimate for the diffusion coefficient σ . The correspondingstationary distribution is thus estimated as N (0 , . ), indicating a rather small range Table 3:
Parameter estimates with 95% confidence intervals.parameter estimate 95% CI θ (drift) 0.042 [0.016; 0.109] σ (diffusion) 0.101 [0.055; 0.185] β ( home ) 0.023 [-0.009; 0.055] β ( scorediff ) 0.030 [0.011; 0.048] β ( last30 ) 0.003 [-0.051; 0.058] β ( ft2 ) 0.223 [0.192; 0.254] β ( ft3 ) 0.421 [0.279; 0.563]10 time s t a t e s u cc e ss p r ob . Figure 2:
Simulation of possible state trajectories for the length of an NBA gamebased on the estimated parameters of the OU process. The red dashed line indicates theintercept (here: the median throwing success over all players), around which the processesfluctuate. The right y-axis shows the success probabilities resulting from the current state(left y-axis), given that the explanatory variables equal 0. The graphs were obtained byapplication of the Euler-Maruyama scheme with initial value 0 and step length 0.01.of the state process, which becomes apparent also when simulating state trajectories basedon the parameter estimates of the OU process (cf. Figure 2). Still, the associated successprobabilities, given that all covariate values are fixed to zero, vary considerably during thetime of an NBA game (cf. right y-axis of Figure 2). While the state process and hencethe resulting success probabilities slowly fluctuate around the average throwing success(given the covariates), the simulated state trajectories reflect the temporal persistenceof the players’ underlying form. Thus, our results suggest that players can temporarilyenter a state in which their success probability is considerably higher than their averageperformance, which provides evidence for a hot hand effect.Regarding the estimated regression coefficients, the player-specific intercepts ˆ β ,p rangefrom -0.311 to 2.192 (on the logit scale), reflecting the heterogeneity in players’ throwingsuccess. The estimates for β to β are displayed in Table 3 together with their 95%11onfidence intervals. The chance of making a free throw is slightly increased if the gameis played at home ( β ) or if a free throw occurs in the last 30 seconds of a quarter ( β ),but both corresponding confidence intervals include the zero. In contrast, the confidenceinterval for the score difference ( β ) does not include the zero and its effect is positivebut small, indicating that the higher the lead, the higher is, on average, the chance tomake a free throw. The position of the throw, i.e. whether it is the first, second ( β ), orthird ( β ) attempt in a row, has the largest effect of all covariates considered: comparedto the first free throw, the chance of a hit is considerably increased if it is the second or,in particular, the third attempt, which was already indicated by the descriptive analysispresented in Section 2. However, this strong effect on the success probabilities is probablycaused by the fact that three free throws in a row are only awarded if a player is fouledwhile shooting a three-point field goal, which, in turn, is more often attempted by playerswho regularly perform well at free throws.To further investigate how the hot hand may evolve during a game, we compute themost likely state sequences, corresponding to the underlying form of a player. Specifically,we seek ( s ∗ , . . . , s ∗ T ) = argmax s ,...,s T Pr( s , . . . , s T | y , . . . , y T ) , where s ∗ , . . . , s ∗ T denotes the most likely state sequence given the observations. As wetransferred our continuous-time SSM to an HMM framework by finely discretising the statespace (cf. Section 3.2), we can use the Viterbi algorithm to calculate such sequences at lowcomputational cost (Zucchini et al., 2016). Figure 3 shows the most likely states underlyingthe throwing sequences presented in Table 2. While the decoded state processes fluctuatearound zero (i.e. a player’s average throwing success), the state values vary slightly overthe time of an NBA game. Over all players, the decoded states range from -0.42 to 0.46,again indicating that the hot hand effect as modelled by the state process is rather small.The decoded state sequences in Figure 3 further allow to illustrate the advantagesand the main idea of our continuous-time modelling approach. For example, consider the12 a t c h m a t c h m a t c h m a t c h m a t c h
10 20 30 40−0.2−0.10.00.1−0.2−0.10.00.1−0.2−0.10.00.1−0.2−0.10.00.1−0.2−0.10.00.1 time s t a t e Figure 3:
Decoded states underlying the throwing sequences of LeBron James shown inTable 2. Successful free throws are shown in yellow, missed shots in black.throwing sequence in the second match shown, where LeBron James only made a singlefree throw of his first four attempts. The decoded state at throw number 3 is -0.092 (cf.Figure 3) and the time passed between throw number 3 and 4 is 1.65 minutes (cf. Table 2).Thus, the value of the state process at throw number 4 is drawn from a normal distribution,given the decoded state of the previous attempt, with mean e − . · . ( − . − . . · . (1 − e − · . · . ) = 0 .
016 (cf. Equation (4)). Accordingly, the valueof the state process for throw number 5 is drawn from a normal distribution with mean − .
050 and variance 0 . In our analysis of the hot hand, we used SSMs formulated in continuous time to modelthrowing success in basketball. Focusing on free throws taken in the NBA, our resultsprovide evidence for a hot hand effect as the underlying state process exhibits some per-sistence over time. In particular, the model including a hot hand effect is preferred overthe benchmark model without any underlying state process by information criteria. Al-though we provide evidence for the existence of a hot hand, the magnitude of the hothand effect is rather small as the underlying success probabilities are only elevated by afew percentage points (cf. Figures 2 and 3).A minor drawback of the analysis arises from the fact that there is no universallyaccepted definition of the hot hand. In our setting, we use the OU process to model players’continuously varying form and it is thus not clear which values of the drift parameter θ correspond to the existence of the hot hand. While lower values of θ refer to a slowerreversion of a player’s form to his average performance, a further quantification of themagnitude of the hot hand effect is not possible. In particular, a comparison of our resultsto other studies on the hot hand effect proves difficult as these studies apply differentmethods to investigate the hot hand.In general, the modelling framework considered provides great flexibility with regardto distributional assumptions. In particular, the response variable is not restricted tobe Bernoulli distributed (or Gaussian, as is often the case when making inference oncontinuous-time SSMs), such that other types of response variables used in hot handanalyses (e.g. Poisson) can be implemented by changing just a few lines of code. Ourcontinuous-time SSM can thus easily be applied to other sports, and the measure for14uccess does not have to be binary as considered here. For readers interested in adoptingour code to fit their own hot hand model, the authors can provide the data and code usedfor the analysis. Acknowledgements
We thank Roland Langrock, Christian Deutscher, and Houda Yaqine for stimulating dis-cussions and helpful comments. 15 eferences
Albert, J. (1993). A statistical analysis of hitting streaks in baseball: Comment.
Journalof the American Statistical Association , 88(424):1184–1188.Bar-Eli, M., Avugos, S., and Raab, M. (2006). Twenty years of “hot hand” research:review and critique.
Psychology of Sport and Exercise , 7(6):525–553.Chang, J. C. (2019). Predictive Bayesian selection of multistep Markov chains, applied tothe detection of the hot hand and other statistical dependencies in free throws.
RoyalSociety Open Science , 6(3):182174.Dorsey-Palmateer, R. and Smith, G. (2004). Bowlers’ hot hands.
The American Statisti-cian , 58(1):38–45.Gilovich, T., Vallone, R., and Tversky, A. (1985). The hot hand in basketball: on themisperception of random sequences.
Cognitive Psychology , 17(3):295–314.Green, B. and Zwiebel, J. (2017). The hot-hand fallacy: cognitive mistakes or equi-librium adjustments? Evidence from Major League Baseball.
Management Science ,64(11):4967–5460.Jagannathan, R., Malakhov, A., and Novikov, D. (2010). Do hot hands exist among hedgefund managers? An empirical evaluation.
The Journal of Finance , 65(1):217–255.Kahneman, D. (2011).
Thinking, Fast and Slow . New York: Farrar, Straus and Giroux.Kitagawa, G. (1987). Non-gaussian state-space modeling of nonstationary time series.
Journal of the American Statistical Association , 82(400):1032–1041.Liu, L., Wang, Y., Sinatra, R., Giles, C. L., Song, C., and Wang, D. (2018). Hot streaksin artistic, cultural, and scientific careers.
Nature , 559(7714):396–399.16ews, S., Langrock, R., ¨Otting, M., Yaqine, H., and Reinecke, J. (2020). Maximumapproximate likelihood estimation of general continuous-time state-space models. arXivpreprint arXiv:2010.14883 .Miller, J. B. and Sanjurjo, A. (2018). Surprised by the hot hand fallacy? A truth in thelaw of small numbers.
Econometrica , 86(6):2019–2047.Morgulev, E., Azar, O. H., and Bar-Eli, M. (2020). Searching for momentum in NBAtriplets of free throws.
Journal of Sports Sciences , 38(4):390–398.¨Otting, M., Langrock, R., Deutscher, C., and Leos-Barajas, V. (2020). The hot hand inprofessional darts.
Journal of the Royal Statistical Society (Series A) , 183(2):565–580.Raab, M., Gula, B., and Gigerenzer, G. (2012). The hot hand exists in volleyball and isused for allocation decisions.
Journal of Experimental Psychology: Applied , 18(1):81–94.Thaler, R. H. and Sunstein, C. R. (2009).
Nudge: Improving Decisions about Health,Wealth, and Happiness . London: Penguin.Toma, M. (2017). Missed shots at the free-throw line: analyzing the determinants ofchoking under pressure.
Journal of Sports Economics , 18(6):539–559.Wetzels, R., Tutschkow, D., Dolan, C., van der Sluis, S., Dutilh, G., and Wagenmakers,E.-J. (2016). A Bayesian test for the hot hand phenomenon.
Journal of MathematicalPsychology , 72:200–209.Zucchini, W., MacDonald, I. L., and Langrock, R. (2016).