Stop the Clock: Are Timeout Effects Real?
SStop the Clock: Are Timeout Effects Real?
Niander Assis [ (cid:0) ], Renato Assun¸c˜ao, and Pedro O.S. Vaz-de-Melo
Departmento de Ciˆencia da Computa¸c˜aoUniversidade Federal de Minas Gerais, Belo Horizonte, Brazil { niander,assuncao,olmo } @dcc.ufmg.br Abstract.
Timeout is a short interruption during games used to commu-nicate a change in strategy, to give the players a rest or to stop a negativeflow in the game. Whatever the reason, coaches expect an improvementin their teams performance after a timeout. But how effective are thesetimeouts in doing so? The simple average of the differences between thescores before and after the timeouts has been used as evidence that thereis an effect and that it is substantial. We claim that these statisticalaverages are not proper evidence and a more sound approach is needed.We applied a formal causal framework using a large dataset of officialNBA play-by-play tables and drew our assumptions about the data gen-eration process in a causal graph. Using different matching techniquesto estimate the causal effect of timeouts, we concluded that timeoutshave no effect on teams performances. Actually, since most timeouts arecalled when the opposing team is scoring more frequently, the momentsthat follow resemble an improvement in the teams performance but arejust the natural game tendency to return to its average state. This isanother example of what statisticians call the regression to the mean phenomenon.
Keywords: causal inference · sports analytics · timeout effect · momen-tum · bayesian networks In sports, timeout is a short interruption in a play commonly used to stop anegative flow in the game, to discuss a strategy change, or to rest the players [3].As this is the most direct way coaches can intervene during a game, their influenceand strategic ability is best expressed during these events. A timeout is usuallycalled when a team has a rather long streak of score losses [7, 24]. Popular beliefand research [5, 11, 16, 18, 21] have found a positive effect on teams performancesafter the timeout. That is, on average, the team asking for the timeout recoversfrom the losses by scoring positively immediately after. This observed differencehas been wrongly used as evidence that the timeout has a real and positiveeffect on teams performance. In order to answer such causal question, a formalcounterfactual analysis should be used, and that is what we propose in this work.There is an intense interest on causal models to analyze non-experimentaldata since causal reasoning can answer questions that machine learning itself a r X i v : . [ s t a t . A P ] S e p N. Assis et al. cannot [15]. Our approach is built on top of these causal inference approachesthat are briefly introduced in Section 3. For each timeout event at time t r in thedatabase, we found a paired moment t c in the same game when no timeout hasbeen called that serves as a control moment for t r , reflecting what would havehappened the timeout had not been called. This control moment is chosen basedon other variables about the current game instant, which were drawn in a causalgraph that depict our assumptions about the generation of the data and, as wewill further discuss, asses if the causal effect can be estimated without bias. Inorder to quantify how the game changed just after a given moment t , we proposed Short-term Momentum Change (STMC), which is discussed in Section 4.1.After using a matching approach to construct our matched data (pairs of( t r , t c )), we found virtually no difference between the distribution of the STMCfor real timeouts and control instants, i.e., the estimated timeout effect is veryclose to zero or non-existent. Hence, we conclude that the apparent positiveeffect of timeouts is another example of the well-known regression to the meanfallacy [1]. The dynamic match score fluctuates naturally and, after an intenseincrease, commonly returns towards a mild variation. Thus, because timeoutsare usually called near the extreme moments, as we will show, the game seemsto benefit to those loosing. In summary, the main contributions of this paper arethe following: – We proposed a metric called
Short-term Momentum Change (STMC) toquantify how much the game momentum changes after a time moment t r associated with an event, such as a regular ball possession or an interruptionof the game; – We collected and organized a large dataset covering all the play-by-playinformation for all National Basketball Association (NBA) games of the fourregular seasons from 2015 to 2018. A single season has over 280 thousands game instants , as we define in Section 4, and over 17 thousand timeout events; – After a detailed causal inference analysis to evaluate the timeout effect of the
Short-term Momentum Change , we did not find evidence that the timeouteffect exists or that its effect size is meaningful. Inspired by others, we didalso consider two other settings in which the timeout effect could be different:(i) only the last five minutes of the games and (ii) everything but the lastfive minutes.The next section describes the previous work carried out on the effect oftimeouts and discusses how our work distance from them. Section 3 gives abackground on causal inference and the statistical models adopted. We startSection 4 by summarizing timeout rules in the NBA and describing our dataset,the play-by-play tables. In Section 4.1 we introduce our outcome variable ofinterest, the
Short-term Momentum Change , and in Section 4.2 our causal model.In Section 4.3, we describe our treatment and control groups and, in Section 4.4,we explain our matching approaches. All the results are presented in Section 5.We close the paper in Section 6 with our conclusions. top the Clock: Are Timeout Effects Real? 3
Timeouts are used and implemented in team sports for several reasons, such asto rest or change players, to inspire morale, to discuss plays, or to change thegame strategy [20]. However, timeouts are mostly used to stop a negative flowin the game [7, 24], which is popularly referred as “the game momentum.” Inbasketball, momentum arises when one team is scoring significantly more thanthe other [11, 22].Several earlier studies analyzed the effect timeouts have for decreasing theopponent’s momentum in the game [5, 11, 18, 21]. These studies analyze theeffect of timeouts on teams’ performance just before and after it was called.For instance, by using a small sample of seven televised games from the 1989National Collegiate Athletic Association (NCAA) tournament, Mace et al. [11]recorded specific events of interest, which were classified as either
Reinforcers (e.g. successful shots) or
Adversities (e.g. turnovers) and verified that the rate ofthese events change significantly among teams in the 3 minutes before and aftereach timeout. They found that while the team that called the timeout improvedits performance, the opponent team decreased it. Other works reached the sameconclusions using similar methodologies and different data sets [5, 18, 21].To the best of our knowledge, Permutt [16] was the first to acknowledgethe regression to the mean phenomenon in such analysis. Permutt consideredspecific game moments—timeouts called for after a team suffered a loss of sixconsecutive points. Similar to others, the short-term scoring ratio was observedto be higher after timeouts. However, in contrast to others, the paper comparesreal timeouts with other similar game moments without a timeout. With suchanalysis, Permutt found that timeouts can be effective at enhancing performance,but at a small magnitude. The most significant result shows that the home-teamwith a “first-half restriction” presents a 0 .
21 increase in average ratio for the nextten points. Calling a timeout predicts that the home-team will score 5 .
47 out ofthe next ten points as opposed to 5 .
26 points when a timeout is not called. Thus,the conclusion is that timeouts do not have any significant effect in changingthe momentum of a game, i.e., using 6-0 runs as an indicator of instances wheremomentum would be a factor, teams were successful at “reversing” momentumeven without the timeout as a mediator.Although the work of Permutt [16] innovates by considering counterfactuals,the analyses still leave room for reasonable doubts about the reality of thetimeout effect. It fails to take into account the existence of other importantfactors that could also influence on the momentum change and confound thetrue timeout effect. As a result, spurious correlations could have caused the lackof effect observed in the data. In our work, we take into account other factorssuch as coaches and teams abilities, stadium and match conditions, clock time,quarter and relative score between the teams. More important, different from allthe studies described in this section, we adopt a formally defined causal modelapproach [14] with a counterfactual analysis with its constructed control group.We show in a compelling way that timeouts do not have an effect in teamsperformances.
N. Assis et al.
For illustration, consider Y our outcome variable and A ∈ { , } the treatmentvariable. Regardless of the actual value of A , we define Y A =1 to be the value of Y had A been set to A = 1 and Y A =0 to be the value of Y had A been set to A = 0.We say there is a causal effect of A on Y if Y A =0 (cid:54) = Y A =1 and, conversely, thereis no causal effect or the effect is null if Y A =0 = Y A =1 . These defined values arecalled potential outcomes [12] because just one potential outcome is factual, trulyobserved, while the others are counterfactuals. Therefore, we cannot generallyidentify the causal effects of a single individual. This problem is known as the Fundamental Problem of Causal Inference (FPCI) [8].Nevertheless, in most causal inference settings, the real interest is in thepopulation level effect, or the average causal effect defined by E ( Y A =1 − Y A =0 ). E ( Y | A = 1) − E ( Y | A = 0) gives a reliable estimate in randomized experimentalstudies, where treatment A is assigned randomly to each unit. However, inobservational ones, we need to collect more information to control for and tomake assumptions. One important assumption is the conditional ignorability [19].This assumption is satisfied if, given a vector of covariates X , the treatmentvariable A is conditionally independent of the potential outcomes ( Y A =0 ⊥⊥ A | X and Y A =1 ⊥⊥ A | X ), and there is a positive probability of receiving treatmentfor all values of X (0 < P( A = 1 | X = x ) < x ). The conditionalignorability assumption allows us to state that E ( Y A =1 − Y A =0 | X ) = E ( Y | A =1 , X ) − E ( Y | A = 0 , X ) for every value of X . This represents the basic rationalebehind the matching technique.The simplest matching technique is the exact matching . For each possible X = x , we form two subgroups: one composed by individuals that received thetreatment and have X = x , and the other by individuals that did not receive thetreatment and have X = x . Unfortunately, exact matching is not feasible when thenumber of covariates is large or some are continuous. As an alternative, examplesare usually matched according to a distance metric d ij = d ( x i , x j ) betweenthe covariate configurations of pairs ( i, j ) of observations. The Mahalanobisdistance [23] is a common choice for such distance metric as it takes into accountthe correlation between the different features in the vector X . Another option isto use propensity score [19] for estimation of causal effects, which is defined asprobability of receiving treatment given the covariates, i.e., s ( X ) = P ( A = 1 | X ).Rosenbaum and Rubin proved that it is enough to just match on a distancecalculated using the scalar scores s ( x ), rather than the entire vector x [19].In general, each matching approach can be implemented using algorithmsthat are mainly classified as either greedy or optimal . The greedy ones, alsoknown as nearest-neighbor matching , matches the i -th treated example with theavailable control example j that has the smallest distance d ij . optimal matching ,however, takes into account the whole reservoir of examples since the goal is togenerate a matched sample that minimizes the total sum of distances betweenthe pairs. Such optimal approach can be preferred in situations where there aregreat competition for controls. For a good review on different matching methodsfor causal inference, see [23]. top the Clock: Are Timeout Effects Real? 5 Whatever matching technique used, its success can be partially judged byhow balanced out are the covariates in the treatment and control groups. Bypruning unmatched examples from the dataset, the control and treated groups ofthe remaining matched sample should have similar covariate distributions, whenwe say that matching achieves balance of the covariates distribution.
According to the NBA 2016-2017 season official rules, in a professional NBAregular game, each team is entitled to six full-length timeouts and one 20-secondtimeout for each half. A full-length timeout can be of 60 seconds or 100 seconds,depending on when the timeout was requested. Also, every game has four regularperiods plus the amount of overtime periods necessary on the occurrence of ties.There is a specific amount of timeouts expected in each period for commercialpurposes. If neither team calls a timeout before a specific time, thus not fulfillingthe next expected timeout, the official scorer stops the game and calls a timeout.The timeout is charged to the team that has not been charged before, startingwith the home team. These timeouts are called mandatory or official timeouts.In basketball games, possessions are new opportunities to score in the game.Each possession starts from the moment a team gets hold of the ball until oneof his players scores, commits a fault, or loses the ball in defensive rebounds orturnovers. The total number of possessions are guaranteed to be approximately thesame for both teams at the end of a match, so it provides a good standardizationfor the points scored by each team [9]. Indeed, most of basketball statistics arealready given in a per possession manner.Play-by-play tables capture the main play events such as goal attempts,rebounds, turnovers, faults, substitutions, timeouts and end of quarters (periods).For each play, we have the time in which it happened, the players and/or theteam involved and any other relevant information, e.g., the score just after theplay is recorded. Each play event is recorded as a new line in the table. Whileball possessions are not clearly recorded in play-by-play tables, one can identifyevery change of possession from observing the game events.In this work, we use play-by-play tables to identify the ball possessions anduse them to observe how the teams’ performance change when timeouts are called.We have identified every change of possession alongside the main interruptions—timeouts and end of quarters—in each game. Every change of possession and everymain interruption is considered a new game instant . We model each basketballgame as a series of discrete game instants . More formally, a game instant iseither (1) a regular ball possession; or (2) a major game interruption, whichcan be a regular timeout, an official scorer timeout, or the end of a quarter.Player substitutions and fouls were not considered as a main interruption. Infact, substitutions can happen when the ball stops and not only during timeouts. N. Assis et al.
Here we describe our outcome variable associated with teams’ performance, forwhich we aim to estimate how it is affected by the timeouts. Let { P t } be anunivariate stochastic process associated to a single match and indexed by thediscrete game instants t . At any timeout moment, the team calling the timeoutis defined as the target team . The P t random variable at the end of the t -th gameinstant is the score of the target team minus the opposing team’s score and it iscalled the scoring margin. Hence, P t is a positive quantity when the target teamis winning the match at game instant t and negative otherwise. At the end ofthe t -th game instant, P t = P t − in two situations. First, if t is a regular ballpossession instant whose attacking team did not score, i.e., the possession endedwith a turnover or defensive rebound. Second, if t is a main interruption instantand, consequently, none of the teams had the opportunity to score.In order to evaluate how “momentum” changes after a game instant, we usethe Short-term Momentum Change (STMC), which is the amount by which the scoring margin per possession rate changes right after an game instant. For anygame instant t and a positive integer λ >
0, we define the STMC, Y λt , as thethe average rate of change from P t to P t + λ ( ∆P t + λt ) minus the average rate ofchange from P t − λ − to P t − ( ∆P tt − λ ). Note that we do not take into accountthe possible change in scoring margin caused in game instant t (the change from P t − to P t ): Y λt = P t + λ − P t λ − P t − − P t − λ − λ = ∆P t + λt − ∆P tt − λ (1)for t − λ ≥ t + λ ≤ n , where n is the total number of game instants in agiven game.The hyper-parameter λ controls the time window used to evaluated howthe game scoring dynamics changes around t . To balance out the offensive anddefensive ball possessions, λ must be an even integer. Also, the variable Y λt shouldonly be evaluated if the interval [ t − λ, t + λ ] contains no game interruptions, withthe possible exception of t . In a causal perspective, λ represents our assumptionfor how many game instants that the interference (calling a timeout or not) atgame instant t can influence and is influenced by, in the short-term.Let A t be the binary indicator that a timeout has been called at time t . Wewill denote A t = 1 if a team calls a timeout right before the game instant t and A t = 0 if t is a regular ball possession. If we find the set of covariates X that satisfy the conditional ignorability assumption, we can apply a matchingtechnique and our average causal effect of interest, E ( Y λA t =1 − Y λA t =0 ), can beestimated taking the difference in means from the matched treatment and controlgroups. The estimated timeout effect TE is defined as: TE = E ( Y λt | A t = 1) − E ( Y λt | A t = 0) . (2)Every game is composed by two teams, the home and the away team. Becausewe want to estimate the causal effect of timeouts on the performance of the teamthat actually asked for it , we decided to estimate the average causal effect of top the Clock: Are Timeout Effects Real? 7 A t ∆P t + λt ∆P tt − λ X t U Y λt Fig. 1.
A causal graph to model the timeout effect. timeouts called by the home teams ( TE h ) and the away teams ( TE a ), separately.We proceed now to present our causal model which encodes our assumptions. Pearl [14] suggests the use of directed acyclic graphs (DAGs) as a way of encodingcausal model assumptions with nodes representing the random variables andthe direct edges representing direct causal relationships. One can identify insuch graph a set of variables (or nodes) that satisfies the so called back-doorcriterion [14]. These are variables that blocks all back-door paths from A (thetreatment variable) to Y (the outcome variable) and does not include any descen-dants of A . Given that the graphical model includes all important confoundingvariables, it can be shown that conditioning on them suffices to remove all non-causal dependencies between A and Y . In other words, it leaves only causaldependence that corresponds to the causal effect.There are many factors that can potentially influence the short-term perfor-mance (STMC) of the team that called a timeout after a given game instant.These can be intra-game factors, which vary along the game, such as the scoringmargin, the quarter and the time since the start of the quarter, or inter-game factors, which vary from game to game, such as the venue conditions, the atten-dance at the venue, the specific adversary team, the players available and theteams’ momentum in the season.It is very intuitive why intra-game factors, which are specific to a game instant,are considered a cause of both the treatment and outcome, thus being considereda confounder. In a not so straightforward way, some inter-game factors are alsovery likely to affect both the treatment and outcome. For example, a team playingagainst a stronger or a weaker adversary would differently request the availabletimeouts and the afterwards performance may be differently affected.Figure 1 shows our causal model graph. Each game instant t can either receivethe treatment assignment A t = 1 or A t = 0, meaning that the game instant is atimeout or a regular ball possession, respectively. The variables X t represent the N. Assis et al. observed covariates, which are intra-game factors specific to the game instant t : (i) the current quarter (period) ( Q t ), (ii) the current scoring margin ( P t ) and(iii) the current time in seconds since the start of the period ( S t ). The variablesrepresented by the node U are the inter-game factors, or the covariates relatedto a specific game that influence both the treatment assignment and the gameoutcome as exemplified in the last paragraph. Most of these variables are notdirectly observed or very difficult to measure—players and coach strategies, teams’relative skill difference and venue conditions. Hence, we include them in ourgraph as a dashed circle. The average rates of scoring margin change before( ∆P tt − λ ) and after ( ∆P t + λt ) the game instant t are also in the graph, as well asthe outcome Y λt that is connected by dashed edges since it is a deterministicnode—a logical function of the other two stochastic nodes.We are interested in the causal effect of A t on Y λt . Since ∆P tt − λ is a directcause of A t and not the reverse—for obvious chronological reasons—, we actuallywant to measure the causal effect of A t on ∆P t + λt . According to the back-door criterion [14], if we adjust for U , X t and ∆P tt − λ we block any non-causal influenceof A t on ∆P t + λt . Because we want to estimate TE as defined in Equation (2), our treated and con-trol groups are formed by game instants’ STMC, Y λt . As discussed in Section 4.1,depending on which value we choose for λ , Y λt is not valid—if the short-termwindow induced by λ includes another major interruption besides the possible t or is longer than the start or end of the game. Therefore, our inclusion criteriafor both groups is that STMC exists and can be calculated. For a given λ , thetreated group, { Y λt | A t = 1 } , is formed by the valid real timeouts’ STMC. Thecontrol group, { Y λt | A t = 0 } , is formed by any valid game instant t ’s STMC thatis not a timeout or any other kind of major interruption.Since we want to estimate TE h and TE a , we have two treatment groups, onefor timeouts called by home teams and one for timeouts called by away teams.On the other hand, it does not make sense to classify the control group as either home or away , thus we have just one control group. We will limit ourselves inthe future to just mention theses treatment groups as either the home treatmentgroup or away treatment group.Our data consist of play-by-play information for every game from the 2014-2015, 2015-2016, 2016-2017 and 2017-2018 National Basketball Association NBAregular seasons. We crawled the data from the Basketball-Reference website .Most of our analysis will consist only of games from the NBA 2016-2017 seasonbecause using more than a single season would lead to very big samples that areimpractical to apply our matching approaches. Also, while we did perform thesame analysis using only other seasons, achieving very similar results, the choicefor the 2016-2017 season is arbitrary, mainly due to be the first season for whichwe collected the data. top the Clock: Are Timeout Effects Real? 9 The 2016-2017 season had a total of 30 teams and 1,309 games (1,230 forthe regular season and 79 in the playoffs). Considering all games, we computed281,373 game instants, including the 17,765 identified timeouts (7,754 were calledby home teams and 8,011 by away teams), and the 2,000 mandatory timeouts.Our datasets, code and further instructions on how to reproduce our results canbe found at our GitHub repository . The variables U , X t and ∆P tt − λ should be controlled for. In other words, theyshould be considered as possible confounders. Consequently, all of these variablesare included in our matching for a valid causal inference. While we consider U , the inter-game factors , unobserved covariates, we can still control them bypairing timeout examples with non-timeout examples taken from the same game .Furthermore, the variable ∆P tt − λ is likely the most important confounder covariatein our model. Indeed, coaches tend to call a timeout when their teams are sufferingfrom a bad “momentum”, evidencing great influence on the treatment assignment A t . Also, ∆P t + λt , the average rate of scoring margin change after a game instant t , should be highly causal dependent on ∆P tt − λ . Therefore, in whatever matchingapproach we use, timeouts and control examples taken from the same game andwith equal ∆P tt − λ are going to be matched, hopefully, achieving balance for X t . We also restrict our matches to be constructed with non-overlapping ballpossessions . This restriction arises from our assumption that λ defines a range ofgame instants that are dependent and influence A t as discussed in Section 4.1.We applied three matching procedures: (1) no-balance matching ; (2) Maha-lanobis matching , and (3) propensity score matching . In the no-balance matching,each treatment example is paired with a valid control example that has thesame ∆P tt − λ and is taken from the same game. We did not considered X t in thismatching. For the Mahalanobis matching, we applied the Mahalanobis distanceusing all covariates in X t , i.e., current quarter ( Q t ), current scoring margin ( P t ),and current clock time in seconds ( S t ) since the start of the quarter. Finally, forthe propensity score matching technique, we applied a simple euclidean distancematch on the estimated scalar propensity score.The true propensity score s ( X ) = P ( A = 1 | X ) is unknown and must beestimated. Since estimating P ( A = 1 | X ) can be seen as a classification task, anya supervised classification model could be used. While logistic regression is themost common estimation procedure for propensity score, Lee et al. [10] showedthat, in a non-linear dependence scenario, the use of machine learning modelssuch as boosting regression trees to estimate the propensity score achieves bettercovariate balance in the matched sample. Indeed, our treatment assignmentpresent a great non-linear dependence on its covariates. Take the clock time S t ,for example. As explained in Section 4.3, the timeout rules of NBA stimulatecoaches to call a timeout just before a mandatory timeout would have been called https://github.com/pkdd-paper/paper667 Short−term Momentum Change ( Y t l ) D en s i t y l Fig. 2.
The STMC ( Y λt ) distribution for home and away timeouts, considering threedifferent ball possession windows λ = 2 , , by the official scorer. We use the boosting regression tree algorithm implementedin the gbm R package [6] to estimate the propensity score using X t .Because we restrict timeout and non-timeout pairs to be taken from the samegame, we have a very sparse matching problem. The rcbalance R package [17]implementation of optimal matching exploits such sparsity of treatment-controllinks to reduce computational time for larger problems. We use the optimalalgorithm implemented in this package for all the aforementioned matchingapproaches. In addition, before applying any matching technique, we retained inthe control subpopulation only those non-timeout game instants t (cid:48) ( A t (cid:48) = 0) forwhich the value ∆P t (cid:48) t (cid:48) − λ is exactly equal to at least one ∆P tt − λ calculated to areal timeout instant t ( A t = 1) in the same game. This improved the runningperformance even more. In order to find out whether our data shows the generally accepted positive cor-relation between timeouts and improvements in the “momentum”, we calculated Y λt for every game instant associated with a timeout t in every single game using λ = 2 , ,
6. Figure 2 shows the estimated density distribution of the STMC Y λt forall timeouts, including those called by both home and away teams, but removingthe official timeouts. The sample means and number of valid timeout examplesin each sample are 0 .
629 and 14 ,
031 for λ = 2, 0 .
421 and 12 ,
225 for λ = 4, 0 . ,
296 for λ = 6, respectively.These results shows that, when a timeout is called by a team, its momentumimproves by a small positive amount afterwards. For instance, with λ = 4, theaverage value of STMC is 0 . . top the Clock: Are Timeout Effects Real? 11 Table 1.
Summary statistics and SMD for balance assessment for matching using home treatment group. The control ( A t = 0) and timeout ( A t = 1) groups are presentedbefore (BM) and after all three matchings approaches: No-Balance (NB), Mahalanobisdistance (M), and Propensity score (P) S t (mean(sd)) Q t (mean(sd)) P t (mean(sd)) λ Method A t = 0 A t = 1 SMD A t = 0 A t = 1 SMD A t = 0 A t = 1 SMD2 BM 363.42 (198.77) 410.03 (168.47) 0.253 2.42 (1.12) 2.68 (1.15) 0.222 1.73 (10.81) -0.00 (10.78) 0.161NB 363.11 (201.26) 410.20 (168.52) 0.254 2.47 (1.12) 2.68 (1.15) 0.178 0.56 (10.63) 0.00 (10.78) 0.052M 397.73 (168.83) 410.27 (168.50) 0.074 2.61 (1.11) 2.68 (1.15) 0.062 0.53 (10.86) 0.00 (10.78) 0.049P 403.85 (163.08) 410.14 (168.50) 0.038 2.67 (1.13) 2.68 (1.15) 0.009 0.23 (11.01) 0.00 (10.78) 0.0214 BM 351.23 (187.75) 388.45 (153.17) 0.217 2.33 (1.11) 2.58 (1.13) 0.222 1.75 (10.69) 0.22 (11.04) 0.141NB 351.59 (191.15) 394.10 (151.23) 0.247 2.33 (1.11) 2.58 (1.14) 0.219 0.44 (10.44) 0.33 (11.14) 0.011M 381.35 (167.92) 393.79 (151.12) 0.078 2.43 (1.04) 2.57 (1.14) 0.135 0.50 (11.06) 0.32 (11.12) 0.016P 385.03 (162.81) 393.70 (151.37) 0.055 2.56 (1.11) 2.58 (1.14) 0.017 0.38 (11.33) 0.32 (11.12) 0.0056 BM 334.96 (170.51) 380.80 (143.08) 0.291 2.19 (1.08) 2.49 (1.12) 0.270 1.67 (10.34) 0.33 (11.09) 0.124NB 332.33 (173.87) 389.79 (139.28) 0.365 2.20 (1.08) 2.48 (1.12) 0.253 0.71 (10.40) 0.79 (11.13) 0.007M 351.13 (162.90) 389.67 (138.95) 0.255 2.27 (1.02) 2.48 (1.12) 0.200 0.77 (10.80) 0.80 (11.17) 0.002P 356.99 (160.61) 389.72 (139.63) 0.218 2.35 (1.08) 2.48 (1.12) 0.122 0.72 (10.91) 0.81 (11.16) 0.008 and a bootstrap based test for the mean with the null hypothesis being thatthe mean is equal to zero. Both tests for the three different values of λ yieldedp-values numerically equal to zero.While these results could be used as evidence on why there is such commonand widespread belief that timeouts improves teams’ performance, or breaks themomentum, it is not an evidence of the causal effect of timeouts. We move on toconsider the analysis under our causal framework discussed in Section 4. Each of the three matching methods was applied twice: one time using thetreatment group with away timeouts and the control group, and the other usingthe treatment group with home timeouts and the control group. Some examplesdid not find a valid match and, therefore, were not included in the matchedsamples. Also, it should be noted that all matches were performed withoutreplacement.To evaluate for proper covariate balance between the treatment groups, acommon numerical discrepancy measurement is the difference in means dividedby the pooled standard deviation of each covariate, known as the standardizedmean difference (SMD) [4]. Unlike t-tests, SMD is not influenced by sample sizesand allows comparison between variables of different measured units. There is nogeneral consensus on which value of SMD should denote an accepted imbalancelevel. Some researches, although, have proposed a threshold of 0.1 [13]. Table 1summarizes the covariate distribution with its mean and standard deviationvalues and the SMD of our matched samples considering all different approaches.For simplicity, we are only including here the results from the home timeoutsmatched samples. Indeed, the away samples showed very similar results and it can also be accessible from our GitHub repository . Also, we do not showbalance for ∆P tt − λ as it is perfectly balanced due to our perfect match on thiscovariate. The unmatched sample sizes for control groups are 172 , , , , ,
048 and 5 ,
127 with λ = 2 , , λ = 2 , ,
6, they were 6 , ,
477 and 3 , ∆P tt − λ covariate is equally distributed. Comparing the no-balance matching withthe balance in before matching , it is clear that this simple matching procedurereduces substantially the SMD for all covariates. However, this is not enough tomake the SMD negligible for all cases and hence whatever conclusions based onthe comparison of these no-balance matched samples will be rightly subjectedto doubt. Considering the Mahalanobis matching , it achieved better covariatebalance for all covariates and λ values in comparison with before matching and no-balance matching . For λ = 2 and λ = 4, all SMDs are bellow 0 .
1, with theexception of the Q t covariate in λ = 4, in both analysis. The λ = 6 configuration,on the other hand, presented the worse SMDs— P t is the only covariate withSMD bellow the 0 . Mahalanobis matching was not as good as the
Propensity score matching . All covariates have SMDs smaller than 0 . λ = 2and λ = 4 in both analysis. For λ = 6, while Q t and S t still had SMD above the0 . S t , Q t and P t after all three matchings with λ = 4 on theanalysis using timeouts called by the home team. We can see that No-balance and
Mahalanobis matching presented some imbalance for S t and Q t . On the otherhand, propensity score matching shows rather similar distributions suggesting amuch better balance. The fact that timeouts and control examples are matchedonly if taken from the same game makes it more complicated to find matcheswith a more balanced X t . Mahalanobis matching , while trying to match samplesas close as possible, encounters great difficulties. Propensity score, however,translates all interactions and non-linearities presented in the joint distributionof all covariates with the propensity score. This is why its matching was better.
We analyzed our matched data using a Monte Carlo permutation test [2, Vol II,chapter 10] and difference of means between the two groups as our test statistic.When the null hypothesis is true, the timeout effect TE defined in (2) is equal to https://github.com/pkdd-paper/paper667 top the Clock: Are Timeout Effects Real? 13 No−balance
Mahalanobis seconds ( S t ) quarter ( Q t ) scoring margin ( P t ) Propensity
Treatment Group
Control Timeouts
Fig. 3.
The control (magenta) and timeout (turquoise) seconds , quarter and scor-ing margin covariate ( X t ) distributions after all matchings using the home timeoutstreatment group and λ = 4. zero. The treatment label was permuted ten thousand times. It should be notedthat we have large samples—the number of treatment-control pairs in each testvaries from 3 ,
766 to 7 , home ( TE h ) and the away ( TE a ) teams. In addition, we have also included in the tablea 99% level confidence interval generated from the p-values obtained with theMonte Carlo permutation test.There are two fundamental remarks here. First, the estimated timeout effect TE is very small for all cases, practically irrelevant during a game. Rememberthat the timeout effect TE defined in Equation (2) is the amount by which theteam should expect its scoring margin per possession to change if a timeout wereto be called by them. It would be rather challenging to find the minimum effectsize for which it would be deemed enough to change the teams performance.It would not be a bad idea, however, to consider it to be an effect of at least one marginal point. Specially because our assumption is that timeouts have aneffect in the short-term window represented by λ . Yet, from our results, takethe propensity matching with λ = 4, which yielded the largest absolute timeouteffect equal to − . .
059 points per possession inthe next 4 possessions, which can be considered negligible in a basketball game.Second, by analyzing the confidence intervals, the number of significant tests isvery small given the very large number of examples in each of them. From the 18
Table 2.
The estimated timeout effect for both away ( TE a ) and home ( TE h ) timeoutsunder different matching procedures and λ values. The respective confidence intervalsfor each timeout effect for 99% level is also shown. While some values are statisticallysignificant, all timeout effects are negligible due to the small effect sizes. λ Method TE a
99% CI TE h
99% CI2 No-balance -0.028 (-0.075, 0.020) -0.022 (-0.072, 0.028)Mahalanobis -0.032 (-0.079, 0.015) -0.017 (-0.067, 0.032)Propensity -0.044 (-0.092, 0.004) -0.021 (-0.071, 0.028)4 No-balance -0.043 (-0.082, -0.004) -0.023 (-0.063, 0.017)Mahalanobis -0.053 (-0.092, -0.013) -0.013 (-0.053, 0.028)Propensity -0.059 (-0.098, -0.020) -0.032 (-0.072, 0.008)6 No-balance -0.031 (-0.068, 0.005) -0.036 (-0.072, 0.000)Mahalanobis -0.046 (-0.083, -0.010) -0.036 (-0.072, 0.000)Propensity -0.044 (-0.081, -0.008) -0.046 (-0.082, -0.009)
Table 3.
Additional analysis for excluding or considering only the last 5 minutes. Foreach analysis, the same matching approaches were applied using the new subsets ofdata. The results are very similar to the original analysis considering the whole game.
Minus Last 5 Min Only Last 5 Min λ Method TE a
99% CI TE h
99% CI TE a
99% CI TE h
99% CI2 No-balance -0.029 (-0.082, 0.024) -0.010 (-0.065, 0.045) -0.083 (-0.115, -0.052) -0.033 (-0.067, 0.001)Mahalanobis -0.021 (-0.074, 0.032) -0.010 (-0.065, 0.045) -0.091 (-0.122, -0.060) -0.040 (-0.074, -0.006)Propensity -0.044 (-0.098, 0.009) -0.012 (-0.067, 0.043) -0.094 (-0.125, -0.063) -0.038 (-0.072, -0.004)4 No-balance -0.036 (-0.078, 0.006) -0.030 (-0.073, 0.013) -0.006 (-0.068, 0.055) 0.038 (-0.029, 0.105)Mahalanobis -0.055 (-0.097, -0.013) -0.022 (-0.065, 0.020) -0.019 (-0.081, 0.044) 0.026 (-0.041, 0.093)Propensity -0.057 (-0.099, -0.015) -0.038 (-0.081, 0.004) -0.019 (-0.082, 0.043) 0.007 (-0.060, 0.074)6 No-balance -0.044 (-0.082, -0.005) -0.041 (-0.079, -0.003) -0.046 (-0.315, 0.223) -0.032 (-0.250, 0.186)Mahalanobis -0.051 (-0.089, -0.012) -0.040 (-0.078, -0.002) -0.028 (-0.299, 0.244) -0.048 (-0.267, 0.172)Propensity -0.046 (-0.085, -0.008) -0.055 (-0.093, -0.017) -0.046 (-0.317, 0.224) -0.048 (-0.265, 0.170) tests, only 6 were statistically significant in the α = 0 .
001 level. In fact, theseconfidence intervals barely include the 0 effect value. Nevertheless, while thesetests were statistically significant, they are still negligible effects. Also, we wantto point it out that we did not perform any adjustment for multiplicity of tests.Indeed, such adjustment would yield a smaller number of statistically significanttests.We went further and investigated the timeout effect TE for two particularcases: (i) when the last five minutes are excluded and (ii) when only the finalfive minutes of the games are taken into account. For both cases, we rerunthe matching approaches on the new subsets of the data. Before executing thematching approaches, we filtered out from treatment and control groups examplesthat happened within the last five minutes of the last quarter (the 4th) of eachgame for (i). In a similar fashion, we filtered out from treatment and controlgroups examples that happened outside of the last five minutes of each gamefor (ii). However, because we ended up with fewer sample units available for top the Clock: Are Timeout Effects Real? 15 matching, we included examples extracted from the other NBA seasons for the(ii) case, i.e., the 2014-2015, 2015-2016 and 2017-2018 seasons.The results from both of these new analysis are in Table 3. For the case ofexcluding the last five minutes, we can see that we have slightly more statisticalsignificant tests, 8 out of 18. Still, all of them have small effect sizes, makingthem not practically significant. The largest estimated effect is − . TE a for the propensity score matching with λ = 4. For the case ofconsidering only the last five minutes, the only 5 statistically significant testswere all found with λ = 2, but again, with very small effect sizes. The largestabsolute effect is − .
094 for propensity score matching.
In this work we proposed a causality framework to quantify the effect of timeoutson basketball games. For the best of our knowledge, we were the first to resort onthe theory of causality to solve this problem. While all previous studies pointedto a positive timeout effect, by applying our causality model on a large datasetof official NBA play-by-play data, we concluded that timeouts have no effecton teams’ performance. This is another example of what statisticians call the regression to the mean phenomenon. Since most timeouts are called when theopponent team is scoring more frequently, the moments that follow resemble animprovement in the team’s performance, but are just the natural game tendencyto return to its average state. We have also stratified our analysis by eitherincluding only the last five minutes or everything but the last five minutes of allgames, but the results pointed to the same conclusion: timeouts have virtuallyno effect on team’s performance.
Acknowledgments
This work is supported by the authors’ individual grantsfrom FAPEMIG, CAPES and CNPq.
References
1. Barnett, A.G., Van Der Pols, J.C., Dobson, A.J.: Regression to the mean: what itis and how to deal with it. International journal of epidemiology (1), 215–220(2004)2. Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and SelectedTopics, Volumes I-II. Chapman and Hall/CRC (2015)3. Coffino, M.J.: Odds-On Basketball Coaching: Crafting High-Percentage Strategiesfor Game Situations. Rowman & Littlefield Publishers (2017)4. Flury, B.K., Riedwyl, H.: Standard distance in univariate and multivariate analysis.The American Statistician (3), 249–251 (1986)5. G´omez, M.A., Jim´enez, S., Navarro, R., Lago-Penas, C., Sampaio, J.: Effectsof coaches’ timeouts on basketball teams’ offensive and defensive performancesaccording to momentary differences in score and game period. European Journal ofSport Science (5), 303–308 (2011)6 N. Assis et al.6. Greenwell, B., Boehmke, B., Cunningham, J., Developers, G.: gbm: GeneralizedBoosted Regression Models (2019), https://CRAN.R-project.org/package=gbm , rpackage version 2.1.57. Halldorsson, V.: Coaches Use of Team Timeouts in Handball: A Mixed MethodAnalysis. The Open Sports Sciences Journal (1), 143–152 (oct 2016)8. Holland, P.W.: Statistics and causal inference. Journal of the American statisticalAssociation (396), 945–960 (1986)9. Kubatko, J., Oliver, D., Pelton, K., Rosenbaum, D.T.: A starting point for analyzingbasketball statistics. Journal of Quantitative Analysis in Sports (3) (2007)10. Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting usingmachine learning. Statistics in medicine (3), 337–346 (2010)11. Mace, F.C., Lalli, J.S., Shea, M.C., Nevin, J.A.: Behavioral momentum in collegebasketball. Journal of Applied Behavior Analysis (3), 657–663 (1992)12. Neyman, J.: edited and translated by dorota m. dabrowska and terrence p. speed(1990). on the application of probability theory to agricultural experiments. essayon principles. section 9. Statistical Science (4), 465–472 (1923)13. Normand, S.L.T., Landrum, M.B., Guadagnoli, E., Ayanian, J.Z., Ryan, T.J.,Cleary, P.D., McNeil, B.J.: Validating recommendations for coronary angiographyfollowing acute myocardial infarction in the elderly: a matched analysis usingpropensity scores. Journal of clinical epidemiology (4), 387–398 (2001)14. Pearl, J.: Causality. Causality: Models, Reasoning, and Inference, Cambridge Uni-versity Press (2009)15. Pearl, J., Mackenzie, D.: The Book of why: The New Science of Cause and Effect.Basic Books (2018)16. Permutt, S.: The Efficacy of Momentum-Stopping Timeouts on Short-Term Per-formance in the National Basketball Association. Ph.D. thesis, Haverford College.Department of Economics (2011)17. Pimentel, S.D.: rcbalance: Large, Sparse Optimal Matching with Refined Covari-ate Balance (2017), https://CRAN.R-project.org/package=rcbalance , r packageversion 1.8.518. Roane, H.S., Kelley, M.E., Trosclair, N.M., Hauer, L.S.: Behavioral momentum insports: a partial replication with women’s basketball. Journal of Applied BehaviorAnalysis (3), 385–390 (2004)19. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observa-tional studies for causal effects. Biometrika (1), 41–55 (1983)20. Saavedra, S., Mukherjee, S., Bagrow, J.P.: Is coaching experience associated witheffective use of timeouts in basketball? Scientific reports , 676 (2012)21. Sampaio, J., Lago-Pe˜nas, C., G´omez, M.A.: Brief exploration of short and mid-termtimeout effects on basketball scoring according to situational variables. EuropeanJournal of Sport Science (1), 25–30 (2013)22. Siva, J.M., Cornelius, A.E., Finch, L.M.: Psychological Momentum and Skill Per-formance: A Laboratory Study. Journal of Sport and Exercise Psychology (2),119–133 (mar 1992)23. Stuart, E.A.: Matching methods for causal inference: A review and a look forward.Statistical science (1), 1 (2010)24. Zetou, E., Kourtesis, T., Giazitzi, K., Michalopoulou, M.: Management and ContentAnalysis of Timeout during Volleyball Games. International Journal of PerformanceAnalysis in Sport8