[PDF] Stop the Clock: Are Timeout Effects Real?

Abstract

Timeout is a short interruption during games used to communicate a change in strategy, to give the players a rest or to stop a negative flow in the game. Whatever the reason, coaches expect an improvement in their team's performance after a timeout. But how effective are these timeouts in doing so? The simple average of the differences between the scores before and after the timeouts has been used as evidence that there is an effect and that it is substantial. We claim that these statistical averages are not proper evidence and a more sound approach is needed. We applied a formal causal framework using a large dataset of official NBA play-by-play tables and drew our assumptions about the data generation process in a causal graph. Using different matching techniques to estimate the causal effect of timeouts, we concluded that timeouts have no effect on teams' performances. Actually, since most timeouts are called when the opposing team is scoring more frequently, the moments that follow resemble an improvement in the team's performance but are just the natural game tendency to return to its average state. This is another example of what statisticians call the regression to the mean phenomenon.

Full PDF

SStop the Clock: Are Timeout Eﬀects Real?

Niander Assis [ (cid:0) ], Renato Assun¸c˜ao, and Pedro O.S. Vaz-de-Melo

Departmento de Ciˆencia da Computa¸c˜aoUniversidade Federal de Minas Gerais, Belo Horizonte, Brazil { niander,assuncao,olmo } @dcc.ufmg.br Abstract.

Timeout is a short interruption during games used to commu-nicate a change in strategy, to give the players a rest or to stop a negativeﬂow in the game. Whatever the reason, coaches expect an improvementin their teams performance after a timeout. But how eﬀective are thesetimeouts in doing so? The simple average of the diﬀerences between thescores before and after the timeouts has been used as evidence that thereis an eﬀect and that it is substantial. We claim that these statisticalaverages are not proper evidence and a more sound approach is needed.We applied a formal causal framework using a large dataset of oﬃcialNBA play-by-play tables and drew our assumptions about the data gen-eration process in a causal graph. Using diﬀerent matching techniquesto estimate the causal eﬀect of timeouts, we concluded that timeoutshave no eﬀect on teams performances. Actually, since most timeouts arecalled when the opposing team is scoring more frequently, the momentsthat follow resemble an improvement in the teams performance but arejust the natural game tendency to return to its average state. This isanother example of what statisticians call the regression to the mean phenomenon.

Keywords: causal inference · sports analytics · timeout eﬀect · momen-tum · bayesian networks In sports, timeout is a short interruption in a play commonly used to stop anegative ﬂow in the game, to discuss a strategy change, or to rest the players [3].As this is the most direct way coaches can intervene during a game, their inﬂuenceand strategic ability is best expressed during these events. A timeout is usuallycalled when a team has a rather long streak of score losses [7, 24]. Popular beliefand research [5, 11, 16, 18, 21] have found a positive eﬀect on teams performancesafter the timeout. That is, on average, the team asking for the timeout recoversfrom the losses by scoring positively immediately after. This observed diﬀerencehas been wrongly used as evidence that the timeout has a real and positiveeﬀect on teams performance. In order to answer such causal question, a formalcounterfactual analysis should be used, and that is what we propose in this work.There is an intense interest on causal models to analyze non-experimentaldata since causal reasoning can answer questions that machine learning itself a r X i v : . [ s t a t . A P ] S e p N. Assis et al. cannot [15]. Our approach is built on top of these causal inference approachesthat are brieﬂy introduced in Section 3. For each timeout event at time t r in thedatabase, we found a paired moment t c in the same game when no timeout hasbeen called that serves as a control moment for t r , reﬂecting what would havehappened the timeout had not been called. This control moment is chosen basedon other variables about the current game instant, which were drawn in a causalgraph that depict our assumptions about the generation of the data and, as wewill further discuss, asses if the causal eﬀect can be estimated without bias. Inorder to quantify how the game changed just after a given moment t , we proposed Short-term Momentum Change (STMC), which is discussed in Section 4.1.After using a matching approach to construct our matched data (pairs of( t r , t c )), we found virtually no diﬀerence between the distribution of the STMCfor real timeouts and control instants, i.e., the estimated timeout eﬀect is veryclose to zero or non-existent. Hence, we conclude that the apparent positiveeﬀect of timeouts is another example of the well-known regression to the meanfallacy [1]. The dynamic match score ﬂuctuates naturally and, after an intenseincrease, commonly returns towards a mild variation. Thus, because timeoutsare usually called near the extreme moments, as we will show, the game seemsto beneﬁt to those loosing. In summary, the main contributions of this paper arethe following: – We proposed a metric called

Short-term Momentum Change (STMC) toquantify how much the game momentum changes after a time moment t r associated with an event, such as a regular ball possession or an interruptionof the game; – We collected and organized a large dataset covering all the play-by-playinformation for all National Basketball Association (NBA) games of the fourregular seasons from 2015 to 2018. A single season has over 280 thousands game instants , as we deﬁne in Section 4, and over 17 thousand timeout events; – After a detailed causal inference analysis to evaluate the timeout eﬀect of the

Short-term Momentum Change , we did not ﬁnd evidence that the timeouteﬀect exists or that its eﬀect size is meaningful. Inspired by others, we didalso consider two other settings in which the timeout eﬀect could be diﬀerent:(i) only the last ﬁve minutes of the games and (ii) everything but the lastﬁve minutes.The next section describes the previous work carried out on the eﬀect oftimeouts and discusses how our work distance from them. Section 3 gives abackground on causal inference and the statistical models adopted. We startSection 4 by summarizing timeout rules in the NBA and describing our dataset,the play-by-play tables. In Section 4.1 we introduce our outcome variable ofinterest, the

Short-term Momentum Change , and in Section 4.2 our causal model.In Section 4.3, we describe our treatment and control groups and, in Section 4.4,we explain our matching approaches. All the results are presented in Section 5.We close the paper in Section 6 with our conclusions. top the Clock: Are Timeout Eﬀects Real? 3

Timeouts are used and implemented in team sports for several reasons, such asto rest or change players, to inspire morale, to discuss plays, or to change thegame strategy [20]. However, timeouts are mostly used to stop a negative ﬂowin the game [7, 24], which is popularly referred as “the game momentum.” Inbasketball, momentum arises when one team is scoring signiﬁcantly more thanthe other [11, 22].Several earlier studies analyzed the eﬀect timeouts have for decreasing theopponent’s momentum in the game [5, 11, 18, 21]. These studies analyze theeﬀect of timeouts on teams’ performance just before and after it was called.For instance, by using a small sample of seven televised games from the 1989National Collegiate Athletic Association (NCAA) tournament, Mace et al. [11]recorded speciﬁc events of interest, which were classiﬁed as either

Reinforcers (e.g. successful shots) or

Adversities (e.g. turnovers) and veriﬁed that the rate ofthese events change signiﬁcantly among teams in the 3 minutes before and aftereach timeout. They found that while the team that called the timeout improvedits performance, the opponent team decreased it. Other works reached the sameconclusions using similar methodologies and diﬀerent data sets [5, 18, 21].To the best of our knowledge, Permutt [16] was the ﬁrst to acknowledgethe regression to the mean phenomenon in such analysis. Permutt consideredspeciﬁc game moments—timeouts called for after a team suﬀered a loss of sixconsecutive points. Similar to others, the short-term scoring ratio was observedto be higher after timeouts. However, in contrast to others, the paper comparesreal timeouts with other similar game moments without a timeout. With suchanalysis, Permutt found that timeouts can be eﬀective at enhancing performance,but at a small magnitude. The most signiﬁcant result shows that the home-teamwith a “ﬁrst-half restriction” presents a 0 .

21 increase in average ratio for the nextten points. Calling a timeout predicts that the home-team will score 5 .

47 out ofthe next ten points as opposed to 5 .

26 points when a timeout is not called. Thus,the conclusion is that timeouts do not have any signiﬁcant eﬀect in changingthe momentum of a game, i.e., using 6-0 runs as an indicator of instances wheremomentum would be a factor, teams were successful at “reversing” momentumeven without the timeout as a mediator.Although the work of Permutt [16] innovates by considering counterfactuals,the analyses still leave room for reasonable doubts about the reality of thetimeout eﬀect. It fails to take into account the existence of other importantfactors that could also inﬂuence on the momentum change and confound thetrue timeout eﬀect. As a result, spurious correlations could have caused the lackof eﬀect observed in the data. In our work, we take into account other factorssuch as coaches and teams abilities, stadium and match conditions, clock time,quarter and relative score between the teams. More important, diﬀerent from allthe studies described in this section, we adopt a formally deﬁned causal modelapproach [14] with a counterfactual analysis with its constructed control group.We show in a compelling way that timeouts do not have an eﬀect in teamsperformances.

N. Assis et al.

For illustration, consider Y our outcome variable and A ∈ { , } the treatmentvariable. Regardless of the actual value of A , we deﬁne Y A =1 to be the value of Y had A been set to A = 1 and Y A =0 to be the value of Y had A been set to A = 0.We say there is a causal eﬀect of A on Y if Y A =0 (cid:54) = Y A =1 and, conversely, thereis no causal eﬀect or the eﬀect is null if Y A =0 = Y A =1 . These deﬁned values arecalled potential outcomes [12] because just one potential outcome is factual, trulyobserved, while the others are counterfactuals. Therefore, we cannot generallyidentify the causal eﬀects of a single individual. This problem is known as the Fundamental Problem of Causal Inference (FPCI) [8].Nevertheless, in most causal inference settings, the real interest is in thepopulation level eﬀect, or the average causal eﬀect deﬁned by E ( Y A =1 − Y A =0 ). E ( Y | A = 1) − E ( Y | A = 0) gives a reliable estimate in randomized experimentalstudies, where treatment A is assigned randomly to each unit. However, inobservational ones, we need to collect more information to control for and tomake assumptions. One important assumption is the conditional ignorability [19].This assumption is satisﬁed if, given a vector of covariates X , the treatmentvariable A is conditionally independent of the potential outcomes ( Y A =0 ⊥⊥ A | X and Y A =1 ⊥⊥ A | X ), and there is a positive probability of receiving treatmentfor all values of X (0 < P( A = 1 | X = x ) < x ). The conditionalignorability assumption allows us to state that E ( Y A =1 − Y A =0 | X ) = E ( Y | A =1 , X ) − E ( Y | A = 0 , X ) for every value of X . This represents the basic rationalebehind the matching technique.The simplest matching technique is the exact matching . For each possible X = x , we form two subgroups: one composed by individuals that received thetreatment and have X = x , and the other by individuals that did not receive thetreatment and have X = x . Unfortunately, exact matching is not feasible when thenumber of covariates is large or some are continuous. As an alternative, examplesare usually matched according to a distance metric d ij = d ( x i , x j ) betweenthe covariate conﬁgurations of pairs ( i, j ) of observations. The Mahalanobisdistance [23] is a common choice for such distance metric as it takes into accountthe correlation between the diﬀerent features in the vector X . Another option isto use propensity score [19] for estimation of causal eﬀects, which is deﬁned asprobability of receiving treatment given the covariates, i.e., s ( X ) = P ( A = 1 | X ).Rosenbaum and Rubin proved that it is enough to just match on a distancecalculated using the scalar scores s ( x ), rather than the entire vector x [19].In general, each matching approach can be implemented using algorithmsthat are mainly classiﬁed as either greedy or optimal . The greedy ones, alsoknown as nearest-neighbor matching , matches the i -th treated example with theavailable control example j that has the smallest distance d ij . optimal matching ,however, takes into account the whole reservoir of examples since the goal is togenerate a matched sample that minimizes the total sum of distances betweenthe pairs. Such optimal approach can be preferred in situations where there aregreat competition for controls. For a good review on diﬀerent matching methodsfor causal inference, see [23]. top the Clock: Are Timeout Eﬀects Real? 5 Whatever matching technique used, its success can be partially judged byhow balanced out are the covariates in the treatment and control groups. Bypruning unmatched examples from the dataset, the control and treated groups ofthe remaining matched sample should have similar covariate distributions, whenwe say that matching achieves balance of the covariates distribution.

According to the NBA 2016-2017 season oﬃcial rules, in a professional NBAregular game, each team is entitled to six full-length timeouts and one 20-secondtimeout for each half. A full-length timeout can be of 60 seconds or 100 seconds,depending on when the timeout was requested. Also, every game has four regularperiods plus the amount of overtime periods necessary on the occurrence of ties.There is a speciﬁc amount of timeouts expected in each period for commercialpurposes. If neither team calls a timeout before a speciﬁc time, thus not fulﬁllingthe next expected timeout, the oﬃcial scorer stops the game and calls a timeout.The timeout is charged to the team that has not been charged before, startingwith the home team. These timeouts are called mandatory or oﬃcial timeouts.In basketball games, possessions are new opportunities to score in the game.Each possession starts from the moment a team gets hold of the ball until oneof his players scores, commits a fault, or loses the ball in defensive rebounds orturnovers. The total number of possessions are guaranteed to be approximately thesame for both teams at the end of a match, so it provides a good standardizationfor the points scored by each team [9]. Indeed, most of basketball statistics arealready given in a per possession manner.Play-by-play tables capture the main play events such as goal attempts,rebounds, turnovers, faults, substitutions, timeouts and end of quarters (periods).For each play, we have the time in which it happened, the players and/or theteam involved and any other relevant information, e.g., the score just after theplay is recorded. Each play event is recorded as a new line in the table. Whileball possessions are not clearly recorded in play-by-play tables, one can identifyevery change of possession from observing the game events.In this work, we use play-by-play tables to identify the ball possessions anduse them to observe how the teams’ performance change when timeouts are called.We have identiﬁed every change of possession alongside the main interruptions—timeouts and end of quarters—in each game. Every change of possession and everymain interruption is considered a new game instant . We model each basketballgame as a series of discrete game instants . More formally, a game instant iseither (1) a regular ball possession; or (2) a major game interruption, whichcan be a regular timeout, an oﬃcial scorer timeout, or the end of a quarter.Player substitutions and fouls were not considered as a main interruption. Infact, substitutions can happen when the ball stops and not only during timeouts. N. Assis et al.

Here we describe our outcome variable associated with teams’ performance, forwhich we aim to estimate how it is aﬀected by the timeouts. Let { P t } be anunivariate stochastic process associated to a single match and indexed by thediscrete game instants t . At any timeout moment, the team calling the timeoutis deﬁned as the target team . The P t random variable at the end of the t -th gameinstant is the score of the target team minus the opposing team’s score and it iscalled the scoring margin. Hence, P t is a positive quantity when the target teamis winning the match at game instant t and negative otherwise. At the end ofthe t -th game instant, P t = P t − in two situations. First, if t is a regular ballpossession instant whose attacking team did not score, i.e., the possession endedwith a turnover or defensive rebound. Second, if t is a main interruption instantand, consequently, none of the teams had the opportunity to score.In order to evaluate how “momentum” changes after a game instant, we usethe Short-term Momentum Change (STMC), which is the amount by which the scoring margin per possession rate changes right after an game instant. For anygame instant t and a positive integer λ >

0, we deﬁne the STMC, Y λt , as thethe average rate of change from P t to P t + λ ( ∆P t + λt ) minus the average rate ofchange from P t − λ − to P t − ( ∆P tt − λ ). Note that we do not take into accountthe possible change in scoring margin caused in game instant t (the change from P t − to P t ): Y λt = P t + λ − P t λ − P t − − P t − λ − λ = ∆P t + λt − ∆P tt − λ (1)for t − λ ≥ t + λ ≤ n , where n is the total number of game instants in agiven game.The hyper-parameter λ controls the time window used to evaluated howthe game scoring dynamics changes around t . To balance out the oﬀensive anddefensive ball possessions, λ must be an even integer. Also, the variable Y λt shouldonly be evaluated if the interval [ t − λ, t + λ ] contains no game interruptions, withthe possible exception of t . In a causal perspective, λ represents our assumptionfor how many game instants that the interference (calling a timeout or not) atgame instant t can inﬂuence and is inﬂuenced by, in the short-term.Let A t be the binary indicator that a timeout has been called at time t . Wewill denote A t = 1 if a team calls a timeout right before the game instant t and A t = 0 if t is a regular ball possession. If we ﬁnd the set of covariates X that satisfy the conditional ignorability assumption, we can apply a matchingtechnique and our average causal eﬀect of interest, E ( Y λA t =1 − Y λA t =0 ), can beestimated taking the diﬀerence in means from the matched treatment and controlgroups. The estimated timeout eﬀect TE is deﬁned as: TE = E ( Y λt | A t = 1) − E ( Y λt | A t = 0) . (2)Every game is composed by two teams, the home and the away team. Becausewe want to estimate the causal eﬀect of timeouts on the performance of the teamthat actually asked for it , we decided to estimate the average causal eﬀect of top the Clock: Are Timeout Eﬀects Real? 7 A t ∆P t + λt ∆P tt − λ X t U Y λt Fig. 1.

A causal graph to model the timeout eﬀect. timeouts called by the home teams ( TE h ) and the away teams ( TE a ), separately.We proceed now to present our causal model which encodes our assumptions. Pearl [14] suggests the use of directed acyclic graphs (DAGs) as a way of encodingcausal model assumptions with nodes representing the random variables andthe direct edges representing direct causal relationships. One can identify insuch graph a set of variables (or nodes) that satisﬁes the so called back-doorcriterion [14]. These are variables that blocks all back-door paths from A (thetreatment variable) to Y (the outcome variable) and does not include any descen-dants of A . Given that the graphical model includes all important confoundingvariables, it can be shown that conditioning on them suﬃces to remove all non-causal dependencies between A and Y . In other words, it leaves only causaldependence that corresponds to the causal eﬀect.There are many factors that can potentially inﬂuence the short-term perfor-mance (STMC) of the team that called a timeout after a given game instant.These can be intra-game factors, which vary along the game, such as the scoringmargin, the quarter and the time since the start of the quarter, or inter-game factors, which vary from game to game, such as the venue conditions, the atten-dance at the venue, the speciﬁc adversary team, the players available and theteams’ momentum in the season.It is very intuitive why intra-game factors, which are speciﬁc to a game instant,are considered a cause of both the treatment and outcome, thus being considereda confounder. In a not so straightforward way, some inter-game factors are alsovery likely to aﬀect both the treatment and outcome. For example, a team playingagainst a stronger or a weaker adversary would diﬀerently request the availabletimeouts and the afterwards performance may be diﬀerently aﬀected.Figure 1 shows our causal model graph. Each game instant t can either receivethe treatment assignment A t = 1 or A t = 0, meaning that the game instant is atimeout or a regular ball possession, respectively. The variables X t represent the N. Assis et al. observed covariates, which are intra-game factors speciﬁc to the game instant t : (i) the current quarter (period) ( Q t ), (ii) the current scoring margin ( P t ) and(iii) the current time in seconds since the start of the period ( S t ). The variablesrepresented by the node U are the inter-game factors, or the covariates relatedto a speciﬁc game that inﬂuence both the treatment assignment and the gameoutcome as exempliﬁed in the last paragraph. Most of these variables are notdirectly observed or very diﬃcult to measure—players and coach strategies, teams’relative skill diﬀerence and venue conditions. Hence, we include them in ourgraph as a dashed circle. The average rates of scoring margin change before( ∆P tt − λ ) and after ( ∆P t + λt ) the game instant t are also in the graph, as well asthe outcome Y λt that is connected by dashed edges since it is a deterministicnode—a logical function of the other two stochastic nodes.We are interested in the causal eﬀect of A t on Y λt . Since ∆P tt − λ is a directcause of A t and not the reverse—for obvious chronological reasons—, we actuallywant to measure the causal eﬀect of A t on ∆P t + λt . According to the back-door criterion [14], if we adjust for U , X t and ∆P tt − λ we block any non-causal inﬂuenceof A t on ∆P t + λt . Because we want to estimate TE as deﬁned in Equation (2), our treated and con-trol groups are formed by game instants’ STMC, Y λt . As discussed in Section 4.1,depending on which value we choose for λ , Y λt is not valid—if the short-termwindow induced by λ includes another major interruption besides the possible t or is longer than the start or end of the game. Therefore, our inclusion criteriafor both groups is that STMC exists and can be calculated. For a given λ , thetreated group, { Y λt | A t = 1 } , is formed by the valid real timeouts’ STMC. Thecontrol group, { Y λt | A t = 0 } , is formed by any valid game instant t ’s STMC thatis not a timeout or any other kind of major interruption.Since we want to estimate TE h and TE a , we have two treatment groups, onefor timeouts called by home teams and one for timeouts called by away teams.On the other hand, it does not make sense to classify the control group as either home or away , thus we have just one control group. We will limit ourselves inthe future to just mention theses treatment groups as either the home treatmentgroup or away treatment group.Our data consist of play-by-play information for every game from the 2014-2015, 2015-2016, 2016-2017 and 2017-2018 National Basketball Association NBAregular seasons. We crawled the data from the Basketball-Reference website .Most of our analysis will consist only of games from the NBA 2016-2017 seasonbecause using more than a single season would lead to very big samples that areimpractical to apply our matching approaches. Also, while we did perform thesame analysis using only other seasons, achieving very similar results, the choicefor the 2016-2017 season is arbitrary, mainly due to be the ﬁrst season for whichwe collected the data. top the Clock: Are Timeout Eﬀects Real? 9 The 2016-2017 season had a total of 30 teams and 1,309 games (1,230 forthe regular season and 79 in the playoﬀs). Considering all games, we computed281,373 game instants, including the 17,765 identiﬁed timeouts (7,754 were calledby home teams and 8,011 by away teams), and the 2,000 mandatory timeouts.Our datasets, code and further instructions on how to reproduce our results canbe found at our GitHub repository . The variables U , X t and ∆P tt − λ should be controlled for. In other words, theyshould be considered as possible confounders. Consequently, all of these variablesare included in our matching for a valid causal inference. While we consider U , the inter-game factors , unobserved covariates, we can still control them bypairing timeout examples with non-timeout examples taken from the same game .Furthermore, the variable ∆P tt − λ is likely the most important confounder covariatein our model. Indeed, coaches tend to call a timeout when their teams are suﬀeringfrom a bad “momentum”, evidencing great inﬂuence on the treatment assignment A t . Also, ∆P t + λt , the average rate of scoring margin change after a game instant t , should be highly causal dependent on ∆P tt − λ . Therefore, in whatever matchingapproach we use, timeouts and control examples taken from the same game andwith equal ∆P tt − λ are going to be matched, hopefully, achieving balance for X t . We also restrict our matches to be constructed with non-overlapping ballpossessions . This restriction arises from our assumption that λ deﬁnes a range ofgame instants that are dependent and inﬂuence A t as discussed in Section 4.1.We applied three matching procedures: (1) no-balance matching ; (2) Maha-lanobis matching , and (3) propensity score matching . In the no-balance matching,each treatment example is paired with a valid control example that has thesame ∆P tt − λ and is taken from the same game. We did not considered X t in thismatching. For the Mahalanobis matching, we applied the Mahalanobis distanceusing all covariates in X t , i.e., current quarter ( Q t ), current scoring margin ( P t ),and current clock time in seconds ( S t ) since the start of the quarter. Finally, forthe propensity score matching technique, we applied a simple euclidean distancematch on the estimated scalar propensity score.The true propensity score s ( X ) = P ( A = 1 | X ) is unknown and must beestimated. Since estimating P ( A = 1 | X ) can be seen as a classiﬁcation task, anya supervised classiﬁcation model could be used. While logistic regression is themost common estimation procedure for propensity score, Lee et al. [10] showedthat, in a non-linear dependence scenario, the use of machine learning modelssuch as boosting regression trees to estimate the propensity score achieves bettercovariate balance in the matched sample. Indeed, our treatment assignmentpresent a great non-linear dependence on its covariates. Take the clock time S t ,for example. As explained in Section 4.3, the timeout rules of NBA stimulatecoaches to call a timeout just before a mandatory timeout would have been called https://github.com/pkdd-paper/paper667 Short−term Momentum Change ( Y t l ) D en s i t y l Fig. 2.

The STMC ( Y λt ) distribution for home and away timeouts, considering threediﬀerent ball possession windows λ = 2 , , by the oﬃcial scorer. We use the boosting regression tree algorithm implementedin the gbm R package [6] to estimate the propensity score using X t .Because we restrict timeout and non-timeout pairs to be taken from the samegame, we have a very sparse matching problem. The rcbalance R package [17]implementation of optimal matching exploits such sparsity of treatment-controllinks to reduce computational time for larger problems. We use the optimalalgorithm implemented in this package for all the aforementioned matchingapproaches. In addition, before applying any matching technique, we retained inthe control subpopulation only those non-timeout game instants t (cid:48) ( A t (cid:48) = 0) forwhich the value ∆P t (cid:48) t (cid:48) − λ is exactly equal to at least one ∆P tt − λ calculated to areal timeout instant t ( A t = 1) in the same game. This improved the runningperformance even more. In order to ﬁnd out whether our data shows the generally accepted positive cor-relation between timeouts and improvements in the “momentum”, we calculated Y λt for every game instant associated with a timeout t in every single game using λ = 2 , ,

6. Figure 2 shows the estimated density distribution of the STMC Y λt forall timeouts, including those called by both home and away teams, but removingthe oﬃcial timeouts. The sample means and number of valid timeout examplesin each sample are 0 .

629 and 14 ,

031 for λ = 2, 0 .

421 and 12 ,

225 for λ = 4, 0 . ,

296 for λ = 6, respectively.These results shows that, when a timeout is called by a team, its momentumimproves by a small positive amount afterwards. For instance, with λ = 4, theaverage value of STMC is 0 . . top the Clock: Are Timeout Eﬀects Real? 11 Table 1.

Summary statistics and SMD for balance assessment for matching using home treatment group. The control ( A t = 0) and timeout ( A t = 1) groups are presentedbefore (BM) and after all three matchings approaches: No-Balance (NB), Mahalanobisdistance (M), and Propensity score (P) S t (mean(sd)) Q t (mean(sd)) P t (mean(sd)) λ Method A t = 0 A t = 1 SMD A t = 0 A t = 1 SMD A t = 0 A t = 1 SMD2 BM 363.42 (198.77) 410.03 (168.47) 0.253 2.42 (1.12) 2.68 (1.15) 0.222 1.73 (10.81) -0.00 (10.78) 0.161NB 363.11 (201.26) 410.20 (168.52) 0.254 2.47 (1.12) 2.68 (1.15) 0.178 0.56 (10.63) 0.00 (10.78) 0.052M 397.73 (168.83) 410.27 (168.50) 0.074 2.61 (1.11) 2.68 (1.15) 0.062 0.53 (10.86) 0.00 (10.78) 0.049P 403.85 (163.08) 410.14 (168.50) 0.038 2.67 (1.13) 2.68 (1.15) 0.009 0.23 (11.01) 0.00 (10.78) 0.0214 BM 351.23 (187.75) 388.45 (153.17) 0.217 2.33 (1.11) 2.58 (1.13) 0.222 1.75 (10.69) 0.22 (11.04) 0.141NB 351.59 (191.15) 394.10 (151.23) 0.247 2.33 (1.11) 2.58 (1.14) 0.219 0.44 (10.44) 0.33 (11.14) 0.011M 381.35 (167.92) 393.79 (151.12) 0.078 2.43 (1.04) 2.57 (1.14) 0.135 0.50 (11.06) 0.32 (11.12) 0.016P 385.03 (162.81) 393.70 (151.37) 0.055 2.56 (1.11) 2.58 (1.14) 0.017 0.38 (11.33) 0.32 (11.12) 0.0056 BM 334.96 (170.51) 380.80 (143.08) 0.291 2.19 (1.08) 2.49 (1.12) 0.270 1.67 (10.34) 0.33 (11.09) 0.124NB 332.33 (173.87) 389.79 (139.28) 0.365 2.20 (1.08) 2.48 (1.12) 0.253 0.71 (10.40) 0.79 (11.13) 0.007M 351.13 (162.90) 389.67 (138.95) 0.255 2.27 (1.02) 2.48 (1.12) 0.200 0.77 (10.80) 0.80 (11.17) 0.002P 356.99 (160.61) 389.72 (139.63) 0.218 2.35 (1.08) 2.48 (1.12) 0.122 0.72 (10.91) 0.81 (11.16) 0.008 and a bootstrap based test for the mean with the null hypothesis being thatthe mean is equal to zero. Both tests for the three diﬀerent values of λ yieldedp-values numerically equal to zero.While these results could be used as evidence on why there is such commonand widespread belief that timeouts improves teams’ performance, or breaks themomentum, it is not an evidence of the causal eﬀect of timeouts. We move on toconsider the analysis under our causal framework discussed in Section 4. Each of the three matching methods was applied twice: one time using thetreatment group with away timeouts and the control group, and the other usingthe treatment group with home timeouts and the control group. Some examplesdid not ﬁnd a valid match and, therefore, were not included in the matchedsamples. Also, it should be noted that all matches were performed withoutreplacement.To evaluate for proper covariate balance between the treatment groups, acommon numerical discrepancy measurement is the diﬀerence in means dividedby the pooled standard deviation of each covariate, known as the standardizedmean diﬀerence (SMD) [4]. Unlike t-tests, SMD is not inﬂuenced by sample sizesand allows comparison between variables of diﬀerent measured units. There is nogeneral consensus on which value of SMD should denote an accepted imbalancelevel. Some researches, although, have proposed a threshold of 0.1 [13]. Table 1summarizes the covariate distribution with its mean and standard deviationvalues and the SMD of our matched samples considering all diﬀerent approaches.For simplicity, we are only including here the results from the home timeoutsmatched samples. Indeed, the away samples showed very similar results and it can also be accessible from our GitHub repository . Also, we do not showbalance for ∆P tt − λ as it is perfectly balanced due to our perfect match on thiscovariate. The unmatched sample sizes for control groups are 172 , , , , ,

048 and 5 ,

127 with λ = 2 , , λ = 2 , ,

6, they were 6 , ,

477 and 3 , ∆P tt − λ covariate is equally distributed. Comparing the no-balance matching withthe balance in before matching , it is clear that this simple matching procedurereduces substantially the SMD for all covariates. However, this is not enough tomake the SMD negligible for all cases and hence whatever conclusions based onthe comparison of these no-balance matched samples will be rightly subjectedto doubt. Considering the Mahalanobis matching , it achieved better covariatebalance for all covariates and λ values in comparison with before matching and no-balance matching . For λ = 2 and λ = 4, all SMDs are bellow 0 .

1, with theexception of the Q t covariate in λ = 4, in both analysis. The λ = 6 conﬁguration,on the other hand, presented the worse SMDs— P t is the only covariate withSMD bellow the 0 . Mahalanobis matching was not as good as the

Propensity score matching . All covariates have SMDs smaller than 0 . λ = 2and λ = 4 in both analysis. For λ = 6, while Q t and S t still had SMD above the0 . S t , Q t and P t after all three matchings with λ = 4 on theanalysis using timeouts called by the home team. We can see that No-balance and

Mahalanobis matching presented some imbalance for S t and Q t . On the otherhand, propensity score matching shows rather similar distributions suggesting amuch better balance. The fact that timeouts and control examples are matchedonly if taken from the same game makes it more complicated to ﬁnd matcheswith a more balanced X t . Mahalanobis matching , while trying to match samplesas close as possible, encounters great diﬃculties. Propensity score, however,translates all interactions and non-linearities presented in the joint distributionof all covariates with the propensity score. This is why its matching was better.

We analyzed our matched data using a Monte Carlo permutation test [2, Vol II,chapter 10] and diﬀerence of means between the two groups as our test statistic.When the null hypothesis is true, the timeout eﬀect TE deﬁned in (2) is equal to https://github.com/pkdd-paper/paper667 top the Clock: Are Timeout Eﬀects Real? 13 No−balance

Mahalanobis seconds ( S t ) quarter ( Q t ) scoring margin ( P t ) Propensity

Treatment Group

Control Timeouts

Fig. 3.

The control (magenta) and timeout (turquoise) seconds , quarter and scor-ing margin covariate ( X t ) distributions after all matchings using the home timeoutstreatment group and λ = 4. zero. The treatment label was permuted ten thousand times. It should be notedthat we have large samples—the number of treatment-control pairs in each testvaries from 3 ,

766 to 7 , home ( TE h ) and the away ( TE a ) teams. In addition, we have also included in the tablea 99% level conﬁdence interval generated from the p-values obtained with theMonte Carlo permutation test.There are two fundamental remarks here. First, the estimated timeout eﬀect TE is very small for all cases, practically irrelevant during a game. Rememberthat the timeout eﬀect TE deﬁned in Equation (2) is the amount by which theteam should expect its scoring margin per possession to change if a timeout wereto be called by them. It would be rather challenging to ﬁnd the minimum eﬀectsize for which it would be deemed enough to change the teams performance.It would not be a bad idea, however, to consider it to be an eﬀect of at least one marginal point. Specially because our assumption is that timeouts have aneﬀect in the short-term window represented by λ . Yet, from our results, takethe propensity matching with λ = 4, which yielded the largest absolute timeouteﬀect equal to − . .

059 points per possession inthe next 4 possessions, which can be considered negligible in a basketball game.Second, by analyzing the conﬁdence intervals, the number of signiﬁcant tests isvery small given the very large number of examples in each of them. From the 18

Table 2.

The estimated timeout eﬀect for both away ( TE a ) and home ( TE h ) timeoutsunder diﬀerent matching procedures and λ values. The respective conﬁdence intervalsfor each timeout eﬀect for 99% level is also shown. While some values are statisticallysigniﬁcant, all timeout eﬀects are negligible due to the small eﬀect sizes. λ Method TE a

99% CI TE h

99% CI2 No-balance -0.028 (-0.075, 0.020) -0.022 (-0.072, 0.028)Mahalanobis -0.032 (-0.079, 0.015) -0.017 (-0.067, 0.032)Propensity -0.044 (-0.092, 0.004) -0.021 (-0.071, 0.028)4 No-balance -0.043 (-0.082, -0.004) -0.023 (-0.063, 0.017)Mahalanobis -0.053 (-0.092, -0.013) -0.013 (-0.053, 0.028)Propensity -0.059 (-0.098, -0.020) -0.032 (-0.072, 0.008)6 No-balance -0.031 (-0.068, 0.005) -0.036 (-0.072, 0.000)Mahalanobis -0.046 (-0.083, -0.010) -0.036 (-0.072, 0.000)Propensity -0.044 (-0.081, -0.008) -0.046 (-0.082, -0.009)

Table 3.

Additional analysis for excluding or considering only the last 5 minutes. Foreach analysis, the same matching approaches were applied using the new subsets ofdata. The results are very similar to the original analysis considering the whole game.

Minus Last 5 Min Only Last 5 Min λ Method TE a

99% CI TE h

99% CI TE a

99% CI TE h

99% CI2 No-balance -0.029 (-0.082, 0.024) -0.010 (-0.065, 0.045) -0.083 (-0.115, -0.052) -0.033 (-0.067, 0.001)Mahalanobis -0.021 (-0.074, 0.032) -0.010 (-0.065, 0.045) -0.091 (-0.122, -0.060) -0.040 (-0.074, -0.006)Propensity -0.044 (-0.098, 0.009) -0.012 (-0.067, 0.043) -0.094 (-0.125, -0.063) -0.038 (-0.072, -0.004)4 No-balance -0.036 (-0.078, 0.006) -0.030 (-0.073, 0.013) -0.006 (-0.068, 0.055) 0.038 (-0.029, 0.105)Mahalanobis -0.055 (-0.097, -0.013) -0.022 (-0.065, 0.020) -0.019 (-0.081, 0.044) 0.026 (-0.041, 0.093)Propensity -0.057 (-0.099, -0.015) -0.038 (-0.081, 0.004) -0.019 (-0.082, 0.043) 0.007 (-0.060, 0.074)6 No-balance -0.044 (-0.082, -0.005) -0.041 (-0.079, -0.003) -0.046 (-0.315, 0.223) -0.032 (-0.250, 0.186)Mahalanobis -0.051 (-0.089, -0.012) -0.040 (-0.078, -0.002) -0.028 (-0.299, 0.244) -0.048 (-0.267, 0.172)Propensity -0.046 (-0.085, -0.008) -0.055 (-0.093, -0.017) -0.046 (-0.317, 0.224) -0.048 (-0.265, 0.170) tests, only 6 were statistically signiﬁcant in the α = 0 .

001 level. In fact, theseconﬁdence intervals barely include the 0 eﬀect value. Nevertheless, while thesetests were statistically signiﬁcant, they are still negligible eﬀects. Also, we wantto point it out that we did not perform any adjustment for multiplicity of tests.Indeed, such adjustment would yield a smaller number of statistically signiﬁcanttests.We went further and investigated the timeout eﬀect TE for two particularcases: (i) when the last ﬁve minutes are excluded and (ii) when only the ﬁnalﬁve minutes of the games are taken into account. For both cases, we rerunthe matching approaches on the new subsets of the data. Before executing thematching approaches, we ﬁltered out from treatment and control groups examplesthat happened within the last ﬁve minutes of the last quarter (the 4th) of eachgame for (i). In a similar fashion, we ﬁltered out from treatment and controlgroups examples that happened outside of the last ﬁve minutes of each gamefor (ii). However, because we ended up with fewer sample units available for top the Clock: Are Timeout Eﬀects Real? 15 matching, we included examples extracted from the other NBA seasons for the(ii) case, i.e., the 2014-2015, 2015-2016 and 2017-2018 seasons.The results from both of these new analysis are in Table 3. For the case ofexcluding the last ﬁve minutes, we can see that we have slightly more statisticalsigniﬁcant tests, 8 out of 18. Still, all of them have small eﬀect sizes, makingthem not practically signiﬁcant. The largest estimated eﬀect is − . TE a for the propensity score matching with λ = 4. For the case ofconsidering only the last ﬁve minutes, the only 5 statistically signiﬁcant testswere all found with λ = 2, but again, with very small eﬀect sizes. The largestabsolute eﬀect is − .

094 for propensity score matching.

In this work we proposed a causality framework to quantify the eﬀect of timeoutson basketball games. For the best of our knowledge, we were the ﬁrst to resort onthe theory of causality to solve this problem. While all previous studies pointedto a positive timeout eﬀect, by applying our causality model on a large datasetof oﬃcial NBA play-by-play data, we concluded that timeouts have no eﬀecton teams’ performance. This is another example of what statisticians call the regression to the mean phenomenon. Since most timeouts are called when theopponent team is scoring more frequently, the moments that follow resemble animprovement in the team’s performance, but are just the natural game tendencyto return to its average state. We have also stratiﬁed our analysis by eitherincluding only the last ﬁve minutes or everything but the last ﬁve minutes of allgames, but the results pointed to the same conclusion: timeouts have virtuallyno eﬀect on team’s performance.

Acknowledgments

This work is supported by the authors’ individual grantsfrom FAPEMIG, CAPES and CNPq.

References

1. Barnett, A.G., Van Der Pols, J.C., Dobson, A.J.: Regression to the mean: what itis and how to deal with it. International journal of epidemiology (1), 215–220(2004)2. Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and SelectedTopics, Volumes I-II. Chapman and Hall/CRC (2015)3. Coﬃno, M.J.: Odds-On Basketball Coaching: Crafting High-Percentage Strategiesfor Game Situations. Rowman & Littleﬁeld Publishers (2017)4. Flury, B.K., Riedwyl, H.: Standard distance in univariate and multivariate analysis.The American Statistician (3), 249–251 (1986)5. G´omez, M.A., Jim´enez, S., Navarro, R., Lago-Penas, C., Sampaio, J.: Eﬀectsof coaches’ timeouts on basketball teams’ oﬀensive and defensive performancesaccording to momentary diﬀerences in score and game period. European Journal ofSport Science (5), 303–308 (2011)6 N. Assis et al.6. Greenwell, B., Boehmke, B., Cunningham, J., Developers, G.: gbm: GeneralizedBoosted Regression Models (2019), https://CRAN.R-project.org/package=gbm , rpackage version 2.1.57. Halldorsson, V.: Coaches Use of Team Timeouts in Handball: A Mixed MethodAnalysis. The Open Sports Sciences Journal (1), 143–152 (oct 2016)8. Holland, P.W.: Statistics and causal inference. Journal of the American statisticalAssociation (396), 945–960 (1986)9. Kubatko, J., Oliver, D., Pelton, K., Rosenbaum, D.T.: A starting point for analyzingbasketball statistics. Journal of Quantitative Analysis in Sports (3) (2007)10. Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting usingmachine learning. Statistics in medicine (3), 337–346 (2010)11. Mace, F.C., Lalli, J.S., Shea, M.C., Nevin, J.A.: Behavioral momentum in collegebasketball. Journal of Applied Behavior Analysis (3), 657–663 (1992)12. Neyman, J.: edited and translated by dorota m. dabrowska and terrence p. speed(1990). on the application of probability theory to agricultural experiments. essayon principles. section 9. Statistical Science (4), 465–472 (1923)13. Normand, S.L.T., Landrum, M.B., Guadagnoli, E., Ayanian, J.Z., Ryan, T.J.,Cleary, P.D., McNeil, B.J.: Validating recommendations for coronary angiographyfollowing acute myocardial infarction in the elderly: a matched analysis usingpropensity scores. Journal of clinical epidemiology (4), 387–398 (2001)14. Pearl, J.: Causality. Causality: Models, Reasoning, and Inference, Cambridge Uni-versity Press (2009)15. Pearl, J., Mackenzie, D.: The Book of why: The New Science of Cause and Eﬀect.Basic Books (2018)16. Permutt, S.: The Eﬃcacy of Momentum-Stopping Timeouts on Short-Term Per-formance in the National Basketball Association. Ph.D. thesis, Haverford College.Department of Economics (2011)17. Pimentel, S.D.: rcbalance: Large, Sparse Optimal Matching with Reﬁned Covari-ate Balance (2017), https://CRAN.R-project.org/package=rcbalance , r packageversion 1.8.518. Roane, H.S., Kelley, M.E., Trosclair, N.M., Hauer, L.S.: Behavioral momentum insports: a partial replication with women’s basketball. Journal of Applied BehaviorAnalysis (3), 385–390 (2004)19. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observa-tional studies for causal eﬀects. Biometrika (1), 41–55 (1983)20. Saavedra, S., Mukherjee, S., Bagrow, J.P.: Is coaching experience associated witheﬀective use of timeouts in basketball? Scientiﬁc reports , 676 (2012)21. Sampaio, J., Lago-Pe˜nas, C., G´omez, M.A.: Brief exploration of short and mid-termtimeout eﬀects on basketball scoring according to situational variables. EuropeanJournal of Sport Science (1), 25–30 (2013)22. Siva, J.M., Cornelius, A.E., Finch, L.M.: Psychological Momentum and Skill Per-formance: A Laboratory Study. Journal of Sport and Exercise Psychology (2),119–133 (mar 1992)23. Stuart, E.A.: Matching methods for causal inference: A review and a look forward.Statistical science (1), 1 (2010)24. Zetou, E., Kourtesis, T., Giazitzi, K., Michalopoulou, M.: Management and ContentAnalysis of Timeout during Volleyball Games. International Journal of PerformanceAnalysis in Sport8