Simulation-Based Decision Making in the NFL using NFLSimulatoR
NNoname manuscript No. (will be inserted by the editor)
Simulation-Based Decision Making in the NFL usingNFLSimulatoR
Benjamin Williams · Will Palmquist · RyanElmore
Received: date / Accepted: date
Abstract
In this paper, we introduce an R software package for simulating playsand drives using play-by-play data from the National Football League. The simula-tions are generated by sampling play-by-play data from previous football seasons.The sampling procedure adds statistical rigor to any decisions or inferences arisingfrom examining the simulations. We highlight that the package is particularly use-ful as a data-driven tool for evaluating potential in-game strategies or rule changeswithin the league. We demonstrate its utility by evaluating the oft-debated strat-egy of “going for it” on fourth down and investigating whether or not teams shouldpass more than the current standard.
Keywords sports statistics · analytics · National Football League
Correspondence should be addressed to Benjamin Williams.Benjamin WilliamsDepartment of Business Information and Analytics, Daniels College of Business, University ofDenverE-mail:
Will PalmquistDepartment of Business Information and Analytics, Daniels College of Business, University ofDenverE-mail:
Ryan ElmoreDepartment of Business Information and Analytics, Daniels College of Business, University ofDenverE-mail:
[email protected] a r X i v : . [ s t a t . A P ] F e b Williams, Palmquist, and Elmore
Data-driven decision making is a ubiquitous strategy in today’s marketplace andis becoming increasingly common amongst professional sports organizations. Fromthe Major League Baseball’s Moneyball movement Lewis (2003) to the Moreyballstrategies employed by the National Basketball Association’s Houston Rockets(Walsh, 2019), analytics are no longer the sole purview of the academy as teamstry to improve their performance by investigating the data. The general consensus,however, is that the National Football League (NFL) lags behind other professionalsports leagues in their use of analytics (Clark, 2018). This does seem to be chang-ing, as evidenced by a recent hiring trend of data analysts to NFL teams (Loque,2019), as well as within the league office in New York.In the NFL, perhaps the most widely debated research question regards thedecision of going for it on fourth down. This has also been the subject of sev-eral academic articles (Romer, 2006; Yam and Lopez, 2019), the New York Times“4th down bot” (Burke and Quealy, 2013; Causey et al., 2015), and a new calcu-lator from sports analyst Ben Baldwin (Baldwin, 2020a). The consensus amongresearchers and analysts is NFL coaches tend to be too conservative in their fourth-down calls, often preferring to kick the football (punt or field goal attempt) whenthe data suggests they should pass or run the ball.While the decision to go for it on fourth down is much discussed, there are aplethora of other strategies a team may wish to investigate. Potential strategiesfor deeper investigation range from the frequency and type of plays run, the useof a team’s (limited) timeouts during a game, defensive alignment, and so on.The seemingly infinite possibilities for NFL strategy evaluation made us wonderhow one could determine which strategies offer the best chance of winning. Someattempt has been done for strategies such as passing versus running the football.In the sole peer-reviewed article that we are aware of, Kovash and Levitt (2009)found NFL teams did not pass as much as they should. Hermsmeyer (2018) simi-larly noted, in an article for the data journalism website fivethirtyeight.com, thateven though the NFL has transitioned to become a more passing heavy league,teams should still pass more. Apart from the passing versus rushing and fourthdown decision making, there is a lack of research regarding NFL strategies in theliterature.In this paper we present an R software package,
NFLSimulatoR , and an ana-lytically rigorous method for analyzing NFL strategies. Our method consists ofsimulating strategies via the sampling of NFL play-by-play datasets that were re-alized in previous seasons. This simulation method is flexible and allows for theinvestigation of many possible strategies and offers a tool for informed decisionmaking with respect to sport performance. We have embedded the simulationframework into an open source software package to share the method with thebroader sports analytics community. The rest of the paper is outlined as follows.
In the next section we present the R software package we wrote for simulating NFLstrategies. Section 3 describes the use of the software package for the two strate-gies we have discussed thus far: fourth down decision making and passing versusrushing. Finally, we offer some concluding thoughts about using (and contributingto) the package moving forward and other discussions in the final section. ecision Making in the NFL 3
The ideas presented in this paper are, in part, inspired by a blog post by MikeLopez, currently the Director of Data and Analytics for the NFL, in which heused a simulation-based approach to investigate a potential overtime rule changein the NFL (Lopez, 2019). In contrast to the one-off solution presented on hisblog, we provide a robust software platform for assessing NFL strategies in the
NFLSimulatoR
R package. Our desire is for the wider analytics community to usethis package, extend our work, and study other strategies in a analytically soundmanner.The ideas embedded in
NFLSimulatoR are simple, yet extremely powerful. Thekey feature is that we rely on simulations of actual NFL play-by-play data toevaluate potential strategies. We define a strategy broadly as any set of princi-pled decisions consistently made by an NFL team during a game. An example,albeit possibly extreme, is for a team to employ only passing plays rather thana mixture of passes and runs while on offense. This is a simple strategy, but onewe can nonetheless examine using our package. To examine a particular strategy,we sample plays satisfying the criteria of the strategy at hand. Going back to oursimplistic example, we would sample only passing plays if we wanted to see whathappened when a team only passes the football.Sampling data to make estimates, inferences, or decisions about a larger pop-ulation is at the core of statistics and lends important rigor to our method. In ourpackage, we select probability samples according to a simple random sample withreplacement from our population of interest (NFL play-by-play data) to produceunbiased and representative results. An excellent resource for more on statisticalsampling can be found in Lohr (2010).The package relies on NFL play-by-play data available via the NFL’s Applica-tion Programming Interface. These datasets are accessible within R using eitherthe nflscrapR or the nflfastR
R packages (Horowitz et al., 2020; Carl et al., 2020)(or by downloading it directly from the nflscrapR-data or nflfastR-data websites(Yurko, 2020; Baldwin, 2020b). The
NFLSimulatoR package includes two functions, download nflscrapr data() and download nflfastr data() , for directly download-ing regular-, pre-, or post-season NFL play-by-play data from either source forseveral years, currently from 2009 - 2019. Each year contains approximately 48,000plays of data. In addition, we include a function called prep pbp data() to elim-inate extraneous information and prepare the NFL data for use in
NFLSimulatoR functions.Our package is built primarily on the function sample play() . This functionsamples from NFL data according to a given strategy for a particular down anddistance. The strategy is passed to the function via the strategy parameter. Downand distance information refer to what down it is (1 - 4), how many yards arerequired for a first down, and the yardline at which the play occurs (1 - 99).The down is passed to the function via the what down parameter, the distance togo is passed via the yards to go parameter, and the yardline is passed via the yards from own goal parameter. Our sampling is done randomly and so we areconfident in the outcomes from the simulations. However, some combinations ofsampling parameters (strategy, down, distance, yardline) rarely occur in an NFLgame. For example, it may be there are few or no plays where a team had theball on 3rd down, on the 47th yardline, with 15 yards to go for a first down, and
Williams, Palmquist, and Elmore chose to run the ball. In such cases we widen our sampling range to include playsfrom yardlines close the to the yardline of interest or with one less yard to go fora first down (the user can also choose a window to expand the yardline selectionvia the window yards from own goal parameter). We have built flexibility into the sample play() function so the user can seamlessly implement it in their uniquesettings.The other main function of interest in the package is called sample drives() .This allows the user to simulate a series of plays by one team (a drive) followingsome specific strategy versus another team employing a “normal” strategy. By“normal” we mean the plays of the opposing team are simply sampled at randomfrom all plays without a specific strategy in mind. The sample drives() functionshows how a specific strategy is expected to perform if implemented during anNFL game when the opposing team is employing the status quo. The functioncan either sample drives until one team scores, or it can sample a single drive andreturn the outcome of the drive (i.e., touchdown, field goal, punt, or turnover).By simulating many drives one can identify statistics such as expected points perdrive and proportion of drives resulting in a score for a variety of strategies. The sample drives() function takes parameters for the number of simulations to berun ( n sims ), the starting yardline of the simulations ( from yard line ), the strat-egy ( strategy ), and if the simulation is of a single drive ( single drive ). Within sample drives() , the function down distance updater() updates the down, dis-tance, and yards to go and then samples the next play from all plays satisfyingthe updated criteria.To demonstrate the use of this software and to offer an idea of how to extendour work, we provide two strategies in the package. The first is a strategy related tofourth-down decision making and the second is associated with how often a teamshould pass (or run) the football. Within the fourth down strategy we includeseveral sub-strategies to make a decision about going for it or not on fourth down.As mentioned above, the fourth down strategy has been studied in the academicdomain, see e.g. Yam and Lopez (2019) and Romer (2006). We include it in thismanuscript due to its popularity and to give our own perspective on this well-known problem. In the next section, we discuss these two strategies in more detail.The
NFLSimulatoR package is available on CRAN (Comprehensive R ArchiveNetwork) and the latest developmental version is available on github. Addingthis package to CRAN was an important step to make sure our package passedrigorous software checks and to make installation simpler. At the time of thiswriting,
NFLSimulatoR has been available on CRAN for less than two months andhas over 1000 downloads, showing the enthusiasm for the package by the sportanalytics community. Additional package details related to issues, recent changes,etc. can be found at the NFLSimulatoR website. The package can be installedwithin R using either option given below. install.packages("NFLSimulatoR") install.packages("remotes")remotes::install_github("rtelmore/NFLSimulatoR") ecision Making in the NFL 5 empirical sub-strategy. Here, our functions simplyselect the fourth down play at random from among all similar plays (i.e., similarwith respect to down, distance and yardline). The majority of the time this willbe a punt or field goal attempt, but there are occasions where a team may try fora fourth down (perhaps if there is very little yardage needed for a first down andthe yardline is close to the opposing endzone). The second sub-strategy is alwaysgo for it and samples non-kicking plays from the given down and distance. In thissub-strategy we do not require the sampling to be exclusively from fourth down plays. In fact, we expand the pool of potential plays to sample from on each ofdowns two through four. That is, we sample from downs d and d − d , for d = 2 , ,
4. We assume the impact of, and mental anxiety among, playersdue to it being fourth down is negligible because the defensive team would havesimilar anxieties, the players are professional and should be more immune to such
Williams, Palmquist, and Elmore inhibitions, and because previous literature followed this procedure. The third sub-strategy is never go for it and in it the team always punts or kicks a field goal.This offers us a conservative strategy to study, and we simply sample kicks (andtheir outcome) from the given location.The fourth sub-strategy is go for it if yardage is smaller than Y . Here we let theuser set the parameter Y to be the value of the yards required for a first down.If the distance for a first down is less than or equal to Y the strategy says to gofor it, and to kick if the distance is greater than or equal to Y . This allows theexamination of a stricter sub-strategy but one offering a trade-off between alwaysgo for it and never go for it . This sub-strategy is likely more palatable for NFLteams since having a rule to go for it on fourth if there is always less than, say, 1yard to go for a first down might be more acceptable than always going for it. Thefinal sub-strategy is expected points . Here we use the expected points estimatedfrom the nflscrapR R package to find the expected points at each yardline onthe field. We further empirically estimate the probability of gaining a first downand making a field goal. Then we solve for the expected value of going for it,punting it, and kicking a field goal. The decision is made by selecting the choicewhich maximizes this expected points value. This last sub-strategy is the mostanalytically reliant, and best mirrors current literature. Because we offer thesesub-strategies within a free software package they can be re-run each season asmore data becomes available allowing analysts to make recommendations whichinclude the most recent NFL data.We compare these sub-strategies by plotting the percent of drives resulting inno score, a field goal, or a touchdown for the five sub-strategies. For the go forit if yardage is smaller than Y option we let
Y=5 . For this and subsequent fourthdown analyses, we only keep plays occuring before the final 2 minutes of each halfof the game and only plays where one team is within 28 points of the other. Thisallows us to remove any plays that result from extreme decision making becausethe outcome of the game is all but determined. We use play-by-play data fromboth 2018 and 2019.For the simulations, we generate 10000 drives for each sub-strategy startingat the 25 yard line for all plays from these two regular seasons. This correspondsto the usual starting position to begin a half or after an opposing team scores(assuming the kickoff is a touchback). For each drive we use the sample drives() function and set the single drive argument equal to
TRUE . Thus, we only careabout simulating one drive and storing its outcome for each simulated drive. Inother words, we start each drive with first down and ten yards to go from the 25and sample plays accordingly. The summarized results are displayed Figure 1.From this figure we see the never go for it strategy offers the largest probabilityfor scoring on a single drive with the majority of the scores coming from fieldgoals. The expected points strategy has the second largest percentage of simulateddrives resulting in a score, followed by yardage smaller than 5 yards , and then the empirical sub-strategy. For further investigation, in Table 1 we examine the percentof drives resuling in a field goal (FG) or touchdown (TD), the average score per drive (assuming a touchdown always results in 7 points), and a 95% confidenceinterval for the average score, for the 5 sub-strategies. ecision Making in the NFL 7 P e r c en t o f D r i v e s Fig. 1
The percentage of simulated drives that resulted in no score (green), a field goal(orange), or a touchdown (purple) in 2018 and 2019, for the fourth-down sub-strategies
Table 1: Description of 10000 simulations (2018 and 2019 data) foreach fourth down sub-strategySub-strategy % of DrivesEnding in FG % of DrivesEnding in TD MeanScore Lower 95%CI for Score Upper 95%CI for ScoreAlwaysgo for it 0% 33% 2.28 2.22 2.35Empirical 15% 22% 1.96 1.91 2.02Expectedpoints 28% 19% 2.19 2.14 2.25Never gofor it 33% 15% 2.03 1.98 2.08Yardsless than5 11% 27% 2.24 2.18 2.30From Table 1, the fourth down sub-strategy with the largest average pointsper simulated drive is always go for it (average of 2.28 points) followed by yardagesmaller than 5 yards (average of 2.24 points), and expected points (average of 2.19points). We also see the always go for it sub-strategy is boom or bust resulting inonly touchdowns or no scores. Interestingly the yardage smaller than 5 yards hasan average score similar to always go for it , yet it does recommend field goals to betaken. The confidence interval for the yardage smaller than 5 yards mean score is also narrower than that for always go for it . Taking this into account along with thefact that the averages of these two sub-strategies are so close, a recommendationfor a team nervous about always going for it on fourth down might be to alwaysgo for it if there are less than five yards to go for a first down, regardless of fieldposition. Figure 1 shows this strategy will produce scoring drives more often andhas nearly the highest average score per drive.If a team wishes to pursue this sub-strategy (going for it on fourth if the yardsto go is less than five yards) a logical next question is: what about other yardsto go values? That is, what if the team went for it if the yards required for afirst down are 4, or 6, or something else entirely? Figure 2.1 shows the percent ofdrives resulting in a score for a range of Y values. Figure 2.2 displays the average(and 95% confidence interval) score per drive for the various Y values, and Figure2.3 gives the average (and 95% confidence interval) yardline at which the ball isturned over when the drive does not result in a score. Williams, Palmquist, and Elmore
Yards less than P e r c en t S c o r e FGTD2.02.12.22.32.4 1 2 3 4 5 6 7 8 9 10
Yards less than A v e r age S c o r e Yards less than A v e r age T u r no v e r Y a r d li ne Fig. 2 yardage less than Y yards sub-strategy as a function of Y ; 2.3: Average turnover yardline resulting from the yardage lessthan Y yards sub-strategy as a function of Y In Figure 2.1 the largest percent score value (of about 38%) is nearly exactlyachieved by Y values of 3, 4, and 5. Figure 2.2 shows the Y values of 8 has the topaverage score per drive values, and this average decreases as Y decreases. Figure2.3 shows the average turnover yardline gets further away from the offensive teamsgoal for larger values of Y . Taking all this together, a value of 5 yards may be thebest option for the fourth down sub-strategy go for it if yardage less than Y becauseit has nearly the highest percent score value, a higher average score than all smaller Y values, a more advantageous average turnover yardline than all larger Y values,and (speculatively) may be more acceptable by NFL coaching staffs than a valueof, say, Y = 8.Here, we caution the reader that this is by no means a causal investigation offourth down strategies. Indeed, we could further analyze the data by evaluating theperformance of a specific sub-strategy amongst better or worse teams, but do notdo so as our primary purpose is to demonstrate the usefulness of the NFLSimulatoR package and its core functionality.3.2 Run/Pass Percentage
Kovash and Levitt (2009) and Hermsmeyer (2018) argue NFL teams should passmore often. In this section we investigate this thesis using the simulation-basedapproach of
NFLSimulatoR . Though perhaps simple on its surface, examining astrategy having to do with the proportion of plays that are a pass instead of arun proves interesting. Even if the NFL is not as analytically forward as other ecision Making in the NFL 9
Proportion Pass P e r c en t S c o r e Fig. 3
The percentage of simulated drives that resulted in a score (touchdown or field goal)in 2018 and 2019. The dashed line represents the actual proportion of passing plays on first,second, and third downs in both years. professional sports leagues, the league seems to be trending towards passing more.The
NFLSimulatoR package includes a strategy allowing the user to study the effectof passing the ball more or less often.When employing this strategy in the sample play() or sample drives() func-tions, the argument p must be included as a parameter. p is the probability agiven offensive play on first, second, or third down is a pass. To keep the strategystraightforward, we follow an empirical procedure when the play to be sampledis a fourth down. That is, when a fourth down situation arises in the sample, weassume the play is simply sampled from all fourth down plays at the given yardline(or within a neighborhood of the yardline) and distance to go until a first down.Fourth down plays sampled at their regular rates usually result in a punt or a fieldgoal attempt. By varying p we can study how pass proportion affects statisticssuch as the expected points per drive, the proportion of drives resulting in a score,among a host of other metrics.Figure 3 shows the proportion of simulated drives resulting in a score for theoffensive team (field goal or touchdown) in 2018 and 2019. Note that we includea vertical dashed line showing the league-wide proportion of passing plays on firstthrough third downs. This proportion of passing (running) plays on first throughthird downs was roughly 59% (41%) in both 2018 and 2019. At first inspection thisfigure suggests passing more often results in scoring less on average. Obviously thisinitial glance requires more scrutiny and indeed, subsetting by the type of score reveals additional insight. Specifically, Figure 4 shows the same data subsetted bythe type of score: either a touchdown or field goal. There is a clear trend showingmore touchdowns are scored as the proportion of plays that are passes increases.Next, we look at the percentage of drives resulting in a score broken down bythe quality of the team. In this case we subset by whether or not a team made the Proportion Pass P e r c en t S c o r e Fig. 4
The percentage of simulated drives that resulted in either a touchdown (orange) or afield goal (green) in 2018 and 2019. The dashed line represents the actual proportion of passingplays on first, second, and third downs in both years. playoffs, and use playoff appearance as a proxy for quality. To do this we simulateone set of drives using plays from teams that made the playoffs and another set ofdrives for teams that did not. Figure 5 shows drives using plays from the betterteams (i.e., playoff teams) tend to result in a score more often when employing aheavier passing-based strategy than the drives from non-playoff teams.Again, we stress that our approach is not causal in any sense of the imagination.That is, we are not saying that passing more will necessarily lead to more scores,particularly if the team has a sub-standard quarterback. This result, of course,is likely confounded by playoff teams (traditionally) having better quarterbacks.Thus, we next subset the pool of plays by each team’s overall passer rating (RTG)and sample plays from three distinct pools: High, Medium, and Low passer ratingteams. A team in the pool of High passer rating teams had an overall rating fallinginto the upper-most tercile of teams. The pools for Medium and Low are similarlydefined. The results of the simulated drives using these groups are displayed inFigure 6. Here, we see the teams in the upper tercile of passing ratings score moretouchdowns as the proportion of passing plays increases than teams in the othertwo groups. However, the percent field goals scored as a function of the proportionof passing plays is similar for all the three team groupings.
Our overall conclusion, based on these simulations, is that passing more shouldlead to a higher percentage of touchdowns scored. This conclusion is not uniformlytrue across all types of teams, however. That is, the better teams, or those teamswith a higher-quality quarterback relative to the rest of the teams in the league,will benefit much more than the others. ecision Making in the NFL 11
FG TD
Proportion Pass P e r c en t S c o r e Fig. 5
The percentage of simulated drives that resulted in a score by type (touchdown or fieldgoal) in 2018 and 2019 colored by playoff teams (orange) versus non-playoff teams (purple).
FG TD
Proportion Pass P e r c en t S c o r e Fig. 6
The percentage of simulated drives that resulted in a score by type (touchdown orfield goal) in 2018 and 2019 colored by overall team passer rating classification: High (green),Medium (purple), and Low (orange).2 Williams, Palmquist, and Elmore
Even though the NFL has existed since 1920 teams are still seeking inefficenciesin the game to exploit. There always seems to be a brilliant new coach ready tointroduce a new strategy to push teams to more success. The purpose of creating
NFLSimulatoR is to give the wider community a tool to examine a multitude ofNFL strategies. The package contains a set of robust and statistically sound toolsto simulate plays and drives to examine NFL game plans. This package will alsoage well as it can be continually updated with data from the most recent NFLseason. Another use for the package is to examine the ramifications of rule changesby the league. This would allow the league to take a data-driven approach to suchchanges. One example of a rule change that has been debated is eliminating thekickoff after a score.In the package we include two strategies of interest, passing versus rushing theball and going for it or not on fourth down. We have examined each strategy in thispaper as examples of possibilities for the package. We imagine many extensions ofour work, including strategies regarding whether to run or throw on first down,what play works best after a penalty or timeout, and what plays to run in the firstor last few minutes of a game or quarter.For reproducibility we have included code to generate simulation data used inthis paper as an article on the package website. This allows others to replicate ourwork and see firsthand how to implement the various functions we have discussedin this paper.We welcome collaboration from the sports analytics community and hope forcontributions to our package, which are easy to make given its open-source nature.The initial reception of our package in the analytics community (over 1000 down-loads within the first two months of widespread availability) is incredibly humblingand encouraging. The fact that there are so many possible analysis options of thesestrategies makes us more excited about the existence of the
NFLSimulatoR package,because now the wider sports analytics community can take our initial work andextend it. As an example, very recently a new model-based fourth down decisionmaker was introduced by Ben Baldwin, an author of the previously mentioned nflfastR package Baldwin (2020a). This is exactly the sort of contribution wehope will be added to the
NFLSimulatoR package. Such a strategy could be inte-grated and tested within the simulation based framework we created and sharedwith the community at large. We look forward to what new strategies will be de-vised and tested and hope to see even more analytics used in the NFL and othersports leagues.
References
Baldwin, B. (2020a). NFL fourth-down decisions: The math behind the league’snew aggressiveness. https://theathletic.com/2144214/2020/10/28/nfl-fourth-down-decisions-the-math-behind-the-leagues-new-aggressiveness/ .Baldwin, B. (2020b). nflfastR-data repository. https://github.com/guga31bb/nflfastR-data .Burke, B., Quealy, K. (2013). New York Times Forth Down Bot. . ecision Making in the NFL 13 Carl, S., Baldwin, B., Sharpe, L. (2020) nflfastR: Functions to Efficiently AccessNFL Play by Play Data . https://github.com/mrcaseb/nflfastR , r package ver-sion 3.2.0.Causey, T., Katz, J., Quealy, K. (2015). A better 4th down bot: Giving anal-ysis before the play. .Clark, K. (2018). The nfl’s analytics revolution has arrived. .Hermsmeyer, J. (2018). For a passing league, the nfl still doesn’t passenough. https://fivethirtyeight.com/features/for-a-passing-league-the-nfl-still-doesnt-pass-enough/ .Horowitz, M., Yurko, R., Ventura, S. (2020) nflscrapR: Compiling the NFL Play-by-Play API for easy use in R . https://github.com/maksimhorowitz/nflscrapR ,r package version 1.8.3.Kovash, K., Levitt, S. D. (2009). Professionals do not play minimax: Evidencefrom Major League Baseball and the National Football League. NBER WorkingPapers 15347 .Lewis, M. (2003).
Moneyball: The Art of Winning an Unfair Game . W. W. Nortonand Company, New York City.Lohr, S. L. (2010).
Sampling: Design and Analysis . Brooks/Cole.Lopez, M. (2019). Estimating nfl drive outcomes under rules that don’t exist. https://statsbylopez.netlify.app/post/resampling-nfl-drives/ .Loque, J. (2019). Ravens bolster their analytics department with new front officeadditions. .Romer, D. (2006). Do firms maximize? Evidence from professional football.
Journalof Political Economy https://digitalmarketing.temple.edu/rwalsh/2019/03/15/moreyball-and-its-effect-on-the-art-of-basketball/ .Yam, D. R., Lopez, M. J. (2019). What was lost? A causal estimate of fourth downbehavior in the National Football League.
Journal of Sports Analytics https://ryurko.github.io/nflscrapR-data/https://ryurko.github.io/nflscrapR-data/