[PDF] Rugby-Bot: Utilizing Multi-Task Learning & Fine-Grained Features for Rugby League Analysis

Abstract

Sporting events are extremely complex and require a multitude of metrics to accurate describe the event. When making multiple predictions, one should make them from a single source to keep consistency across the predictions. We present a multi-task learning method of generating multiple predictions for analysis via a single prediction source. To enable this approach, we utilize a fine-grain representation using fine-grain spatial data using a wide-and-deep learning approach. Additionally, our approach can predict distributions rather than single point values. We highlighted the utility of our approach on the sport of Rugby League and call our prediction engine "Rugby-Bot".

Full PDF

RR UGBY -B OT : U TILIZING M ULTI -T ASK L EARNING &F INE -G RAINED F EATURES FOR R UGBY L EAGUE A NALYSIS

A P

REPRINT

Matthew Holbrook

STATS, LLCChicago, IL [email protected]

Jennifer Hobbs

STATS, LLCChicago, IL [email protected]

Patrick Lucey

STATS, LLCChicago, IL [email protected]

October 17, 2019 A BSTRACT

Sporting events are extremely complex and require a multitude of metrics to accurate describe theevent. When making multiple predictions, one should make them from a single source to keepconsistency across the predictions. We present a multi-task learning method of generating multiplepredictions for analysis via a single prediction source. To enable this approach, we utilize a ﬁne-grainrepresentation using ﬁne-grain spatial data using a wide-and-deep learning approach. Additionally,our approach can predict distributions rather than single point values. We highlighted the utility ofour approach on the sport of Rugby League and call our prediction engine “Rugby-Bot”. K eywords Mixture Density Network · Multi-Task Learning · Embeddings · Rugby League · DVOA

For the past decade in sports analytics, the holy grail has been to ﬁnd “the one best metric” which can best capturethe performance of players and teams through the lens of winning. For example, expected metrics such as wins-abovereplacement (WAR) in baseball [1], expected point value (EPV) and efﬁciency metrics in basketball [2, 3] and expectedgoal value in soccer [4] are used as the gold-standard in team and player analysis. In American Football, Defense-Adjusted Value Over Average (DVOA) [5], is perhaps the most respected and utilized advanced metric in the NFLwhich utilizes both an expected value and efﬁciency metric, and analyzes the value of a play compared to expected forevery play and also normalizes for team, drive and game context. Win probability has also been widely used as thego-to metric for analyzing plays and performance across sports [6, 7, 8].Although the works mentioned above are all impressive and have improved analysis and decision making across sports,we believe the overall hypothesis of having a single metric to explain all performance is limiting. By its very nature,sport is extremely complex and lends itself to many queries for different contexts and temporal resolutions. Instead ofhaving a single predictor (or metric), we believe we need many predictors to enable these analyses. For example, if ateam has won the title, we want predictors that highlight their dominant attributes across the season (i.e., offensivelyand/or defensively strong, and if so which locations and times). Or if a team has won or lost a single match, we wantpredictors which highlight which plays or play was decisive or of note.While we believe that a multitude of predictors are required to enable multi-resolution analyses, we believe there shouldbe a single source or brain that generates these predictors. For example, if we are predicting the expected point valueof a drive in American Football and that jumps based on a big play, that should be correlated/co-occurring with ajump with the win-probability – otherwise this would possibly be contradictory and cause the user not to trust themodel. Additionally, we believe the predictors should go beyond “expected metrics” which compare solely to the leagueaverage and produce distributions to enable deeper analysis (i.e., instead of comparing to the league average, why can’twe compare against the top 20% or 10% of teams). To effectively do this we need four ingredients: i) a ﬁne-grainedrepresentation of performance at the play-level which can be aggregated up to different resolutions, ii) ﬁne-grain spatial a r X i v : . [ c s . L G ] O c t PREPRINT - O

CTOBER

17, 2019data to enable the representation, iii) a multi-task learning approach where we train the predictors simultaneously, andiv) a method which can generate distributions instead of just point values.In this paper, we show how we can do this. We focus on the sport of Rugby League which has an abundance of spatialinformation available across many seasons. Speciﬁcally, we focus our analysis on the 2018 NRL Season which was theclosest regular season in the 110-year history of Rugby League in Australia (and the history of all sport). Our singlesource prediction engine – which we call Rugby-Bot – showcases how our engine can highlight how we can uncoverhow the league was won. In Figure 1 we showcase Rugby-Bot and in the next section we describe it.

Rugby League is a continuous and dynamic game played between 2 teams of 13 players (7 forwards and 6 backs) across2x40 minute halves with the goal of “scoring a try” and obtaining points by carrying the ball across the opponent’sgoal-line. Rugby League has numerous similarities to American Football: it is played on a similar size pitch (100m x70m) with a “set of 6 tackles” per possession (in place of “4 downs”) in which a team has the opportunity to score a try.The scoring convention of Rugby League is: 4 points for a try and 2 points for the subsequent successful conversion; 2points for a successful penalty kick; and 1 point for a successful ﬁeld-goal.The most popular Rugby League competition in the world is the “National Rugby League (NRL)”, which consists of 16teams across Australia and New Zealand. The NRL is the most viewed and attended Rugby League competition in theworld. The 2018 NRL season was particularly noteworthy as it was the most competitive regular season in history of theleague, and we believe in the history of sport. After 25 rounds of the regular season competition (where each team plays24 games), the top 8 teams were separated by one win. Speciﬁcally, the top four teams (Roosters, Storm, Rabbitohs andSharks) all ﬁnished with 16 wins. The next four teams (Panthers, Broncos, Warriors and Tigers) ﬁnished with 15 wins.The minor premiership – which goes to the team that ﬁnishes ﬁrst during the regular season was determined by pointdifferential – saw the Sydney Roosters win it for the fourth time in the past six years. To achieve this feat, the Roostershad to beat the Parramatta Eels by a margin of 27 points on the last day of the regular season, which they did with a44-10 victory. The result meant they pushed the Melbourne Storm out of top spot, with a point differential of just +8.Given the closeness of the competition, it would be useful to have many measurement tools to dissect how the leaguewas won – which is what Rugby-Bot enables.An example of Rugby-Bot tool is depicted in Figure 1. In the example we show a complete “set” (i.e., Rugby League’sequivalent of a “drive” in American Football) from the Round of 25 of the 2018 NRL Season with the Roosters inpossession against the Eels. In the “Set Summary” (Second Row - left), Rugby-Bot tracks the likelihood the Roosterswill score a try (exTrySet) during that set at the start of each tackle. We see that they initially ﬁeld the ball in terribleﬁeld position and only a 3.5% likelihood of scoring. However, a great return boosts their exTrySet to 7.4% at the startof their ﬁrst tackle. A huge run during the second tackle raised their exTrySet further to 11.5%, and good central ﬁeldposition on the fourth tackle sees it grow to 15.2%. Unfortunately for the Roosters, after a bad ﬁfth tackle they elect tokick in an attempt to recover the ball and turn the ball over. Rugby-Bot also allows us to analyze that decision making,which we will explore later. We can dig further into the spatial context of the big run by exploring the expected metersgained as a function of ﬁeld location in the “Contextual Spatial Insights” panel (Second Row – right). This panel showsthe expected meters as a function of not only the ﬁeld position, but the full context: tackle number, team, scoreline, andopponent. In the plot we see the advantaged gained by starting a tackle near the middle of the pitch.An important concept in Rugby League is that of “Momentum” during a set of six tackles. This is done in Rugby-Botby comparing the current exTrySet to the average for that context (ﬁeld position, time, scoreline, team identities). In theexample shown in Figure 1 (Third Row), we see a boost in momentum both from the runback after the initial possessionas well as the big run on tackle 2. Rugby-Bot enables us to identify big runs not just in terms of the meters gained, butin how exceptional that run was given that context. Next to the video player we see a “Big Play Highlight” indicatorshowing the distribution of meters gained expected under the conditions of tackle 2 and where the actual ran fell in thisdistribution, indicated by the red arrow.Finally, in the bottom row, we see the Scoreline Prediction tracker. In this plot, we not only track the current scoredifferential (red dashed line), we predict the end-of-game score differential. Note that because of our approach, whichwe will explore in the next section, our predictor is not just a ﬁnal value, but a distribution, displayed here as the 90-10%conﬁdence interval (black lines) around the mean (green line). We will dive into analyzing this plot more in Section 5.2

PREPRINT - O

CTOBER

17, 2019

Set Summary T T T T T Contextual Spatial Insight

Expected Meters

Scoreline Prediction

SYD

PAR

Game Clock

ActualMean

H1 26:30

Big Run

Expected Meters

Context: Tackle Number, Score, Field Position

Big Play Highlight

Set Momentum T T T T T Round 25

September 1, 2018Venue: ANZ Stadium T S c o r e D i ff e r e n t i a l Figure 1: An example of our Rugby-Bot tool used to analyze a Round 25 match from the 2018 NRL Season whichconsists of the following components: (Top) Shows the video and game information, as well as an indicator of a“highlight” using our anomaly detector. (Second Row) Shows the spatial information of each tackle in addition tothe Expected TrySet Value for that set at each tackle. Additionally, “Contextual Spatial Insights”, can be gleaned bylooking at the expected meters gained for each location on the ﬁeld under for a given context. (Third Row) The “SetMomentum” is monitored by tracking the current Expected Try Value at each tackle relative to average value. (BottomRow) The ﬁnal scoreline predictor tracks the current scoreline (red) and ﬁnal prediction distribution (mean in green,90-10% conﬁdence interval in black). 3

PREPRINT - O

CTOBER

17, 2019

Rugby-Bot is designed to analyze the state of the game across multiple time resolutions: play, set, and game. That is, inany given situation, we would like to know how likely various outcomes are. In particular, we predict: • Play Selection [ y p ]: likelihood a team will perform an offensive kick, defensive kick, run a normal offensiveplay on the next tackle • Expected Meters (Tackle) [ y m ]: predicted meters gained/lost on the next tackle • Expected Try (Tackle) [ y tt ]: likelihood of scoring a try on the next tackle • Expected Try (Set) [ y ts ]: likelihood of scoring a try at some point during that set • Win Probability [ y w ]: likelihood of winning the game • Scoreline Prediction [ y s ]: predicted ﬁnal score of each teamSimply constructing a model “or specialist” for each task leaves much to be desired: it fails to share information andstructure that is common across tasks and is an (unnecessary) hassle to maintain in a production environment. Insteadwe strategically frame this problem as a model whose output can be thought of as a “state-vector” of the match. Rugby-Bot is built on data from the 2015-2018 National Rugby League seasons which includes over 750 games andmore than 250,000 distinct tackles (i.e., plays). For our analysis, the continuous event data was transformed into asegmented dataset with each data point representing a distinct tackle in the match. At each tackle, we have informationabout the position of the ball, the subsequent play-by-play event sequence, the players and teams involved, the team inpossession of the ball, as well as game-context (e.g. tackle number, score, time remaining). Data is split 80-train/20-testfor modeling.

Our approach draws on three key approaches used throughout the machine learning and deep learning communities.First, it draws on “wide and deep” models commonly used for recommendation tasks [9]. Second, we cast this asa multi-task learning approach, predicting all outcomes from the same model, thereby sharing common featuresthroughout. Finally, we use a mixture density network which enables us to produce a distribution of outcomes, therebycapturing the uncertainty in the network. The advantage to this is three-fold. First, the Bayesian nature of MDNsafford us the ability to treat all losses in terms of a probabilistic likelihood and therefore enable us to do regressionand classiﬁcation tasks simultaneously. Next, it is intrinsically multi-task which forces the model to learn globallyrelevant features about the game. Finally, by modeling the outputs as a multi-dimensional distribution, it captures therelationships between the outputs as well as providing uncertainties about the predictions. The model architecture isshown in Figure 2 and we will explore how each component contributes to the prediction task and the advantages thatthis architecture offers.

Our goal is to leverage both spatial and contextual information to make predictions about the outcome at the play,possession, and game-level. Some of the challenges in modeling this data include: • Raw spatial data is low-dimensional and densely represented • Spatial features are inherently non-linear • Contextual features (e.g. teams, tackle) are discrete and may be sparse • The relationship between the spatial and contextual features is non-linearTo address these points, we draw on inspiration from the wide and deep models of [9]. Following the approachesthere, we pass each categorical feature through its own deep portion of the network to construct higher-level, densefeatures from the originally sparse inputs, creating an embedding vector. These embeddings are then concatenated withthe remaining dense features (such as time and position) before passing through 2 ﬁnal dense layers to capture theirnon-linear relationships.A uniqueness to our approach is in how we represented the interactions of certain categorical (i.e. contextual) featuressuch as season and team/opponent identity. As is common practice, we represented each as a one-hot vector. As4

PREPRINT - O

CTOBER

17, 2019

Team ID, Season ID Opponent ID, Season IDTackle

Back-To-BackPosition (x, y)Position (x, y) v Time Remaining v Mixture Density Layer y m , y tt , y ts , y w , y s Score Di ﬀ erence Fully ConnectedFully Connected

Figure 2: Our base network architecture is a Mixture Density Network (MDN). Categorical features as well as ﬁeldposition ( x , y ) are passed through embedding layers (green) to create a dense representation. These are then concatenatedwith the continuous features (blue) and passed through two fully connected layers (black). The mixture density layerconsists of 5 mixture each with a prediction for µ and σ . Expected meters ( y m ), expected try tackle ( y tt ), expected tryset ( y ts ), win probability ( y w ), and ﬁnal scoreline ( y s ) are all predicted simultaneously.opposed to creating an embedding for season, team, and opponent independently, however, we instead concatenated theone-hot vector of season with the one-hot vector of team (and similarly for opponent). This enables us to more directlyshare identity information across seasons, capturing the notion that teams maintain a level of similarity across yearswhile also picking up on league-wide trends over different seasons. We used a similar approach to capture the tacklenumber (1-6) and back-to-back sets (represented as a binary ﬂag).The other key innovation in our approach is the manner in which high-level features are extracted from the ﬁeld position;this addresses the ﬁrst point above. The ﬁeld position is a dense, two-dimensional x, y input. In a traditional wide anddeep model, this value would simply be concatenated at the embedding layer. However, the shallowness of the wideportion of these models prevents extraction of higher-level spatial features. To address this, we treat ﬁeld position asboth a sparse and dense feature.For the sparse portion, we pass the two-dimensional position through several layers just like the categorical features.Note that we initially boost from the original two-dimensions up to 50 at the ﬁrst hidden layer. This gives thenetwork sufﬁcient dimensionality to “mix” the inputs and create the desired higher-level features. This is a key insight.Traditionally, the low dimensionality of the spatial input has been addressed by fully discretizing the ﬁeld position,that is, by treating the ﬁeld as a grid and representing the position as a 1-hot vector indicating the occupancy in thatbin [10, 11]. This has several limitations. First, the discrete representation is inherently lossy as positions are effectivelyrounded during the binning process. Second, the resulting dimensionality may be massive (e.g. a 70m x 100m ﬁeldbroken into m × m bins results in a 63,000 dimensional input). While we would like to increase our dimensionalityabove 2, expanding to the thousands or tens-of-thousands is unnecessarily high and results in an extremely sparserepresentation, requiring more neurons and more data. Our approach avoids these pitfalls by working directly on theraw spatial input, moderately increasing the dimensionality in the ﬁrst hidden layer, and then enabling the network toextract important spatial relationships directly from the data.Finally, because of the importance of the ﬁeld position in all of these prediction tasks, we also include the position againas a dense feature. This enables us to capture both low-level (i.e. raw), as well as high-level (i.e. embedded) spatialinformation. Multi-task learning has been commonplace within the machine learning because of its ability to exploit commonstructures and characteristics across tasks [12]. Often a difﬁcult with multi-task learning comes when one would like toperform both regression (which would use a loss such as RMSE) and classiﬁcation (which would use a loss such ascross entropy) simultaneously. For example, in predicting both win probability and ﬁnal score-line, how much should a1% likelihood in winning be weighted relative to a 1pt change in the score? Even when all tasks are of the same type(classiﬁcation or regression), how each loss is scaled and/or weighted can be somewhat arbitrary: how should 1m errorin the predicted meters gained be waited against 1pt error in the ﬁnal scoreline?Mixture density networks (MDN) are a class of models which combine neural networks and mixture density models [13].As such it is naturally Bayesian and allows us to model the conditional probability distribution p ( t | x ) where t are the5 PREPRINT - O

CTOBER

17, 2019parameters which describe the distributions generating our game-state vector [ y p , y m , y tt , y ts , y w , y s ] and x is our inputfeature vector. Therefore, we have a single loss for our prediction: the negative log-likelihood that a game-state isobserved given the current game-state. This dramatically simpliﬁes our loss function. Additionally, the loss over the fulldistribution constrains the model to learn the relationships between the outputs, for example, that it would be unlikelyto observe a high value for meters gained but very low value for an expected try.The “mixture” part of the MDN allows us to capture the multi-modality of the underlying distribution. This producesvarious “modes” of the game, such as lopsided contests or very close scenarios. MDNs also give us insight into theuncertainty of our models “for free”. This enables us to generate a distribution of outcomes - as opposed to just a mean -for any given prediction.Cross entropy is used as the loss for training. Our test loss is .051. Note that this is a single loss for across theentire state-vector: even if several components in a prediction are very good, the loss may be quite high if the state isinconsistent (e.g. a positive ﬁnal scoreline prediction, but a low win probability are inconsistent and therefore wouldhave a high loss). Similar to the decision-making opportunities on 4th-down in American Football (punt, kick, go-for-it), in Rugby Leagueteams must decide whether to kick defensive (equivalent to a punt), kick offensively (in an attempt to regain the ball forback-to-back sets), or “go-for-it” on the last tackle. For this task we use a simple logistic regressor. We elect to modelthis decision separately because it only pertains to the last tackle and is therefore “silent” on most downs unlike theother tasks which are active every single tackle. The test loss for this model is .64.

Football Outsider’s DVOA (Defense-adjusted Value Over Average) has become arguably the most popular advancedstatistic in the NFL. It aims to correctly value each play and its impact on scoring and winning. DVOA factors in thecontext such as opponent, time, and score to compare performance to a baseline for a situation [5]. It can be brokendown and applied in several ways, but the most popular application is to measure team strength.Rugby-Bot can be used to create to a DVOA for Rugby League. Like American Football, Rugby League has segmentedplays where the ﬁght for ﬁeld position is key and not all yards/meters are equal even though they appear so in the boxscore. The Expected Try (set) prediction generates the predicted chance of scoring during the set for every play. Theprediction, like football’s DVOA, considers the score, time, opponent, and location. Taking the difference between thisexpected value and the actual value we can see which teams outperform their expectation. To create a more interpretable‘DVOA’ the values are scaled with a mean of 0. A positive offensive DVOA and negative defensive DVOA correspondto strong offenses and defenses, respectively.We can use our Rugby League DVOA to analyze the 2018 NRL Season as seen in Table 4. 2018 was an extraordinarilyclose season in which the top 4 teams all had the same number of points and wins. The top two teams had similar metersfor and against, however, the context under which those meters were gained and conceded strongly favored the Roostersas indicated by their DVOA differential. The champion Roosters had the top DVOA per play mostly due to their strongdefense, surpassing the next strongest defensive team (the Storm) by more than 1.5 defensive DVOA pts per play.Next we examine the Roosters form over the course of the season. Figures 3A and 3B break down the offensive anddefensive strength over the season. In Figure 3A we see that initially the Roosters got off to a slow start, largely on-parwith the league average (they started the season with a 4-4 record). However, by mid-season, they began to ﬁnd theirform. The offensive DVOA allows us to track how the Roosters went from average to great in the 2nd half of the season.Defensively, the Roosters were dominant all season long as seen in Figure 3B. They maintained their strong form evenas the league worsened in Defensive DOVA.With our individual play valuation we can dive deeper in the Roosters success and reveal where their defense outper-formed their opponents. Figure 3C shows that the Roosters defense was average during plays occurring in the ﬁrst ofthe ﬁeld. (Note that both the Roosters and the league average have a negative DVOA in these situations as they rarelyresult in scoring.) However, in critical moments, in the ﬁnal 25m of the pitch, they were incredibly strong in the last 25yards with their Defensive DVOA being maintained at 0.12 while the league-average shot to over 0.15 in key situations.6 PREPRINT - O

CTOBER

17, 2019Table 1: The points, wins, losses, meters for and against, meter differential (for-against), offensive and defensive DVOA,and differential DVOA (offensive-defensive) is shown for each team during the 2018 NRL season. A large positivenumber for Offensive DVOA and a large negative number for Defensive DVOA indicate a strong offense and defense,respectively.

Rank Team Pts Wins Losses Meters For Meters Against Meters Diff DVOA Offense DVOA Defense DVOA Diff1 Roosters 34 16 8 542 361 181 1.62 -5.75 7.372 Storm 34 16 8 536 363 173 0.57 -4.17 4.743 Rabbitohs 34 16 8 582 437 145 2.37 -0.97 3.354 Sharks 34 16 8 519 423 96 1.72 -1.84 3.565 Panthers 32 15 9 517 461 56 0.39 -1.67 2.056 Broncos 32 15 9 556 500 56 0.99 0.29 0.77 Dragons 32 15 9 519 472 47 3.62 1.92 1.78 Warriors 32 15 9 472 447 25 1.96 0.91 2.879 West Tigers 26 12 12 377 460 -83 -2.77 1.41 -1.3510 Raiders 22 10 14 563 540 23 3.62 1.92 1.711 Knights 20 9 15 414 607 -193 -0.28 3.6 -3.8812 Bulldogs 18 8 16 428 474 -46 -3.57 0.29 -3.8613 Cowboys 18 8 16 449 521 -72 -2.58 0.95 -3.5414 Titans 18 8 16 472 582 -110 -0.85 3.74 -4.5915 Sea Eagles 16 7 17 500 622 -122 1.5 4.42 -2.8816 Eels 14 6 18 374 550 -176 -3.96 2.15 -6.11 D V O A C u m u l a t i v e D V O A C u m u l a t i v e D V O A A Tackle Meter LineRound CB Round

Figure 3: (A) The cumulative Offensive DOVA for the Roosters (red) and the League-average (green); the Roostersgot off to a slow start offensively then began to pull away from the ﬁeld. (B) The cumulated Defensive DVOA forthe Roosters (red) and League-average (green); the Roosters dominated defensively (indicated by a negative value fordefense) through-out the season. (C) Comparison of the Roosters to the League when their opponents were on offenseand in normal ﬁeld position versus the near the goal (within the 75m line).

We return now to the critical Round of 25 matchup between the Roosters and the Eels which was shown earlier. Recallthat to clinch the title, the Roosters needed to win this match by 27 points. In Figure 1, we viewed ﬁnal scorelineprediction only a few minutes into the match when the Roosters lead by only 8 pts. Is the title within reach? Toanswer this question we need to be able to predict not only the ﬁnal score differential, but the uncertainty around thismeasurement. Rugby-Bot enables us to do this. Similar to the approaches used in [5] our MDN allows us to predict arange of likely ﬁnal scorelines at each point in the match.Now we plot the full scoreline predictor over the course of the match in Figure 4. The actual score is shown in red andthe predicted mean score in green with the 90-10% conﬁdence interval shown in black around it. Note that although weare showing the mean here, the mean need not be the most likely value; as our model is a mixture of gaussians it isinherently multimodal and the most likely outcome could occur at a mode away from the mean. At any point in thegame we can view the full shape of the predicted distribution, but here the mean and 90-10 bounds are presented forvisual simplicity. Also note that because it is a mixture model, the distribution complex and not necessarily symmetric:the values of the mean, 90-bound, and 10-bound trend together, but are not identical.7

PREPRINT - O

CTOBER

17, 2019

SYD

PAR

Game Clock

ActualMean S c o r e D i ff e r e n t i a l Figure 4: (A) The predictor of the ﬁnal scoreline for the Round of 25 matchup between the Roosters and Eels. Theactual score is shown in red, the mean of the predicted distribution is shown in green, and the 90-10% conﬁdenceinterval is shown in black.There are several interesting observations we can draw from this analysis. First, with 10 minutes remaining in the ﬁrsthalf, the game appears to be largely over: the lower 10% bound has moved about a score differential of 0, making acomeback unlikely. It dips back down ever so slightly after the start of the second half and stays here for the next fewminutes until by H2 33:32, the 90-10 interval never includes 0 for the remainder of the game. At this point we can saythe same is effectively over. This is interesting in recreating the story of the game, but is also useful for analyses such asquantifying “garbage time” or understanding the risk impact of substitutions to rest key players late in a game.Although the game is “over” shortly into the second half, the fate of the season is still uncertain. The Roosters must winby 27 and at H2 33:32, they are up only by 14. Furthermore, even the 90-10 conﬁdence interval does not include a 27point victory for the Roosters. This does not occur until H2 18:13 when the Roosters go up by 22. Now the title iswithin reach.Note that as the game progresses, the width of the conﬁdence interval shrinks. Early in the game the outcome is veryuncertain, but as the game progresses, the range of possible outcomes shrinks until it is identically known once thegame is over. Even in the remaining few minutes, there is still some uncertainty as to the ﬁnal score, as there is always apossibility for a last-minute score or turnover. However, the model has learned that at most the score will ﬂuctuate by isthe value of a single try and conversion.

Rugby-Bot allows us to determine the value of each play in a game. To demonstrate this, we can compare two playsfrom the Round 25 Eels v Roosters Game. It is obvious that a big run will increase a team’s chance of scoring, but notall meters are created equal. Figure 5 shows two different sets in the match; on the left is the same set in Figure 1, andon the right is another set occurring near the end of the ﬁrst half.We can evaluate tackle 3 from the example shown on the left, and tackle 4 in example shown on the right in Figure 5.Both runs were over 22 meters and over 95th percentile of predicted expected meters. In example shown on the left, therun is made closer to the goal line and earlier in the set. This causes the exTrySet value to go up 3.8% while the run inexample shown on the right is late in the set and far from the goal line so it only increases the exTrySet 1.2%. Thisquantiﬁes the conclusion that an observer would create that the meters gain in the example of the right were not asvaluable as the chances of scoring are still miniscule. Taking these deltas between plays allows us to pinpoint the playsin the match that are impactful outside of just yards or scoring.To further highlight how good this play on the left of Figure 5 was – we can use a distribution of predictions to easilyidentify big plays and analyze individual play performance. Figure 6 visualizes the 22 meter run through a differentlens. The blue dot corresponds to the starting position of the play and the red dot is where the play ended up. We cansee based on the context ( x, y ) starting position, tackle number and game context), this play was in the top 5% of plays(or in the 95th percentile). Effectively, we can value each play and determine whether it was within the “top-plays” inany situation. 8 PREPRINT - O

CTOBER

17, 2019 T T T T T T T T T T Figure 5: The likelihood of scoring a try at each tackle (exTrySet) is shown for two different sets. Plays have beennormalized so the offensive team is always going left to right.Table 2: Within the opponent’s 20 meter line, teams consistently make conservative decisions, electing to kick whenrunning produces more than twice the expected points.Play Decision Frequency Expected PointsRun 32% 1.15Kick 68% 0.53

The ﬁrst ﬁve tackles in a set in Rugby League are largely uniform in their objective: score a try or gain the most meterspossible towards the tryline. On the last tackle, however, teams may elect to kick defensively (similar to a punt inAmerican Football), kick offensively (in an attempt to score or regain the ball for back-to-back sets), or run a standardplay in an attempt to score as they have to turn the possession of the ball over to the opposition at end of the tackle.Rugby-Bot enables us to predict what a given team is likely to do, analyze the outcome of that decision, and assess theimpact of that decision on the game relative to another choice having been made.Given the last tackle is often the most exciting and variable play – it enables us to investigate some interesting behavioralquestions. One such question is “do teams make the best decision on the last tackle, or are they naturally conservativein their play calling?” To answer this requires us to analyze both the position as well as context surrounding the sixthtackle. Using our predictions of play call (run, defensive kick, offense kick) and our play values, we can value eachdecision on each part of the ﬁeld. In Table 2, we see that in the “Red Zone” attempting to run the ball for a Try is morevaluable than kicking. Despite this fact, teams tend to be conservative and attempt to run it on only 32% of plays.Table 5.3 displays how the most aggressive team in the 2018 season, the Canberra Raiders, also have the most pointsover expected. Two of the best teams, the Roosters and Storm, do not run the ball often on ﬁnal tackle. This could bebecause, as more talented teams, they can afford to be risk averse.Returning to our example set from Round 25 (Figure 1), we saw the Roosters in outstanding ﬁeld position before theﬁnal tackle, with an exTrySet value well above average. However, instead of running the ball, which they had donewith great success during that set, they elected to kick to the far touch-line, consistent with their style of running farless than any other team in the league on the ﬁnal tackle. The ball was mishandled, lost out of bounds, and possessionwas lost to the Eels. The Eels mounted an extended drive to the shadow of the Roosters’ goal, only to fail to convert.The Roosters’ defense again came to their rescue. Overall, the analysis shown in Table 3is effectively the “Red Zoneefﬁciency”, which teams can utilize to see how effective they are in the most important area of the ﬁeld.

In this paper, we presented a multi-task learning method of generating multiple predictions for analysis via a singleprediction source. To enable this approach, we utilized a ﬁne-grain representation using ﬁne-grain spatial data using awide-and-deep learning approach. Additionally, our approach can predict distributions rather than single point values.We highlighted the utility of our approach on the sport of Rugby League and call our prediction engine “Rugby-Bot”.Utilizing our slue of prediction tools, we showed how Rugby-Bot could be used to discover how the Sydney Roosters9

PREPRINT - O

CTOBER

17, 2019Table 3: 2018 Season performance on last tackle. Each team has roughly the same Expected Points on their ﬁnal tackle,but the actual points obtained varies dramatically.

Team Expected Points Actual Points Over Expected Run (%)

Run (%)

Rank Over Expected RankCanberra Raiders 0.56 1.04 0.47 41.67% 1 1Brisbane Broncos 0.54 0.97 0.43 37.18% 6 2Penrith Panthers 0.56 0.78 0.22 38.39% 5 3Newcastle Knights 0.56 0.73 0.18 39.80% 2 4Cronulla-Sutherland Sharks 0.52 0.68 0.16 30.58% 9 5Manly-Warringah Sea Eagles 0.55 0.66 0.11 25.89% 10 6North Queensland Cowboys 0.56 0.65 0.09 19.58% 15 7South Sydney Rabbitohs 0.55 0.6 0.04 24.47% 13 8Sydney Roosters 0.58 0.57 -0.01 16.16% 16 9Gold Coast Titans 0.54 0.5 -0.04 38.46% 4 10Canterbury Bulldogs 0.55 0.51 -0.04 33.79% 8 11St. George Illawarra Dragons 0.55 0.51 -0.05 39.39% 3 12Melbourne Storm 0.53 0.48 -0.06 25.24% 11 13New Zealand Warriors 0.53 0.46 -0.07 20.56% 14 14Wests Tigers 0.56 0.42 -0.13 37.17% 7 15Parramatta Eels 0.55 0.38 -0.17 25.00% 12 16 th Percentile Figure 6: The blue dot corresponds to the starting position of the play and the red dot is where the play ended up. Usingour distribution analysis, we can see that this play was in the top 5% of plays (or in the 95th percentile).won the 2018 NRL minor premiership (the closest in the history of the sport). Additionally, we showed how it can beused to value each individual play, accurate predict the outcome of the match as well as investigate play behavior inRugby League – speciﬁcally behavior on the last tackle.

References [1] Steve Slowinski. What is war? https://library.fangraphs.com/misc/war/ , February 2010.[2] Dan Cervone, Alexander D’Amour, Luke Bornn, and Kirk Goldsberry. Pointwise : Predicting points and valuingdecisions in real time with nba optical tracking data. 2014.[3] Dean Oliver.

Basketball on paper : rules and tools for performance analysis . Washington, D.C. : Brassey’s, Inc,1st ed edition, 2004. Includes bibliographical references and index.[4] Patrick Lucey, Alina Bialkowski, Mathew Monfort, Peter Carr, and Iain Matthews. Quality vs quantity: Improvedshot prediction in soccer using strategic features from spatiotemporal data. In

Proc. 8th annual mit sloan sportsanalytics conference , pages 1–9, 2014.[5] Football Outsiders. Methods to our madness. .[6] Michael Beuoy. Updated nba win probability calculator updated nba win probability calculator. .10

PREPRINT - O

CTOBER

17, 2019[7] Konstantinos Pelechrinis. iwinrnﬂ: A simple, interpretable & well-calibrated in-game win probability model fornﬂ. arXiv e-prints , page arXiv:1704.00197, April 2017.[8] Sujoy Ganguly and Nathan Frank. The problem with win probability. In

MIT Sloan Sports Analytics Conference ,2018.[9] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson,Greg Corrado, Wei Chai, Mustafa Ispir, et al. Wide & deep learning for recommender systems. In

Proceedings ofthe 1st Workshop on Deep Learning for Recommender Systems , pages 7–10. ACM, 2016.[10] Yisong Yue, Patrick Lucey, Peter Carr, Alina Bialkowski, and Iain A. Matthews. Learning ﬁne-grained spatialmodels for dynamic sports play prediction. In , pages 670–679. IEEE Computer Society, 2014.[11] Andrew Miller, Luke Bornn, Ryan Adams, and Kirk Goldsberry. Factorized point process intensities: A spatialanalysis of professional basketball. In

International Conference on Machine Learning , pages 235–243, 2014.[12] Rich Caruana. Multitask learning.