OOn Estimating the Ability of NBA Players
Paul Fearnhead and Benjamin M. TaylorMay 31, 2018
Abstract
This paper introduces a new model and methodology for estimating the ability of NBAplayers. The main idea is to directly measure how good a player is by comparing howtheir team performs when they are on the court as opposed to when they are off it. Thisis achieved in a such a way as to control for the changing abilities of the other playerson court at different times during a match. The new method uses multiple seasons’ datain a structured way to estimate player ability in an isolated season, measuring separatelydefensive and offensive merit as well as combining these to give an overall rating. The useof game statistics in predicting player ability will be considered. Results using data fromthe 2008/9 season suggest that LeBron James, who won the NBA MVP award, was thebest overall player. The best defensive player was Lamar Odom and the best rookie wasRussell Westbrook, neither of whom won an NBA award that season. The results furtherindicate that whilst the frequently–reported game statistics provide some information onoffensive ability, they do not perform well in the prediction of defensive ability.
Keywords: Defensive Ratings, Game Statistics, Offensive Ratings, Rating NBA players
The most basic rating systems for professional basketball players are simple (or not so simple)functions of ‘positive’ statistics such as free throw percentage and the number of steals as wellas ‘negative’ statistics like the number of turnovers and personal fouls. One example is thecomputer ranking procedure used to assign the 1998 IBM player award (the function used isdetailed in equation 1 of Berri (1999)). Although such rating systems yield some informationon player ability, they usually offer neither a justification for the functional form, for the1 a r X i v : . [ s t a t . A P ] A ug hoice of player statistics used, for the ‘value’ ascribed to each statistic in calculating therating nor an estimate of the precision.One alternative is to model an outcome variable (like whether or not the team won) as afunction of game or player statistics, for example by using a regression model. Provided thedata is well described according to the chosen model, standard statistical techniques may beapplied to overcome and provide answers to the issues mentioned above. One of the earliestpublished attempts to model professional basketball team performance was by Zak et al.(1979) in which a Cobb–Douglas production function was used to model the ratio of finalscores against game level statistics including those four mentioned above. Although this paperdid not seek to rate individual players, the authors mention that such an approach is feasibleusing their method.Berri (1999) models team wins as a linear combination of player and game level statistics, theidea being to learn which statistics are valuable in predicting team wins. Unfortunately theproblem with ascribing team wins to individual performances over entire games ignores someimportant information, namely that at any one time there are only 10 active players.Although this issue is addressed in his paper by adjusting for the time spent by each playeron court, there is still some loss of information as only certain combinations of players meeteach other during the game. Moreover, Berri’s method though ingenious, is somewhatintricate and difficult to justify at a methodological level: it is still not clear why the range ofstatistics considered should be preferred over another candidate set. The reliance on thesestatistics means that methods such as Berri’s tend to be overly complicated; it is the opinionof the present authors that it does not matter how a net point difference is achieved, simplythat it is achieved.Thanks to modern game charting techniques and records, much more information is nowavailable to researchers (Kubatko et al., 2007). Not only are player–level summary statisticsavailable, but some organisations provide time–lines of important events during each game.With this information, it is possible to infer which players were on pitch at any time in thegame, thus allowing an alternative modelling perspective. A basketball game consists of asequence of time intervals in which no substitutions occur. Each of these intervals can bethought of as a ‘small game’ in which the players on court remain constant; the outcome ofthe game as a whole being the sum of outcomes of the small games. By comparing how theresults of these small games depends on the players on the court, the relative ability of the2ifferent players can be measured. For example if player A is substituted by player B then therelative ability of these two players is measured by looking at how well their team performswhen player A is on the court as compared to player B.Formally, the new model introduced in this article measures each player’s offensive anddefensive ability. The expected number of points per possession for a given team during a‘small game’ is modelled as a linear function of the offensive ability of the 5 players of thatteam who are on the court and in possession and the defensive ability of the 5 players on theopposing team, adjusting for home advantage. This approach is similar to that of Rosenbaum(2004) and Ilardi and Barzilai (2008). The main difference compared with Rosenbaum (2004)is that the author estimates only a combined ability for each player. The model presentedhere further uses a structured approach to combining information from multiple seasons.In the model of Ilardi and Barzilai (2008) (which has been used by Macdonald (2010) in thecontext of estimating the abilities of NHL players) the home advantage parameter only actswhen the home players are in possession, whereas in the present article it also modelled as aneffect when they are defending. This article further extends the results of Ilardi and Barzilai(2008) by providing a combined measure of player ability and a method for analysing resultsfrom a single season informed by data from previous seasons. Furthermore the utility of gamestatistics in predicting offensive and defensive ability is also considered here.A summary of the paper is as follows. In section 2, the data collection process will bereviewed and the new models for within and between year analyses presented. In section 3 theability of players at the end of the 2008–2009 season are estimated. In section 4, the use ofgame statistics in inferring player ability is addressed. The article ends with a discussion. The information required to fit the proposed model is available from the ESPN website in theform of individual game ‘play–by–play’ and ‘box score’ records (ESPN, 2010). For eachregular season match, the box score pages provide summary statistics by player and theplay–by–play pages give a detailed record of events over the course of the match. The3lay–by–play pages consist of a list of important events on court together with the time atwhich the event occurred. Using the available information on substitutions, together with thelist of ‘starters’ (players on court at the start of the match) from the box score page, it ispossible to infer exactly which 10 players are on court at any time during the match andfurthermore exactly which team is in possession at any time. The latter requires the mildassumption that between time records, possession remains in one team and that all changes ofpossession are recorded.Any quarter or overtime period of a game may therefore be split up into small intervals oftime in which the players on court remain constant (this will be taken as the definition of theword ‘interval’ in this paper). The duration of these intervals, the number of possessions foreach team and the number of points gained (or conceded) in that space of time can beinferred from the play by play data. In the course of one season, there are of the order of30000 such intervals, so some data error is to be expected. In the analysis to follow, anyinterval in which it was not possible to infer the exact 10 players on court was excluded fromthe analysis; this restriction meant that approximately 10% of all available data was excluded.Using the game summary information from the box score page, it was possible to cross checkinferences from the play–by–play pages for games with complete information (ie those gamesin which there were no apparent errors), this included total time on pitch by player and totalinferred game time; the corroboration between these two sources was excellent.
Suppose there are currently N players in the NBA; let { α i } Ni =1 and { β i } Ni =1 denote respectivelythe attacking and defensive ability of each player. For each game let H be the the home teamand A the away team. Suppose the number of intervals in the game is n and for interval k , thenumber of possessions for team H is n Hk and for team A is n Ak (so that n = (cid:80) k ( n Hk + n Ak )).If team H (resp. A ) comes into possession in interval k , let y Hk (resp. y Ak ) be the totalnumber of points scored by team H (resp. A ) in the interval. This gives the model:100 × y Hk = n Hk (cid:32) (cid:88) i :home player i on court α i − (cid:88) j :away player j on court β j + γ + σ √ n Hk (cid:15) Hk (cid:33) , (1)100 × y Ak = n Ak (cid:32) (cid:88) i :away player i on court α i − (cid:88) j :home player j on court β j − γ + σ √ n Ak (cid:15) Ak (cid:33) , (2)4here (cid:15) Ak and (cid:15) Hk are independent standard normal random variables, σ is a positive scalingparameter fro the noise and γ is a constant representing the home advantage overapproximately half of the course of a match (see below). A typical basketball game has of theorder of 100 possessions, hence the function of the multiplicative factor on the LHS of (1) and(2) is to scale the estimates of the α s and β s so that they can be interpreted at the moremeaningful game level; a similar idea was used by Rosenbaum (2004); Ilardi and Barzilai(2008).The interpretation of the parameters is that, for a set of 5 home players K and 5 differentaway players L , where K , L ⊂ { , . . . , N } , assuming n Hk = n Ak for all k , the quantity, (cid:88) k ∈K ( α k + β k ) − (cid:88) l ∈L ( α l + β l ) + 2 γ, would be the expected score difference between the home and away players over a period oftime approximately equal to that of a typical NBA game without overtime. A possiblecriticism of the above model is the use of a Gaussian error term, since the chosen outcomevariable is always non–negative. The model is justifiable on the basis of a central limittheorem argument: a game consists of a series of around 100 alternating possessions, theoutcome of the game being the difference in the sum of points scored by each team in theirshare of possessions (Harville, 2003). Although the parameters α i and β j are not identified bythis model (which means that the likelihood of the observed data is unchanged if the sameconstant is added to each α i and β j ) their relative difference is estimable, which allows ratingsand standard errors to be constructed directly using the posterior density of the unknownparameters.The parameters γ and σ were treated as fixed, which is justifiable on the basis that there is alot of information in the data on these quantities. The other parameters in the model wereassumed to be drawn from Gaussian distributions, which represent the variability in offensiveand defensive ability of NBA players, α i ∼ N ( µ α , σ α ) , (3) β i ∼ N ( µ β , σ β );the posterior will therefore also be Gaussian. The fixed parameters and prior5yperparameters were estimated by maximum likelihood (ML) and are given in Table 1.Parameter µ α σ α µ β σ β γ σ Estimate 9.82 2.55 -9.12 1.82 1.43 106.8Table 1: Prior hyperparameter estimates.The interpretation of these parameters is that (5 µ α − µ β + 2 γ ) = 0 .
98 is the expectednumber of points scored by a team on their home court (see (1)) in one possession; thestandard error is 1 . β s is smaller than that for the α s, this indicates that under the proposed model, playersare more similar in terms of defensive ability than in terms of offensive ability. The MLestimate of γ suggests that there is a home–court advantage of around three points, which wassimilar to the figure of 3.6 found by Entine and Small (2008), using data from two seasons. A further extension is to consider the strength of players over a number of years. This is mostsimply achieved by using the model described above in equations (1) and (2) for each year.For the first year the priors given by (3) are used, then for subsequent years, a model isintroduced to describe how a player’s ability may change from year to year. Let α ( t ) i and α ( t − i be the respectively the offensive ability of player i at the start of season t and at theend of season t −
1; define similar terms for the defensive abilities, β i . A simple way to allowfor a change in ability between seasons is to assume, α ( t ) i = pα ( t − i + (1 − p ) µ α + s α (cid:15) i , β ( t ) i = pβ ( t − i + (1 − p ) µ β + s β (cid:15) i , (4)where s α , s β > (cid:15) i iid ∼ N (0 , t , then he is assigned anormal prior, as per (3). The two effects of this end–of–year transition are firstly that p shrinks each parameter estimate towards the prior mean, µ α or µ β ; secondly, by adding either N (0 , s α ) or N (0 , s β ) noise, the uncertainty in each of the estimates is increased whilst some ofthe correlation structure learned to–date is preserved (the off–diagonal correlations beingshrunk towards zero). Increasing the uncertainty in the estimates at the end of the year isimportant because it allows each player’s ability to vary over time: one might expect a player6o (at least initially) improve with experience, but also they may have been injured during theprevious season or change teams at the end. Using data from the 2006/2007 – 2008/2009seasons, the respective maximum likelihood estimates of p , s α and s β were 0 .
83, 1 .
23 and 0 . It is often of interest to estimate how well each player has performed in just the most recentseason, one example being the decision as to who should receive the annual NBA awards.Whilst it is possible to obtain such estimates using multiple years data and the methodologydiscussed thus far, the estimates of ability obtained from these models can be heavilyinfluenced by performance in previous seasons (consider the example of the combined abilityof Kevin Garnett in the results section). For rookie players, the methodology discussed doesnot present any problems as there is no information from previous years on the ability ofthese players. For non–rookie players on the other hand, data from earlier seasons is stilluseful because it enables better estimates of how good the other players are, but the mannerin which this information should be included requires care. In this section, a method forhandling information from previous seasons is presented, this method may be used toestimate the abilities of non–rookie players in a particular season.Suppose interest is in estimating how well player i performed in the most recent season. Theapproach suggested here is to use data from the most recent season to estimate the parametersassociated with player i ; but data from all seasons to estimate the parameters associated withthe other players. The simplest way of implementing this is to analyse all seasons’ data, as inSection 2.3, but changing the prior distribution for α i and β i for the most recent season sothat it is independent of data in previous seasons. If t denotes the most recent season, this isachieved by imposing the prior α ( t ) i ∼ N ( µ α , σ α ) and β ( t ) i ∼ N ( µ β , σ β ); rather than using (4).The posterior distribution for α ( t ) i and β ( t ) i under this model is a measure of how well player i performed in that season. This method reduces the standard errors of parameter estimatescompared to those that would be obtained with a single season’s data under model (1)–(2).The model in this section will henceforth be referred to as model (2.4).7 Results
In this section the 2009 season results will be presented. Tables 2, 3 and 4 give respectivelyoffensive, defensive and combined player ratings, with standard errors in parentheses, andrankings for the top ten players under the three scenarios described. The ratings and rankingsin the second column of these tables correspond to model (1)–(2) using the 2007–2009 data; inthe third column are results from the same model, but only using the 2009 data; and thefourth column gives the results from model (2.4). The parameter estimates in these tables arecentred.Under model (1)–(2), the 2007–2009 data gives greater accuracy in predicting player abilitycompared to using the 2009 data only; however it may not always be appropriate to useinformation from previous years. The estimates in the third column are therefore the bestestimates of player abilities using only 2009 season information, and those in the fourth arethe overall best estimates for this season using all available information.For the 2009 season only (columns three and four), there was much similarity between theresults from model (1)–(2) and model (2.4); both models identifying the same top threeplayers in the same order for the combined ratings. One of the interesting point from thesetables is that using the 2007–2009 data, Kevin Garnett (the most highly paid player in 2009)is identified as the number 3 player, but by using only the 2009 data, he ranks 13th/12th.The likely cause of this is the fact that during a game against the Utah Jazz, Garnett strainedhis right knee: he was forced to miss the next 14 games and played in four further gamesbefore missing the final 25 games due to a right knee sprain (Associated Press, 2009; Spears,2009). Thus considering data from one season in isolation does not give a complete picture ofa player’s ability.In the Bayesian framework advocated in this article, it is of interest to compute the posteriorprobability that one player is stronger than another. For players A and B , the marginal jointposterior density of their respective combined abilities is multivariate Gaussian. It is thereforestraightforward to compute the posterior probability that the combined ability of player A islarger that that of player B . These probabilities were computed for a subset of players usingthe 2007–2009 data, the results are in Table 5. It is also straightforward to compute the8007–2009 data 2009 data 2007–2009 dataPlayer Name Model (1)–(2) Model (1)–(2) Model (2.4)Steve Nash 7.62 (1.53), 1 5.05 (1.77), 2 4.79 (1.68), 5LeBron James 7.08 (1.5), 2 4.96 (1.78), 3 6.47 (1.66), 1Chris Paul 6.87 (1.7), 3 4.51 (1.87), 5 5.02 (1.75), 4Dwyane Wade 6.69 (1.56), 4 5.28 (1.78), 1 5.59 (1.73), 2Kobe Bryant 5.91 (1.55), 5 3.51 (1.8), 11 4.15 (1.66), 8Carmelo Anthony 5.33 (1.52), 6 3.95 (1.75), 7 4.09 (1.72), 9Dirk Nowitzki 5.1 (1.52), 7 2.73 (1.77), 31 3.53 (1.64), 12Pau Gasol 4.89 (1.36), 8 4.28 (1.72), 6 4.15 (1.56), 7Kevin Martin 4.67 (1.42), 9 3.61 (1.66), 10 3.87 (1.63), 10Michael Redd 4.34 (1.68), 10 3.16 (1.91), 17 3.23 (1.88), 19Danny Granger 4.14 (1.4), 14 4.58 (1.64), 4 5.05 (1.59), 3Brandon Roy 4.08 (1.51), 15 3.91 (1.73), 8 4.18 (1.67), 6Lamar Odom 2.9 (1.39), 33 3.72 (1.62), 9 3.49 (1.54), 13Table 2: Centred offensive ratings showing mean (standard error), rank under model.2007–2009 data 2009 data 2007–2009 dataPlayer Name Model (1)–(2) Model (1)–(2) Model (2.4)Kevin Garnett 4.07 (1.31), 1 2.47 (1.52), 9 2.51 (1.47), 8Bruce Bowen 4.04 (1.28), 2 2.62 (1.45), 3 2.26 (1.43), 16Kurt Thomas 3.47 (1.25), 3 2.29 (1.49), 13 1.92 (1.48), 28Lamar Odom 3.38 (1.21), 4 3.41 (1.39), 1 3.2 (1.34), 1Chuck Hayes 3.15 (1.37), 5 0.93 (1.58), 78 0.69 (1.57), 127LeBron James 3.09 (1.29), 6 3.18 (1.48), 2 2.78 (1.41), 4Nene Hilario 2.99 (1.38), 7 1.89 (1.52), 27 2.51 (1.45), 7Amir Johnson 2.92 (1.58), 8 2.49 (1.6), 8 2.26 (1.59), 15Anderson Varejao 2.75 (1.27), 9 1.15 (1.44), 58 1.27 (1.39), 59Ron Artest 2.64 (1.18), 10 2.27 (1.41), 14 1.91 (1.36), 29Ime Udoka 2.29 (1.29), 16 2.5 (1.53), 7 2.24 (1.52), 18Andrew Bogut 2.12 (1.47), 22 2.44 (1.63), 10 2.84 (1.62), 2Jeff Foster 2.09 (1.25), 24 2.23 (1.44), 16 2.48 (1.42), 9Kirk Hinrich 2.08 (1.26), 25 2.58 (1.45), 5 2.23 (1.43), 19Rashard Lewis 1.81 (1.2), 36 1.99 (1.41), 23 2.53 (1.34), 6Marko Jaric 1.45 (1.36), 54 2.62 (1.66), 4 2.81 (1.66), 3Ronald Murray 1.25 (1.22), 68 1.95 (1.38), 26 2.42 (1.35), 10Quinton Ross 1.22 (1.31), 74 2.25 (1.52), 15 2.54 (1.51), 5Jarvis Hayes 1.17 (1.25), 78 2.56 (1.45), 6 2.18 (1.43), 21Table 3: Centred defensive ratings showing mean (standard error), rank under model.9007–2009 data 2009 data 2007–2009 dataPlayer Name Model (1)–(2) Model (1)–(2) Model (2.4)LeBron James 10.17 (1.98), 1 8.14 (2.32), 1 9.25 (2.18), 1Dwyane Wade 7.6 (2.05), 2 6.41 (2.31), 3 6.61 (2.25), 3Kevin Garnett 7.18 (2.02), 3 4.19 (2.36), 13 4.16 (2.28), 12Chris Paul 6.7 (2.23), 4 5.46 (2.42), 4 5.41 (2.28), 5Steve Nash 6.46 (2.03), 5 4.48 (2.31), 9 4.06 (2.21), 14Kobe Bryant 6.45 (2.05), 6 4.31 (2.34), 11 5.02 (2.16), 6Lamar Odom 6.28 (1.85), 7 7.14 (2.13), 2 6.69 (2.04), 2Tim Duncan 5.72 (2.08), 8 2.62 (2.3), 42 2.7 (2.2), 41Dirk Nowitzki 5.57 (2.01), 9 3.19 (2.31), 28 3.45 (2.15), 25Rashard Lewis 5.41 (1.85), 10 5.27 (2.18), 5 5.57 (2.06), 4LaMarcus Aldridge 4.78 (2.02), 11 3.39 (2.28), 25 4.32 (2.19), 10Yao Ming 4.07 (1.98), 21 4.4 (2.2), 10 3.64 (2.12), 22Matt Bonner 3.8 (2.04), 25 4.85 (2.24), 6 4.48 (2.18), 8Ray Allen 3.71 (1.94), 27 4.64 (2.3), 8 3.65 (2.13), 21Danny Granger 3.31 (1.87), 36 4.16 (2.16), 14 4.88 (2.11), 7Nene Hilario 3.14 (2.14), 40 4.78 (2.39), 7 3.97 (2.26), 16Brandon Roy 3.07 (2), 42 4.15 (2.27), 15 4.36 (2.19), 9Table 4: Centred combined ratings showing mean (standard error), rank under model.posterior probability that a particular player is the best; this is most easily achieved by aMonte Carlo estimate, simulating directly from the Gaussian posterior. With 1000 drawsfrom this density based on the the 2007–2009 data, LeBron James had the highest rating on466 occasions, followed by Dwayne Wade (77) and Kevin Garnett (51), that is the respectiveposterior probabilities that these players were the best at the end of the 2009 season were0.47, 0.08 and 0.05. Using only the 2009 data, the posterior probability that the top playerwas LeBron James was 0.21; Lamar Odom, 0.11 and Dwayne Wade, 0.06. L e B r o n J a m e s D w y a n e W a d e K e v i n G a r n e tt C h r i s P a u l S t e v e N a s h K o b e B r y a n t L a m a r O d o m T i m D un c a n D i r k N o w i t z k i Dwyane Wade 0.82 · · · · · · · ·
Kevin Garnett 0.86 0.56 · · · · · · ·
Chris Paul 0.88 0.62 0.56 · · · · · ·
Steve Nash 0.91 0.66 0.6 0.53 · · · · ·
Kobe Bryant 0.91 0.66 0.6 0.53 0.5 · · · ·
Lamar Odom 0.92 0.68 0.63 0.56 0.53 0.52 · · ·
Tim Duncan 0.94 0.74 0.69 0.63 0.6 0.6 0.58 · ·
Dirk Nowitzki 0.95 0.76 0.71 0.65 0.62 0.62 0.6 0.52 · Rashard Lewis 0.96 0.79 0.74 0.67 0.65 0.65 0.63 0.55 0.52
Table 5: Posterior probability that player on top of table is stronger than players at the sideof the table. Computed using the 2007–2009 data.10
Player Statistics
As mentioned in the introduction, one common, but ad hoc , means of rating NBA players isby using their individual game statistics. In this section it will be considered whether the 2009game statistics can predict player ability and if so, which of the game–statistics are important.Consider estimating the offensive ability of player i , α i . Given covariate information (thegame statistics) for this player, X ij , a simple linear model would take the form, α i = (cid:88) j a j X ij + σ a (cid:15) i , (5)where (cid:15) i iid ∼ N (0 ,
1) and σ a > α i is unknown, however it can be estimated as described earlier using model(2.4) for example.The approach advocated here is therefore to replace α i in (5) by the estimate, ˆ α i , and to allowthe variance of each of the observations to vary to account for the relative precision of theestimated ˆ α i . This leads to a linear model of the form,ˆ α i = (cid:88) j a j X ij + S . E . ( ˆ α i ) ν i , (6)where ν i iid ∼ N (0 ,
1) and X ij corresponds to the value of the explanatory variable X j in Table 6for player i . The quantities, ˆ α i and S . E . ( ˆ α i ) , can be read directly from the estimated vector ofmeans and from the leading diagonal of the estimated covariance matrix. A similar model wasalso set up for estimating ˆ β i as a linear combination of the covariates, but with coefficients b j .Under model (6), the variance of ˆ α i is S . E . ( ˆ α i ) . Since this is different for each player, theestimates of the parameters a j are given by weighted least squares. Covariates were chosenvia backwards selection, starting from the saturated model, excluding independent variablesclearly not linked to the dependent variable (for example mean points scored was excludedfrom the defensive model). An intercept term was included in the model, though thecoefficient of this term is not reported as it does not affect rankings derived from the fittedvalues.Table 6 presents the results from these two linear models as well as listing the explanatory11tandardised Not StandardisedStatistic Off. (95% C.I.) Def. (95% C.I.) Off. Def. X field goal % 0.11 (0,0.22) · · X free throw % 0.12 (0.02,0.22) · · X turnovers/40 mins -0.22 (-0.32,-0.12) 0.12 (0.01,0.22) -0.45 0.05 X total rebounds/40 mins 0.14 (0.03,0.25) · · X assists/40 mins 0.35 (0.25,0.46) · · X points scored/40 mins 0.49 (0.4,0.59) · · X steals/40 mins · · α s) also have much better game statistics than the rest –consider the points in the top right corner of the plot. Of the ten game statistics considered,only steals, blocks and rebounds are potentially connected with defensive ability. There is norecorded information on, for example whether a fast–breaking team are contained by theforwards or guards. One possible problem with the fit of the defensive model may therefore bea lack of relevant covariate information.Assuming that a linear model is appropriate for these data, the results in this table give thecoefficients, a j and b j , that can be used to predict player ability. For example in the case ofthe offensive model, the predicted strength of player i is given by the expected value of ˆ α i under the model, E [ ˆ α i ] = (cid:88) j ∈{ ,..., } a j X ij , + + + + + + + + + + + + + + + + ++ + + + + ++ + + + + + + + + + + + + +++ + ++ + + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ ++ + + + + + ++ + + + + + + + + + ++ + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + +++ + + + + + + + + + + + + + + + + + ++ + + + + + ++ + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + ++ + + + + + + ++ + + + + + + + + + + ++ + + + + + ++ + + + + + + + + + + a ^ F i tt ed + + + + + + + + ++ + + + + + + + + + + + + + ++ + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + +++ ++ + + + + + ++ + ++ + + + + + + +++ + ++ + + + + ++ + + + + + + + + + + +++ +++ + + + + + + + ++ + + + + + + + ++ + + + + + + + + + + ++ + + + ++ + + ++ + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ +++ ++ + + + + + + + + ++ + + + + + + + + + + + ++ + + + + + ++ + + ++ + + ++ ++ + + + + + + ++ + + + + + + + + ++ + + + + + + + + + ++ + + + + + + + + + ++ + ++ + + + + + ++ + + + + + + + + + + ++ + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + −12 −11 −10 −9 −8 −7 −6 − . − . − . b ^ F i tt ed Figure 1: Plots of estimated ability against fitted values from the regression model (6). Themarkers are scaled according to the variance from (1)–(2), with lager markers having lessuncertainty from the original fit. The respective R values for the offensive/defensive fits were0 .
41 and 0 . The three main types of position on court have different functionality in their offensive anddefensive modes. For example, one aspect of the defensive ability of a guard is expertise inmanoeuvres that prevent effective progress of the offense up the court – they must me able tocontend with players dribbling the ball. The function of the guards and forwards on theoffense is also different: the former being more concerned with governing the general form ofan attack and obtaining field goals, and the latter with shooting from the side of the courtand obtaining rebounds (Ambler, 1979).The linear model (6) was fitted to subsets of the players (all guards and all forwardsseparately) with the aim of finding any differences in covariate choice for predicting offensiveability. Using backward selection as above, the standardised covariate effects with standarderrors in parenthesis are as follows. For the forwards, the most important ability were pointsscored, 0.47 (0.08); then assists, 0.44 (0.08); turnovers − .
29 (0.09); and lastly rebounds, 0.17130.08). For the guards the significant effects were points scored, 0.42 (0.07); personal fouls, − .
23 (0.07); then assists, 0 .
20 (0.06); and lastly field goals, 0.19 (0.07). The R values wererespectively 0.40 and 0.47 for the forwards and guards. It is interesting to note that thedifferent functions of the points and guards are reflected in the choice of covariates in thesemodels: for the forwards, the ability to rebound is identified as important, whereas for theguards, shooting accuracy is relevant. This article introduces a new method for estimating both the offensive and defensive ability ofNBA players and a justifiable way of conjoining this information to derive a combinedestimate of player utility. To the knowledge of the authors, the model presented here isunique in providing a structured means of updating player abilities between years. One of themost important findings here is that whilst using player game statistics and a simple linearmodel to infer offensive ability may be okay, the very poor fit of the defensive ratings modelsuggests that defensive ability depends on some trait not measured by the current range ofplayer game statistics.At the end of each season, the NBA presents players with awards including ‘Most ValuablePlayer’ (MVP), ‘Defensive Player of the Year’, ‘Rookie of the Year’ and ‘Most ImprovedPlayer’ (MIP); though there is no material prize for the winning player, the awards are highlyprestigious. The NBA employs a panel of sportswriters and broadcasters to rank their top five(in the case of the MVP award) or top three players. The individual rankings are combinedby weighting each with respect to the voted position eg 10 points for first place votes andrespectively 7, 5, 3, 1 for 2nd to 5th place (NBA, 2010) and summing these over voters.Although the sportswriters and broadcasters are undoubtedly experts in their field, therhetoric accompanying the announcement of the awards repeatedly refers to the muchpublished player game statistics NBA (2009). It is of interest to compare the results presentedhere with those of the NBA committee.The 2009 MVP according to model (2.4) was LeBron James (Cleveland Cavaliers) since hehas the highest estimated combined ability from Table 2, this is in agreement with the NBAdecision. Similarly the best defensive player in 2009 was Lamar Odom (Los Angeles Lakers),14ere the NBA presented the award to Dwight Howard, who was ranked 34th by the newmethod. In order to rank rookie players, the approach advocated here is to use the combinedratings ratings from the 2007–2009 data, since in this setting the ability of non–rookie playersis more accurately estimated by including historical data. The best rookie player here goes toRussell Westbrook of Oklahoma City Thunder (the award was given to Derrick Rose of theChicago Bulls, ranked 27th by the new model). To compute the most improved player is asimple case of comparing the combined estimates of ability obtained from successive historicalrankings, for example by using all available data up to 2008 and then all available data up toto 2009 – excluding rookies, the largest difference between the combined estimates thusobtained gives the most improved player. Using the new method the 2009 MIP was HakimWarrick of the Memphis Grizzlies (the NBA presented this award to Danny Granger of theIndiana Pacers, who was ranked 6th by the method described here).
References
Ambler, V. (1979).
Basketball: The Basics for Coach and Player . Faber and Faber.Associated Press (2009). http://espn.go.com/nba/recap?gameId=290219026 .Berri, D. J. (1999). Who is ‘most valuable’ ? measuring the player’s production of wins in theNational Basketball Association.
Managerial and Decision Economics 20 (8), 411–427.Entine, O. A. and D. S. Small (2008). The role of rest in the NBA home-court advantage.
Journal of Quantitative Analysis in Sports 4 (2).ESPN (2010). http://espn.go.com/nba/ .Harville, D. A. (2003, March). The selection and seeding of college basketball or footballteams for postseason competition.
Journal of the American Statistical Association 98 (461),17–27.Ilardi, S. and A. Barzilai (2008). .Kubatko, J., D. Oliver, K. Pelton, and D. T. Rosenbaum (2007). A starting point foranalyzing basketball statistics.
Journal of Quantitative Analysis in Sports 3 (3), 1.Macdonald, B. (2010). A regression–based plus–minus statistic for nhl players. http://arxiv.org/abs/1006.4310 .NBA (2009). .NBA (2010). .Rosenbaum, D. T. (2004). Measuring how NBA players help their teams win. Source: . 15pears, M. (2009). .Zak, T. A., C. J. Huang, and J. J. Siegfried (1979). Production efficiency: The case ofprofessional basketball.