Hockey Player Performance via Regularized Logistic Regression
HHockey Player Performance viaRegularized Logistic Regression
Robert B. Gramacy, Matt Taddy, and Sen Tian
A hockey player’s plus-minus measures the difference between goals scored by and againstthat player’s team while the player was on the ice. This measures only a marginal effect,failing to account for the influence of the others he is playing with and against. A betterapproach would be to jointly model the effects of all players, and any other confoundinginformation, in order to infer a partial effect for this individual: his influence on the boxscore regardless of who else is on the ice.This chapter describes and illustrates a simple algorithm for recovering such partial ef-fects. There are two main ingredients. First, we provide a logistic regression model that canpredict which team has scored a given goal as a function of who was on the ice, what teamswere playing, and details of the game situation (e.g. full-strength or power-play). Since theresulting model is so high dimensional that standard maximum likelihood estimation tech-niques fail, our second ingredient is a scheme for regularized estimation. This adds a penaltyto the objective that favors parsimonious models and stabilizes estimation. Such techniqueshave proven useful in fields from genetics to finance over the past two decades, and havedemonstrated an impressive ability to gracefully handle large and highly imbalanced datasets. The latest software packages accompanying this new methodology – which exploit par-allel computing environments, sparse matrices, and other features of modern data structures1 a r X i v : . [ s t a t . A P ] J a n are widely available and make it straightforward for interested analysts to explore theirown models of player contribution.This framework allows us to quickly obtain high-quality estimates for the full systemof competing player contributions. After introducing the measurement problem in Section1, we detail our regression model in Section 2 and the regularization scheme in Section3. The remainder of the chapter analyzes more than a decade of data from the NHL. Wefit and interpret our main model, based on prediction of goal scoring, in Section 4. Thisis compared to shot-based analysis, and metrics analogous to Corsi or Fenwick scores, inSection 5. Finally, Section 6 considers the relationship between our estimated performancescores and player salaries. Overall, we are able to estimate a partial plus-minus metricthat occurs on the same scale as plus-minus but controls for other players and confoundingvariables. This metric is shown to be more highly correlated with salary than the standard(marginal) plus minus. Moreover, we find that the goals-based metric is more correlatedwith salary than those based upon shots and blocked shots. We conclude in Section 7 withthoughts on further extensions, in particular by breaking out of the linear framework to useclassification models popular in the Machine Learning literature.The code for all empirical work in this chapter is provided to the public via a GitHubrepository ( https://github.com/TaddyLab/hockey ) and utilizes open source libraries for R [25], particularly the gamlr [28] package from [30]. Hockey is played on ice, but that’s not all that sets it apart from seemingly related sportslike soccer, basketball, or even field hockey. At least not from an analytics perspective. Theunique thing about hockey is the rapid substitutions transpiring continuously during play,as well as at stoppages in play. In the data sets we have compiled, which we discuss in moredetail shortly, the median amount of time observed for a particular on-ice player configuration(determined by unique players on the ice for both teams) is a mere eight seconds. Althoughmany “shifts” are much longer than that, a trickle of piecemeal substitutions on both sides,2ranspiring as play develops, makes it difficult to attribute credit or blame to players forsignificant events, such as goals or shots.Plus-minus (PM) is a traditional metric for evaluating player contributions in hockey. Itis calculated as the difference, for a given player, between the number of goals scored againstthe player’s team and those scored by the player’s team while that player was on the ice.For example, during the 2012-2013 season Stanley Cup Finals, between Boston and Chicago,Duncan Keith of the Chicago Blackhawks was on the ice for 8 goals by Chicago and 4 byBoston, giving him a +4 PM for the series.The PM score represents what statisticians call a marginal effect: the average change insome response (goals for-vs-against) with change in some covariate (a player being on the ice) without accounting for whatever else changes at the same time . It is an aggregate measurethat averages over the contributions of other factors, such as teammates and opponents. Forexample, suppose that the three authors of this chapter are added to the Blackhawks rosterand that Joel Quenville (the coach of the Blackhawks) makes sure that Duncan Keith is withus on the ice whenever we are playing. Since none of us are anything close to as good athockey as Keith is, and surely our poor play would allow the other team to score, this willcause Duncan Keith’s PM to drop. At the same time, our PMs will be much higher thanthey would be if we didn’t get to play next to Duncan Keith.Due to its simplicity and minimal data requirements, plus-minus has been a preferredmetric for the last fifty-odd years. But since it measures a marginal effect, the plus-minusis impacted by many factors beyond player ability, which is the actual quantity of interest.The ability of a player’s teammates, or the quality of opponents, are not taken into account.The amount of playing time is also not factored in, meaning plus-minus stats are muchnoisier for some players than others. Finally, goalies, teams, coaches, salaries, situations,special teams, etc. – which all clearly contribute to the nature of play, and thus to goals –are neither accounted for when determining player ability by plus-minus, and nor are theyused to explain variation in the goals scored against a particular player or team.Instead of marginal effects, statisticians are more often interested in partial effects : changein the expected response that can be accounted for by change in your variable of interest3 fter removing the change due to other influential variables.
In the example above, a partialeffect for Duncan Keith would be unchanged if he plays with the authors of this article orwith the current members of the Blackhawks. In each case, the partial effect will attempt tomeasure how Duncan Keith can influence the box-score regardless of with whom he skates.Because such partial effects help us predict how Keith would perform on a different team orwith a different combination of line-mates, this information is more useful than knowing amarginal effect.One way that statisticians can isolate partial effects is by running experiments . Supposethat now, instead of playing for the Blackhawks, we are coaching them. In order to figureout the value of Keith, we could randomly select different players to join him whenever heis on the ice and send completely random sets of players onto the ice whenever he is notplaying. Then, due to the setup of this randomized experiment, Keith’s resulting PM scorewill represent a partial effect – his influence regardless of who he plays with. Of course, noreal hockey coach would ever manage their team in this way. Instead, we hockey analystsmust make sense of observational data that is collected as the games are naturally played,with consistent line mates and offensive-defensive pairings and where Duncan Keith tendsto play both with and against the best players available.Partial effects are measured from observational data through regression : you model theresponse (e.g., goals) as a function of many influential variables ( covariates ; e.g., all ofthe players on the ice). With rich enough data, we can simultaneously estimate the fullset of competing partial effects corresponding to all of our influential variables. This isstraightforward when there are only a small number of covariates. However, the standardregression algorithms will fail when the number of covariates is large. This ‘high dimensionalregression’ setting occurs in hockey analysis, where we would like to regress ‘goals’ onto theset of variables corresponding to whether each NHL player is on the ice (a set of 2500players in our dataset) while also including effects of team, season, playoffs, and specialteams scenarios (e.g. power plays). Moreover, the covariate design is highly imbalanced :over the span of several seasons there may be tens of thousands of goals, but players playwith and against only a small fraction of other players and the number of unique playerconfigurations is relatively small. Due to the use of player lines, and consistent line match-4ps with opponents, where groups of two or three players are consistently on ice together atthe same time, the data contain many clusters of individuals who are seldom observed apart.Standard regression algorithms, such as maximum likelihood inference via Fisher scoring,will either massively over-fit (e.g. assign large effects to players who rarely play) or simplyfail to converge.However, there has been a tremendous improvement over the past two decades in the tech-niques available for high dimensional regression analysis. These advancements are driven bythe demands of researchers in genetics and finance, for example, for whom resolving partialeffects amongst large sets of variables is the key to their science. The most successful ap-proaches introduce some amount of regularization to the estimation problem – an additionalpenalty term that rewards simplicity (e.g., [13]). In our context, regularization shrinks to-wards a model where individual players don’t make a huge difference while still allowing forlarge estimated player effects when the data warrant it. This conforms to what most analystsalready believe: many players have a neutral, or “zero”, effect (relative to the NHL average),whereas some are stars and others are liabilities. The amount of regularization is chosen tomake the model perform as well as possible in out-of-sample prediction and, again, contem-porary statistical learning tools are designed to do exactly this – reliably predict the future.To take advantage of these tools, we need only to phrase partial player effect estimation asa regression problem.
The goal of our regression analysis will be to estimate a model that relates individual presenceon the ice to observable outcomes of interest. We describe the model here for a goals-basedanalysis, but extend it to shots and other metrics in the analysis sections.Previous attempts at partial player effect estimation range from standard linear regres-sion (usually on aggregate data) – the adjusted plus-minus scores of [1], [24], and [19] – tothe complex hazard model of [32], which proposes a proportional hazards process for gameevents, allowing partial player effects to be backed out from high resolution game data. Ad-5usted plus-minus is built from similar ideas for Basketball analysis (see [23] and [18]). Itslinear model analysis implies an underlying normality assumption for the error structure;this may be a good approximation for basketball, where scoring is frequent and variabil-ity in player configurations is small, but it is inappropriate for disaggregated data with abinary response (e.g., whether an individual goal is for-vs-against the home team). Suchmisspecification becomes especially problematic when combined with the modern regular-ization techniques necessary for reliable estimation of high dimensional models. On the otherhand, more complex stochastic process modeling requires many additional assumptions onthe data generating process and can be difficult to validate in practice; moreover, modelssuch as that of [32] take far longer to run than we wish for our analysis. Some other impor-tant contributions to estimating player ability and attributing that to team success include[27, 22, 20].The goal of our modeling is to provide a correct treatment of the binary ‘goals’ datawithout introducing significant additional modeling complexity. In particular, we advocatethe simple logistic regression framework suggested by [12]. In logistic regression, the averagelog odds of a goal being scored “for” a particular team is modeled as a linear function ofpredictor variables which may be comprised of an indicator of player configuration and otherquantities, which is otherwise identical to the familiar ordinary (least squares) regressionsetup. We provide a detailed description of the model here, but refer the reader to [26] orsimilar texts covering
Generalized Linear Models (GLMs) , of which logistic regression is aspecial case. The setup is rather straightforward, easy to extend, and highly interpretable.Estimated coefficients describe contributions to the log odds of goals, and we show that thesecan be converted back onto the scale of goals, resulting in a adjusted plus-minus statistic,but this time one which is a true partial effect.Given n goals throughout the National Hockey League (NHL) over some specified timeperiod, say y i is +1 for a goal by the home team and − away team. Say that q i = p( y i = 1) = p(home team scored goal i ) . The logistic regression model home and away are merely organizational devices, creating a consistent binary bifurcation for goals that can be appliedacross games, seasons, etc. Due to the symmetry in the logit transformation, player effects are unchanged when framing awayteam probabilities as q i rather than 1 − q i , so we loose no generality by privileging home team goals in this way.
6f player contribution is, for goal i in season s with away team a and home team h ,log (cid:20) q i − q i (cid:21) = α + u (cid:48) i γ + v (cid:48) i ϕ + x (cid:48) i β + ( x i ◦ s i ) (cid:48) ( β s + p i β p ) , (1)where • Vector u i holds indicators for each team-season (e.g., the Blackhawks in 2012-2013would correspond to a coordinate of u i ), set u it = +1 if team-season t was the hometeam for goal i , u it = − u it = 0 if team-season t was not onthe ice for goal i . This information is included to control for factors beyond the player’scontrol, such as quality of coaching and fan support. • Vector v i holds indicators for various special-teams scenarios (e.g., being short-handedon a penalty kill), again set v ik = +1 if the home team is in special-teams scenario k when goal i was scored, v ik = − k , and v ik = 0 ifneither team was in scenario k when goal i was scored. We consider 6 non-six-on-sixsettings (6v5, 6v4, 6v3, 5v4, 5v3, 4v3) and an additional ‘pulled goalie’ indicator; notethat more than 35% of the goals occur on some type of special teams scenario. • Vector x i contains player-presence indicators, set x ij = 1 if player j was on the hometeam and on ice for goal i , x ij = − j on ice for goal i , and x ij = 0 foreveryone not on the ice. With ◦ denoting the Hadamard (element-wise) product, thisplayer vector is also interacted with – season (e.g., 2013-2013) vector s i , with s ti = 1 if goal i was scored in season t , and – the post-season indicator p i for whether or not the goal was scored in the playoffs,with p i = 1 for the playoffs and zero for the regular season.By interacting players with seasons and with playoffs in this way, we have the potentialto differentiate player ability over time both within (regular v. post-) season(s) andacross seasons. There is potential for confounding with team–season effects u i however,as very few players change teams during a season.In this full specification the number of parameters, i.e., K = | α | + | φ | + | β | + | β s | + | β p | is on the order of the number of players, p , in the league spanning the seasons/games of7nterest. The exact number depends on modeling aspects, like the number of special teamsscenarios (i.e., constant in p ), and on quantities like the number of team-seasons which growmore slowly than p .To explain the coefficients and their interpretation, β j + β sj is the regular-season- s effectof player j on the log odds that, given a goal has been scored, the goal was scored by theirteam. Coefficient β j + β sj + β pj is the corresponding effect for post-season- s (note that,under the regularization scheme in the next section, β pj will be fixed at zero unless player j reaches the playoffs). These effects are ‘partial’ in that they control for who else was onthe ice, special teams scenarios, and team-season effects – a player’s β j or β sj only need benonzero if that player effects play above or below the team average for a given season. Atest of understanding: what does the intercept α represent in (2)? For intuition, consider a simple “player-only” version of our model that has only who-is-on-the-ice as a time-invariant influence on goal scoring. This is the version of the modelthat was applied in [12]. Then there are no team-season-specific intercepts ( α sh = α sa = 0),no special teams effects ( φ = ), and no season-specific player-effect changes ( β s = and β p = ) so that β j = β j is the constant effect of player j . The log odds that the home teamhas scored a given goal becomelog (cid:20) q i − q i (cid:21) = α + β h i + · · · + β h i − β a i − · · · − β a i , (2)where the subscripts on the coefficients β are as follows: h i , . . . , h i are the six players onthe ice for the home team and a i , . . . , a i indicate the players for the away time. This isthe just a re-writing of x (cid:48) i β from (1), where the vector x i (of length equal to the number ofplayers) contains the “+1” and “ −
1” indicators depending on whether that player was on thehome or away team, and where all other x ij are zero so that (cid:80) j | x ij | = 12 for full-strengthplay. See Figure 1 for illustration.The model in 1 is simple and transparent; if you wish to control for new variables orsituations you just need to add covariates to the logistic regression. In theory, one could fit It is the home ice advantage: if you do not know anything about who is playing or on the ice, the odds are e α higher thatthe home team has scored any goal. In this setup the goalies are included in the calculations, unlike with plus-minus . : scoring team X P : players y i ∈ {− , } x P ij ∈ {− , , } − − − − − · · · n p ... ... − − − − · · · − − − R by typing R> fit <- glm(y~X, family="binomial")
But unfortunately, that just doesn’t work. The problem is that hockey data is too highdimensional (too many covariates) for ordinary logistic regression software, and the designmatrices are too highly imbalanced to obtain meaningful (low variance) estimates of playereffects. In almost all regressions one is susceptible to the temptations of rich modeling whenthe data set is large, and our hockey setup is no different. One must be careful not to over-fit , wherein parameters are optimized to statistical noise rather than fit to the relationshipof interest. And one must be aware of multicollinearity , where groups of covariates arecorrelated with each other making it difficult to identify individual effects, as happens whenplayers are grouped into lines.A first approach to finding a remedy might be to entertain stepwise regression, e.g.,via step in R with a stopping rule based upon an information criteria like AIC and BIC.But that also doesn’t work on this data: the calculations take days, and turn up very fewnon-zero predictors (i.e., players whose presence have any effect on goals). The troublehere is that players can’t be judged on their own, since they almost always play with andagainst eleven others. Therefore the one-at-a-time judgments made by step fail to discover9any relevant players despite making a combinatorially huge number of such comparisons.Moreover, stepwise regression results are well known to be highly variable: tiny jitter to thedata can lead to massive changes in the estimated model. The combined effect is an unstablealgorithm that yields overly simple results and takes a very long time to run.Instead, a crucial contribution of [12] is to suggest the use of modern penalized andBayesian logistic regression models, which biases the estimates of player effects towardszero. In the next section we consider one fast and successful version of these methods: L regularization. Our solution is to take a modern regularized approach to regression. If η i = log[ q i / (1 − q i )]is our linear equation for log odds from (1), then the usual maximum likelihood estimationroutine (e.g., via glm in R ) minimizes the negative log likelihood objective for n goal events l ( η ; y ) = n (cid:88) i =1 log (1 + exp[ − y i η i ]) . (3)Instead, a regularized regression algorithm will minimize a penalized objective, say for exam-ple l ( η ; y ) + nλ (cid:80) pj =1 [ c j ( | β j | ) + c ( | β sj | )], where λ > c j ( · ) are coefficient cost functions, while n is the total number of goals and p is the num-ber of players. A few common cost functions are shown in Figure 2. Those that have anon-differentiable spike at zero (all but ridge) lead to sparse estimators, meaning that manycoefficients are set to exactly zero. The curvature of the penalty away from zero dictates theweight of shrinkage imposed on the nonzero coefficients: L costs increase with coefficientsize, lasso’s L penalty has zero curvature and imposes constant shrinkage, and as curvaturegoes towards −∞ one approaches the L penalty of subset selection.In this article we focus on the L penalty for its balance between shrinkage of large signals(players tend not to have huge effects) and a preference for sparsity (we can only measurethe nonzero effects of a subset of players). For these and many appealing theoretical reasons10 ridge beta be t a ^ -20 lasso beta ab s ( be t a ) -20 elastic net beta ab s ( be t a ) + . * be t a ^ -20 . . . log beta l og ( + ab s ( be t a )) Figure 2: From left to right, L ‘ridge’ costs [14], L ‘lasso’ [33], the ‘elastic net’ mixture of L and L [35], and the log penalty [3].(and for computational tractability), the L penalty is by far the most commonly used incontemporary regularized regression; see [13] for a broad-audience overview and [30] fordetails on the algorithms used in this chapter. Under lasso L penalization, estimation forthe unknown parameters in our particular hockey model (1) proceeds through optimizationof l ( η ; y ) + nλ p (cid:88) j =1 ( | β j | + | β sj | + | β pj | ) . (4)It is important to note there that we are penalizing only the player effects . The team-season effects ( γ ) are unpenalized. This strategy of combining penalized and unpenalizedestimation is advocated in, e.g., [29] and [7]. It works nicely whenever you have a subset ofcovariates for which there is strong data signal (many repeated observations, which we havefor team-season and special teams effects) and whose effect you’d like to completely removefrom estimation for other coefficients. In this way, we ensure that the player effect estimatesare not polluted by confounding effects in u and v .Moreover, consider the covariates on β , β s , and β p in (1): for the latter two, x i interactswith additional binary indicators, such that β j acts on more nonzero terms than β sj , whichitself acts on more nonzero terms than β pj . Thus there is less signal associated with theseason and playoff player effect innovations than with the player baseline effects, so thatthese will tend to be estimated at zero unless there is significant evidence that a player hasbecome better or worse across seasons or in a given post-season.11enalty size, λ , acts as a squelch : canceling noise to focus on the true input signal. Large λ lead to very simple model estimates, while as λ → λ , practical application of penalized estimationrequires a regularization path : a p × T field of ˆ β estimates obtained while moving from highto low penalization along λ > λ . . . > λ T . These paths begin at λ set to infimum λ suchthat (4) is minimized at ˆ β = , and proceed down to some pre-set λ T (e.g., λ T = 0 . λ ).A common tool for choosing the optimal λ – that for which we report estimated playereffects – is cross validation (CV). In CV, the path of coefficients is repeatedly fit to datasubsamples and used to predict the response on the left-out data. The λ leading to minimumerror is then selected as optimal. In this chapter, we instead use an analytic alternative toCV that yields models that perform as well or better out-of-sample. The corrected AkaikeInformation Criterion (AICc), proposed in [17], is defined as AICc = 2 n (cid:88) i =1 l ( ˆ η λ ; y ) + 2 knn − k − , where ˆ η λ are the estimated log odds under penalty λ and k ≤ K is the number of non-zeroestimated coefficients (likely far fewer than the total number of parameters K , as manyplayers cannot be distinguished from the league average) at this penalty. See [30] and [6]for details on AICc selection in this context; we find AICc preferable to CV because it iscomputationally efficient (you only need to optimize once) and because there is no randomMonte Carlo variation – it always gives the same answer on the same data. However, all ofour ideas here apply if you wish to use CV selection instead.The gamlr package [28] can be called via R to implement this procedure: R> fit <- gamlr(X, Y, standardize=FALSE, family="binomial")
For CV, just replace gamlr with cv.gamlr . The standardize=FALSE flag tells gamlr to not weight the coefficient penalties by the standard deviation of the corresponding covariate(i.e., to use penalty λ | β j | instead of λ sd( x j ) | β j | ); this is appropriate here because suchstandardization would up-weight the influence of players who rarely play (and have low12d( x j )) relative to those who have a lot of ice time (and thus high sd( x j )). The softwareexploits sparsity in our player effects ( X ) via the Matrix library for R , and is extremely fast torun: no examples in this article require more than a few seconds of computation. Estimatedcoefficients at optimal λ are available as coef(fit) .One natural way to understand regularized regression is through the lens of Bayesianposterior inference. Judiciously chosen prior distributions lend stability to the fitted model,which is crucial in contexts where the number of quantities being estimated is large. Inour setting where larger β -values indicate large positive or negative contributions to playerability, it makes sense to choose a prior that encourages coefficients to center around zero,a so-called shrinkage prior. Our a priori belief is that most players are members of therank-and-file: their contribution to goals is neutral (e.g., zero on the log-odds scale), andthat only a handful of stars (and liabilities) have a strong contribution to the chances ofscoring (or letting in) goals. From the perspective of point estimation, adding a prior on β j centered at zero is equivalent to adding a penalty term for β k (cid:54) = 0 in our objective function.Different choices of priors correspond to different penalty functions on β j (cid:54) = 0; a Laplaceprior distribution on each β j corresponds to our L penalty in (4). The posterior densityis the product of likelihood and prior, and on the log scale that product becomes a a sum.So maximizing the posterior to obtain posterior modes is equivalent to maximizing the loglikelihood plus a penalty term which is the log of the prior. Conversely, minimizing (4) maybe interpreted as Bayesian posterior maximization. This section attempts to quantify the performance of hockey players using data from theNHL. It extends the analysis in [12], which used a smaller dataset, assumed constant playereffects, and did not control for team-season effects. The data, downloaded from , comprise of play-by-play NHL game data for regular and playoff gamesduring 11 seasons of 2002-2003 through 2013-2014 . The data capture all signifigant eventsin every single game, such as change, goal, shot, blocked shot, miss shot, penalty and etc. Season 2004-2005 was a lockout that resulted in a cancellation p = 2439 players involved in n = 69449 goals.The analysis proceeds through estimation of the model from (1),log (cid:20) q i − q i (cid:21) = α + u (cid:48) i γ + v (cid:48) i ϕ + x (cid:48) i β + ( x i ◦ s i ) (cid:48) ( β s + p (cid:48) i β p ) , by minimization of the implied penalized deviance in (4). The estimated player coefficientsfor each season s are then available as β j + β sj , the combination of a baseline effect plusa season-specific innovation. These effects represent the estimated change in the log oddsthat, given a goal was scored, the goal was scored by player j ’s team.The estimated effects – our β j and β sj – might be tough for non-statisticians to interpret.One option is to translate from the scale of log-odds to that of probabilities. In particular,we define the ‘partial for-%’ functional asPFP sj = (1 + exp[ − β j − β sj ]) − . (5)This feeds the player effects through a logit link to obtain the probability that, given a goalwas scored in season s , it was scored by player j ’s team, if we know nothing else other thanthat player j was on the ice . It lives on the same scale as the commonly used for-% (FP)statistic: the total number of events by a given player’s team divided by the total numberof events by either team, while that player was on the ice. Like PM, FP is a marginal effectthat does not account for who else was on the ice and other confounding factors. Hence,PFP is the partial effect version of FP.An important feature of the standard PM statistic that differs from both for-% and β orPFP is that it – in a limited sense – accounts not only for player ability but also the amountthat they play. For example, a player with a very high PM must both perform well and maintain this level of performance over an extended period of time (assuming that you needto be on the ice a long time to be on the ice for many goals). Conversely, similarly estimated β values for two players might hide the fact that one of these two logs much more ice timeand is thus more valuable to the team. It is therefore also important to be able to translate Note that, due to the role of the penalty in our regularized estimation scheme, players with little ice time tend to havetheir effect estimated at zero; thus, the difference between β or PFP and the PPM statistics should be less dramatic than partial plus-minus (PPM). Suppose player j was on the ice for g sj total goals (for or against) during season s ;then the PPM is definedPPM sj = g sj PFP sj − g sj (1 − PFP sj ) = g sj (1 − sj ) . (6)Just as PFP is is the partial effect version of FP, PPM is the partial effect analogue to PM’smarginal effect.Table 1 provides a list of top and bottom players listed by their PPM, along with thecorresponding β effects and their standard PM. Since these PPMs and effects are calculatedfor each season, players will occur repeatedly in the table; for example, Sidney Crosby has 4of the top 10 best player-seasons since 2002; he has been consistently the best, or near best,player in the league. The number one player-season since 2002 by PPM is Peter Forsbergin 2002-2003, with a PPM of 55.5. This is around 25% better than the 2nd best PPM:Crosby’s 43.5 in 2009-2010. These tabulated effects are all calculated based upon regularseason performance alone. However, for all 10,000 player seasons, using goals data, we neversee enough signal to conclude that a given player was significantly better or worse in thepost-season than in the regular season. That is, the ˆ β pj are all zero and we have no evidenceof ‘clutch’ players who improve their play in the post-season. At the same time, many ofthe β sj are estimated at nonzero values: there is measurable signal indicating that playerperformance changes across seasons.The ranking in Table (1) differs dramatically from those in [12]. This occurs becausewe’re now controlling for additional non-player confounding factors (e.g., coaching throughteam-season effects) and we allow the player performance to change over time rather than befixed at a single ‘career’ value. To help intuition on why such control and model flexibilityis useful, we note that Sidney Crosby’s β effects drop significantly (he falls out of the top5 for any season) if you do not control for the special teams effects. This occurs becausehe spends a lot of time on the penalty kill (short-handed), which makes it easier for him toget scored upon through no fault of his own. As another example, many of the goalies have the difference between FP and PM statistics. In a fully Bayesian analysis, such as the reglogit approach discussed in ourconclusion, one would be able to separate posterior uncertainty about β from the issue of the number of goals for which a playeris on ice; for example, the reglogit approach yields posterior uncertainty over a player’s PPM. oal-based performance analysis Rank Player Season Team PFP FP PPM PM1 PETER FORSBERG 2002-2003 COL 0.68 0.77 55.52 852 SIDNEY CROSBY 2009-2010 PIT 0.60 0.64 43.47 603 DOMINIK HASEK 2005-2006 OTT 0.59 0.67 42.45 804 SIDNEY CROSBY 2008-2009 PIT 0.60 0.61 42.26 485 SIDNEY CROSBY 2005-2006 PIT 0.60 0.62 41.86 526 PETER FORSBERG 2005-2006 PHI 0.68 0.77 40.67 617 PAVEL DATSYUK 2007-2008 DET 0.60 0.72 39.49 878 PAVEL DATSYUK 2008-2009 DET 0.60 0.67 39.49 699 SIDNEY CROSBY 2006-2007 PIT 0.60 0.72 35.62 7910 MARK STREIT 2008-2009 NYI 0.59 0.56 35.08 2411 MATT MOULSON 2011-2012 NYI 0.60 0.61 34.92 3712 LUBOMIR VISNOVSKY 2010-2011 ANA 0.58 0.66 34.52 7013 ALEX OVECHKIN 2008-2009 WAS 0.57 0.66 34.46 8014 JOE THORNTON 2009-2010 SJS 0.60 0.65 33.91 5215 JOE THORNTON 2010-2011 SJS 0.60 0.64 33.91 4816 ONDREJ PALAT 2013-2014 TAM 0.64 0.66 32.75 3717 PAVEL DATSYUK 2006-2007 DET 0.60 0.71 32.61 7018 JOE THORNTON 2002-2003 BOS 0.60 0.64 32.17 4719 JOE THORNTON 2007-2008 SJS 0.60 0.71 32.17 6920 ANDREI MARKOV 2007-2008 MON 0.57 0.60 31.9 4721 PETER FORSBERG 2003-2004 COL 0.68 0.72 31.47 3922 JOE THORNTON 2008-2009 SJS 0.60 0.67 31.21 5623 PETER FORSBERG 2006-2007 PHI 0.68 0.68 31.12 3224 PAVEL DATSYUK 2005-2006 DET 0.60 0.74 30.85 7525 ROBERT LANG 2003-2004 WAS 0.60 0.66 30.8 5010184 PATRICK LALIME 2008-2009 BUF 0.43 0.44 -15.79 -1510185 JACK JOHNSON 2007-2008 LOS 0.45 0.39 -15.82 -3410186 BRETT CLARK 2011-2012 TAM 0.44 0.35 -16.93 -4710187 NICLAS HAVELID 2008-2009 ATL 0.45 0.39 -16.97 -4010188 JACK JOHNSON 2010-2011 LOS 0.45 0.53 -17.21 910189 JACK JOHNSON 2011-2012 LOS 0.45 0.5 -17.21 -110190 P. J. AXELSSON 2008-2009 BOS 0.41 0.49 -17.35 -110191 BRYAN ALLEN 2006-2007 FLA 0.45 0.45 -17.9 -1710192 JACK JOHNSON 2009-2010 LOS 0.45 0.49 -19.46 -410193 PATRICK LALIME 2005-2006 STL 0.43 0.40 -19.77 -2910194 ALEXANDER EDLER 2013-2014 VAN 0.37 0.27 -20.49 -3510195 PATRICK LALIME 2007-2008 CHI 0.43 0.49 -22.29 -410196 TIM THOMAS 2009-2010 BOS 0.43 0.46 -24.22 -1610197 ANDREJ MESZAROS 2006-2007 OTT 0.42 0.48 -27.32 -610198 BRYCE SALVADOR 2008-2009 NJD 0.35 0.37 -34.4 -3110199 PATRICK LALIME 2002-2003 OTT 0.43 0.58 -37.81 4710200 PATRICK LALIME 2003-2004 OTT 0.43 0.56 -37.81 3710201 NICLAS HAVELID 2006-2007 ATL 0.34 0.44 -62.64 -2210202 NICLAS HAVELID 2005-2006 ATL 0.33 0.40 -65.94 -4110203 JAY BOUWMEESTER 2005-2006 FLA 0.33 0.42 -69.62 -32
Table 1: Top-25 and bottom-20 player-seasons when ranked by their regular-season PPM.16arge PPM if you do not control for team-season effects; since the goalie is almost always onthe ice, they act as a surrogate for aggregate team performance unless you explicitly controlfor it (unfortunately for Patrick Lalime, there is still enough variation at goal to measurethe effect and PPM for some goalies).Another change from [12] is that we are ranking players here by PPM rather than by β ;as described above, this rewards those with more ice-time. For comparison, Table 2 ranksplayers by their PFP (which is equivalent to ranking by β ); the table includes both the goal-based metrics from this section and the shot-based metrics from our next section. WhilePFP and PPM are clearly related quantities, we do see some major differences. For example,Tyler Toffoli (ranks 9 and 10 by goal-based PFP) was a breakout star in 2013-2014 with theLos Angeles Kings; this was his first full season, after playing only a portion of 2012-2013 inthe NHL. As a rookie, his ice time was relatively limited; however he clearly has talent andthis is reflected in his β and PFP but less in his PPM. On the other hand, players rankedat the bottom by PPM in Table 1 are those who have a negative β and get a large amountof ice-time. There are many players who have lower PFPs than Jack Johnson’s 0.45 (e.g.,John McCarthy at 0.38 and Thomas Pock at 0.40), but they do not get to play as much andthus don’t show up in our bottom 20. The analysis above is built around the event of a ‘goal’; this is the most reasonable baselineanalysis, as it removes any subjectivity about whether or not the statistics are related toteam performance – you score more you win. However, it has recently become popular inhockey analysis to consider alternative metrics that are built from shots and other events;see [34] for a review. The most popular of such statistics is Corsi, which counts the numberof events that are goals, shots on goal, missed shots, or blocked shots. Fenwick is anotherstatistic; it is Corsi but without counting blocked shots. Although we have seen no evidencethat Corsi or Fenwick events are more useful in predicting team performance than goal-basedmetrics, they do offer a big advantage to the statistician: they lead to a larger sample size,17
FP player rankings goal-based Corsi-basedRank Player Season Team PFP Player Season Team PFP1 PETER FORSBERG 2002-2003 COL 0.68 DAVID VAN DER GULIK 2010-2011 COL 0.642 PETER FORSBERG 2005-2006 PHI 0.68 DAVID BOOTH 2012-2013 VAN 0.633 PETER FORSBERG 2003-2004 COL 0.68 DANIEL SEDIN 2012-2013 VAN 0.624 PETER FORSBERG 2006-2007 PHI 0.68 ALEXANDER SEMIN 2003-2004 WAS 0.615 PETER FORSBERG 2007-2008 COL 0.68 DANIEL SEDIN 2010-2011 VAN 0.606 PETER FORSBERG 2010-2011 COL 0.68 MIKHAIL GRABOVSKI 2010-2011 TOR 0.607 ONDREJ PALAT 2013-2014 TAM 0.64 DANIEL SEDIN 2007-2008 VAN 0.608 ONDREJ PALAT 2012-2013 TAM 0.64 DANIEL SEDIN 2008-2009 VAN 0.609 TYLER TOFFOLI 2013-2014 LOS 0.63 DANIEL SEDIN 2011-2012 VAN 0.6010 TYLER TOFFOLI 2012-2013 LOS 0.63 PATRIK ELIAS 2010-2011 NJD 0.6011 VINCENT LECAVALIER 2006-2007 TAM 0.61 SIDNEY CROSBY 2013-2014 PIT 0.6012 VINCENT LECAVALIER 2003-2004 TAM 0.61 DANIEL SEDIN 2009-2010 VAN 0.6013 SIDNEY CROSBY 2009-2010 PIT 0.60 JUSTIN WILLIAMS 2010-2011 LOS 0.6014 SIDNEY CROSBY 2008-2009 PIT 0.60 DANIEL SEDIN 2013-2014 VAN 0.6015 SIDNEY CROSBY 2005-2006 PIT 0.60 PATRIC HORNQVIST 2013-2014 NSH 0.6016 PAVEL DATSYUK 2007-2008 DET 0.60 PAVEL DATSYUK 2012-2013 DET 0.6017 PAVEL DATSYUK 2008-2009 DET 0.60 ALEX STEEN 2011-2012 STL 0.6018 SIDNEY CROSBY 2006-2007 PIT 0.60 BRAD RICHARDSON 2011-2012 LOS 0.6019 MATT MOULSON 2011-2012 NYI 0.60 ERIC FEHR 2008-2009 WAS 0.6020 JOE THORNTON 2009-2010 SJS 0.60 TYLER TOFFOLI 2013-2014 LOS 0.60
Table 2: Top 20 player-seasons by goal and Corsi-based PFP.so that you can hopefully better identify the competing influences of different players andconfounding factors. Our data contain n c = 1 , ,
679 Corsi events and n f = 1 , , n g = 69449 goals.The standard way to report Corsi and Fenwick for a given player is as the for-% (FP)described above. Again, since the FP score does not reward players for the amount oftime that they spend on the ice, we also consider both Corsi and Fenwick versions of theplus-minus statistic. Of course, all of these statistics – Corsi-FP, Corsi-PM, etc. – measuremarginal effects. They are thus subject to the same criticisms as the original PM: they failto control for the influence of other players and confounding factors, and are thus less usefulthan a partial effect for predicting and measuring player performance. However, we can applythe exact same regression analysis that we’ve used above for goal events to derive partial versions of the Corsi and Fenwick statistics: simply replace y i with a response calculatedfrom Corsi or Fenwick events. For example, a Corsi regression applies the model as in (1)but for response y i = +1 if the event was a Corsi event (shot, goal, blocked shot) by thehome team and y i = − for-% and18 orsi-based performance analysis Rank Player Season Team PFP FP PPM PM1 DANIEL SEDIN 2010-2011 VAN 0.60 0.65 615.14 8762 ERIC STAAL 2008-2009 CAR 0.58 0.59 605.41 6193 MIKHAIL GRABOVSKI 2010-2011 TOR 0.60 0.57 597.05 4654 JOE THORNTON 2011-2012 SJS 0.59 0.61 596.37 7425 ALEX OVECHKIN 2009-2010 WAS 0.59 0.66 575.72 10476 DANIEL SEDIN 2007-2008 VAN 0.60 0.63 562.11 6857 DANIEL SEDIN 2008-2009 VAN 0.60 0.62 547.83 6808 RYAN KESLER 2010-2011 VAN 0.58 0.59 530.05 6499 SIDNEY CROSBY 2009-2010 PIT 0.57 0.62 517.86 81510 DANIEL SEDIN 2011-2012 VAN 0.60 0.67 510.16 88011 HENRIK ZETTERBERG 2011-2012 DET 0.58 0.60 497.04 59612 CLAUDE GIROUX 2010-2011 PHI 0.58 0.56 487.53 34713 ZACH PARISE 2008-2009 NJD 0.58 0.64 486.45 84314 JOE THORNTON 2010-2011 SJS 0.58 0.60 482.72 64715 ALEX STEEN 2010-2011 STL 0.59 0.61 475.5 56116 LUBOMIR VISNOVSKY 2010-2011 ANA 0.56 0.56 474.91 44617 ERIC STAAL 2010-2011 CAR 0.56 0.56 473.92 41518 JUSTIN WILLIAMS 2011-2012 LOS 0.59 0.63 471.53 71719 ALEX OVECHKIN 2007-2008 WAS 0.56 0.65 463.14 109420 PATRIK ELIAS 2010-2011 NJD 0.60 0.60 461.75 46121 SIDNEY CROSBY 2013-2014 PIT 0.60 0.61 459.92 48022 DUSTIN BYFUGLIEN 2010-2011 ATL 0.56 0.60 456.04 70523 JAROMIR JAGR 2007-2008 NYR 0.58 0.65 455.78 91124 ALEX OVECHKIN 2008-2009 WAS 0.56 0.64 455.46 106525 JASON BLAKE 2008-2009 TOR 0.58 0.55 454.7 27810605 MIKE COMMODORE 2008-2009 CBS 0.43 0.42 -447.91 -53710606 SCOTT HANNAN 2011-2012 CGY 0.42 0.40 -451.04 -59110607 CHRIS PHILLIPS 2007-2008 OTT 0.43 0.40 -454.09 -64410608 JAY BOUWMEESTER 2005-2006 FLA 0.44 0.46 -457.01 -30510609 KARLIS SKRASTINS 2008-2009 COL 0.43 0.38 -457.54 -75410610 KARLIS SKRASTINS 2009-2010 DAL 0.42 0.39 -464.49 -65510611 MATTIAS OHLUND 2008-2009 VAN 0.42 0.47 -465.36 -21210612 MATTIAS OHLUND 2006-2007 VAN 0.43 0.48 -470.03 -14710613 SCOTT HANNAN 2008-2009 COL 0.43 0.38 -478.83 -78810614 DOUGLAS MURRAY 2009-2010 SJS 0.42 0.47 -486.16 -18410615 SCOTT HANNAN 2007-2008 COL 0.42 0.42 -507.7 -50410616 FILIP KUBA 2011-2012 OTT 0.42 0.49 -509.74 -7710617 NICLAS HAVELID 2007-2008 ATL 0.41 0.35 -516.86 -88310618 JOHNNY ODUYA 2008-2009 NJD 0.42 0.51 -522.4 5110619 DOUGLAS MURRAY 2010-2011 SJS 0.40 0.48 -540.83 -11710620 DION PHANEUF 2006-2007 CGY 0.42 0.49 -552.69 -4210621 NICLAS HAVELID 2008-2009 ATL 0.40 0.40 -562.65 -60410622 SERGEI GONCHAR 2006-2007 PIT 0.42 0.52 -586.55 17410623 PAUL MARTIN 2008-2009 NJD 0.39 0.55 -695.83 28310624 BRYCE SALVADOR 2008-2009 NJD 0.32 0.42 -912.17 -407
Table 3: Top 25 and bottom 20 players by Corsi-based PPM.19lus-minus formulas of (5-6) can similarly be applied to obtain Corsi-PPF and Corsi-PPMvalues.The results for regular season Corsi-based performance analysis are in Table 3 and on theright side of Table 2. Comparison to the goal-based rankings shows a distinctly different setof players are at both the top and bottom. For example, Daniel Sedin is a prominent playerwho ranks highly in multiple seasons under Corsi-PPM but does not appear in the top-20for goals-PPM (his best goals-PPM is a still respectable 19.45 in 2010-2011, which ranks152 nd across all player-seasons). At this point we are not looking to argue for either the goalor Corsi based metrics as ‘best’; however, the fact that they do differ dramatically shouldbe a bit troubling for those who wish to focus exclusively on Corsi statistics (since onlygoal differentials dictate who wins the game). In the next section, we consider a comparisonbetween all of our metrics and an outside measure of player quality: salary. In our final analysis section, we consider how our partial effect statistics relate to the market value of a player’s worth as represented by their annual salary. The salary numbers areobtained through a combination of the databases maintained at blackhawkzone.com and hockeyzoneplus.com ; we are able to obtain annual salaries for 80% of the player-seasons inour dataset, including almost all player-seasons where the player was on ice for more thana couple of goals. The histograms in Figure 3 show, for the salary distribution over all 11seasons, how the estimated player effects are distributed between negative, neutral (zeros),and positive. Results are shown for both Goal and Corsi based regressions. In both cases,the ratio of positive to negative effects increases with salary. The main difference betweenthe two plots is that fewer players have zero estimated effects under Corsi – this is a resultof its much larger event sample.Figure 4 takes a deeper look at the relationship between salary and performance for two ofour goal-based metrics: the goal plus-minuses (PM and PPM) and for-percentages (FP andPFP). For each metric, we use nonparametric Bayesian regression to fit expected log salary as20 alary (USD, million) F r equen cy (a) Goals-based salary (USD, million) F r equen cy players with positive player−effectsplayers with zero player−effectsplayers with negative player−effects (b) Corsi-based Figure 3: Distribution of estimated player effects and salaries over all 11 seasons.a function of that metric. In particular, we apply the tgp package [11, 8] to obtain posteriormeans for the Bayesian regression trees of [4]. The trees are fit to salary and performancedata that has first been aggregated by player, such that these surfaces represent the expectedlog average salary (per player) conditional upon their average performance across the yearsthat they played. This aggregation is done to minimize dependence between observations,and because we assume that a player’s salary is not determined by a single season.The surfaces in 4 expose some interesting differences between the partial and marginalperformance statistics. In the plus-minus case (PM and PPM), the two curves are similar –they both show little relationship between salary and performance for negative plus-minus,while salaries rise with positive performance. However, there is a larger jump up from zero forPPM than for PM. This occurs because the regression estimation only assigns a positive non-zero effect for statistically significant performances, so that players with small but positivePM values have their PPM shrunk to zero. The difference between FP and PFP is moredramatic. In the case of marginal FP, the salaries are highest for players with with FP inthe middle of the range (near and above 0.5). This occurs because FPs outside of that rangeoccur only for players with little ice-time. In contrast, the PFP relationship with salary isvery simple: salaries are low for PFP below 0.5, and high above that number. If you makeit more likely than not that a goal-scored is a goal-for, you can expect to make more money.Finally, we close with a look at the ‘highest value’ players in the 2013-2014 season: those21
30 −20 −10 0 10 20 30 − . . . . . plus minus l og ( s a l a r y i n m illi on s ) partialsample 0.0 0.2 0.4 0.6 0.8 1.0 − . . . . . for−probability l og ( s a l a r y i n m illi on s ) Figure 4: Nonparametric regression for log average salary (per player) onto their averageperformance metric: PPM, PM, PFP, and FP.for whom their salary was low relative to their goal-based PPM. Table 4 lists the top 20players by goal-PPM/salary – that is, the top players measured by their goals-added-per-dollar. The cheapest player contribution, by a massive margin, is from Ondrej Palat ofTampa Bay. Palat was an inexpensive 7th round draft pick in 2011, making $500 thousandper year. After spending two seasons in the minors, he moved up to the NHL in 2013-2014 and had a season good enough to be nominated for rookie-of-the-year. The Lightningre-signed Palat to a new contract in 2014; he now averages around $3.5 million per year.
We have provided a sketch for how modern techniques in regularized logistic regression,developed originally to address challenging large-scale problems in genetics, finance, andtext mining, can be used to calculate partial player effects in hockey. We have arguedthat such partial effects are a better measure of player ability compared to the classic plus-minus statistic, and have the benefit of being interpretable on the same scale as plus-minus.We have shown how the framework is flexible, allowing one to control for many aspects ofsituational play (special teams, overtime, playoffs), and personnel/season (coaches, salaries,season-years). A comparison was provided to the popular Corsi and Fenwick alternatives toplus-minus, and we argued that a recent emphasis in the literature on shots (and blocked22 ank Player Team Goals per million1 ONDREJ PALAT TAM 58.272 RYAN NUGENT-HOPKINS EDM 19.813 GABRIEL LANDESKOG COL 16.744 TYLER TOFFOLI LOS 16.725 GUSTAV NYQUIST DET 9.086 JADEN SCHWARTZ STL 8.437 ERIC FEHR WAS 7.518 ANDREW MACDONALD NYI 7.489 BENOIT POULIOT NYR 6.4310 BRAD BOYES FLA 6.0111 TOMAS TATAR DET 5.8312 AL MONTOYA WPG 5.7913 BRANDON SAAD CHI 5.514 FRANS NIELSEN NYI 5.515 JAROMIR JAGR NJD 4.7316 LOGAN COUTURE SJS 4.717 RADIM VRBATA PHO 4.418 DAVID PERRON EDM 4.119 HENRIK LUNDQVIST NYR 3.7620 ANDREI MARKOV MON 3.5
Table 4: Top 20 value players as ranked by PPM/salary.shots), does not in general compare favorably to the traditional goals focus in this framework.Our development has focused on point-estimation via the gamlr package, which infersparameters under L penalization. Another software package offering similar features is glmnet [35]. The difference between gamlr and glmnet is in what options they provide ontop of the standard L penalty. As detailed in [30], including an example analysis of this samehockey data, gamlr provides a ‘gamma-lasso’ algorithm for diminishing bias penalization:the penalty on coefficients automatically diminishes for strong signals. If you believe thatthe current analysis has over-shrunk the influence of, say, total stars like Sidney Crosby orPavel Datsyuk, then gamlr and [30] will offer a preferable analysis framework. On the otherhand, if you think that all players should be shrunk closer to zero – perhaps you believethat the current results over-state the effect of a few stars – then the elastic net penalizationscheme of [35] and glmnet will be preferable. In either case, simple L penalization providesa useful reference baseline analysis.Beyond changes to the penalty specification, we think that there can be considerable valuein moving from point-estimation to a fully Bayesian analysis. [12] included exploration of the23layer effect posterior in their earlier analysis of a related model. They apply the reglogit package [9] for R , which implements the Gibbs sampling strategy from [10]. The softwaretakes advantage of sparse matrix libraries ( slam [16]), and is multi-threaded via OpenMP toengage multiple processors simultaneously. It combines two scale-mixture of normals data-augmentation schemes, one for the logit [15] and one for the Laplace prior [21]. Obtaining T samples from the full posterior, is straightforward using the following R code bfit <- reglogit(T=T, y=Y, X=X, normalize=FALSE) The full posterior sample for β , residing in bfit , is available for calculation of posteriormeans and covariances of player effects and other posterior functionals relevant to playerperformance. For example, [12] use the posterior probability that one player is better thananother as a basis for ranking players, and can even provide posterior credible intervalsaround these rankings. This information can be used to construct teams of players underbudget constraints and subsequently describe the probability that those teams will scoremore goals than their opponents.Another option is to depart from the restrictions of linear modeling. Anecdotally, somein the sports analytics community (not just in hockey) have embraced a framework builtaround random forests [2]. An advantage of decision trees, on which random forests arebased, is that they naturally explore interactions between predictors – e.g., between playersand other effects in the hockey analysis. The bagging procedure – averaging across manytrees – provides a mechanism for avoiding over-fit: structure that is not persistent across treesis eliminated by the averaging. Such work also fits with our above advocacy of fully Bayesiananalysis: [31] describes random forests as approximating a Bayesian nonparametric posteriorover trees, while Bayesian additive regression trees (BART [5]) provide an alternative tree-based scheme that can be extended to logistic regression via the latent variable techniques in[10]. Finally, the proportional hazards model [32], mentioned above in Section 2, attempts toreproduce more completely the stochastic processes behind scoring in a hockey game. Theirfully Bayesian analysis accounts for a wide set of game information, including the time-on-iceinformation that we are only roughly accounting for in our PPM statistic. see., e.g, https://github.com/TaddyLab/hockey/blob/master/results/blog/logistic_pranks_betas.csv L penalized logistic regression has much to recommend it. The model is very simpleto interpret and relies upon minimal restrictive assumptions on the process of a hockey game.Our measures are also much faster to compute than any of the alternatives. These qualitiesmake sophisticated real-time analysis of player effects possible as games and seasons progress. References [1] Tom Awad. Numbers On Ice: Fixing Plus/Minus.
Hockey Prospectus , April 03, 2009,2009.[2] L. Breiman. Random forests.
Machine Learning , 45:5–32, 2001.[3] Emmanuel J. Candes, Michael B. Wakin, and Stephen P. Boyd. Enhancing sparsity byreweighted l1 minimization.
Journal of Fourier Analysis and Applications , 14:877–905,2008.[4] H.A. Chipman, E.I. George, and R.E. McCulloch. Bayesian Treed Models.
MachineLearning , 48:303–324, 2002.[5] Hugh A. Chipman, Edward I. George, and Robert E. McCulloch. BART: Bayesianadditive regression trees.
The Annals of Applied Statistics , 4:266–298, 2010.[6] Cheryl Flynn, Clifford Hurvich, and Jefferey Simonoff. Efficiency for regularizationparameter selection in penalized likelihood estimation of misspecified models.
Journalof the American Statistical Association , 108:1031–1043, 2013.[7] Matthew Gentzkow, Jesse Shapiro, and Matt Taddy. Measuring polarization in highdimensional data.
Chicago Booth working paper , 2015.[8] R. B. Gramacy and Matt Taddy. Categorical inputs, sensitivity analysis, optimizationand importance tempering with tgp version 2.
Journal of Statistical Software , 33, 2010.[9] R.B. Gramacy. reglogit : Simulation-based Regularized Logistic Regression , 2012. R package version 1.1.[10] R.B. Gramacy and N.G. Polson. Simulation-based regularized logistic regression. Bayesian Analysis , 7:1–24, 2012.[11] Robert Gramacy. tgp: An R Package for Bayesian Nonstationary, Semiparametric Non-linear Regression and Design by Treed Gaussian Process Models.
Journal of Statistical oftware , 19, 2007.[12] Robert B. Gramacy, Shane Jensen, and Matt Taddy. Estimating player contribution inhockey with regularized logistic regression. Journal of Quantitative Analysis in Sports ,9:97–111, 2013.[13] Trevor Hastie, Robert Tibshirani, and Jerome Friedman.
The Elements of StatisticalLearning: Data Mining, Inference, and Prediction . Springer, 2001.[14] Arthur Hoerl and Robert Kennard. Ridge regression: Biased estimation for nonorthog-onal problems.
Technometrics , 12:55–67, 1970.[15] C. Holmes and K. Held. Bayesian auxilliary variable models for binary and multinomialregression.
Bayesian Analysis , 1(1):145–168, 2006.[16] Kurt Hornik, David Meyer, and Christian Buchta. slam:
Sparse Lightweight Arraysand Matrices , 2011. R package version 0.1-23.[17] Clifford M. Hurvich and Chih-Ling Tsai. Regression and time series model selection insmall samples.
Biometrika , 76(2):297–307, 1989.[18] Steve Ilardi and Aaron Barzilai. Adjusted Plus-Minus Ratings: New and Improved for2007-2008. , 2004.[19] Brian Macdonald. A regression-based adjusted plus-minus statistic for nhl players.Technical report, arXiv: 1006.4310, 2010.[20] Daniel Mason and William Foster. Putting moneyball on ice?
International Journal ofSport Finance , 2(4):206–213, 2007.[21] Trevor Park and George Casella. The bayesian lasso.
Journal of the American StatisticalAssociation , 103(482):681–686, June 2008.[22] Stephen Pettigrew. Assessing the offensive productivity of nhl players using in-gamewin probabilities. In
MIT Sloan Sports Analytics Conference , 2015.[23] Dan T. Rosenbaum. Measuring How NBA Players Help Their Teams Win. ,April 30, 2004, 2004.[24] Michael E. Schuckers, Dennis F. Lock, Chris Wells, C. J. Knickerbocker, and Robin H.Lock. National hockey league skater ratings based upon all on-ice events: An adjustedminus/plus probability (ampp) approach. Technical report, St. Lawrence University,2010.[25] R Development Core Team. R : A Language and Environment for Statistical Computing .26 Foundation for Statistical Computing, Vienna, Austria, 2010. ISBN 3-900051-07-0.[26] Simon Sheather.
A modern approach to regression with R . Springer, 2009.[27] Anthony Stair, John Neral, Logan Thomas, and Daniel Mizak. Team performance char-acteristics which influence wins in the national hockey league. Journal of InternationalBusiness Diciplines , 6(2), 2011.[28] Matt Taddy. gamlr : Gamma Lasso Regression , 2013. R package version 1.11-2.[29] Matt Taddy. Distributed multinomial regression. The Annals of Applied Statistics ,9:1394–1414, 2015.[30] Matt Taddy. One-step estimator paths for concave regularization. arXiv:1308.5623 ,2015.[31] Matt Taddy, Chun-Sheng Chen, Jun Yu, and Mitch Wyle. Bayesian and empiricalbayesian forests. In
Proceedings of the 32nd International Conference on Machine Learn-ing (ICML-15) , pages 967–976. JMLR Workshop and Conference Proceedings, 2015.[32] A. C. Thomas, Samuel L. Ventura, Shane Jensen, and Stephen Ma. Competingprocess hazard function models for player ratings in ice hockey. Technical report,ArXiv:1208.0799, 2012.[33] Robert Tibshirani. Regression shrinkage and selection via the lasso.
Journal of theRoyal Statistical Society, Series B , 58:267–288, 1996.[34] Robert Vollman. Howe and Why: Ten Ways to Measure Defensive Contributions.
Hockey Prospectus , March 04, 2010, 2010.[35] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net.