Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert A. Stine is active.

Publication


Featured researches published by Robert A. Stine.


Annals of Statistics | 2004

Least angle regression

Bradley Efron; Trevor Hastie; Iain M. Johnstone; Robert Tibshirani; Hemant Ishwaran; Keith Knight; Jean-Michel Loubes; Pascal Massart; David Madigan; Greg Ridgeway; Saharon Rosset; J. Zhu; Robert A. Stine; Berwin A. Turlach; Sanford Weisberg

DISCUSSION OF “LEAST ANGLE REGRESSION” BY EFRONET AL.By Jean-Michel Loubes and Pascal MassartUniversit´e Paris-SudThe issue of model selection has drawn the attention of both applied andtheoretical statisticians for a long time. Indeed, there has been an enor-mous range of contribution in model selection proposals, including work byAkaike (1973), Mallows (1973), Foster and George (1994), Birg´e and Mas-sart (2001a) and Abramovich, Benjamini, Donoho and Johnstone (2000).Over the last decade, modern computer-driven methods have been devel-oped such as All Subsets, Forward Selection, Forward Stagewise or Lasso.Such methods are useful in the setting of the standard linear model, wherewe observe noisy data and wish to predict the response variable using onlya few covariates, since they provide automatically linear models that fit thedata. The procedure described in this paper is, on the one hand, numeri-cally very efficient and, on the other hand, very general, since, with slightmodifications, it enables us to recover the estimates given by the Lasso andStagewise.1. Estimation procedure. The “LARS” method is based on a recursiveprocedure selecting, at each step, the covariates having largest absolute cor-relation with the response y. In the case of an orthogonal design, the esti-mates can then be viewed as an lDISCUSSION OF “LEAST ANGLE REGRESSION” BY EFRONET AL.By Berwin A. TurlachUniversity of Western AustraliaI would like to begin by congratulating the authors (referred to belowas EHJT) for their interesting paper in which they propose a new variableselection method (LARS) for building linear models and show how their newmethod relates to other methods that have been proposed recently. I foundthe paper to be very stimulating and found the additional insight that itprovides about the Lasso technique to be of particular interest.My comments center around the question of how we can select linearmodels that conform with the marginality principle [Nelder (1977, 1994)and McCullagh and Nelder (1989)]; that is, the response surface is invariantunder scaling and translation of the explanatory variables in the model.Recently one of my interests was to explore whether the Lasso techniqueor the nonnegative garrote [Breiman (1995)] could be modified such that itincorporates the marginality principle. However, it does not seem to be atrivial matter to change the criteria that these techniques minimize in such away that the marginality principle is incorporated in a satisfactory manner.On the other hand, it seems to be straightforward to modify the LARStechnique to incorporate this principle. In their paper, EHJT address thisissue somewhat in passing when they suggest toward the end of Section 3that one first fit main effects only and interactions in a second step to controlthe order in which variables are allowed to enter the model. However, sucha two-step procedure may have a somewhat less than optimal behavior asthe following, admittedly artificial, example shows.Assume we have a vector of explanatory variables X =(XThe purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.


Sociological Methods & Research | 1992

Bootstrapping Goodness-of-Fit Measures in Structural Equation Models

Kenneth A. Bollen; Robert A. Stine

Assessing overall fit is a topic of keen interest to structural equation modelers, yet measuring goodness of fit has been hampered by several factors. First, the assumptions that underlie the chi-square tests of model fit often are violated. Second, many fit measures (e.g., Bentler and Bonetts [1980] normed fit index) have unknown statistical distributions so that hypothesis testing, confidence intervals, or comparisons of significant differences in these fit indices are not possible. Finally, modelers have little knowledge about the distribution and behavior of the fit measures for misspecified models or for nonnested models. Given this situation, bootstrapping techniques would appear to be an ideal means to tackle these problems. Indeed, Bentlers (1989) EQS 3.0 and Jöreskog and Sörboms (forthcoming) LISREL 8 have bootstrap resampling options to bootstrap fit indices. In this article the authors (a) demonstrate that the usual bootstrapping methods will fail when applied to the original data, (b) explain why this occurs, and, (c) propose a modified bootstrap method for the chi-square test statistic for model fit. They include simulated and empirical examples to illustrate their results.


Sociological Methods & Research | 1989

An Introduction to Bootstrap Methods

Robert A. Stine

Bootstrap methods are a collection of sample re-use techniques designed to estimate standard errors and confidence intervals. Making use of numerous samples drawn from the initial observations, these techniques require fewer assumptions and offer greater accuracy and insight than do standard methods in many problems. After presenting the underlying concepts, this introduction focuses on applications in regression analysis. These applications contrast two forms of bootstrap resampling in regression, illustrating their differences in a series of examples that include outliers and heteroscedasticity. Other regression examples use the bootstrap to estimate standard errors of robust estimators in regression and indirect effects in path models. Numerous variations of bootstrap confidence intervals exist, and examples stress the concepts that are common to the various approaches. Suggestions for computing bootstrap estimates appear throughout the discussion, and a section on computing suggests several broad guidelines.


Clinical Orthopaedics and Related Research | 1989

Natural history of the posterior cruciate ligament-deficient knee.

Joseph S. Torg; Thomas M. Barton; Helene Pavlov; Robert A. Stine

This paper documents the clinical course of the posterior cruciate ligament-deficient knee. By obtaining an understanding of the natural history of this lesion, the indications for surgical repair, reconstruction, and conservative treatment will be more clearly defined, and the clinician will be able to more critically evaluate the results of both acute repair and reconstruction of this ligament. Forty-three patients with an average interval of 6.3 years (range, one to 37 years) between injury and evaluation were included in this study. Fourteen patients had a straight unidirectional posterior instability and 29 had a combined multidirectional instability. The follow-up evaluation included functional assessment, physical and roentgenographic evaluation, arthrometric laxity measurement, and isokinetic dynametric testing of quadriceps function. Statistical treatment of the data, utilizing both nonparametric methods and logistic modeling, clearly delineated the natural history of the injury to the posterior cruciate ligament (PCL). It was established that the functional outcome can be predicted on the basis of the instability type. Specifically, those knees with PCL disruption without associated ligamentous laxity will probably remain symptom-free. However, when PCL disruption is associated with combined instabilities, a less than desirable functional result will probably occur. Application of logistic modeling to the data demonstrated that the functional result was not due to the type of instability per se, but rather to associated factors, i.e., chondromalacia of the patella, meniscal derangement, quadriceps atrophy, or degenerative changes. A direct correlation has been established between combined multidirectional instability and the occurrence of those associated secondary problems resulting in the patients complaints and functional disability.(ABSTRACT TRUNCATED AT 250 WORDS)


Journal of the American Statistical Association | 1988

The Bias of Autoregressive Coefficient Estimators

Paul Shaman; Robert A. Stine

Abstract This article presents simple expressions for the bias of estimators of the coefficients of an autoregressive model of arbitrary, but known, finite order. The results include models both with and without a constant term. The effects of overspecification of the model order on the bias are described. The emphasis is on least-squares and Yule-Walker estimators, but the methods extend to other estimators of similar design. Although only the order T -1 component of the bias is captured, where T is the series length, this asymptotic approximation is shown to be very accurate for least-squares estimators through some numerical simulations. The simulations examine fourth-order autoregressions chosen to resemble some data series from the literature. The order T -1 bias approximations for Yule-Walker estimators need not be accurate, especially if the zeros of the associated polynomial have moduli near 1. Examples are given where the approximation is accurate and where it is useless. The bias expressions are...


Journal of Bone and Joint Surgery, American Volume | 1996

The Relationship of Developmental Narrowing of the Cervical Spinal Canal to Reversible and Irreversible Injury of the Cervical Spinal Cord in Football Players. An Epidemiological Study

Joseph S. Torg; R. John Naranja; Helene Pavlov; Brian J. Galinat; Russell F. Warren; Robert A. Stine

An evaluation of forty-five athletes who had had an episode of transient neurapraxia of the cervical spinal cord revealed a consistent finding of developmental narrowing of the cervical spinal canal. The purpose of the present epidemiological study was to determine the relationship, if any, between a developmentally narrowed cervical canal and reversible and irreversible injury of the cervical cord with use of various cohorts of football players as well as a large control group. Cohort I comprised college football players who were asymptomatic and had no known history of transient neurapraxia of the cervical cord. Cohort II consisted of professional football players who also were asymptomatic and had no known history of transient neurapraxia of the cervical cord. Cohort III was a group of high-school, college, and professional football players who had had at least one episode of transient neurapraxia of the cervical cord. Cohort IV comprised individuals who were permanently quadriplegic as a result of an injury while playing high-school or college football. Cohort V consisted of a control group of male subjects who were non-athletes and had no history of a major injury of the cervical spine, an episode of transient neurapraxia, or neurological symptoms. The mean and standard deviation of the diameter of the spinal canal, the diameter of the vertebral body, and the ratio of the diameter of the spinal canal to that of the vertebral body were determined for the third through sixth cervical levels on the radiographs for each cohort. In addition, the sensitivity, specificity, and positive predictive value of a ratio of the diameter of the spinal canal to that of the vertebral body of 0.80 or less was evaluated. The findings of the present study demonstrated that a ratio of 0.80 or less had a high sensitivity (93 per cent) for transient neurapraxia. The findings also support the concept that symptoms may result from a transient reversible deformation of the spinal cord in a developmentally narrowed osseous canal. The low positive predictive value of the ratio (0.2 per cent) however, precludes its use as a screening mechanism for determining the suitability of an athlete for participation in contact sports. Developmental narrowing of the cervical canal in a stable spine does not appear to predispose an individual to permanent catastrophic neurological injury and therefore should not preclude an athlete from participation in contact sports.


The American Statistician | 1995

Graphical Interpretation of Variance Inflation Factors

Robert A. Stine

Abstract A dynamic graphical display is proposed for uniting partial regression and partial residual plots. This animated display helps students understand multicollinearity and interpret the variance inflation factor. The variance inflation factor is presented as the square of the ratio of t-statistics associated with the partial regression and partial residual plots. Examples using two small data sets illustrate this approach.


Journal of the American Statistical Association | 1987

Estimating Properties of Autoregressive Forecasts

Robert A. Stine

Abstract Forecasting requires estimates of the error of prediction; however, such estimates for autoregressive forecasts depend nonlinearly on unknown parameters and distributions. Substitution estimators of mean squared error (MSE) possess bias that varies with the underlying model, and Gaussian-based prediction intervals fail if the data are not normally distributed. This article proposes methods that avoid these problems. A second-order Taylor expansion produces an estimator of MSE that is unbiased and leads to accurate prediction intervals for Gaussian data. Bootstrapping also suggests an estimator of MSE, but it is approximately the problematic substitution estimator. Bootstrapping also yields prediction intervals, however, whose coverages are invariant of the sampling distribution and asymptotically approach the nominal content. Parameter estimation increases the error in autoregressive forecasts. This additional error inflates one-step prediction mean squared error (PMSE) by a factor of 1 + p/T, wh...


Journal of the American Statistical Association | 2004

Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy

Dean P. Foster; Robert A. Stine

We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our dataset of 2.9 million months of credit-card activity. We use stepwise selection to find predictors of these from a mix of payment history, debt load, demographics, and their interactions. This combination of rare responses and over 67,000 possible predictors leads to a challenging modeling question: How does one separate coincidental from useful predictors? We show that three modifications turn stepwise regression into an effective methodology for predicting bankruptcy. Our version of stepwise regression (1) organizes calculations to accommodate interactions, (2) exploits modern decision theoretic criteria to choose predictors, and (3) conservatively estimates p-values to handle sparse data and a binary response. Omitting any one of these leads to poor performance. A final step in our procedure calibrates regression predictions. With these modifications, stepwise regression predicts bankruptcy as well as, if not better than, recently developed data-mining tools. When sorted, the largest 14,000 resulting predictions hold 1,000 of the 1,800 bankruptcies hidden in a validation sample of 2.3 million observations. If the cost of missing a bankruptcy is 200 times that of a false positive, our predictions incur less than 2/3 of the costs of classification errors produced by the tree-based classifier C4.5.


Journal of the American Statistical Association | 1985

Bootstrap Prediction Intervals for Regression

Robert A. Stine

Abstract Bootstrap prediction intervals provide a nonparametric measure of the probable error of forecasts from a standard linear regression model. These intervals approximate the nominal probability content in small samples without requiring specific assumptions about the sampling distribution. Empirical measures of the prediction error rate motivate the choice of these intervals, which are calculated by an application of the bootstrap. The intervals are contrasted to other nonparametric procedures in several Monte Carlo experiments. Asymptotic invariance properties are also investigated.

Collaboration


Dive into the Robert A. Stine's collaboration.

Top Co-Authors

Avatar

Dean P. Foster

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kory D. Johnson

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andreas Buja

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lawrence D. Brown

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Lyle H. Ungar

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Paul Shaman

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Helene Pavlov

Hospital for Special Surgery

View shared research outputs
Researchain Logo
Decentralizing Knowledge