David J. Olive
Southern Illinois University Carbondale
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David J. Olive.
Journal of the American Statistical Association | 2002
Douglas M. Hawkins; David J. Olive
Because high-breakdown estimators (HBEs) are impractical to compute exactly in large samples, approximate algorithms are used. The algorithm generally produces an estimator with a lower consistency rate and breakdown value than the exact theoretical estimator. This discrepancy grows with the sample size, with the implication that huge computations are needed for good approximations in large high-dimension samples. The workhorse for HBEs has been the “elemental set,” or “basic resampling,” algorithm. This turns out to be completely ineffective in high dimensions with high levels of contamination. However, enriching it with a “concentration” step turns it into a method that can handle even high levels of contamination, provided that the regression outliers are located on random cases. It remains ineffective if the regression outliers are concentrated on high-leverage cases. We focus on the multiple regression problem, but several of the broad conclusions—notably, those of the inadequacy of fixed numbers of elemental starts—are also relevant to multivariate location and dispersion estimation. We introduce a new algorithm—the “X-cluster” method—for large high-dimensional multiple regression datasets that are beyond the reach of standard resampling methods. This algorithm departs sharply from current HBE algorithms in that, even at a constant percentage of contamination, it is more effective the larger the sample, making a compelling case for using it in the large-sample situations that current methods serve poorly. A multipronged analysis using both traditional ordinary least squares and L1 methods along with newer resistant techniques will often detect departures from the multiple regression model that cannot be detected by any single estimator.
Computational Statistics & Data Analysis | 2004
David J. Olive
This paper presents a simple resistant estimator of multivariate location and dispersion. The DD plot is a plot of Mahalanobis distances from the classical estimator versus the distances from a resistant estimator and can be used to detect outliers and as a diagnostic for multivariate normality. The new estimator can be used in the DD plot, is easy to compute and provides insights about several useful robust algorithm techniques.
Technometrics | 2002
David J. Olive
The DD plot is a plot of classical vsfirobust Mahalanobis distances: MDi vs. RDi. The DD plot can be used as a diagnostic for multivariate normality and elliptical symmetry, and to assess the success of numerical transformations towards elliptical symmetry. In the regression context, many procedures can be adversely affected if strong nonlinearities are present in the predictors. Even if strong nonlinearities are present, the robust distances can be used to help visualize important regression models such as generalized linear models.
Technometrics | 2005
David J. Olive; Douglas M. Hawkins
Variable selection, the search for j relevant predictor variables from a group of p candidates, is a standard problem in regression analysis. The class of 1D regression models is a broad class that includes generalized linear models. We show that existing variable selection algorithms, originally meant for multiple linear regression and based on ordinary least squares and Mallowss Cp, can also be used for 1D models. Graphical aids for variable selection are also provided.
Technometrics | 2001
R. Dennis Cook; David J. Olive
A new graphical method for assessing parametric transformations of the response in linear regression is given. We simply regress the response variable Y on the predictors, find the fitted values, and then dynamically plot the transformed response Yλ against those fitted values by varying the transformation parameter λ until the plot is linear. The method can be used also to assess the success of numerical response transformation methods and to discover influential observations. Modifications using robust estimators can be used as well.
Statistics & Probability Letters | 2003
David J. Olive; Douglas M. Hawkins
An important parameter for several high breakdown regression algorithm estimators is the number of cases given weight one, called the coverage of the estimator. Increasing the coverage is believed to result in a more stable estimator, but the price paid for this stability is greatly decreased resistance to outliers. A simple modification of the algorithm can greatly increase the coverage and hence its statistical performance while maintaining high outlier resistance.
Archive | 2004
David J. Olive
Regression is the study of the conditional distribution of the response y given the predictors x. In a 1D regression, y is independent of x given a single linear combination βT x of the predictors. Special cases of 1D regression include multiple linear regression, binary regression and generalized linear models. If a good estimate b of some non-zero multiple cβ of β can be constructed, then the 1D regression can be visualized with a scatterplot of b T x versus y. A resistant method for estimating cβ is presented along with applications.
Communications in Statistics-theory and Methods | 2010
Jing Chang; David J. Olive
In a 1D regression, the response variable is independent of the predictors given a single linear combination of the predictors. Theory for ordinary least squares (OLS) is reviewed, and it is shown that much of the OLS output originally meant for multiple linear regression is still relevant for a much wider class of regression models including single index models. Ellipsoidal trimming can be combined with OLS to create outlier resistant methods.
Communications in Statistics-theory and Methods | 2013
David J. Olive
Several useful plots for generalized linear models (GLMs) can be applied to generalized additive models (GAMs) with little modification. A plot for a GLM using the estimated sufficient predictor can be extended to a GAM by replacing the ESP by the estimated additive predictor . The residual plot, response plot, and transformation plots are examples. Since a GLM is a special case of a GAM, a plot of EAP versus ESP is useful for checking goodness of fit of the GLM.
Statistics & Probability Letters | 2001
David J. Olive
Two high breakdown estimators that are asymptotically equivalent to a sequence of trimmed means are introduced. They are easy to compute and their asymptotic variance is easier to estimate than the asymptotic variance of standard high breakdown estimators.
Collaboration
Dive into the David J. Olive's collaboration.
Hasthika S. Rupasinghe Arachchige Don
Southern Illinois University Carbondale
View shared research outputs