Kenneth N. Berk
Illinois State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kenneth N. Berk.
Technometrics | 1978
Kenneth N. Berk
Although it is generally felt that the all-subsets, or “best” subset, approach is better than forward selection and backward elimination, the sequential procedures are still widely used. To see what advantage there is in doing all-subsets, this paper gtves both theoretical and empirical comparisons. It is shown that the difference in favor of all-subsets can be arbitrarily large in examples where there are predictors which do poorly alone but do very well together. Also. empirical comparisons on nine data sets show big dilrerences favoring all-subsets, when the differences are measured on the data (sample values). However, fairer comparisons based on known population values show very small differences favoring all-subsets. The only exception is the one data set which has predictors which do well together but poorly alone.
Journal of the American Statistical Association | 1977
Kenneth N. Berk
Abstract Many regression programs include a tolerance test that does not allow a variable to enter the regression if its correlation with the previously entered variables exceeds a specified level. This is done to achieve computational stability by assuring that the correlation matrix C of the independent variables is not nearly singular. However, for any specified tolerance level, there is an example in which the entering variables pass the tolerance test but the computation is extremely unstable. A bound for the condition of C is p times the trace of C -1, which can be monitored instead of tolerance to assure stability.
Journal of Quality Technology | 1991
Kenneth N. Berk; Richard R. Picard
Experimental designs used in industry often allow no degrees of freedom for the estimation of error. Nevertheless, analysis of variance results can (if used properly) be used to determine which factors are significant. We give a back-of-the-envelope cal..
Technometrics | 1995
Kenneth N. Berk; David E. Booth
Start with a multiple regression in which each predictor enters linearly. How can we tell if there is a curve so that the model is not valid? Possibly for one of the predictors an additional square or square-root term is needed. We focus on the case in which an additional term is needed rather than the monotonic case in which a power transformation or logarithm might be sufficient. Among the plots that have been used for diagnostic purposes, nine methods are applied here. All nine methods work fine when the predictors are not related to each other, but two of them are designed to work even when the predictors are arbitrary noisy functions of each other. These two are recent methods, Cooks CERES plot and the plot for an additive model with nonparametric smoothing applied to one predictor. Even these plots, however, can miss a curve in some cases and show a false curve in others. To give a measure of curve detection, the curve can be fitted nonparametrically, and this fit can be used in place of the predic...
Technometrics | 1984
Kenneth N. Berk
The best way to validate the predictive ability of a statistical model is to apply it to new data. This article compares eight ways to form regression models by forming them with old data and then validating them with fresh data. One goal here is to study which methods will work as a function of the type of data. To some extent one can tell which methods will work well by looking at the data. Another goal is to study the quality of prediction when the regression is applied to new data. Prediction quality is determined in large part by the distance of the new data in relation to the old.
The American Statistician | 1987
Kenneth N. Berk
Abstract This article focuses on important aspects of microcomputer statistical software. These include documentation, control language, data entry, data listing and editing, data manipulation, graphics, statistical procedures, output, customizing, system environment, and support. The primary concern is that a package encourage good statistical practice.
Journal of Statistical Computation and Simulation | 1980
Kenneth N. Berk
For stepwise regression and discriminant analysis the parameters F in and F out govern the inclusion and deletion of variables. The candidate variable with the biggest F—ratio is included if this exceeds F inthe included variable with the smallest F—ratio is deleted if this is less than F in If F in ≧F out; then return to a previous subset size implies improvement in the criterion measure. This result also holds for a generalization, stepwise multivariate analysis, which includes stepwise regression and discriminant analysis as special cases Eliminations do not occur if forward regression and backward elimination yield the same sequence of subsets. Conversely, there is a more liberal stepping rule which always eliminates if the two sequences differ.
Journal of the American Statistical Association | 1978
Kenneth N. Berk; Ivor Francis
Abstract SPSS and BMDP have much in common, but they have contrasting emphases. The SPSS manual is intended for an unsophisticated audience. It has low-level statistical explanations and carefully written directions for running the programs, but not much about computational procedures. In contrast, the BMDP manual is more sophisticated, with not much statistical explanation, brief explanation of the control language, and substantial discussion of algorithms. We summarize our review with a listing of qualities which we consider important in a manual and our ratings for SPSS and BMDP.
Archive | 2011
Jay Devore; Kenneth N. Berk
Chapters 8 and 9 presented confidence intervals (CIs) and hypothesis testing procedures for a single mean μ, single proportion p, and a single variance σ2. Here we extend these methods to situations involving the means, proportions, and variances of two different population distributions.
Archive | 2011
Jay Devore; Kenneth N. Berk
The general objective of a regression analysis is to determine the relationship between two (or more) variables so that we can gain information about one of them through knowing values of the other(s). Much of mathematics is devoted to studying variables that are deterministically related. Saying that x and y are related in this manner means that once we are told the value of x, the value of y is completely specified.