Shelby J. Haberman
University of Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shelby J. Haberman.
Journal of the American Statistical Association | 1982
Shelby J. Haberman
Abstract In analogy to analysis of variance for linear models, analyses of dispersion for loglinear models for multinomial responses are constructed. The analyses, which are based on the entropy and concentration measures, are used to construct tests of independence and measures of association.
Archive | 1977
E. H. Uhlenhuth; Shelby J. Haberman; Michael D. Balter; Ronald S. Lipman
Most modern quantitative investigations of life stress depend upon cataloging the life experiences of individual subjects retrospectively over periods ranging from months to years (Dohrenwend & Dohrenwend, 19774; Gunderson & Rake, 1974). Although the unreliability of memory for important events in other contexts is well known (Anderson & Anderson, 1967; Haggard, Brekstad & Shard, 1960; Mechanic & Newton, 1965; Pany, Balter & Cisini, 1970–71; US National Health Survey, 1961, 1963; Wenar & Coulter, 1962), little attention has been paid the issue in relation to studies of life stress (Casey, Masuda, & Holmes, 1967). This paper reports decrements in recall over an l8 month period for stressful events in general, 41 individual events, and various subgroups of events.
Multivariate Behavioral Research | 2010
Sandip Sinharay; Gautam Puhan; Shelby J. Haberman
Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting in educational testing. The existing methods for diagnostic score reporting are discussed. A recent method (Haberman, 2008a) that examines if diagnostic scores are worth reporting is reviewed. It is demonstrated, using results from operational and simulated data, that diagnostic scores have to be based on a sufficient number of items and have to be sufficiently distinct from each other to be worth reporting and that several operationally reported subscores are actually not worth reporting. Several recommendations are made for those interested to report diagnostic scores for educational tests.
Psychometrika | 2013
Yi-Hsuan Lee; Shelby J. Haberman
Monitoring a very frequently administered educational test with a relatively short history of stable operation imposes a number of challenges. Test scores usually vary by season, and the frequency of administration of such educational tests is also seasonal. Although it is important to react to unreasonable changes in the distributions of test scores in a timely fashion, it is not a simple matter to ascertain what sort of distribution is really unusual. Many commonly used approaches for seasonal adjustment are designed for time series with evenly spaced observations that span many years and, therefore, are inappropriate for data from such educational tests. Harmonic regression, a seasonal-adjustment method, can be useful in monitoring scale stability when the number of years available is limited and when the observations are unevenly spaced. Additional forms of adjustments can be included to account for variability in test scores due to different sources of population variations. To illustrate, real data are considered from an international language assessment.
Journal of Educational and Behavioral Statistics | 2010
Shelby J. Haberman; Sandip Sinharay
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a large variety of data sets. It appears that the cumulative logit model performed somewhat better than did the linear regression model.
Psychometrika | 2013
Shelby J. Haberman; Sandip Sinharay; Kyong Hee Chon
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Psychometrika | 2014
Matthias von Davier; Shelby J. Haberman
This commentary addresses the modeling and final analytical path taken, as well as the terminology used, in the paper “Hierarchical diagnostic classification models: a family of models for estimating and testing attribute hierarchies” by Templin and Bradshaw (Psychometrika, doi:10.1007/s11336-013-9362-0, 2013). It raises several issues concerning use of cognitive diagnostic models that either assume attribute hierarchies or assume a certain form of attribute interactions. The issues raised are illustrated with examples, and references are provided for further examination.
Journal of Educational and Behavioral Statistics | 2015
Shelby J. Haberman
Adjustment by minimum discriminant information provides an approach to linking test forms in the case of a nonequivalent groups design with no satisfactory common items. This approach employs background information on individual examinees in each administration so that weighted samples of examinees form pseudo-equivalent groups in the sense that they resemble samples from equivalent groups. Linking methods for equivalent groups are then applied to the weighted samples. To illustrate the approach, 29 administrations from a testing program are linked via the method of pseudo-equivalent groups. Because the forms used are currently linked by use of kernel equating, it is possible to compare the reasonableness of results from pseudo-equivalent groups to results from kernel equating.
International Journal of Testing | 2014
Sandip Sinharay; Shelby J. Haberman
Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups—for example, those based on gender or ethnicity—on subtests. Several researchers found that the difference in performance between the gender-based subgroups varied over the different subtests. In this article, we examine whether the added values of the subscores vary between subgroups using data from several operational tests, including an international English proficiency test. For these data sets, the added values of the subscores occasionally vary over the subgroups, but the added values of the augmented subscores are invariant over the subgroups.
Analysis of Qualitative Data#R##N#New Developments | 1979
Shelby J. Haberman
Publisher Summary Survey data can be used to estimate the joint distribution of several polytomous variables in a population. If the survey data are supplemented by some information from population censuses concerning the distribution of these variables, then the joint distribution can be estimated with increased precision. The method of adjustment developed by Deming and Stephan applies if a table providing the joint distribution of two or more polytomous variables has been obtained from a population sample and if some tables of marginal distributions of these variables have been obtained from a population census. The joint population distribution of the variables is obtained through an iterative proportional fitting algorithm. The Deming–Stephan algorithm produces consistent estimates of joint population probabilities if the sample is a simple random sample. In this case, relatively simple formulas are also available for estimation of asymptotic standard deviations of the probability estimates. The Census does provide a cross-classification of educations of husbands and wives. The adjustment methods described in the chapter have been used extensively to describe relationships between discrete variables. The Newton–Raphson algorithm or the Deming–Stephan algorithm can be used to test the hypothesis of the adjustment of data.