Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Allan S. Cohen is active.

Publication


Featured researches published by Allan S. Cohen.


Assessment in Education: Principles, Policy & Practice | 1996

Threats to the Valid Use of Assessments

Terry J. Crooks; Michael T. Kane; Allan S. Cohen

Abstract Validity is the most important quality of an assessment, but its evaluation is often neglected. The step‐by‐step approach suggested here provides structured guidance to validators of educational assessments. Assessment is depicted as a chain of eight linked stages: administration, scoring, aggregation, generalization, extrapolation, evaluation, decision and impact. Evaluating validity requires careful consideration of threats to validity associated with each link. Several threats are described and exemplified for each link. These sets of threats are intended to be illustrative rather than comprehensive. The chain model suggests that validity is limited by the weakest link, and that efforts to make other links particularly strong may be wasteful or even harmful. The chain model and list of threats is also shown to be valuable when planning assessments.


Applied Psychological Measurement | 1998

A Comparison of Linking and Concurrent Calibration under Item Response Theory.

Seock-Ho Kim; Allan S. Cohen

Applications of item response theory (IRT) to practical testing problems, including equating, differential item functioning, and computerized adaptive testing, require a common metric for item parameter estimates. This study compared three methods for developing a common metric under IRT: (1) linking separate calibration runs using equating coefficients from the characteristic curve method, (2) concurrent calibration based on marginal maximum a posteriori estimation, and (3) concurrent calibration based on marginal maximum likelihood estimation. For smaller numbers of common items, linking using the characteristic curve method yielded smaller root mean square differences for both item discrimination and difficulty parameters. For larger numbers of common items, the three methods yielded similar results.


Applied Psychological Measurement | 2009

Model Selection Methods for Mixture Dichotomous IRT Models.

Feiming Li; Allan S. Cohen; Seock-Ho Kim; Sun-Joo Cho

This study examines model selection indices for use with dichotomous mixture item response theory (IRT) models. Five indices are considered: Akaikes information coefficient (AIC), Bayesian information coefficient (BIC), deviance information coefficient (DIC), pseudo-Bayes factor (PsBF), and posterior predictive model checks (PPMC). The five indices provide somewhat different recommendations for a set of real data. Results from a simulation study indicate that BIC selects the correct (i.e., the generating) model well under most conditions simulated and for all three of the dichotomous mixture IRT models considered. PsBF is almost as effective. AIC and PPMC tend to select the more complex model under some conditions. DIC is least effective for this use.


Applied Psychological Measurement | 1993

Detection of differential item functioning in the graded response model

Allan S. Cohen; Seock-Ho Kim; Frank B. Baker

Methods for detecting differential item func tioning (DIF) have been proposed primarily for the item response theory dichotomous response model. Three measures of DIF for the dichotomous response model are extended to include Samejimas graded response model: two measures based on area differences between item true score functions, and a χ2 statistic for comparing differences in item parameters. An illustrative example is presented.


Journal of Educational and Behavioral Statistics | 2001

A Mixture Item Response Model for Multiple-Choice Data.

Daniel M. Bolt; Allan S. Cohen; James A. Wollack

A mixture item response model is proposed for investigating individual differences in the selection of response categories in multiple-choice items. The model accounts for local dependence among response categories by assuming that examinees belong to discrete latent classes that have different propensities towards those responses. Varying response category propensities are captured by allowing the category intercept parameters in a nominal response model (Bock, 1972) to assume different values across classes. A Markov Chain Monte Carlo algorithm for the estimation of model parameters and classification of examinees is described. A real-data example illustrates how the model can be used to distinguish examinees that are disproportionately attracted to different types of distractors in a test of English usage. A simulation study evaluates item parameter recovery and classification accuracy in a hypothetical multiple-choice test designed to be diagnostic. Implications for test construction and the use of multiple-choice tests to perform cognitive diagnoses of item response patterns are discussed.


Applied Psychological Measurement | 1998

Detection of Differential Item Functioning Under the Graded Response Model With the Likelihood Ratio Test

Seock-Ho Kim; Allan S. Cohen

Type I error rates of the likelihood ratio test for the detection of differential item functioning (DIF) were investigated using monte carlo simulations. The graded response model with five ordered categories was used to generate datasets of a 30-item test for samples of 300 and 1,000 simulated examinees. All DIF comparisons were simulated by randomly pairing two groups of examinees. Three different sample size combinations of reference and focal groups were simulated under two ability matching conditions. For each of the six combinations of sample sizes by ability matching conditions, 100 replications of DIF detection comparisons were simulated. Item parameter estimates and likelihood values were obtained by marginal maximum likelihood estimation using the computer program MULTILOG. Irpe I eryor rates of the likelihood ratio test statistics for all six combinations of sample sizes and ability matching conditions were within theoretically expected values at each of the nominal alpha levels considered.


Applied Psychological Measurement | 1996

An Investigation of the Likelihood Ratio Test For Detection of Differential Item Functioning

Allan S. Cohen; Seock-Ho Kim; James A. Wollack

Type I error rates for the likelihood ratio test for de tecting differential item functioning (DIF) were investi gated using monte carlo simulations. Two- and three-parameter item response theory (IRT) models were used to generate 100 datasets of a 50-item test for samples of 250 and 1,000 simulated examinees for each IRT model. Item parameters were estimated by marginal maximum likelihood for three IRT models: the three-parameter model, the three-parameter model with a fixed guessing parameter, and the two-parameter model. All DIF comparisons were simulated by ran domly pairing two samples from each sample size and IRT model condition so that, for each sample size and IRT model condition, there were 50 pairs of reference and focal groups. Type I error rates for the two-param eter model were within theoretically expected values at each of the α levels considered. Type I error rates for the three-parameter and three-parameter model with a fixed guessing parameter, however, were different from the theoretically expected values at the α levels considered.


Applied Psychological Measurement | 2007

IRT Model Selection Methods for Dichotomous Items

Taehoon Kang; Allan S. Cohen

Fit of the model to the data is important if the benefits of item response theory (IRT) are to be obtained. In this study, the authors compared model selection results using the likelihood ratio test, two information-based criteria, and two Bayesian methods. An example illustrated the potential for inconsistency in model selection depending on which of the indices was used. Results from a simulation study indicated that the inconsistencies among the indices were common but that model selection was relatively accurate for longer tests administered to larger sample of examinees. The cross-validation log-likelihood (CVLL) appeared to work the best of the five models for the conditions simulated in this study.


Applied Psychological Measurement | 2002

Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.

James A. Wollack; Daniel M. Bolt; Allan S. Cohen; Young-Sun Lee

Markov chain Monte Carlo (MCMC) methods, such as Gibbs sampling, present an alternative to marginal maximum likelihood (MML) estimation, which offers some promise for parameter estimation particularly with complex models, in small sample situations, and for other applications where MML algorithms have not been established. MCMC circumvents the problems associated with implementing an estimation algorithm for complex, multidimensional probability distributions by sampling the parameters from each of the one-dimensional conditional posterior distributions at each stage of the Markov chain. In this article, the authors compared the quality of item parameter estimates for MML and MCMC with one type of complex item response theory model, the nominal response model. The quality of item parameter recovery was nearly identical for both MML and MCMC. Both methods tended to produce good estimates, even for short tests and relatively small sample sizes. Parameter recovery was best for items of moderate dif.culty (i.e., items matched to the latent trait distribution); recovery was worst for items that were extremely easy or dif.cult. The quality of item parameter recovery improved as test length increased from 10 to 30 items, but did not change as test length increased from 20 to 30 items. MCMC estimation takes substantially longer but appears to be a good surrogate for MML for those situations for which an MML algorithm has not been developed.


Journal of Educational and Behavioral Statistics | 1998

On the Behrens-Fisher Problem: A Review

Seock-Ho Kim; Allan S. Cohen

The Behrens-Fisher problem arises when one seeks to make inferences about the means of two normal populations without assuming the variances are equal. This paper presents a review of fundamental concepts and applications used to address the Behrens-Fisher problem under fiducial, Bayesian, and frequentist approaches. Methods of approximations to the Behrens-Fisher distribution and a simple Bayesian framework for hypothesis testing are also discussed. Finally, a discussion is provided for the use of generalized p values in significance testing of hypotheses in the presence of nuisance parameters. It is shown that the generalized p values based on a frequentist probability for the Behrens-Fisher problem are numerically the same as those from the fiducial and Bayesian solutions. A table for tests of significance is also included.

Collaboration


Dive into the Allan S. Cohen's collaboration.

Top Co-Authors

Avatar

Seock-Ho Kim

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

James A. Wollack

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel M. Bolt

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Craig S. Wells

University of Wisconsin-Madison

View shared research outputs
Researchain Logo
Decentralizing Knowledge