Neil J. Dorans | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Neil J. Dorans is active.

Explore More

Publication

Featured researches published by Neil J. Dorans.

Psychometrika | 1982

The polyserial correlation coefficient

Ulf Olsson; Fritz Drasgow; Neil J. Dorans

The polyserial and point polyserial correlations are discussed as generalizations of the biserial and point biserial correlations. The relationship between the polyserial and point polyserial correlation is derived. The maximum likelihood estimator of the polyserial correlation is compared with a two-step estimator and with a computationally convenient ad hoc estimator. All three estimators perform reasonably well in a Monte Carlo simulation. Some practical applications of the polyserial correlation are described.

Applied Psychological Measurement | 1995

DIF assessment for polytomously scored items: A framework for classification and evaluation

Maria T. Potenza; Neil J. Dorans

Increased use of alternatives to the traditional dichotomously scored multiple-choice item yield complex responses that require complex scoring rules. Some of these new item types can be polytomously scored. DIF methodology is well-defined for traditional dichotomously scored multiple-choice items. This paper provides a classification scheme of DIF procedures for dichotomously scored items that is applicable to new DIF procedures for polytomously scored items. In the process, a formal development of a polytomous version of a dichotomous DIF technique is presented. Several polytomous DIF techniques are evaluated in terms of statistical and practical criteria. Index terms: DIF methodology, differential item functioning, item bias, polytomous scoring, statistical criteria for differential item functioning.

Linking and Aligning Scores and Scales, Jun, 2005, Princeton University, Princeton, NJ, US; The aforementioned conference provided raw material for this volume. | 2007

Linking and aligning scores and scales

Neil J. Dorans; Mary Pommerich; Paul W. Holland

Overview.- Overview.- Foundations.- A Framework and History for Score Linking.- Data Collection Designs and Linking Procedures.- Equating.- Equating: Best Practices and Challenges to Best Practices.- Practical Problems in Equating Test Scores: A Practitioners Perspective.- Potential Solutions to Practical Equating Issues.- Tests in Transition.- Score Linking Issues Related to Test Content Changes.- Linking Scores Derived Under Different Modes of Test Administration.- Tests in Transition: Discussion and Synthesis.- Concordance.- Sizing Up Linkages.- Concordance: The Good, the Bad, and the Ugly.- Some Further Thoughts on Concordance.- Vertical Scaling.- Practical Issues in Vertical Scaling.- Methods and Models for Vertical Scaling.- Vertical Scaling and No Child Left Behind.- Assessments Linking Group Assessments to Individual.- Linking Assessments Based on Aggregate Reporting: Background and Issues.- An Enhanced Method for Mapping State Standards onto the NAEP Scale.- Using Aggregate-Level Linkages for Estimation and Validation: Comments on Thissen and Braun & Qian.- Postscript.

Review of Educational Research | 1985

Implications for Altering the Context in Which Test Items Appear: A Historical Perspective on an Immediate Concern

Linda F. Leary; Neil J. Dorans

Research into the effects of item arrangement has been motivated by the need to know the potential effects on item statistics of different item arrangement schemes. This review of the literature confirms that many of the salient and common features of the research can be identified as a function of the practical psychometric concerns of the time. The studies are separated into three periods. The earliest studies investigated the simple main effect of item order on test performance; the late 1960s reflected a change in emphasis to a design that included interactions between item order and factors of examinees’ psychological and biological characteristics; current concern with test disclosure and development of individual adaptive testing instruments has shifted the emphasis to the effects of item order on the stability of item parameters. The literature has produced evidence of context effects, but has not demonstrated that the effects are so strong as to invalidate test theory or practice that is dependent on an assumption of item parameter invariance.

Applied Psychological Measurement | 2004

Equating, Concordance, and Expectation.

Neil J. Dorans

How do scores from different tests relate to each other? Three types of score linkage are discussed: equating, concordance, and prediction of expected scores. Statistical indices, in conjunction with rational considerations, are needed to determine whether the highest level of linkage attainable between scores from two “tests” is the conceptual and statistical interchangeability sought by equating, the distributional similarity of concordance links, or the minimum squared loss attained by prediction of expected scores. Relationships among the different scales of the ACT and SAT I are described in the context of the conceptual framework developed herein. These relationships are used to evaluate the appropriateness of concordances and predictions of expected scores among various scores on these two prominent tests. Sums of scores, composite scores, and individual scores are examined. The importance of score reliability in assessing linkage possibilities is discussed.

Medical Care | 2006

Differential item functioning on the Mini-Mental State Examination. An application of the Mantel-Haenszel and standardization procedures.

Neil J. Dorans; Edward Kulick

Differential item functioning (DIF) attempts to identify items for which subpopulations of examinees exhibit performance differentials that are not consistent with the performance differentials seen among those subpopulations on a reliable measure of the construct of interest. DIF assessment requires a rule for scoring items and a matching variable on which different subpopulations can be viewed as comparable for purposes of assessing their performance on items. Typically, DIF is operationally defined as a difference in item performance between subpopulations, eg, Spanish-speakers and English-speakers, which exist after members of the different subpopulations have been matched on some one-dimensional matching variable such as total score. This work defines DIF, describes 2 standard procedures for measuring DIF, applies these DIF procedures to the Mini-Mental State Examination, and contrasts DIF with score equity analysis (SEA). The description of DIF assessment presented in this paper is applicable to any examination question that has responses that can be ordered, eg, with respect to correctness or severity.

Journal of Educational and Behavioral Statistics | 1988

An Assessment of the Dimensionality of Three SAT-Verbal Test Editions

Linda L. Cook; Neil J. Dorans; Daniel R. Eignor

A strong assumption made by most commonly used item response theory (IRT) models is that the data are unidimensional, that is, statistical dependence among item scores can be explained by a single ability dimension. First-order and second-order factor analyses were conducted on correlation matrices among item parcels of SAT-Verbal items. The item parcels were constructed to yield correlation matrices that were amenable to linear factor analyses. The first-order analyses were employed to assess the effective dimensionality of the item parcel data. Second-order analyses were employed to test meaningful hypotheses about the structure of the data. Parcels were constructed for three SAT-Verbal editions. The dimensionality analyses revealed that one SAT-Verbal test edition was less parallel to the other two editions than these other editions were to each other. Refinements in the dimensionality methodology and a more systematic dimensionality assessment are logical extensions of the present research.

Applied Psychological Measurement | 2008

Anchor Test Type and Population Invariance: An Exploration across Subpopulations and Test Administrations.

Neil J. Dorans; Jinghua Liu; Shelby Hammond

This exploratory study was built on research spanning three decades. Petersen, Marco, and Stewart (1982) conducted a major empirical investigation of the efficacy of different equating methods. The studies reported in Dorans (1990) examined how different equating methods performed across samples selected in different ways. Recent population sensitivity studies have examined whether equating methods yield comparable results across subpopulations. The current study confirms earlier research and clarifies the role of population invariance studies in assessing equating results. A content-appropriate anchor produced solid equating results under small ability differences and divergence of equating results for different methods under large ability differences. Results showed a content-inappropriate anchor did not produce sound score equatings but did yield a strong degree of invariance. Lack of population invariance of equating results can be taken as evidence that a linking is not an equating. The existence of invariance does not mean, however, that equating has been achieved.

Journal of Educational and Behavioral Statistics | 2009

Using Past Data to Enhance Small Sample DIF Estimation: A Bayesian Approach.

Sandip Sinharay; Neil J. Dorans; Mary C. Grant; Edwin O. Blew

Test administrators often face the challenge of detecting differential item functioning (DIF) with samples of size smaller than that recommended by experts. A Bayesian approach can incorporate, in the form of a prior distribution, existing information on the inference problem at hand, which yields more stable estimation, especially for small samples. A large volume of past data is available for many operational tests and such data could be used to establish prior distributions for a Bayesian DIF analysis. This article discusses how to perform such an analysis. The suggested approach is found to be more conservative and preferable with respect to several overall criteria than the existing DIF detection methods in a realistic simulation study.

Applied Psychological Measurement | 1982

Robustness of Estimators of the Squared Multiple Correlation and Squared Cross-Validity Coefficient to Violations of Multivariate Normality

Fritz Drasgow; Neil J. Dorans

A monte carlo experiment was conducted to evaluate the robustness of two estimators of the population squared multiple correlation (R2 p ) and one estimator of the population squared cross-validity coefficient (R 2 cv ) to a common violation of multivariate normality. Previous research has shown that these estimators are approximately unbiased when independent and dependent variables follow a joint multivariate normal distribution. The particular violation of multivariate normality studied here consisted of a dependent variable that may assume only a few discrete values. The discrete dependent variable was simulated by categorizing an underlying continuous variable that did satisfy the multivariate normality condition. Results illustrate the attenuating effects of categorization upon R2 p and R2 cv. In addition, the distributions of sample squared multiple correlations and sample squared cross-validity coefficients are affected by categorization mainly through the attenuations of R 2 P and R2 cv . Consequently, the formula estimators of R2 p and R2 cv were found to be as accurate and unbiased with discrete dependent variables as they were with continuous dependent variables. Substantive researchers who use categorical dependent variables, perhaps obtained by rating scale judgments, can justifiably employ any of the three estimators examined here.

Explore More