Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John R. Donoghue is active.

Publication


Featured researches published by John R. Donoghue.


Applied Psychological Measurement | 2000

A General Item Response Theory Model for Unfolding Unidimensional Polytomous Responses

James S. Roberts; John R. Donoghue; James E. Laughlin

The generalized graded unfolding model (GGUM) is developed. This model allows for either binary or graded responses and generalizes previous item response models for unfolding in two useful ways. First, it implements a discrimination parameter that varies across items, which allows items to discriminate among respondents in different ways. Second, the GGUM permits response category threshold parameters to vary across items. Amarginal maximum likelihood algorithm is implemented to estimate GGUM item parameters, whereas person parameters are derived from an expected a posteriori technique. The applicability of the GGUM to common attitude testing situations is illustrated with real data on student attitudes toward abortion.


Journal of Educational and Behavioral Statistics | 1993

Thin Versus Thick Matching in the Mantel-Haenszel Procedure for Detecting DIF

John R. Donoghue; Nancy L. Allen

This Monte Carlo study examined strategies for forming the matching variable for the Mantel-Haenszel (MH) differential item functioning (DIF) procedure; thin matching on total test score was compared to forms of thick matching, pooling levels of the matching variable. Data were generated using a three-parameter logistic (3PL) item response theory (IRT) model with common guessing parameter. Number of subjects and test length were manipulated, as were the difficulty, discrimination, and presence/absence of DIF in the studied item. Outcome measures were the transformed log-odds &Deltacirc; MH, its standard error, and the MH chi-square statistic. For short tests (5 or 10 items), thin matching yielded very poor results, with a tendency to falsely identify items as possessing DIF against the reference group. The best methods of thick matching yielded outcome measure values closer to the expected value for non-DIF items, as well as a larger value than thin matching when the studied item possessed DIF. Intermediate length tests yielded similar results for thin matching and the best methods of thick matching. The method of thick matching that performed best depended on the measure used to detect DIF. Both difficulty and discrimination of the studied item were found to have a strong effect on the value of &Deltacirc; MH.


Applied Psychological Measurement | 1998

A Comparison of Procedures to Detect Item Parameter Drift

John R. Donoghue; Steven Isham

Monte carlo methods were used to compare several measures of item parameter drift. Number of examinees, number of items, and number of drift items in the test were manipulated. Overall, Lords x2 measure was the most effective in identifying items that exhibited drift. However, the measure was accurate only when the studied items c parameter was constrained to be equal across the two assessment years. Of the remaining measures, the best methods (a z test based on Rajus exact unsigned integral, the NAEP BILOG/PARSCALE computer programs x2 by subgroup, and Kim and Cohens closed-interval signed-area measure) required empirical estimates of critical values for the test statistics in order to function well. This requirement detracts from their usefulness.


Applied Psychological Measurement | 2002

Characteristics of MML/EAP Parameter Estimates in the Generalized Graded Unfolding Model

James S. Roberts; John R. Donoghue; James E. Laughlin

The generalized graded unfolding model (GGUM) is a very general parametric, unidimensional item response theory model for unfolding either binary or polytomous responses to test items. Roberts, Donoghue, and Laughlin have described a marginal maximum likelihood (MML) approach to estimate item parameters in the GGUM along with an expected a posteriori (EAP) method to estimate person parameters. This article examines the data demands required to produce accurate parameter estimates using these techniques under ideal conditions. It also examines the robustness of parameter estimates under nonideal conditions, in which there are inconsistencies between the prior distribution of person parameters that must be speci.ed when using either the MML or EAP approaches and the true distribution of person parameters. Results from two simulation studies show that accurate item parameter estimates can generally be obtained with approximately 750 to 1,000 respondents. Similarly, responses to approximately 15 to 20, equally spaced, six-category items can yield accurate EAP estimates of person parameters under static testing conditions. The results also suggest that MML item parameter estimates are quite robust to discrepancies between the prior and true distributions of person parameters. EAP parameter estimates are also fairly robust as long as the item response patterns in question are not too extreme. Finally, 20 quadrature points are generally sufficient to integrate over the prior distribution in both the MML and EAP methods when test and sample characteristics are like those simulated. Thus, the MML/EAP approach to parameter estimation in the general graded unfolding model can produce accurate estimates in an ef.cient manner even when there is uncertainty about the true distribution of person parameters.


Educational Evaluation and Policy Analysis | 1993

Notes: The Validity of the SAT at the University of Hawaii: A Riddle Wrapped in an Enigma

Howard Wainer; Thomas Saka; John R. Donoghue

Hawaii is unique in a variety of ways. One of these is the unusual ethnic mixture that makes up its population; under traditional definitions 76% of its population is “minority” and 24% is “White.” The performance of those of its high school students who go on to the University of Hawaii-Manoa on the SAT-Verbal is higher than the national mean, and on the SAT-Mathematical it is much higher. However, the correlation of SAT scores with first year grades has decreased to almost zero since 1982 among Hawaiian students (although among mainland students at UH it is the same as the national average). In this article we provide the facts for a mystery regarding the low and decreasing validity of the SAT at the University of Hawaii among students from Hawaiian secondary schools. Moreover, while we are unable to provide a complete solution, we do eliminate one onerous suspect and provide an evocative hint.


Applied Psychological Measurement | 1991

An investigation of ordinal true score test theory

John R. Donoghue; Norman Cliff

The validity of the assumptions underlying Cliffs (1989) ordinal true score theory (OTST) were investigated in a three-stage study. OTST makes only ordinal assumptions about the data, and provides a means of converting ordinal item information into summary ordinal information about examinees. Stage 1 was a simulation based on a classical (weak true score) test theory model. Stage 2 used a long empirical test to approximate the true order. Stage 3 was an extensive simulation based on the three-parameter logistic model. The results of all three studies were consistent; the assumption of local ordinal uncorrelatedness was violated in that partial item-item gamma (y) cor relations were positive instead of 0. The assump tion of proportional distribution of ties was violated—pairs tied on one item were not distributed on the other as prescribed. The item- true order tau (τ) correlation was consistently overestimated, although the estimated τ correlated highly with the true τ. The τ correlation between total score and true order was also consistently overestimated. Stage 3 showed that these effects occurred under all conditions, although they were smaller under some conditions.


ETS Research Report Series | 1998

IMPLEMENTING SHAFFER'S MULTIPLE COMPARISON PROCEDURE FOR LARGE NUMBERS OF GROUPS

John R. Donoghue

Shaffer (1986) presented a multiple comparison procedure for paired comparisons that had superior power to many available alternatives. Unfortunately, use of the procedure has been severely limited by the complexity of its implementation. This paper presents a method of allowing Shaffers procedure to be used in a much larger class of problems. Theoretical results concerning an aspect of Shaffers procedure are derived. The use of these results in implementing the procedure is the described. Next, two heuristics are described which greatly enhance the efficiency of the method. Finally, an efficient algorithmic method of implementing Shaffers procedure is outlined. The present work allows Shaffers procedure for multiple comparisons to be applied to large numbers of groups. Unlike earlier work, no assumptions about the testing situation need be made. These methods are illustrated by application to two data sets; comparisons of the order of 11 clustering methods, and comparison of the means of 44 states from the 1994 National Assessment of Educational Progress Trial State Assessment of Reading at Grade 4.


Applied Psychological Measurement | 2018

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

Adrienne Sgammato; John R. Donoghue

When constructed response items are administered repeatedly, “trend scoring” can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart’s Q measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. In the first study, data were generated based on trend scoring tables obtained from an operational assessment. The second study tightly controlled table margins to disentangle certain features present in the empirical data. In addition to Q, the paired t test was included as a comparison, because of its widespread use in monitoring trend scoring. Sample size, number of score categories, interrater agreement, and symmetry/asymmetry of the margins were manipulated. For identical margins, both statistics had good Type I error control. For a unidirectional shift in margins, both statistics had good power. As expected, when shifts in the margins were balanced across categories, the t test had little power. Q demonstrated good power for all conditions and identified almost all items identified by the t test. Q shows substantial promise for monitoring of trend scoring.


Journal of Educational Measurement | 1993

Assessment of Differential Item Functioning for Performance Tasks

Rebecca Zwick; John R. Donoghue; Angela Grima


Education Statistics Quarterly | 1999

State Aid for Undergraduates in Postsecondary Education.

Nancy L. Allen; John R. Donoghue; Terry L. Schoeps

Collaboration


Dive into the John R. Donoghue's collaboration.

Top Co-Authors

Avatar

James E. Laughlin

University of South Carolina

View shared research outputs
Top Co-Authors

Avatar

James S. Roberts

Medical University of South Carolina

View shared research outputs
Top Co-Authors

Avatar

Howard Wainer

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Thomas Saka

United States Department of State

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge