Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ronald J. Nungester is active.

Publication


Featured researches published by Ronald J. Nungester.


Journal of Educational and Behavioral Statistics | 2002

Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models

David B. Swanson; Brian E. Clauser; Susan M. Case; Ronald J. Nungester; Carol Morrison Featherman

Over the past 25 years a range of parametric and nonparametric methods have been developed for analyzing Differential Item Functioning (DIF). These procedures are typically performed for each item individually or for small numbers of related items. Because the analytic procedures focus on individual items, it has been difficult to pool information across items to identify potential sources of DIF analytically. In this article, we outline an approach to DIF analysis using hierarchical logistic regression that makes it possible to combine results of logistic regression analyses across items to identify consistent sources of DIF, to quantify the proportion of explained variation in DIF coefficients, and to compare the predictive accuracy of alternate explanations for DIF. The approach can also be used to improve the accuracy of DIF estimates for individual items by applying empirical Bayes techniques, with DIF-related item characteristics serving as collateral information. To illustrate the hierarchical logistic regression procedure, we use a large data set derived from recent computer-based administrations of Step 2, the clinical science component of the United States Medical Licensing Examination (USMLE®). Results of a small Monte Carlo study of the accuracy of the DIF estimates are also reported.


Archive | 2000

Computer-Adaptive Sequential Testing

Richard M. Luecht; Ronald J. Nungester

This chapter describes a framework for the large scale production and administration of computerized tests called computer-adaptive sequential testing or CAST (Luecht, Nungester & Hadadi, 1996; Luecht, 1997; Luecht & Nungester, 1998). CAST integrates test design, test assembly, test administration, and data management components in a comprehensive manner intended to support the mass production of secure, high quality, parallel test forms over time. The framework is a modular approach to testing which makes use of modern psychometric and computer technologies. CAST was originally conceived as a test design methodology for developing computer-adaptive versions of the United States Medical Licensing ExaminationTM (USMLETM) Steps (Federation of State Medical Boards and National Board of Medical Examiners, 1999). The USMLE Steps are high stakes examinations used to evaluate the medical knowledge of candidate physicians as part of the medical licensure process in the U.S.. Although the Steps are primarily mastery tests, used for making pass/fail licensure decisions, total test scores and discipline-based subscores are also reported. Therefore, an adaptivetesting component is also attractive to economize on test length, while maximizing score precision. The CAST framework has been successfully used in two empirical computerized field trials for the USMLE Step 1 and 2 examinations (NBME, 1996, 1997; Case, Luecht and Swanson, 1998; Luecht, Nungester, Swanson and Hadadi, 1998; Swanson, Luecht, Gessaroli, and Nungester, 1998). In addition, CAST has undergone rather extensive simulation research with positive outcomes (Luecht, Nungester and Hadadi, 1996; Luecht and Nungester, 1998). In terms of USMLE, what CAST offers is a comprehensive means to develop mastery tests, with adaptive capabilities. CAST also helps


Academic Medicine | 1991

Phase-in of the NBME comprehensive Part I examination.

David B. Swanson; S M Case; P R Kelley; J L Lawley; Ronald J. Nungester; R D Powell; Volle Rl

No abstract available.


Academic Medicine | 2001

Classification accuracy for tests that allow retakes.

Brian E. Clauser; Ronald J. Nungester

When tests are used to make classification decisions, estimates of decision consistency and false-positive and false-negative error rates may be more appropriate than reliability as a means of characterizing precision. The decision-theoretic framework that supports this approach is not new, and numerous authors have recommended indices for use in this context. However, one aspect of this framework that has particular significance for licensure and certification testing and yet has received relatively little attention is the impact of retakes. When an examinee receives a passing score on a licensure test, a final classification is typically made. This may be a correct classification or it may be incorrect. (In this paper, when a non-proficient examinee is classified as proficient, it will be referred to as a false-positive error.) When an examinee receives a failing score, the examinee will usually be given the opportunity to repeat the test. An ultimate failing classification will occur only when the examinee has given up or exhausted the allowable retake opportunities. Millman described this effect and noted that this process provides necessary protection for the examinee. The examinee’s proficiency is not measured without error. When a proficient examinee fails a licensure examination due to measurement error, it is appropriate that the examinee should have an additional opportunity to demonstrate proficiency. However, providing additional opportunities for testing not only corrects errors that would penalize proficient examinees but also creates errors that favor non-proficient examinees. These errors may put the public at risk by allowing unqualified candidates to become licensed or certified. This paper presents a theoretical framework describing the factors that influence classification error rates over multiple administrations of a test. It then considers some of the strategies that are available for controlling the false-positive error rate. The main emphasis of the paper is to provide the reader with a realistic sense of the magnitude of the inflation of false-positive errors that may result when retakes are allowed and to provide a framework for considering what, if anything, should be done to control this inflation. A starting point for this discussion is to ask whether allowing retakes has a significant impact on the classification error rate that occurs under typical practice conditions. Consider the example of a test with a reliability of .92, applied to a normally distributed examinee group where 10% of examinees are non-proficient. If the cut-score is established to fail 10% of examinees (in the population), with a single administration approximately 2% of all examinees might be expected to be misclassified as proficient. (Since 10% of the examinees are non-proficient, this is a false-positive rate of 20%). After two retakes this rate is doubled. A false-positive rate this high based on a single administration would be expected from a test with a reliability of approximately .69. If the primary purpose of a licensure examination is protection of the public from non-proficient practitioners, it is clear that, in the circumstances described in this example, retakes do significantly impact the classification error rate, and thus the effectiveness of the test.


Advances in Health Sciences Education | 1998

Maintaining Content Validity in Computerized Adaptive Testing

R. M. Luecht; A.F. De Champlain; Ronald J. Nungester

A major advantage of using computerized adaptive testing (CAT) is improved measurement efficiency; better score reliability or mastery decisions can result from targeting item selections to the abilities of examinees. However, this type of engineering solution can result in differential content for different examinees at various levels of ability. This paper empirically demonstrates some of the trade-offs which can occur when content balancing is imposed in CAT forms or conversely, when it is ignored. That is, the content validity of a CAT form can actually change across a score scale when content balancing is ignored. On the other hand, efficiency and score precision can be severely reduced by over specifying content restrictions in a CAT form. The results from two simulation studies are presented as a means of highlighting some of the trade-offs that could occur between content and statistical considerations in CAT form assembly.


Academic Medicine | 1997

An evaluation of the Rasch model for equating multiple forms of a performance assessment of physicians' patient-management skills.

Brian E. Clauser; Linette P. Ross; Ronald J. Nungester; Stephen G. Clyman

No abstract available.


Academic Medicine | 1992

Results of the initial administrations of the NBME comprehensive Part I and Part II examinations.

D F Becker; David B. Swanson; Susan M. Case; Ronald J. Nungester

No abstract available.


Archive | 1997

Regression-Based Weighting of Items on Standardized Patient Checklists

Brian E. Clauser; Melissa J. Margolis; Linette P. Ross; Ronald J. Nungester; Daniel J. Klass

The use of checklists for scoring standardized patient evaluations reduces the rating task to recording whether a defined behaviour was or was not displayed. Although this may enhance objectivity, checklist-based scoring has the potential limitation that it may fail to account for the complexity of the judgment process used by experts. Minimally, it is clear that experts may consider the behaviours represented by some checklist items to be more important than others. The research described in this paper examines the potential to increase the correspondence between checklist scores and clinician ratings by weighting checklist items using regression-derived item weights. Results show that the expected increase in correspondence between checklist scores and clinician ratings occurred for all cases in which the correlation between the unweighted checklist scores and the ratings was less than. 92. Cross-validation of the results with an independent set of ratings is provided as are generaliz-ability analyses of the ratings, weighted, and unweighted scores.


Archive | 1997

Using the Rasch Model to Equate Alternate Forms for Performance Assessments of Physician’s Clinical Skills

Brian E. Clauser; Linette P. Ross; R. M. Luecht; Ronald J. Nungester; Stephen G. Clyman

In circumstances where performance assessments of physician’s clinical skills are used to make important promotional or curric-ular decisions, it may be necessary to produce multiple equivalent forms of the assessment. Relatively little has been reported regarding appropriate methods for establishing equivalence across such forms. The purpose of this paper is to examine the potential usefulness of the Rasch model for equating forms of a computer-based simulation of physicians’ clinical skills. In addition to assessing the fit of the model to the test data, the paper provides a comparison of the Rasch model to other approaches that have been used to equate clinical skills assessments (e.g., standardized-patient-based examinations). The potential advantages of the Rasch model are discussed as are considerations for its application.


JAMA | 1995

Performance on the NBME Part I Examination-Reply

Beth Dawson; Carrolyn K. Iwamoto; Linette Postell Ross; Ronald J. Nungester; David B. Swanson; Robert L. Volle

In Reply. —The statement referenced by Dr Nickens follows our description of various programs under way at some medical schools to assist students with relatively poor academic backgrounds and our subsequent suggestion that research investigating the effectiveness of these programs is needed. Some programs focus on helping students improve skills so they become more successful candidates for medical school. For instance, over the past 20 years, more than 600 students have participated in the Medical/Dental Education Preparatory Program at Southern Illinois University School of Medicine, and 60% of them have been admitted to health professions schools. Other programs, such as the one at the University of Hawaii John A. Burns School of Medicine, work with students to improve study skills after they have matriculated. We recognize thatProject 3000 by 20001aims to increase the number of minority students actually enrolled in medical schools and did not intend to

Collaboration


Dive into the Ronald J. Nungester's collaboration.

Top Co-Authors

Avatar

Brian E. Clauser

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

David B. Swanson

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Linette P. Ross

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Stephen G. Clyman

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Melissa J. Margolis

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Beth Dawson

Southern Illinois University Carbondale

View shared research outputs
Top Co-Authors

Avatar

Daniel J. Klass

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Douglas R. Ripkey

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

R. M. Luecht

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Richard M. Luecht

University of North Carolina at Greensboro

View shared research outputs
Researchain Logo
Decentralizing Knowledge