Is this you? Create Your Porfile

Brian E. Clauser

National Board of Medical Examiners

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brian E. Clauser is active.

Explore More

Publication

Featured researches published by Brian E. Clauser.

Health Services Research | 2002

A demonstration of the impact of response bias on the results of patient satisfaction surveys

Kathleen M. Mazor; Brian E. Clauser; Terry S. Field; Robert A. Yood; Jerry H. Gurwitz

OBJECTIVES The purposes of the present study were to examine patient satisfaction survey data for evidence of response bias, and to demonstrate, using simulated data, how response bias may impact interpretation of results. DATA SOURCES Patient satisfaction ratings of primary care providers (family practitioners and general internists) practicing in the context of a group-model health maintenance organization and simulated data generated to be comparable to the actual data. STUDY DESIGN Correlational analysis of actual patient satisfaction data, followed by a simulation study where response bias was modeled, with comparison of results from biased and unbiased samples. PRINCIPAL FINDINGS A positive correlation was found between mean patient satisfaction rating and response rate in the actual patient satisfaction data. Simulation results suggest response bias could lead to overestimation of patient satisfaction overall, with this effect greatest for physicians with the lowest satisfaction scores. CONCLUSIONS Findings suggest that response bias may significantly impact the results of patient satisfaction surveys, leading to overestimation of the level of satisfaction in the patient population overall. Estimates of satisfaction may be most inflated for providers with the least satisfied patients, thereby threatening the validity of provider-level comparisons.

Educational and Psychological Measurement | 1992

The effect of sample size on the functioning of the Mantel-Haenszel statistic

Kathleen M. Mazor; Brian E. Clauser; Ronald K. Hambleton

The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning. Valid results with relatively small numbers of examinees are one of the advantages typically attributed to this procedure. In this study, examinee item responses were simulated to contain differentially functioning items, and then were analyzed at five sample sizes to compare detection rates. Results showed the MH procedure missed 25 to 30% of the differentially functioning items when groups of 2000 were used. When 500 or fewer examinees were retained in each group, more than 50% of the differentially functioning items were missed. The items most likely to be undetected were those which were most difficult, those with a small difference in item difficulty between the two groups, and poorly discriminating items.

Educational and Psychological Measurement | 1994

IDENTIFICATION OF NONUNIFORM DIFFERENTIAL ITEM FUNCTIONING USING A VARIATION OF THE MANTEL-HAENSZEL PROCEDURE

Kathleen M. Mazor; Brian E. Clauser; Ronald K. Hambleton

The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning (DIF). One of the most troublesome criticisms of this procedure is that whereas detection rates for uniform DIF are very good, the procedure is not sensitive to nonuniform DIF. In this study, examinee responses were generated to simulate both uniform and nonuniform DIF. A standard MH procedure was used first. Then, examinees were split into two samples by breaking the full sample at approximately the middle of the test score distribution. The tests were then reanalyzed, first with the low-performing sample and then with the high-performing sample. This variation improved detection rates of nonuniform DIF considerably over the total sample procedure without increasing the Type I error rate. Items with the largest differences in discrimination and difficulty parameters were most likely to be identified.

Applied Psychological Measurement | 2000

Recurrent Issues and Recent Advances in Scoring Performance Assessments

Brian E. Clauser

A conceptual framework is provided to guide the development of scoring procedures for performance assessments. Within this framework are four questions: (1) what aspects of the performance are to be scored, (2) what criteria are to be applied to evaluate the identified aspects or components of the performance, (3) how should the scoring criteria be developed, and (4) how should these criteria be applied? The historical limitations of performance assessments are reviewed and recent efforts to avoid those limitations are discussed in the context of these questions.

Academic Medicine | 2006

Use of the mini-clinical evaluation exercise to rate examinee performance on a multiple-station clinical skills examination: a validity study.

Melissa J. Margolis; Brian E. Clauser; Monica M. Cuddy; Andrea Ciccone; Janet Mee; Polina Harik; Richard E. Hawkins

Background Multivariate generalizability analysis was used to investigate the performance of a commonly used clinical evaluation tool. Method Practicing physicians were trained to use the mini-Clinical Skills Examination (CEX) rating form to rate performances from the United States Medical Licensing Examination Step 2 Clinical Skills examination. Results Differences in rater stringency made the greatest contribution to measurement error; more raters rating each examinee, even on fewer occasions, could enhance score stability. Substantial correlated error across the competencies suggests that decisions about one scale unduly influence those on others. Conclusions Given the appearance of a halo effect across competencies, score interpretations that assume assessment of distinct dimensions of clinical performance should be made with caution. If the intention is to produce a single composite score by combining results across competencies, the presence of these effects may be less critical.

Journal of Educational and Behavioral Statistics | 2002

Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models

David B. Swanson; Brian E. Clauser; Susan M. Case; Ronald J. Nungester; Carol Morrison Featherman

Over the past 25 years a range of parametric and nonparametric methods have been developed for analyzing Differential Item Functioning (DIF). These procedures are typically performed for each item individually or for small numbers of related items. Because the analytic procedures focus on individual items, it has been difficult to pool information across items to identify potential sources of DIF analytically. In this article, we outline an approach to DIF analysis using hierarchical logistic regression that makes it possible to combine results of logistic regression analyses across items to identify consistent sources of DIF, to quantify the proportion of explained variation in DIF coefficients, and to compare the predictive accuracy of alternate explanations for DIF. The approach can also be used to improve the accuracy of DIF estimates for individual items by applying empirical Bayes techniques, with DIF-related item characteristics serving as collateral information. To illustrate the hierarchical logistic regression procedure, we use a large data set derived from recent computer-based administrations of Step 2, the clinical science component of the United States Medical Licensing Examination (USMLE®). Results of a small Monte Carlo study of the accuracy of the DIF estimates are also reported.

Advances in Health Sciences Education | 1999

CLINICAL SKILLS ASSESSMENT WITH STANDARDIZED PATIENTS IN HIGH-STAKES TESTS: A FRAMEWORK FOR THINKING ABOUT SCORE PRECISION, EQUATING, AND SECURITY

David B. Swanson; Brian E. Clauser; Susan M. Case

Over the past decade, there has been a dramatic increase in the use of standardizedpatients (SPs) for assessment of clinical skills in high-stakes testing situations.This paper provides a framework for thinking about three inter-related issues thatremain problematic in high-stakes use of SP-based tests: methods for estimatingthe precision of scores; procedures for placing (equating) scores from different testforms onto the same scale; and threats to the security of SP-based exams. Whilegeneralizability theory is now commonly employed to analyze factors inﬂuencingthe precision of test scores, it is very common for investigators to use designsthat do not appropriately represent the complexity of SP-based test administration.Development of equating procedures for SP-based tests is in its infancy, largelyutilizing methods adapted from multiple-choice testing. Despite the obvious impor-tance of adjusting scores on alternate test forms to reduce measurement error andensure equitable treatment of examinees, equating procedures are not typicallyemployed. Research on security to date has been plagued by serious methodo-logical problems, and procedures that seem likely to aid in maintaining securitytend to increase the complexity of test construction and administration, as well asthe analytic methods required to examine precision and equate scores across testforms. Recommendations are offered for improving research and use of SP-basedassessment in high-stakes tests.Over the past decade, the use of standardized patients (SPs) for assessment ofclinical skills has increased dramatically. In North America, it is now common forSPs to be used in high-stakes tests. Dozens of medical schools have now instituted“Clinical Practice Exams” that students take during their senior year (Associationof American Medical Colleges, 1998). At many of these schools students mustpass these exams to graduate; those who fail are typically assigned to remedialwork before retesting.

Applied Measurement in Education | 2002

Validity Issues for Performance-Based Tests Scored With Computer-Automated Scoring Systems

Brian E. Clauser; Michael T. Kane; David B. Swanson

With the increasing use of automated scoring systems in high-stakes testing, it has become essential that test developers assess the validity of the inferences based on scores produced by these systems. In this article, we attempt to place the issues associated with computer-automated scoring within the context of current validity theory. Although it is assumed that the criteria appropriate for evaluating the validity of score interpretations are the same for tests using automated scoring procedures as for other assessments, different aspects of the validity argument may require emphasis as a function of the scoring procedure. We begin the article with a taxonomy of automated scoring procedures. The presentation of this taxonomy provides a framework for discussing threats to validity that may take on increased importance for specific approaches to automated scoring. We then present a general discussion of the process by which test-based inferences are validated, followed by a discussion of the special issues that must be considered when scoring is done by computer.

Academic Medicine | 2002

The Introduction of Computer-based Case Simulations into the United States Medical Licensing Examination

Gerard F. Dillon; Stephen G. Clyman; Brian E. Clauser; Melissa J. Margolis

In the early to mid-1990s, the National Board of Medical Examiners (NBME) examinations were replaced by the United States Medical Licensing Examination (USMLE). The USMLE, which was designed to have three components or Steps, was administered as a paper-and-pencil test until the late 1990s, when it moved to a computer-based testing (CBT) format. The CBT format provided the opportunity to realize the results of simulation research and development that had occurred during the prior two decades. A milestone in this effort occurred in November 1999 when, with the implementation of the computer-delivered USMLE Step 3 examination, the Primumt Computer-based Case Simulations (CCSs) were introduced. In the year preceding this introduction and the more than two years of operational use since the introduction, numerous challenges have been addressed. Preliminary results of this initial experience have been promising. This paper introduces the relevant issues, describes some pertinent research findings, and identifies next steps for research.

Medical Teacher | 2009

Assessment of medical professionalism: Who, what, when, where, how, and ... why?

Richard E. Hawkins; Peter J. Katsufrakis; Matthew C. Holtman; Brian E. Clauser

Medical professionalism is increasingly recognized as a core competence of medical trainees and practitioners. Although the general and specific domains of professionalism are thoroughly characterized, procedures for assessing them are not well-developed. This article outlines an approach to designing and implementing an assessment program for medical professionalism that begins and ends with asking and answering a series of critical questions about the purpose and nature of the program. The process of exposing an assessment program to a series of interrogatives that comprise an integrated and iterative framework for thinking about the assessment process should lead to continued improvement in the quality and defensibility of that program.

Explore More