Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark R. Raymond is active.

Publication


Featured researches published by Mark R. Raymond.


Applied Measurement in Education | 2001

Job Analysis and the Specification of Content for Licensure and Certification Examinations.

Mark R. Raymond

Although practice analysis (i.e., job analysis) serves as the primary source of evidence supporting the validity of scores from licensure and certification exams, there is surprisingly little consensus in the measurement community regarding suitable methods for conducting job analyses and translating the results into test plans. In the first half of this article, I review general approaches to job analysis and consider methodological issues related to sampling and the development of rating scales used to measure and describe a profession or occupation. In the second half of the article, I evaluate the utility of different types of test plans and describe both judgmental and empirical methods for using practice analysis data to help determine the content, structure, and category weights for test plans. I conclude the article with recommendations for research and practice.


Evaluation & the Health Professions | 1991

Correcting Performance-Rating Errors in Oral Examinations

Mark R. Raymond; Lynn C. Webb; Walter M. Houston

Although oral examinations are widely used for making decisions regarding an individual s level of competence, they are frequently of limited reliability. A significant part of the error in oral performance ratings is due to the tendency for some evaluators to be lenient and others to be stringent in their assignment of ratings. This article describes and evaluates a simple method to identify and correct for errors of leniency and stringency. The method, which is based on a regression model recommended by Wilson (1988), extends and simplifies the procedures recommended by Cason and Cason (1984, 1985). The method provides an estimate of each individuals performance that has been corrected for errors of leniency and stringency. In addition, it produces for each rater an index of leniency or stringency and several other statistics useful in evaluating the properties of rating data. The regression method is applied to performance ratings from three separate administrations of an oral examination in a medical specialty. The results indicate modest but significant levels of leniency and stringency error; correcting for such errors would change the pass/fail decisions for about 6% of the examinees. Limitations of the procedure, as well as the need for additional research, ore discussed.


Academic Medicine | 2011

What new residents do during their initial months of training.

Mark R. Raymond; Janet Mee; Ann King; Steven A. Haist; Marcia L. Winward

Background Studies completed over the past decade suggest the presence of a gap between what students learn during medical school and their clinical responsibilities as first-year residents. The purpose of this survey was to verify on a large scale the responsibilities of residents during their initial months of training. Method Practice analysis surveys were mailed in September 2009 to 1,104 residency programs for distribution to an estimated 8,793 first-year residents. Surveys were returned by 3,003 residents from 672 programs; 2,523 surveys met inclusion criteria and were analyzed. Results New residents performed a wide range of activities, from routine but important communications (obtain informed consent) to complex procedures (thoracentesis), often without the attending physician present or otherwise involved. Conclusions Medical school curricula and the content of competence assessments prior to residency should consider more thorough coverage of the complex knowledge and skills required early in residency.


Educational and Psychological Measurement | 2012

Nominal Weights Mean Equating A Method for Very Small Samples

Ben Babcock; Anthony D. Albano; Mark R. Raymond

The authors introduced nominal weights mean equating, a simplified version of Tucker equating, as an alternative for dealing with very small samples. The authors then conducted three simulation studies to compare nominal weights mean equating to six other equating methods under the nonequivalent groups anchor test design with sample sizes of 20, 50, and 80 examinees. Results showed that nominal weights mean equating was generally the most effective. Nominal weights mean equating was, furthermore, never among the least effective methods in any condition, indicating its utility across a wide variety of contexts. Circle-arc equating, another recently developed method, also showed a great deal of promise. The identity function (i.e., no equating) was adequate only when test forms were nearly equivalent in difficulty.


Applied Psychological Measurement | 2011

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors of Performance Ratings.

Mark R. Raymond; Polina Harik; Brian E. Clauser

Prior research indicates that the overall reliability of performance ratings can be improved by using ordinary least squares (OLS) regression to adjust for rater effects. The present investigation extends previous work by evaluating the impact of OLS adjustment on standard errors of measurement (SEM) at specific score levels. In addition, a cross-validation (i.e., resampling) design was used to determine the extent to which any improvements in measurement precision would be realized for new samples of examinees. Conditional SEMs were largest for scores toward the low end of the score distribution and smallest for scores at the high end. Conditional SEMs for adjusted scores were consistently less than conditional SEMs for observed scores, although the reduction in error was not uniform throughout the distribution. The improvements in measurement precision held up for new samples of examinees at all score levels.


Academic Medicine | 2011

Evaluating Construct Equivalence and Criterion-related Validity for Repeat Examinees on a Standardized Patient Examination

Mark R. Raymond; Nilufer Kahraman; Kimberly A. Swygert; Kevin P. Balog

Purpose Prior studies report large score gains for examinees who fail and later repeat standardized patient (SP) assessments. Although research indicates that score gains on SP exams cannot be attributed to memorizing previous cases, no studies have investigated the empirical validity of scores for repeat examinees. This report compares single-take and repeat examinees in terms of both internal (construct) validity and external (criterion-related) validity. Method Data consisted of test scores for examinees who took the United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam between July 16, 2007, and September 12, 2009. The sample included 12,090 examinees who completed Step 2 CS on one occasion and another 4,030 examinees who completed the exam on two occasions. The internal measures included four separately scored performance domains of the Step 2 CS examination, whereas the external measures consisted of scores on three written assessments of medical knowledge (Step 1, Step 2 clinical knowledge, and Step 3). The authors subjected the four Step 2 CS domains to confirmatory factor analysis and evaluated correlations between Step 2 CS scores and the three written assessments for single-take and repeat examinees. Results The factor structure for repeat examinees on their first attempt was markedly different from the factor structure for single-take examinees, but it became more similar to that for single-take examinees by their second attempt. Scores on the second attempt correlated more highly with all three external measures. Conclusions The findings support the validity of scores for repeat examinees on their second attempt.


Evaluation & the Health Professions | 2010

The Second Time Around: Accounting for Retest Effects on Oral Examinations

Mark R. Raymond; Ulana A. Luciw-Dubas

Years of research with high-stakes written tests indicates that although repeat examinees typically experience score gains between their first and subsequent attempts, their pass rates remain considerably lower than pass rates for first-time examinees. This outcome is consistent with expectations. Comparable studies of the performance of repeat examinees on oral examinations are lacking. The current research evaluated pass rates for more than 50,000 examinees on written and oral exams administered by six medical specialty boards for several recent years. Pass rates for first-time examinees were similar for both written and oral exams, averaging about 84% across all boards. Pass rates for repeat examinees on written exams were expectedly lower, ranging from 22% to 51%, with an average of 36%. However, pass rates for repeat examinees on oral exams were markedly higher than for written exams, ranging from 53% to 77%, with an average of 65%. Four explanations for the elevated repeat pass rates on oral exams are proposed, including an increase in examinee proficiency, construct-irrelevant variance, measurement error (score unreliability), and memorization of test content. Simulated data are used to demonstrate that roughly one third of the score increase can be explained by measurement error alone. The authors suggest that a substantial portion of the score increase can also likely be attributed to construct-irrelevant variance. Results are discussed in terms of their implications for making pass—fail decisions when retesting is allowed. The article concludes by identifying areas for future research.


Advances in Health Sciences Education | 2012

Measurement Precision for Repeat Examinees on a Standardized Patient Examination.

Mark R. Raymond; Kimberly A. Swygert; Nilufer Kahraman

Examinees who initially fail and later repeat an SP-based clinical skills exam typically exhibit large score gains on their second attempt, suggesting the possibility that examinees were not well measured on one of those attempts. This study evaluates score precision for examinees who repeated an SP-based clinical skills test administered as part of the US Medical Licensing Examination sequence. Generalizability theory was used as the basis for computing conditional standard errors of measurement (SEM) for individual examinees. Conditional SEMs were computed for approximately 60,000 single-take examinees and 5,000 repeat examinees who completed the Step 2 Clinical Skills Examination® between 2007 and 2009. The study focused exclusively on ratings of communication and interpersonal skills. Conditional SEMs for single-take and repeat examinees were nearly indistinguishable across most of the score scale. US graduates and IMGs were measured with equal levels of precision at all score levels, as were examinees with differing levels of skill speaking English. There was no evidence that examinees with the largest score changes were measured poorly on either their first or second attempt. The large score increases for repeat examinees on this SP-based exam probably cannot be attributed to unexpectedly large errors of measurement.


Advances in Health Sciences Education | 2010

The impact of statistical adjustment on conditional standard errors of measurement in the assessment of physician communication skills

Mark R. Raymond; Brian E. Clauser; Gail E. Furman

The use of standardized patients to assess communication skills is now an essential part of assessing a physician’s readiness for practice. To improve the reliability of communication scores, it has become increasingly common in recent years to use statistical models to adjust ratings provided by standardized patients. This study employed ordinary least squares regression to adjust ratings, and then used generalizability theory to evaluate the impact of these adjustments on score reliability and the overall standard error of measurement. In addition, conditional standard errors of measurement were computed for both observed and adjusted scores to determine whether the improvements in measurement precision were uniform across the score distribution. Results indicated that measurement was generally less precise for communication ratings toward the lower end of the score distribution; and the improvement in measurement precision afforded by statistical modeling varied slightly across the score distribution such that the most improvement occurred in the upper-middle range of the score scale. Possible reasons for these patterns in measurement precision are discussed, as are the limitations of the statistical models used for adjusting performance ratings.


Academic Medicine | 2009

Measurement precision of spoken English proficiency scores on the USMLE step 2 clinical skills examination.

Mark R. Raymond; Brian E. Clauser; Kimberly A. Swygert; Marta van Zanten

Background Previous research has shown that ratings of English proficiency on the United States Medical Licensing Examination Clinical Skills Examination are highly reliable. However, the score distributions for native and nonnative speakers of English are sufficiently different to suggest that reliability should be investigated separately for each group. Method Generalizability theory was used to obtain reliability indices separately for native and nonnative speakers of English (N = 29,084). Conditional standard errors of measurement were also obtained for both groups to evaluate measurement precision for each group at specific score levels. Results Overall indices of reliability (phi) exceeded 0.90 for both native and nonnative speakers, and both groups were measured with nearly equal precision throughout the score distribution. However, measurement precision decreased at lower levels of proficiency for all examinees. Conclusions The results of this and future studies may be helpful in understanding and minimizing sources of measurement error at particular regions of the score distribution.

Collaboration


Dive into the Mark R. Raymond's collaboration.

Top Co-Authors

Avatar

Steven A. Haist

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Kimberly A. Swygert

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Nilufer Kahraman

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Brian E. Clauser

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Janet Mee

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

S. Deniz Bucak

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Amy Morales

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Andre F. De Champlain

National Board of Medical Examiners

View shared research outputs
Top Co-Authors

Avatar

Anthony D. Albano

University of Nebraska–Lincoln

View shared research outputs
Top Co-Authors

Avatar

Gail E. Furman

National Board of Medical Examiners

View shared research outputs
Researchain Logo
Decentralizing Knowledge