David B. Swanson
American Board of Internal Medicine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David B. Swanson.
Medical Education | 1988
D. I. Newble; David B. Swanson
Summary. The objective structured clinical examination (OSCE) is increasingly being used as a method of clinical assessment yet its measurement characteristics have not been well documented. Evidence is accumulating that many OSCEs may be too short to achieve reliable results. This paper reports detailed psychometric analyses of OSCEs which were administered as part of a well‐established final‐ year examination. Generalizability theory guided investigation of test reliability.
Assessment & Evaluation in Higher Education | 1987
David B. Swanson; John J. Norcini; Louis J. Grosso
ABSTRACT Written and computer‐based clinical simulations have been used in the health professions to assess aspects of clinical competence for many years. However, this review of the dozens of studies of their psychometric characteristics fmds little evidence to justify their continued use. While studies of the fidelity of simulations have demonstrated that examinees feel they are realistic and have good face validity, reliability studies have repeatedly shown that scores are too imprecise for meaningful interpretation, unless unpractically large numbers of simulations are included in a test. Validity studies have demonstrated that simulations have the expected relationships with a host of criterion measures, but it appears that similar assessment information can be obtained using clinically‐oriented multiple choice questions in much less testing time. Some common methodological weaknesses in study design and analysis are identified, and some research directions are suggested to improve the psychometric c...
Medical Education | 1985
John J. Norcini; David B. Swanson; Louis J. Grosso; George D. Webster
Summary. Despite a lack of face validity, there continues to be heavy reliance on objective paper‐and‐pencil measures of clinical competence. Among these measures, the most common item formats are patient management problems (PMPs) and three types of multiple choice questions (MCQs): one‐best‐answer (A‐types); matching questions (M‐types); and multiple true/false questions (X‐types). The purpose of this study is to compare the reliability, validity and efficiency of these item formats with particular focus on whether MCQs and PMPs measure different aspects of clinical competence. Analyses revealed reliabilities of 0.72 or better for all item formats; the MCQ formats were most reliable. Similarly, efficiency analyses (reliability per unit of testing time) demonstrated the superiority of MCQs. Evidence for validity obtained through correlations of both programme directors’ ratings and criterion group membership with item format scores also favoured MCQs. More important, however, is whether MCQs and PMPs measure the same or different aspects of clinical competence. Regression analyses of the scores on the validity measures (programme directors’ ratings and criterion group membership) indicated that MCQs and PMPs seem to be measuring predominantly the same thing. MCQs contribute a small unique variance component over and above PMPs, while PMPs make the smallest unique contribution. As a whole, these results indicate that MCQs are more efficient, reliable and valid than PMPs.
Teaching and Learning in Medicine | 1989
David B. Swanson; John J. Norcini
Psychometric studies of assessment with standardized patients (SPs) have established that substantial testing time is required to obtain reproducible scores. Using data drawn from two recent large‐scale studies, this article explores the reasons behind this result and suggests several strategies for reducing testing‐time requirements. Case‐specificity—inconsistency of an examinees performance over cases—appears to be the major source of measurement error in SP‐based tests. Use of multiple SPs to play the same case role for different examinees does not affect reproducibility of scores at typical test lengths. Similarly, low‐to‐moderate levels of interrater agreement do not markedly affect score reproducibility, as long as a reasonably large number of cases are included in an assessment. Often, SP‐based assessment can be viewed within a mastery‐testing framework, where reproducibility of pass‐fail decisions, rather than scores, is of primary importance. Testing‐time requirements within a mastery‐testing fr...
Journal of General Internal Medicine | 1990
Susan C. Day; Louis J. Grosso; John J. Norcini; Linda L. Blank; David B. Swanson; Muriel H. Horne
Objective:To determine the methods of evaluation used routinely by training programs and to obtain information concerning the frequencies with which various evaluation methods were used.Design:Survey of residents who had recently completed internal medicine training.Participants:5,693 respondents who completed residencies in 1987 and 1988 and were registered as first-time takers for the 1988 Certifying Examination in Internal Medicine. This constituted a 76% response rate.Main results:Virtually all residents were aware that routine evaluations were submitted on inpatient rotations, but were more uncertain about the evaluation process in the outpatient setting and the methods used to assess their bumanistic qualities. Most residents had undergone a Clinical Evaluation Exercise (CEX); residents’ clinical skills were less likely to be evaluated by direct observation of history or physical examination skills. Resident responses were aggregated within training programs to determine the pattern of evaluation across programs. The majority of programs used Advanced Cardiac Life Support (ACLS) certification, medical record audit, and the national In-Training Examination to assess most of their residents. Performance-based tests were used selectively by a third or more of the programs. Breast and pelvic examination skills and ability to perform sigmoidoscopy were thought not to be adequately assessed by the majority of residents in almost half of the programs.Conclusions:While most residents are receiving routine evaluation, including a CEX, increased efforts to educate residents about their evaluation system, to strengthen evaluation in the outpatient setting, and to evaluate certain procedural skills are recommended.
Medical Education | 1989
C.P.M. van der Vleuten; S. J. Van Luyk; A. M. J. Van Ballegooijen; David B. Swanson
Summary. Variation in the accuracy of examiner judgements is a source of measurement error in performance‐based tests. In previous studies using doctor subjects, examiner training yielded marginal or no improvement in the accuracy of examiner judgments. This study reports an experiment on accuracy of scoring in which provision of training and background of examiners are systematically varied. Experienced teaching staff, medical students and lay subjects were randomly assigned to either training or no‐training groups. Using detailed behavioural check‐lists, they subsequently scored videotaped performance on two clinical cases, and accuracy of their judgments was appraised.
Evaluation & the Health Professions | 1984
John J. Norcini; David B. Swanson; Louis J. Grosso; Judy A. Shea; George D. Webster
This study compares the reliability, validity, and efficiency of three multiple-choice question (MCQs) ability scales with patient management problems (PMPs). Data are from the 1980, 1981, and 1982 American Board of Internal Medicine Certifying Examinations. The MCQ ability scales were constructed by classifying the one best answer and multiple-true/false questions in each examination as measuring predominantly clinical judgment, synthesis, or knowledge. Clinical judgment items require prioritizing or weighing management decisions; synthesis items require the integration of findings into a diagnostic decision; and knowledge items stress recall of factual information. Analyses indicate that the MCQ ability scales are more reliable and valid per unit of testing time than are PMPs and that clinical judgment and synthesis scales are slightly more correlated with PMPs than is the knowledge scale. Additionally, all MCQ ability scales seem to be measuring the same aspects of competence as PMPs.
Journal of Instructional Development | 1985
Geoffrey R. Norman; Linda J. Muzzin; Reed G. Williams; David B. Swanson
ConclusionsContent specificity seems to be a fundamental problem in assessing clinical problem solving ability with simulations. Even with very high fidelity computer simulations, it can be anticipated that correlations between performance on different cases will be low. This seems to be a characteristic of problem solving in real clinical life, so it can be expected on simulations as well. It is necessary to use large numbers of cases to adequately assess problem solving ability. Measuring performance on a single case with more fidelity and accuracy can well result in a less valid test because more testing time is usually reteaching and evaluation specific technical or procedural skills, although further research is necessary to determine those characteristics of the simulation which result in effective transfer to the clinical setting. Finally, computer simulations have had an uncertain role in the past, but the evolution of new technology holds great promise for the future.
Teaching and Learning in Medicine | 1989
John J. Norcini; David B. Swanson
Review of the literature indicates that a major impediment to using written simulations is the large number of cases required to achieve an acceptable level of reproducibility or reliability. This article describes some of the factors affecting the reproducibility of simulation scores (and thus test length requirements) and identifies their impact. It concentrates on four factors affecting the reproducibility of simulations that assess a single skill: (a) score interpretation, (b) skill characteristics, (c) examinee characteristics, and (d) the scaling of scores. With few exceptions, score interpretation, the characteristics of the skill, and the characteristics of the examinees are not under the test developers control. Once the purpose of measurement is fixed, so are most of these factors. On the other hand, it is often possible to focus cases without trivializing them or hurting the representativeness of the examination. It is also possible to apply item response theory to simulations and take advanta...
annual symposium on computer application in medical care | 1983
David B. Swanson; George D. Webster
We will demonstrate a computer simulation of the clinical encounter developed in conjunction with the Computer-Based Examination (CBX) Project at the American Board of Internal Medicine (ABIM). Over the past ten years, this research and development project has investigated the potential of computer-based testing for use in the ABIM Certifying Examination in internal medicine.