Kathleen Z. Holtzman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kathleen Z. Holtzman is active.

Explore More

Publication

Featured researches published by Kathleen Z. Holtzman.

Academic Medicine | 2009

Relationship between performance on part I of the American Board of Orthopaedic Surgery Certifying Examination and Scores on USMLE Steps 1 and 2.

David B. Swanson; Amy Sawhill; Kathleen Z. Holtzman; S. Deniz Bucak; Carol Morrison; Shepard R. Hurwitz; G. Paul DeRosa

Background This study investigated the strength of the relationship between performance on Part I of the American Board of Orthopaedic Surgery (ABOS) Certifying Examination and scores on United States Medical Licensing Examination (USMLE) Steps 1 and 2. Method USMLE Step 1 and Step 2 scores on first attempt were matched with ABOS Part I results for U.S./Canadian graduates taking Part I for the first time between 2002 and 2006. Linear and logistic regression analyses investigated the relationship between ABOS Part I performance and scores on USMLE Step 1 and 2. Results Step 1 and Step 2 individually each explained 29% of the variation in Part I scores; using both scores increased this percentage to 34%. Results of logistic regression analyses showed a similar, moderately strong relationship with Part I pass/fail outcomes: Examinees with low scores on Steps 1 and 2 were at substantially greater risk for failing Part I. Conclusions There is continuing empirical support for use of Step 1 and Step 2 scores in selection of residents to interview for orthopedics residency positions.

Academic Medicine | 2006

Psychometric characteristics and response times for content-parallel extended-matching and one-best-answer items in relation to number of options.

David B. Swanson; Kathleen Z. Holtzman; Krista Allbee; Brian E. Clauser

Background This study investigated the impact of item format and number of options on the psychometric characteristics (p values and biserials) and response times for multiple-choice questions (MCQs) appearing on Step 2 of the United States Medical Licensing Examination. Method In all, 192 MCQ items were used in the study. Each item was presented in two formats: in a two-item extended-matching set and as an independent item. For the extended matching format, there were two versions: a base version that included all options (10 to 26) and an 8-option version. For the independent-item format, there were three versions: a base version that included all options, and 8-option and 5-option versions created by a group of physicians that selected options without information about examinee performance. All items were embedded in unscored sections of the 2005–06 Step 2 test forms. Results Versions of items with more options were harder and required more testing time; no differences in item discrimination were observed. Mean response times for items presented in the extended-matching format were lower than for those presented as independent items, primarily because of shorter response times for the second item presented in a set. Conclusion Use of the extended-matching format and smaller numbers of options per item (and more items) should result in more efficient use of testing time and greater score precision per unit of testing time.

Academic Medicine | 2005

Psychometric characteristics and response times for one-best-answer questions in relation to number and source of options.

David B. Swanson; Kathleen Z. Holtzman; Brian E. Clauser; Amy Sawhill

Background The research reported here investigated the impact of number and source of response options on the psychometric characteristics and response times for one-best-answer MCQs. Method Ninety sets of MCQs were used in two studies; numbers of options in base versions of items ranged from 11 to 25. For each set, a United States Medical Licensing Examination Step 2 item-writing committee selected the five options viewed as most appropriate. For 40 used sets, two NBME staff constructed five- and eight-option versions to maximize item discrimination. All versions of items were embedded unscored on 2003–04 Step 2 test forms. Results Versions of items with more options were harder and required more testing time; no differences in item discrimination were observed in either study, but previous versions of the items in extended matching format were more discriminating than those used in the study. Conclusion Use of smaller numbers of options (and more items) results in more efficient use of testing time.

Academic Medicine | 2008

Measurement characteristics of content-parallel single-best-answer and extended-matching questions in relation to number and source of options.

David B. Swanson; Kathleen Z. Holtzman; Krista Allbee

Background Previous research showed that extended-matching questions (EMQs) with eight options per set resulted in better score precision than EMQs with larger numbers of options or independent single-best-answer items (A-type) with five options. This study extends previous work using smaller numbers of options. Method Ninety-six questions were presented in two formats on United States Medical Licensing Examination Step 2: as two-item EMQ sets and as independent A-types. Four versions of EMQs were used: five- and eight-option versions with options selected using statistics, and five- and eight-option versions with options selected by physicians. Seven A-type versions were used: three-, four-, five-, and eight-option versions with options selected using statistics, and three-, four-, and five-option versions with options selected by physicians. Results Items with more options were harder, required more time to complete, and had similar item discrimination. Option sets selected by physicians were easier, slightly more discriminating, and required less testing time. Conclusions A-types with four or five options and EMQs with eight options make more efficient use of testing time. Provision of response statistics to content experts does not seem necessary to guide option selection.

Journal of Bone and Joint Surgery, American Volume | 2013

Utility of AAOS OITE scores in predicting ABOS Part I outcomes: AAOS exhibit selection.

David B. Swanson; J. Lawrence Marsh; Shepard R. Hurwitz; G. Paul DeRosa; Kathleen Z. Holtzman; S. Deniz Bucak; Amy Baker; Carol Morrison

BACKGROUND Residency programs commonly use performance on the Orthopaedic In-Training Examination (OITE) developed by the American Academy of Orthopaedic Surgeons (AAOS) to identify residents who are lagging behind their peers and at risk for failing Part I of the American Board of Orthopaedic Surgery (ABOS) Certifying Examination. This study was designed to investigate the utility of the OITE score as a predictor of ABOS Part I performance. METHOD Results for 3132 examinees who took Part I of the ABOS examination for the first time from 2002 to 2006 were matched with records from the 1997 to 2006 OITE tests; at least one OITE score was located for 2852 (91%) of the ABOS Part I examinees. After OITE performance was rescaled to place scores from different test years on comparable scales, descriptive statistics and correlations between ABOS and OITE scores were computed, and regression analyses were conducted to predict ABOS results from OITE performance. RESULTS Substantial increases in the mean OITE score were observed as residents progressed through training. Stronger correlations were observed between OITE and ABOS performance during later years in training, reaching a maximum of 0.53 in years 3 and 4. Logistic regression results indicated that residents with an OITE score below the 10th percentile were much more likely to fail Part I compared with those with an OITE score above the 50th percentile. CONCLUSIONS OITE performance was a good predictor of the ABOS score and pass-fail outcome; the OITE can be used effectively for early identification of residents at risk for failing the ABOS Part I examination.

Academic Medicine | 2014

International variation in performance by clinical discipline and task on the United States medical licensing examination step 2 clinical knowledge component.

Kathleen Z. Holtzman; David B. Swanson; Wenli Ouyang; Gerard F. Dillon; John R. Boulet

Purpose To investigate country-to-country variation in performance across clinical science disciplines and tasks for examinees taking the Step 2 Clinical Knowledge (CK) component of the United States Medical Licensing Examination. Method In 2012 the authors analyzed demographic characteristics, total scores, and percent-correct clinical science discipline and task scores for more than 88,500 examinees taking Step 2 CK for the first time during the 2008–2010 academic years. For each examinee and score, differences between the score and the mean performance of examinees at U.S. MD-granting medical schools were calculated, and mean differences by country of medical school were tabulated for analysis of country-to-country variation in performance by clinical discipline and task. Results Controlling for overall performance relative to U.S. examinees, results showed that international medical graduates (IMGs) performed best in Surgery and worst in Psychiatry for clinical discipline scores; for clinical tasks, IMGs performed best in Understanding Mechanisms of Disease and worst in Promoting Preventive Medicine and Health Maintenance. The pattern of results was strongest for IMGs attending schools in the Middle East and Australasia, present to a lesser degree for IMGs attending schools in Europe, and absent for IMGs attending Caribbean medical schools. Conclusions Country-to-country differences in relative performance were present for both clinical discipline and task scores. Possible explanations include differences in learning outcomes, curriculum emphasis and clinical experience, standards of care, and culture, as well as the effects of English as a second language and relative emphasis on preparing students to take the Step 2 CK exam.

Medical Teacher | 2010

Collaboration across the pond: The multi-school progress testing project

Dave Swanson; Kathleen Z. Holtzman; Aggie Butler; Michelle M. Langer; M. V. Nelson; J. W. M. Chow; Richard Fuller; John Patterson; Margaret Boohan

This collaborative project between the National Board of Medical Examiners and four schools in the UK is investigating the feasibility and utility of a cross-school progress testing program drawing on test material recently retired from the United States Medical Licensing Examination (USMLE) Step 2 Clinical Knowledge (CK) examination. This article describes the design of the progress test; the process used to build, translate (localize), review, and finalize test forms; the approach taken to (web-based) test administration; and the procedure used to calculate and report scores. Results to date have demonstrated that it is feasible to use test items written for the US licensing examination as a base for developing progress test forms for use in the UK. Some content areas can be localized more readily than others, and care is clearly needed in review and revision of test materials to ensure that it is clinically appropriate and suitably phrased for use in the UK. Involvement of content experts in review and vetting of the test material is essential, and it is clearly desirable to supplement expert review with the use of quality control procedures based on the item statistics as a final check on the appropriateness of individual test items.

Medical Teacher | 2010

Progress testing in clinical science education: results of a pilot project between the National Board of Medical Examiners and a US Medical School.

Andrė F. De Champlain; Monica M. Cuddy; Peter V. Scoles; Marie Brown; David B. Swanson; Kathleen Z. Holtzman; Aggie Butler

Background: Though progress tests have been used for several decades in various medical education settings, a few studies have offered analytic frameworks that could be used by practitioners to model growth of knowledge as a function of curricular and other variables of interest. Aim: To explore the use of one form of progress testing in clinical education by modeling growth of knowledge in various disciplines as well as by assessing the impact of recent training (core rotation order) on performance using hierarchical linear modeling (HLM) and analysis of variance (ANOVA) frameworks. Methods: This study included performances across four test administrations occurring between July 2006 and July 2007 for 130 students from a US medical school who graduated in 2008. Measures-nested-in-examinees HLM growth curve analyses were run to estimate clinical science knowledge growth over time and repeated measures ANOVAs were run to assess the effect of recent training on performance. Results: Core rotation order was related to growth rates for total and pediatrics scores only. Additionally, scores were higher in a given discipline if training had occurred immediately prior to the test administration. Conclusions: This study provides a useful progress testing framework for assessing medical students’ growth of knowledge across their clinical science education and the related impact of training.

Academic Medicine | 2004

Using the NBME self-assessments to project performance on USMLE Step 1 and Step 2: impact of test administration conditions.

Amy Sawhill; Aggie Butler; Douglas R. Ripkey; David B. Swanson; Raja Subhiyah; John Thelman; William Walsh; Kathleen Z. Holtzman; Kathy Angelucci

Problem Statement and Background. This study examined the extent to which performance on the NBME® Comprehensive Basic Science Self-Assessment (CBSSA) and NBME Comprehensive Clinical Science Self-Assessment (CCSSA) can be used to project performance on USMLE Step 1 and Step 2 examinations, respectively. Method. Subjects were 1,156 U.S./Canadian medical students who took either (1) the CBSSA and Step 1, or (2) the CCSSA and Step 2, between April 2003 and January 2004. Regression analyses examined the relationship between each self-assessment and corresponding USMLE Step as a function of test administration conditions. Results. The CBSSA explained 62% of the variation in Step 1 scores, while the CCSSA explained 56% of Step 2 score variation. In both samples, Standard-Paced conditions produced better estimates of future Step performance than Self-Paced ones. Conclusions. Results indicate that self-assessment examinations provide an accurate basis for predicting performance on the associated Step with some variation in predictive accuracy across test administration conditions.

Academic Medicine | 2009

Use of multimedia on the step 1 and step 2 clinical knowledge components of USMLE: a controlled trial of the impact on item characteristics.

Kathleen Z. Holtzman; David B. Swanson; Wenli Ouyang; Kieran Hussie; Krista Allbee

Background During 2007, multimedia-based presentations of selected clinical findings were introduced into the United States Medical Licensing Examination. This study investigated the impact of presenting cardiac auscultation findings in multimedia versus text format on item characteristics. Method Content-matched versions of 43 Step 1 and 51 Step 2 Clinical Knowledge (CK) multiple-choice questions describing common pediatric and adult clinical presentations were administered in unscored sections of Step 1 and Step 2 CK. For multimedia versions, examinees used headphones to listen to the heart on a simulated chest while watching video showing associated chest and neck vein movements. Text versions described auscultation findings using standard medical terminology. Results Analyses of item responses for first-time examinees from U.S./Canadian and international medical schools indicated that multimedia items were significantly more difficult than matched text versions, were less discriminating, and required more testing time. Conclusions Examinees can more readily interpret auscultation findings described in text using standard terminology than those same findings presented in a more authentic multimedia format. The impact on examinee performance and item characteristics is substantial.

Explore More