Archive | 2019

Resident and Faculty Concordance in Screening Mammography: Impact of Experience and Opportunities for Focused Instruction

 
 
 
 
 
 
 
 

Abstract


Purpose: To evaluate the frequency of and reasons for patient callback from offline screening mammography, comparing residents and breast imaging faculty. Methods: Residents and MQSA-approved fellowship-trained breast imaging faculty independently recorded prospective interpretations of a subset of bilateral clinical screening mammograms performed over a 1-year period at our NCI-designated cancer site utilizing Computer-Assisted Diagnosis (CAD). BI-RADS 1, 2, or 0 were allowed at screen interpretation. IRB-approved retrospective review compared callback performance in both groups. Descriptive statistics and multivariate logistic regression were performed. Results: 1317 consecutive bilateral screening mammograms were reviewed. Residents recommended callback for 123/1317 (9.3%) and faculty for 110/1317 (8.4%) women (p<.0001). Overall agreement was moderate (k=0.50) with lower agreement between faculty and novices (experience < 4 weeks) (k=0.39) than between faculty and senior residents (experience > 8 weeks) (k=0.63). Agreement varied with findings: calcifications (k=0.66), mass (k=0.52), focal asymmetry (k=0.45), asymmetry (k=0.33). In multivariate regression, all four finding types were predictors of discordance: calcifications (OR 10.4, 95% CI 3.4, 33.1, p<.0001); mass (OR 19.2, 95% CI 7.7, 48.0, p<.0001); focal asymmetry (OR 21.3, 95% CI 9.9, 45.7, p<.0001); asymmetry (OR 40.1, 95% CI 21.4, 75.2, p<.0001). Odds of discordance declined by 6% with each week of resident experience (OR 0.94, 95% CI 0.89, 0.99, p=.02). Breast density was not a significant predictor. Conclusions: Resident and faculty callback agreement was moderate but improved with resident experience. Novices often detected calcifications and masses but missed focal asymmetry and asymmetry, suggesting educational efforts should focus on the perception of asymmetry. Introduction Breast cancer has the second highest mortality rate of all cancers in women, and mammography is the only known screening method shown to decrease disease-related mortality [1]. Robust diagnostic performance of screening mammography is essential to this public health impact, with a delicate balance between detecting clinically significant cancers and avoiding excessive callback rates. This level of accuracy is the intended result of specialty training and years of experience in breast imaging, but the first phase is residency training [2, 3]. To meet the requirements of the Mammography Quality Standards Act (MQSA) for training in breast imaging, radiology residents spend at least 12 weeks of their 4 year training in breast imaging clinical rotations [4]. Resident evaluations are based on faculty observation of interpretative skills and procedures, patient interactions, and dictated reports. This style of individualized instruction has the potential to provide residents with personalized training. However, given the time constraints often present at busy academic centers, there is a further need for objective metrics and data that can be used to assess the performance and tailor the education of trainees in breast imaging. As residency training is integral to mammography expertise, many previous efforts have focused on improving the training process. Previous efforts have addressed the need for varied difficulty of cases based on selfand expert-assessments to maximize the effect of training on resident performance [5]. Mathematical models have been developed in an effort to address the need for objective assessment metrics [6, 7], and some efforts have been made to identify image features predictive of error to improve the clinical utility of such models [8]. Outside of breast imaging, concordance of resident and faculty interpretation is high [9, 10]. That is not the case in breast imaging. The goal of the current study is to evaluate the frequency and morphologic reason for trainee callbacks from screening mammography and to compare them to faculty breast imager callbacks. We hypothesize Katherine A. Klein (2018) Resident and Faculty Concordance in Screening Mammography: Impact of Experience and Opportunities for Focused Instruction Interv Med Clin Imaging, Volume 1(1): 2–5, 2018 that the callback rates of radiology residents will be within the national benchmarks of 8–12% but higher than those of experienced breast imaging faculty. Materials and Methods All cases interpreted were 2D digital four view screening mammograms obtained on GE Senographe Essential Mammography equipment (Buc, France) at one of six screening locations within one academic health system. Residents and faculty had individual workstations to view the digital studies with hard copy images available for review as desired. Anonymized screening mammography data sheets, including resident, and faculty interpretations, were routinely recorded for Quality Assessment (QA) and educational purposes from July 1, 2014, to June 30, 2015. All the radiology residents who rotated in breast imaging took part in this process. It has been shown that trainee interpretation of screening mammography influences faculty interpretation [2]. Thus, we asked residents and faculty to fill out an initial written assessment form independently, stating whether they would recall the mammography patient for additional screening or interpret the mammogram as negative. Faculty interpretation was the reference standard for purposes of this study. Subsequent Institutional Review Board (IRBMED) approval for retrospective reviews of the data waived the need for patient consent. Data included resident weeks of training, resident observations (calcifications, mass, focal asymmetry, asymmetry), location, recommendations for a callback for additional diagnostic imaging as well as faculty observations, location, recommendation, and assessment of breast density. All eleven faculties in the breast imaging section, with nine to thirty years of experience after fellowship, were included. The hard copy data were subsequently entered into an electronic spreadsheet by a medical student blinded to clinical outcomes (Microsoft Excel, Redmond, WA). Resident interpretation was considered concordant with faculty interpretation when the decision and reason for callback matched that of the faculty, for one breast in per breast analysis or both breasts for per patient analysis. Descriptive statistics were performed to identify data trends and distribution. Continuous variables were evaluated with means and compared using t-tests or non-parametric tests where appropriate, while categorical variables were expressed as counts or percentages and compared using chi-square tests and measures of agreement. Kappa agreement was considered slight if <.20, fair if 0.21–0.40, moderate if 0.41–0.60, substantial if 0.61–0.80, and almost perfect if 0.81–0.99. Logistic regression analysis was performed to evaluate predictors of residentfaculty discordance. A stepwise forward selection algorithm was used to select covariates for multivariate logistic regression. All statistical procedures considered p<.05 as the standard for statistical significance and were performed using SAS 9.4 (SAS Institute, Cary, NC). Results Data sheets were reviewed for 1,345 consecutive bilateral screening mammograms; 28 of these were excluded from further analysis because the data sheets were incomplete (n=27), or the patient had clinical symptoms that would warrant a diagnostic exam regardless of screening mammographic findings (n=1), leaving 1,317 cases. Residents recommended that 123/1,317 (9.34%) women be called back for additional imaging, while faculty recommended callbacks for 110/1,317 (8.35%) women (p<.0001). Resident and faculty callback recommendations at the per-patient level were concordant in 1208/1,317 (91.72%) cases. Residents and faculty agreed on 62 callbacks, while residents would have called back 61 women who were not called back by faculty, and faculty called back 48 women who would not have been called back by residents. Among the 62 cases of apparently concordant callbacks, the sidedness of the resident and faculty’s reasons for callback differed in 5/62 (8.07%) cases. Therefore, the true proportion of concordant interpretations on the per-patient level was 91.34%, and the remaining analysis was performed on a perbreast basis with a total sample size of 2,634. Regarding each breast as an individual observation, the residents recommended callback in 139/2634 (5.28%) cases and the faculty in 123/2634 (4.67%) cases (p<.0001). Overall agreement between residents and faculty was moderate (k=0.50, p<.0001). Recommendations were negative concordant (no call back) in 2441/2634 (92.67%) cases, positive concordant (both call back) in 69/2634 (2.63%), resident positive/faculty negative in 70/2634 (2.66%) and resident negative/faculty positive in 54/2634 (2.05%). Types and locations of findings prompting callbacks are illustrated in Figures 1 and 2. Resident and faculty agreement were highest for calcifications (k=0.66) and lowest for asymmetry (k=0.33), presented in table 1. Agreement for location was moderate (k=0.45). Figure 1. M. ammographic findings prompting recommendation for callbacks among residents and faculty, on a per breast basis. Figure 2. Location of findings prompting recommendation for callbacks among residents and faculty, on a per breast basis Katherine A. Klein (2018) Resident and Faculty Concordance in Screening Mammography: Impact of Experience and Opportunities for Focused Instruction Interv Med Clin Imaging, Volume 1(1): 3–5, 2018 Table 1. Agreement between residents and faculty on type and location of findings prompting recommendation for callback from screening mammography, on a per breast basis. P-values < .05 indicate the presence of a non-zero correlation between faculty and trainee interpretations of each feature. Cohen’s kappa p value Calcifications 0.66 <.0001

Volume None
Pages None
DOI 10.31038/irci.2018113
Language English
Journal None

Full Text