Radiology | 2021

Added Value of Deep Learning-based Detection System for Multiple Major Findings on Chest Radiographs: A Randomized Crossover Study.

 
 
 
 
 
 
 
 

Abstract


Background Previous studies assessing the effects of computer-aided detection on observer performance in the reading of chest radiographs used a sequential reading design that may have biased the results because of reading order or recall bias. Purpose To compare observer performance in detecting and localizing major abnormal findings including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax on chest radiographs without versus with deep learning-based detection (DLD) system assistance in a randomized crossover design. Materials and Methods This study included retrospectively collected normal and abnormal chest radiographs between January 2016 and December 2017 (https://cris.nih.go.kr/; registration no. KCT0004147). The radiographs were randomized into two groups, and six observers, including thoracic radiologists, interpreted each radiograph without and with use of a commercially available DLD system by using a crossover design with a washout period. Jackknife alternative free-response receiver operating characteristic (JAFROC) figure of merit (FOM), area under the receiver operating characteristic curve (AUC), sensitivity, specificity, false-positive findings per image, and reading times of observers with and without the DLD system were compared by using McNemar and paired t tests. Results A total of 114 normal (mean patient age ± standard deviation, 51 years ± 11; 58 men) and 114 abnormal (mean patient age, 60 years ± 15; 75 men) chest radiographs were evaluated. The radiographs were randomized to two groups: group A (n = 114) and group B (n = 114). Use of the DLD system improved the observers JAFROC FOM (from 0.90 to 0.95, P = .002), AUC (from 0.93 to 0.98, P = .002), per-lesion sensitivity (from 83% [822 of 990 lesions] to 89.1% [882 of 990 lesions], P = .009), per-image sensitivity (from 80% [548 of 684 radiographs] to 89% [608 of 684 radiographs], P = .009), and specificity (from 89.3% [611 of 684 radiographs] to 96.6% [661 of 684 radiographs], P = .01) and reduced the reading time (from 10-65 seconds to 6-27 seconds, P < .001). The DLD system alone outperformed the pooled observers (JAFROC FOM: 0.96 vs 0.90, respectively, P = .007; AUC: 0.98 vs 0.93, P = .003). Conclusion Observers including thoracic radiologists showed improved performance in the detection and localization of major abnormal findings on chest radiographs and reduced reading time with use of a deep learning-based detection system. ©\u2009RSNA, 2021 Online supplemental material is available for this article.

Volume None
Pages \n 202818\n
DOI 10.1148/radiol.2021202818
Language English
Journal Radiology

Full Text