International journal of radiation oncology, biology, physics | 2021

Clinical Validation of Deep Learning Algorithms for Lung Cancer Radiotherapy Targeting.

 
 
 
 
 
 
 
 
 
 
 
 
 
 

Abstract


PURPOSE/OBJECTIVE(S)\nAutomated target segmentation for non-small cell lung cancer (NSCLC) patients has the potential to support radiation treatment planning. Artificial intelligence (AI) has demonstrated great promise in medical image segmentation tasks. However, most studies have been confined to in silico validation in small internal cohorts, lacking data on real-world clinical utility. In this study, we developed primary tumor and involved lymph node segmentation algorithms in computed tomography (CT) images. Validation is performed in multiple large multi-institutional cohorts to assess model generalizability.\n\n\nMATERIALS/METHODS\nSimulation CTs and ground truth annotations were collected from multiple public and private sources (total n\u202f=\u202f2584). We employed the following benchmarks: Inter-observer (6 radiation oncologists, n\u202f=\u202f20, median volumetric dice 0.83, 95% CI 0.82-0.84) and intra-observer (1 radiation oncologist, 3 reads, n\u202f=\u202f21, median volumetric dice 0.88, 95% CI 0.84-0.9). We developed two segmentation algorithms: seed-point assisted and fully automated. Model training data (n\u202f=\u202f787) comprised NSCLC-Radiomics (stages I-IIIB, n\u202f=\u202f422) and LungRT-1 (stages IA-IV, n\u202f=\u202f365). Validation was first performed in an internal dataset annotated by a single thoracic radiation oncologist (LungRT-1, n\u202f=\u202f136). Additional validation included: (1) an internal dataset annotated by other radiation oncologists, including generalists, in our center (LungRT-2, n\u202f=\u202f1075), (2) an external clinical trial dataset from 185 different institutions (RTOG-0617, n\u202f=\u202f403), and (3) a dataset of early-stage surgical patients annotated for diagnostic purposes by radiologists (NSCLC-Radiogenomics, n\u202f=\u202f142). Volumetric dice, using expert manual segmentations as ground truth, was used as an evaluation metric.\n\n\nRESULTS\nThe model performance is comparable to the benchmarks when validated on internal data, with degrading performance in cohorts annotated by other radiation oncologists.\n\n\nCONCLUSION\nThe results highlight the importance of assessing segmentation style among annotators and understanding model generalizability in external cohorts, all while cautioning against degrading performance in increasingly external data. Differences between radiologists and radiation oncologists performing the same segmentation task underscore the importance of clinical context in AI model deployment. Further validation includes studying the dosimetric impact of AI-generated segmentations, and conducting human subject experiments to assess AI output acceptance and time savings.

Volume 111 3S
Pages \n S67\n
DOI 10.1016/j.ijrobp.2021.07.167
Language English
Journal International journal of radiation oncology, biology, physics

Full Text