Archive | 2021

Novel deep learning algorithm predicts the status of molecular pathways and key mutations in colorectal cancer from routine histology images

 
 
 
 
 
 
 
 
 

Abstract


Summary Background Determining molecular pathways involved in the development of colorectal cancer (CRC) and knowing the status of key mutations are crucial for deciding optimal target therapy. The goal of this study is to explore machine learning to predict the status of the three main CRC molecular pathways, microsatellite instability (MSI), chromosomal instability (CIN), CpG island methylator phenotype (CIMP), and to detect BRAF and TP53 mutations as well as to predict hypermutated (HM) CRC tumors from whole-slide images (WSIs) of colorectal cancer (CRC) slides stained with Hematoxylin and Eosin (H&E). Methods We propose a novel iterative draw-and-rank sampling (IDaRS) algorithm to select representative sub-images or tiles from a WSI given a single WSI-level label, without needing any detailed annotations at the cell or region levels. IDaRS is used to train a deep convolutional network for predicting key molecular parameters in CRC (in particular, prediction of HM tumors and the status of three main CRC molecular pathways, MSI, CIN, CIMP, as well as the detection of two key mutations, BRAF and TP53) from digitized images of routine H&E stained tissue slides of CRC patients (n=497 for TCGA cohort and n=47 cases for the Pathology AI Platform or PAIP cohort). Visual fields most predictive of each pathway and HM tumors identified by IDaRS are analyzed for verification of known histological features for the first time to reveal novel histological features. This is achieved by systematic, data-driven analysis of the cellular composition of strongly predictive tiles. Findings IDaRS yields high prediction accuracy for prediction of the three main CRC genetic pathways and key mutations by deep learning based analysis of the WSIs of H&E stained slides. It achieves the state-of-the-art AUROC values of 0.90 and 0.83 for prediction of the status of MSI and CIN for the TCGA cohort, which is significantly higher than any other currently published methods on that cohort. To the best of our knowledge, this study is the first to report deep learning based automated prediction of HM tumors and the status of CIMP pathway (CIMP-High and CIMP-Low) from H&E slides, with an AUROC of 0.81 and 0.79, respectively. We analyzed key discriminative histological features associated with HM tumors and each molecular pathway in a data-driven manner, via an automated quantitative analysis of the cellular composition of tiles strongly predictive of the corresponding molecular status. A key feature of the proposed method is that it enables a systematic and data-driven analysis of the cellular composition of image tiles strongly predictive of the various molecular parameters. We found that relatively high proportion of tumor infiltrating lymphocytes and necrosis are found to be strongly associated with HM and MSI, and moderately associated with CIMP-H and genome-stable (GS) cases, whereas relatively high proportions of neoplastic epithelial type 2 (NEP2), mesenchymal and neoplastic epithelial type 1 (NEP1) cells are found to be associated with CIN cases. Interpretation Automated prediction of genetic pathways and key mutations from image analysis of simple H&E stained sections with a high accuracy can provide time and cost-effective decision support. This work shows that a deep learning algorithm can mine both visually recognizable as well as sub-visual histological patterns associated with molecular pathways and key mutations in CRC in a data-driven manner. Funding This study was funded by the UK Medical Research Council (award MR/P015476/1).

Volume None
Pages None
DOI 10.1101/2021.01.19.21250122
Language English
Journal None

Full Text