EBioMedicine | 2021

Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice

 
 
 
 
 
 

Abstract


Background Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ( supersusceptible ) in an experimentally infected cohort of Diversity Outbred mice (n=77). Methods Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data. Findings The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively. Implications Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes expression to serve as a diagnostic or prognostic tool. Funding National Institutes of Health and American Lung Association.

Volume 67
Pages None
DOI 10.1016/j.ebiom.2021.103388
Language English
Journal EBioMedicine

Full Text