bioRxiv | 2019

Linnaeus: Interpretable Deep Learning Classification of Single Cell Transcript Data

 
 

Abstract


High throughput data is commonplace in biomedical research as seen with technologies such as single-cell RNA sequencing (scRNA-seq) and other Next Generation Sequencing technologies. As these techniques continue to be increasingly utilized it is critical to have analysis tools that can identify meaningful complex relationships between variables (i.e., in the case of scRNA-seq: genes) in a way such that human bias is absent. Moreover, it is equally paramount that both linear and non-linear (i.e., one-to-many) variable relationships be considered when contrasting datasets. Linnaeus is a deep learning-based framework that generates an optimal interpretable classifier a given high-throughput dataset using a simple genetic algorithm as well as an autoencoder to classifier transfer learning approach. Using four unique publicly available scRNA-seq datasets with published ground truth, we demonstrate the robustness of Linnaeus and the ability to identify ontologically accurate gene lists for a given data subset. Linnaeus serves as a bioinformatic tool to allow novice and advanced analysts to gain complex insight into their respective datasets enabling novel hypotheses development.

Volume None
Pages None
DOI 10.1101/822759
Language English
Journal bioRxiv

Full Text