Nature Machine Intelligence | 2021

Integration of millions of transcriptomes using batch-aware triplet neural networks

 
 
 

Abstract


Efficient integration of heterogeneous and increasingly large single-cell RNA sequencing data poses a major challenge for analysis and, in particular, comprehensive atlasing efforts. Here we developed a novel deep learning algorithm called INSCT (Insight) to overcome batch effects using batch-aware triplet neural networks. We use simulated and real data to demonstrate that INSCT generates an embedding space that accurately integrates cells across experiments, platforms and species. Our benchmark comparisons with current state-of-the-art single-cell RNA sequencing integration methods revealed that INSCT outperforms competing methods in scalability while achieving comparable accuracies. Moreover, using INSCT in semisupervised mode enables users to classify unlabelled cells by projecting them into a reference collection of annotated cells. To demonstrate scalability, we applied INSCT to integrate more than 2.6 million transcriptomes from four independent studies of mouse brains in less than 1.5\u2009h using less than 25\u2009GB of memory. This feature empowers researchers to perform atlasing-scale data integration in a typical desktop computer environment. INSCT is freely available at https://github.com/lkmklsmn/insct\n . Single-cell RNA sequencing efforts have made large amounts of data available for transcriptomics research. Simon and colleagues develop a neural network embedding approach that avoids batch effects, such that it can rapidly and efficiently integrate large datasets from different studies.

Volume None
Pages None
DOI 10.1038/s42256-021-00361-8
Language English
Journal Nature Machine Intelligence

Full Text