bioRxiv | 2019

Approaches for integrating heterogeneous RNA-seq data reveals cross-talk between microbes and genes in asthmatic patients

 
 
 
 
 
 
 
 
 
 
 

Abstract


Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNAseq) can be used on sputum, but it can be challenging to interpret because sputum contains a complex and heterogeneous mixture of human cells and exogenous (microbial) material. In this study, we developed a methodology that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. We use this to relate bulk RNAseq data from 115 asthmatic patients with clinical information, microscope images, and single-cell profiles. First, we mapped sputum RNAseq to human and exogenous sources. Next, we decomposed the human reads into cell-expression signatures and fractions of these in each sample; we validated the decomposition using targeted single-cell RNAseq and microscopy. We observed enrichment of immune-system cells (neutrophils, eosinophils, and mast cells) in severe asthmatics. Second, we inferred microbial abundances from the exogenous reads and then associated these with clinical variables -- e.g., Haemophilus was associated with increased white blood cell count and Candida, with worse lung function. Third, we applied a generative model, Latent Dirichlet allocation (LDA), to identify patterns of gene expression and microbial abundances and relate them to clinical data. Based on this, we developed a method called LDA-link that connects microbes to genes using reduced-dimensionality LDA topics. We found a number of known connections, e.g. between Haemophilus and the gene IL1B, which is highly expressed by mast cells. In addition, we identified novel connections, including Candida and the calcium-signaling gene CACNA1E, which is highly expressed by eosinophils. These results speak to the mechanism by which gene-microbe interactions contribute to asthma and define a strategy for making inferences in heterogeneous and noisy RNAseq datasets.

Volume None
Pages None
DOI 10.1101/765297
Language English
Journal bioRxiv

Full Text