Qike Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qike Li is active.

Explore More

Publication

Featured researches published by Qike Li.

Bioinformatics | 2015

Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival.

A. Grant Schissler; Vincent Gardeux; Qike Li; Ikbel Achour; Haiquan Li; Walter W. Piegorsch; Yves A. Lussier

Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We previously employed a metric that could prioritize the statistical significance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g. the equivalent to a gene expression fold-change). Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance (MD) to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples MD (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simulations), while not inflating false-positive rate using a study with biological replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant and are predictive of breast cancer survival (P < 0.05, n = 80 invasive carcinoma; TCGA RNA-sequences). Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient’s transcriptome. These pathways offer the opportunities for deriving clinically actionable decisions that have the potential to complement the clinical interpretability of personal polymorphisms obtained from DNA acquired or inherited polymorphisms and mutations. In addition, it offers an opportunity for applicability to diseases in which DNA changes may not be relevant, and thus expand the ‘interpretable ‘omics’ of single subjects (e.g. personalome). Availability and implementation: http://www.lussierlab.net/publications/N-of-1-pathways. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Bioinformatics | 2015

A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes

Naruekamol Pookhao; Michael B. Sohn; Qike Li; Isaac Jenkins; Ruofei Du; Hongmei Jiang; Lingling An

MOTIVATION With the advance of new sequencing technologies producing massive short reads data, metagenomics is rapidly growing, especially in the fields of environmental biology and medical science. The metagenomic data are not only high dimensional with large number of features and limited number of samples but also complex with a large number of zeros and skewed distribution. Efficient computational and statistical tools are needed to deal with these unique characteristics of metagenomic sequencing data. In metagenomic studies, one main objective is to assess whether and how multiple microbial communities differ under various environmental conditions. RESULTS We propose a two-stage statistical procedure for selecting informative features and identifying differentially abundant features between two or more groups of microbial communities. In the functional analysis of metagenomes, the features may refer to the pathways, subsystems, functional roles and so on. In the first stage of the proposed procedure, the informative features are selected using elastic net as reducing the dimension of metagenomic data. In the second stage, the differentially abundant features are detected using generalized linear models with a negative binomial distribution. Compared with other available methods, the proposed approach demonstrates better performance for most of the comprehensive simulation studies. The new method is also applied to two real metagenomic datasets related to human health. Our findings are consistent with those in previous reports. AVAILABILITY R code and two example datasets are available at http://cals.arizona.edu/∼anling/software.htm. SUPPLEMENTARY INFORMATION Supplementary file is available at Bioinformatics online.

BMC Bioinformatics | 2014

Accurate genome relative abundance estimation for closely related species in a metagenomic sample

Michael B. Sohn; Lingling An; Naruekamol Pookhao; Qike Li

BackgroundMetagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.ResultsWe developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.ConclusionsBy taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Journal of Biomedical Informatics | 2017

kMEn: Analyzing noisy and bidirectional transcriptional pathway responses in single subjects

Qike Li; A. Grant Schissler; Vincent Gardeux; Joanne Berghout; Ikbel Achour; Colleen Kenost; Haiquan Li; Hao Helen Zhang; Yves A. Lussier

MOTIVATION Understanding dynamic, patient-level transcriptomic response to therapy is an important step forward for precision medicine. However, conventional transcriptome analysis aims to discover cohort-level change, lacking the capacity to unveil patient-specific response to therapy. To address this gap, we previously developed two N-of-1-pathways methods, Wilcoxon and Mahalanobis distance, to detect unidirectionally responsive transcripts within a pathway using a pair of samples from a single subject. Yet, these methods cannot recognize bidirectionally (up and down) responsive pathways. Further, our previous approaches have not been assessed in presence of background noise and are not designed to identify differentially expressed mRNAs between two samples of a patient taken in different contexts (e.g. cancer vs non cancer), which we termed responsive transcripts (RTs). METHODS We propose a new N-of-1-pathways method, k-Means Enrichment (kMEn), that detects bidirectionally responsive pathways, despite background noise, using a pair of transcriptomes from a single patient. kMEn identifies transcripts responsive to the stimulus through k-means clustering and then tests for an over-representation of the responsive genes within each pathway. The pathways identified by kMEn are mechanistically interpretable pathways significantly responding to a stimulus. RESULTS In ∼9000 simulations varying six parameters, superior performance of kMEn over previous single-subject methods is evident by: (i) improved precision-recall at various levels of bidirectional response and (ii) lower rates of false positives (1-specificity) when more than 10% of genes in the genome are differentially expressed (background noise). In a clinical proof-of-concept, personal treatment-specific pathways identified by kMEn correlate with therapeutic response (p-value<0.01). CONCLUSION Through improved single-subject transcriptome dynamics of bidirectionally-regulated signals, kMEn provides a novel approach to identify mechanism-level biomarkers.

BMC Medical Genomics | 2017

N-of-1- pathways MixEnrich: advancing precision medicine via single-subject analysis in discovering dynamic changes of transcriptomes

Qike Li; A. Grant Schissler; Vincent Gardeux; Ikbel Achour; Colleen Kenost; Joanne Berghout; Haiquan Li; Hao Helen Zhang; Yves A. Lussier

BackgroundTranscriptome analytic tools are commonly used across patient cohorts to develop drugs and predict clinical outcomes. However, as precision medicine pursues more accurate and individualized treatment decisions, these methods are not designed to address single-patient transcriptome analyses. We previously developed and validated the N-of-1-pathways framework using two methods, Wilcoxon and Mahalanobis Distance (MD), for personal transcriptome analysis derived from a pair of samples of a single patient. Although, both methods uncover concordantly dysregulated pathways, they are not designed to detect dysregulated pathways with up- and down-regulated genes (bidirectional dysregulation) that are ubiquitous in biological systems.ResultsWe developed N-of-1-pathways MixEnrich, a mixture model followed by a gene set enrichment test, to uncover bidirectional and concordantly dysregulated pathways one patient at a time. We assess its accuracy in a comprehensive simulation study and in a RNA-Seq data analysis of head and neck squamous cell carcinomas (HNSCCs). In presence of bidirectionally dysregulated genes in the pathway or in presence of high background noise, MixEnrich substantially outperforms previous single-subject transcriptome analysis methods, both in the simulation study and the HNSCCs data analysis (ROC Curves; higher true positive rates; lower false positive rates). Bidirectional and concordant dysregulated pathways uncovered by MixEnrich in each patient largely overlapped with the quasi-gold standard compared to other single-subject and cohort-based transcriptome analyses.ConclusionThe greater performance of MixEnrich presents an advantage over previous methods to meet the promise of providing accurate personal transcriptome analysis to support precision medicine at point of care.

Bioinformatics | 2016

Analysis of aggregated cell-cell statistical distances within pathways unveils therapeutic-resistance mechanisms in circulating tumor cells

A. Grant Schissler; Qike Li; James L. Chen; Colleen Kenost; Ikbel Achour; Dean Billheimer; Haiquan Li; Walter W. Piegorsch; Yves A. Lussier

Motivation: As ‘omics’ biotechnologies accelerate the capability to contrast a myriad of molecular measurements from a single cell, they also exacerbate current analytical limitations for detecting meaningful single-cell dysregulations. Moreover, mRNA expression alone lacks functional interpretation, limiting opportunities for translation of single-cell transcriptomic insights to precision medicine. Lastly, most single-cell RNA-sequencing analytic approaches are not designed to investigate small populations of cells such as circulating tumor cells shed from solid tumors and isolated from patient blood samples. Results: In response to these characteristics and limitations in current single-cell RNA-sequencing methodology, we introduce an analytic framework that models transcriptome dynamics through the analysis of aggregated cell–cell statistical distances within biomolecular pathways. Cell–cell statistical distances are calculated from pathway mRNA fold changes between two cells. Within an elaborate case study of circulating tumor cells derived from prostate cancer patients, we develop analytic methods of aggregated distances to identify five differentially expressed pathways associated to therapeutic resistance. Our aggregation analyses perform comparably with Gene Set Enrichment Analysis and better than differentially expressed genes followed by gene set enrichment. However, these methods were not designed to inform on differential pathway expression for a single cell. As such, our framework culminates with the novel aggregation method, cell-centric statistics (CCS). CCS quantifies the effect size and significance of differentially expressed pathways for a single cell of interest. Improved rose plots of differentially expressed pathways in each cell highlight the utility of CCS for therapeutic decision-making. Availability and implementation: http://www.lussierlab.org/publications/CCS/ Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Journal of Biomedical Informatics | 2015

eQTL networks unveil enriched mRNA master integrators downstream of complex disease-associated SNPs

Haiquan Li; Nima Pouladi; Ikbel Achour; Vincent Gardeux; Jianrong Li; Qike Li; Hao Helen Zhang; Fernando D. Martinez; Joe G. N. Garcia; Yves A. Lussier

The causal and interplay mechanisms of Single Nucleotide Polymorphisms (SNPs) associated with complex diseases (complex disease SNPs) investigated in genome-wide association studies (GWAS) at the transcriptional level (mRNA) are poorly understood despite recent advancements such as discoveries reported in the Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTex). Protein interaction network analyses have successfully improved our understanding of both single gene diseases (Mendelian diseases) and complex diseases. Whether the mRNAs downstream of complex disease genes are central or peripheral in the genetic information flow relating DNA to mRNA remains unclear and may be disease-specific. Using expression Quantitative Trait Loci (eQTL) that provide DNA to mRNA associations and network centrality metrics, we hypothesize that we can unveil the systems properties of information flow between SNPs and the transcriptomes of complex diseases. We compare different conditions such as naïve SNP assignments and stringent linkage disequilibrium (LD) free assignments for transcripts to remove confounders from LD. Additionally, we compare the results from eQTL networks between lymphoblastoid cell lines and liver tissue. Empirical permutation resampling (p<0.001) and theoretic Mann-Whitney U test (p<10(-30)) statistics indicate that mRNAs corresponding to complex disease SNPs via eQTL associations are likely to be regulated by a larger number of SNPs than expected. We name this novel property mRNA hubness in eQTL networks, and further term mRNAs with high hubness as master integrators. mRNA master integrators receive and coordinate the perturbation signals from large numbers of polymorphisms and respond to the personal genetic architecture integratively. This genetic signal integration contrasts with the mechanism underlying some Mendelian diseases, where a genetic polymorphism affecting a single protein hub produces a divergent signal that affects a large number of downstream proteins. Indeed, we verify that this property is independent of the hubness in protein networks for which these mRNAs are transcribed. Our findings provide novel insights into the pleiotropy of mRNAs targeted by complex disease polymorphisms and the architecture of the information flow between the genetic polymorphisms and transcriptomes of complex diseases.

Journal of the American Medical Informatics Association | 2017

A genome-by-environment interaction classifier for precision medicine: Personal transcriptome response to rhinovirus identifies children prone to asthma exacerbations

Vincent Gardeux; Joanne Berghout; Ikbel Achour; A. Grant Schissler; Qike Li; Colleen Kenost; Jianrong Li; Yuan Shang; Anthony Bosco; Donald Saner; Marilyn Halonen; Daniel J. Jackson; Haiquan Li; Fernando D. Martinez; Yves A. Lussier

Abstract Objective To introduce a disease prognosis framework enabled by a robust classification scheme derived from patient-specific transcriptomic response to stimulation. Materials and Methods Within an illustrative case study to predict asthma exacerbation, we designed a stimulation assay that reveals individualized transcriptomic response to human rhinovirus. Gene expression from peripheral blood mononuclear cells was quantified from 23 pediatric asthmatic patients and stimulated in vitro with human rhinovirus. Responses were obtained via the single-subject gene set testing methodology “N-of-1-pathways.” The classifier was trained on a related independent training dataset (n = 19). Novel visualizations of personal transcriptomic responses are provided. Results Of the 23 pediatric asthmatic patients, 12 experienced recurrent exacerbations. Our classifier, using individualized responses and trained on an independent dataset, obtained 74% accuracy (area under the receiver operating curve of 71%; 2-sided P = .039). Conventional classifiers using messenger RNA (mRNA) expression within the viral-exposed samples were unsuccessful (all patients predicted to have recurrent exacerbations; accuracy of 52%). Discussion Prognosis based on single time point, static mRNA expression alone neglects the importance of dynamic genome-by-environment interplay in phenotypic presentation. Individualized transcriptomic response quantified at the pathway (gene sets) level reveals interpretable signals related to clinical outcomes. Conclusion The proposed framework provides an innovative approach to precision medicine. We show that quantifying personal pathway–level transcriptomic response to a disease-relevant environmental challenge predicts disease progression. This genome-by-environment interaction assay offers a noninvasive opportunity to translate omics data to clinical practice by improving the ability to predict disease exacerbation and increasing the potential to produce more effective treatment decisions.

bioRxiv | 2018

iDEG: A single-subject method for assessing gene differential expression from two transcriptomes of an individual

Qike Li; Samir Rachid Zaim; Dillon Aberasturi; Joanne Berghout; Haiquan Li; Francesca Vitali; Colleen Kenost; Helen Hao Zhang; Yves A. Lussier

Abstract Background Accurate profiling of gene expression in a single subject has the potential to be a powerful precision medicine tool, useful for unveiling individual disease mechanisms and responses. However, expression analysis tools for RNA-sequencing (RNA-Seq) data require replicate samples to estimate gene-wise data variability and make inferences, which is costly and not easily obtainable in clinical practice. Strategies to implement DEGSeq, DESeq, and edgeR for comparing two conditions without replicates (TCWR) have been proposed without evaluation, while NOISeq-sim was validated in a restricted way using qPCR on 400 transcripts. These methods impose restrictive assumptions in TCWR limiting inferential opportunities. Methods We propose a new method that borrows information across different genes from the same individual using a partitioned window to strategically bypass the requirement of replicates per condition. We termed this method “iDEG”, which identifies individualized Differentially Expressed Genes in a single subject sampled under two conditions without replicates, i.e., a baseline sample (unaffected tissue) vs. a case sample (tumor). iDEG transforms RNA-Seq data such that, under the null hypothesis, differences of transformed expression counts follow a distribution and variance calculated across a local partition of related transcripts at baseline expression. This transformation enables modeling genes with a two-group mixture model from which the probability of differential expression for each gene is then estimated by an empirical Bayes approach with a local false discovery rate control. To compare the performance of iDEG to other methods applied to TCWR, we conducted simulations assuming a Negative Binomial distribution with varying dispersion parameters and percentages of differentially expressed genes (DEGs). Results Our extensive simulation studies demonstrate that iDEG’s F1 accuracy scores better than the other methods at 5% 90% and recall>75% and low false positive rate ( Conclusion The partitioned window strategy provides a novel and accurate way to borrow information across genes locally and would probably increase the accuracy of all relevant methods.Accurate profiling of gene expression in a single subject has the potential to be a powerful precision medicine tool, useful for unveiling individual disease mechanisms and responses. However, most expression analysis tools for RNA-sequencing (RNA-Seq) data require replicate samples to estimate gene-wise data variability and make inferences, which is costly and not easily obtainable in clinical practice. We propose the iDEG method to identify individualized Differentially Expressed Genes in a single subject sampled under two conditions without replicates, i.e. a baseline sample (unaffected tissue) vs. a case sample (tumor). iDEG borrows information across different genes from the same individual using a partitioned window to strategically bypass the requirement of replicates per condition. It then transforms RNA-Seq data such that, under the null hypothesis, differences of transformed expression counts follow a distribution and variance calculated across a local partition of related transcripts at baseline expression. This transformation enables modeling genes with a two-group mixture model from which the probability of differential expression for each gene is then estimated by an empirical Bayes approach with a local false discovery rate control. Our extensive simulation studies demonstrate iDEGs substantially was the only technique keeping high precision (>90%), recall (>75%) and low false positive rate (<1%) accuracy under thousands of scenarios when compared to DESeq, edgeR, and DEGseq. Software available at: http://www.lussiergroup.org/publications/iDEGCalculating Differentially Expressed Genes (DEGs) from RNA-sequencing requires replicates to estimate gene-wise variability, infeasible in clinics. By imposing restrictive transcriptome-wide assumptions limiting inferential opportunities of conventional methods (edgeR, NOISeq-sim, DESeq, DEGseq), comparing two conditions without replicates (TCWR) has been proposed, but not evaluated. Under TCWR conditions (e.g., unaffected tissue vs. tumor), differences of transformed expression of the proposed individualized DEG (iDEG) method follow a distribution calculated across a local partition of related transcripts at baseline expression; thereafter the probability of each DEG is estimated by empirical Bayes with local false discovery rate control using a two-group mixture model. In extensive simulation studies of TCWR methods, iDEG and NOISeq are more accurate at 5%<DEGs<20% (precision>90%, recall>75%, false_positive_rate<1%) and 30%<DEGs<40% (precision=recall∼90%), respectively. The proposed iDEG method borrows localized distribution information from the same individual, a strategy that improves accuracy to compare transcriptomes in absence of replicates at low DEGs conditions. http://www.lussiergroup.org/publications/iDEG

Proceedings of the Pacific Symposium | 2018

Single subject transcriptome analysis to identify functionally signed gene set or pathway activity

Joanne Berghout; Qike Li; Nima Pouladi; Jianrong Li; Yves A. Lussier

Analysis of single-subject transcriptome response data is an unmet need of precision medicine, made challenging by the high dimension, dynamic nature and difficulty in extracting meaningful signals from biological or stochastic noise. We have proposed a method for single subject analysis that uses a mixture model for transcript fold-change clustering from isogenically paired samples, followed by integration of these distributions with Gene Ontology Biological Processes (GO-BP) to reduce dimension and identify functional attributes. We then extended these methods to develop functional signing metrics for gene set process regulation by incorporating biological repressor relationships encoded in GO-BP as negatively_regulates edges. Results revealed reproducible and biologically meaningful signals from analysis of a single subjects response, opening the door to future transcriptomic studies where subject and resource availability are currently limiting. We used inbred mouse strains fed different diets to provide isogenic biological replicates, permitting rigorous validation of our method. We compared significant genotype-specific GO-BP term results for overlap and rank order across three replicate pairs per genotype, and cross-methods to reference standards (limma+FET, SAM+FET, and GSEA). All single-subject analytics findings were robust and highly reproducible (median area under the ROC curve=0.96, n=24 genotypes × 3 replicates), providing confidence and validation of this approach for analyses in single subjects. R code is available online at http://www.lussiergroup.org/publications/PathwayActivity

Explore More