Travers Ching
University of Hawaii
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Travers Ching.
RNA | 2014
Travers Ching; Sijia Huang; Lana X. Garmire
It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.
Frontiers in Genetics | 2016
Olivier B. Poirion; Xun Zhu; Travers Ching; Lana X. Garmire
The emerging single-cell RNA-Seq (scRNA-Seq) technology holds the promise to revolutionize our understanding of diseases and associated biological processes at an unprecedented resolution. It opens the door to reveal intercellular heterogeneity and has been employed to a variety of applications, ranging from characterizing cancer cells subpopulations to elucidating tumor resistance mechanisms. Parallel to improving experimental protocols to deal with technological issues, deriving new analytical methods to interpret the complexity in scRNA-Seq data is just as challenging. Here, we review current state-of-the-art bioinformatics tools and methods for scRNA-Seq analysis, as well as addressing some critical analytical challenges that the field faces.
PLOS Computational Biology | 2014
Sijia Huang; Cameron Yee; Travers Ching; Herbert Yu; Lana X. Garmire
Breast cancer is the most common malignancy in women worldwide. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. Towards this goal, we have developed a novel computational model for breast cancer prognosis by combining the Pathway Deregulation Score (PDS) based pathifier algorithm, Cox regression and L1-LASSO penalization method. We trained the model on a set of 236 patients with gene expression data and clinical information, and validated the performance on three diversified testing data sets of 606 patients. To evaluate the performance of the model, we conducted survival analysis of the dichotomized groups, and compared the areas under the curve based on the binary classification. The resulting prognosis genomic model is composed of fifteen pathways (e.g. P53 pathway) that had previously reported cancer relevance, and it successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value<0.0005). Moreover, the pathway-based genomic models consistently performed better than gene-based models on all four data sets. We also find strong evidence that combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone. In summary, we propose a novel prognosis model that harnesses the pathway-based dysregulation as well as valuable clinical information. The selected pathways in our prognosis model are promising targets for therapeutic intervention.
Molecular Human Reproduction | 2014
Travers Ching; Min-Ae Song; Maarit Tiirikainen; Janos Molnar; Marla J. Berry; Dena Towner; Lana X. Garmire
Pre-eclampsia is the leading cause of fetal and maternal morbidity and mortality. Early onset pre-eclampsia (EOPE) is a disorder that has severe maternal and fetal outcomes, whilst its etiology is poorly understood. We hypothesize that epigenetics plays an important role to mediate the development of EOPE and conducted a case-control study to compare the genome-wide methylome difference between chorioamniotic membranes from 30 EOPE and 17 full-term pregnancies using the Infinium Human Methylation 450 BeadChip arrays. Bioinformatics analysis tested differential methylation (DM) at CpG site level, gene level, and pathway and network level. A striking genome-wide hypermethylation pattern coupled with hypomethylation in promoters was observed. Out of 385 184 CpG sites, 9995 showed DM (2.6%). Of those DM sites, 91.9% showed hypermethylation (9186 of 9995). Over 900 genes had DM associated with promoters. Promoter-based DM analysis revealed that genes in canonical cancer-related pathways such as Rac, Ras, PI3K/Akt, NFκB and ErBB4 were enriched, and represented biological functional alterations that involve cell cycle, apoptosis, cancer signaling and inflammation. A group of genes previously found to be up-regulated in pre-eclampsia, including GRB2, ATF3, NFKB2, as well as genes in proteasome subunits (PSMA1, PMSE1, PSMD1 and PMSD8), harbored hypomethylated promoters. Contrarily, a cluster of microRNAs, including mir-519a1, mir-301a, mir-487a, mir-185, mir-329, mir-194, mir-376a1, mir-486 and mir-744 were all hypermethylated in their promoters in the EOPE samples. These findings collectively reveal new avenues of research regarding the vast epigenetic modifications in EOPE.
Clinical Epigenetics | 2015
Travers Ching; James Ha; Min-Ae Song; Maarit Tiirikainen; Janos Molnar; Marla J. Berry; Dena Towner; Lana X. Garmire
BackgroundPreeclampsia is one of the leading causes of fetal and maternal morbidity and mortality worldwide. Preterm babies of mothers with early onset preeclampsia (EOPE) are at higher risks for various diseases later on in life, including cardiovascular diseases. We hypothesized that genome-wide epigenetic alterations occur in cord blood DNAs in association with EOPE and conducted a case control study to compare the genome-scale methylome differences in cord blood DNAs between 12 EOPE-associated and 8 normal births.ResultsBioinformatics analysis of methylation data from the Infinium HumanMethylation450 BeadChip shows a genome-scale hypomethylation pattern in EOPE, with 51,486 hypomethylated CpG sites and 12,563 hypermethylated sites (adjusted P <0.05). A similar trend also exists in the proximal promoters (TSS200) associated with protein-coding genes. Using summary statistics on the CpG sites in TSS200 regions, promoters of 643 and 389 genes are hypomethylated and hypermethylated, respectively. Promoter-based differential methylation (DM) analysis reveals that genes in the farnesoid X receptor and liver X receptor (FXR/LXR) pathway are enriched, indicating dysfunction of lipid metabolism in cord blood cells. Additional biological functional alterations involve inflammation, cell growth, and hematological system development. A two-way ANOVA analysis among coupled cord blood and amniotic membrane samples shows that a group of genes involved in inflammation, lipid metabolism, and proliferation are persistently differentially methylated in both tissues, including IL12B, FAS, PIK31, and IGF1.ConclusionsThese findings provide, for the first time, evidence of prominent genome-scale DNA methylation modifications in cord blood DNAs associated with EOPE. They may suggest a connection between inflammation and lipid dysregulation in EOPE-associated newborns and a higher risk of cardiovascular diseases later in adulthood.
BMC Bioinformatics | 2015
Jeffery Li; Travers Ching; Sijia Huang; Lana X. Garmire
BackgroundEpigenetic alterations are known to correlate with changes in gene expression among various diseases including cancers. However, quantitative models that accurately predict the up or down regulation of gene expression are currently lacking.MethodsA new machine learning-based method of gene expression prediction is developed in the context of lung cancer. This method uses the Illumina Infinium HumanMethylation450K Beadchip CpG methylation array data from paired lung cancer and adjacent normal tissues in The Cancer Genome Atlas (TCGA) and histone modification marker CHIP-Seq data from the ENCODE project, to predict the differential expression of RNA-Seq data in TCGA lung cancers. It considers a comprehensive list of 1424 features spanning the four categories of CpG methylation, histone H3 methylation modification, nucleotide composition, and conservation. Various feature selection and classification methods are compared to select the best model over 10-fold cross-validation in the training data set.ResultsA best model comprising 67 features is chosen by ReliefF based feature selection and random forest classification method, with AUC = 0.864 from the 10-fold cross-validation of the training set and AUC = 0.836 from the testing set. The selected features cover all four data types, with histone H3 methylation modification (32 features) and CpG methylation (15 features) being most abundant. Among the dropping-off tests of individual data-type based features, removal of CpG methylation feature leads to the most reduction in model performance. In the best model, 19 selected features are from the promoter regions (TSS200 and TSS1500), highest among all locations relative to transcripts. Sequential dropping-off of CpG methylation features relative to different regions on the protein coding transcripts shows that promoter regions contribute most significantly to the accurate prediction of gene expression.ConclusionsBy considering a comprehensive list of epigenomic and genomic features, we have constructed an accurate model to predict transcriptomic differential expression, exemplified in lung cancer.
Genome Biology | 2014
Mark Menor; Travers Ching; Xun Zhu; David Garmire; Lana X. Garmire
MiRNAs play important roles in many diseases including cancers. However computational prediction of miRNA target genes is challenging and the accuracies of existing methods remain poor. We report mirMark, a new machine learning-based method of miRNA target prediction at the site and UTR levels. This method uses experimentally verified miRNA targets from miRecords and mirTarBase as training sets and considers over 700 features. By combining Correlation-based Feature Selection with a variety of statistical or machine learning methods for the site- and UTR-level classifiers, mirMark significantly improves the overall predictive performance compared to existing publicly available methods. MirMark is available from https://github.com/lanagarmire/MirMark.
PeerJ | 2017
Xun Zhu; Travers Ching; Xinghua Pan; Sherman M. Weissman; Lana X. Garmire
Single-cell RNA-Sequencing (scRNA-Seq) is a fast-evolving technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well-suited bioinformatics tools to analyze the data generated from this new technology are still lacking. Here we investigate the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq datasets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy in separating similar groups in various datasets. We ranked genes by their importance scores (D-scores) in separating these groups, and discovered that NMF uniquely identifies genes expressed at intermediate levels as top-ranked genes. Finally, we show that in conjugation with the modularity detection method FEM, NMF reveals meaningful protein-protein interaction modules. In summary, we propose that NMF is a desirable method to analyze heterogeneous single-cell RNA-Seq data. The NMF based subpopulation detection package is available at: https://github.com/lanagarmire/NMFEM.
Biodata Mining | 2015
Travers Ching; Jayson Masaki; Jason L. Weirather; Lana X. Garmire
Long intergenic non-coding RNAs (lincRNAs) represent one of the most mysterious RNA species encoded by the human genome. Thanks to next generation sequencing (NGS) technology and its applications, we have recently witnessed a surge in non-coding RNA research, including lincRNA research. Here, we summarize the recent advancement in genomics studies of lincRNAs. We review the emerging characteristics of lincRNAs, the experimental and computational approaches to identify lincRNAs, their known mechanisms of regulation, the computational methods and resources for lincRNA functional predictions, and discuss the challenges to understanding lincRNA comprehensively.
EBioMedicine | 2016
Travers Ching; Karolina Peplowska; Sijia Huang; Xun Zhu; Yi Shen; Janos Molnar; Herbert Yu; Maarit Tiirikainen; Ben Fogelgren; Rong Fan; Lana X. Garmire
Long intergenic noncoding RNAs (lincRNAs) are a relatively new class of non-coding RNAs that have the potential as cancer biomarkers. To seek a panel of lincRNAs as pan-cancer biomarkers, we have analyzed transcriptomes from over 3300 cancer samples with clinical information. Compared to mRNA, lincRNAs exhibit significantly higher tissue specificities that are then diminished in cancer tissues. Moreover, lincRNA clustering results accurately classify tumor subtypes. Using RNA-Seq data from thousands of paired tumor and adjacent normal samples in The Cancer Genome Atlas (TCGA), we identify six lincRNAs as potential pan-cancer diagnostic biomarkers (PCAN-1 to PCAN-6). These lincRNAs are robustly validated using cancer samples from four independent RNA-Seq data sets, and are verified by qPCR in both primary breast cancers and MCF-7 cell line. Interestingly, the expression levels of these six lincRNAs are also associated with prognosis in various cancers. We further experimentally explored the growth and migration dependence of breast and colon cancer cell lines on two of the identified lncRNAs. In summary, our study highlights the emerging role of lincRNAs as potentially powerful and biologically functional pan-cancer biomarkers and represents a significant leap forward in understanding the biological and clinical functions of lincRNAs in cancers.