Sijia Huang
University of Hawaii
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sijia Huang.
RNA | 2014
Travers Ching; Sijia Huang; Lana X. Garmire
It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.
Frontiers in Genetics | 2017
Sijia Huang; Kumardeep Chaudhary; Lana X. Garmire
Multi-omics data integration is one of the major challenges in the era of precision medicine. Considerable work has been done with the advent of high-throughput studies, which have enabled the data access for downstream analyses. To improve the clinical outcome prediction, a gamut of software tools has been developed. This review outlines the progress done in the field of multi-omics integration and comprehensive tools developed so far in this field. Further, we discuss the integration methods to predict patient survival at the end of the review.
PLOS Computational Biology | 2014
Sijia Huang; Cameron Yee; Travers Ching; Herbert Yu; Lana X. Garmire
Breast cancer is the most common malignancy in women worldwide. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. Towards this goal, we have developed a novel computational model for breast cancer prognosis by combining the Pathway Deregulation Score (PDS) based pathifier algorithm, Cox regression and L1-LASSO penalization method. We trained the model on a set of 236 patients with gene expression data and clinical information, and validated the performance on three diversified testing data sets of 606 patients. To evaluate the performance of the model, we conducted survival analysis of the dichotomized groups, and compared the areas under the curve based on the binary classification. The resulting prognosis genomic model is composed of fifteen pathways (e.g. P53 pathway) that had previously reported cancer relevance, and it successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value<0.0005). Moreover, the pathway-based genomic models consistently performed better than gene-based models on all four data sets. We also find strong evidence that combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone. In summary, we propose a novel prognosis model that harnesses the pathway-based dysregulation as well as valuable clinical information. The selected pathways in our prognosis model are promising targets for therapeutic intervention.
BMC Bioinformatics | 2015
Jeffery Li; Travers Ching; Sijia Huang; Lana X. Garmire
BackgroundEpigenetic alterations are known to correlate with changes in gene expression among various diseases including cancers. However, quantitative models that accurately predict the up or down regulation of gene expression are currently lacking.MethodsA new machine learning-based method of gene expression prediction is developed in the context of lung cancer. This method uses the Illumina Infinium HumanMethylation450K Beadchip CpG methylation array data from paired lung cancer and adjacent normal tissues in The Cancer Genome Atlas (TCGA) and histone modification marker CHIP-Seq data from the ENCODE project, to predict the differential expression of RNA-Seq data in TCGA lung cancers. It considers a comprehensive list of 1424 features spanning the four categories of CpG methylation, histone H3 methylation modification, nucleotide composition, and conservation. Various feature selection and classification methods are compared to select the best model over 10-fold cross-validation in the training data set.ResultsA best model comprising 67 features is chosen by ReliefF based feature selection and random forest classification method, with AUC = 0.864 from the 10-fold cross-validation of the training set and AUC = 0.836 from the testing set. The selected features cover all four data types, with histone H3 methylation modification (32 features) and CpG methylation (15 features) being most abundant. Among the dropping-off tests of individual data-type based features, removal of CpG methylation feature leads to the most reduction in model performance. In the best model, 19 selected features are from the promoter regions (TSS200 and TSS1500), highest among all locations relative to transcripts. Sequential dropping-off of CpG methylation features relative to different regions on the protein coding transcripts shows that promoter regions contribute most significantly to the accurate prediction of gene expression.ConclusionsBy considering a comprehensive list of epigenomic and genomic features, we have constructed an accurate model to predict transcriptomic differential expression, exemplified in lung cancer.
EBioMedicine | 2016
Travers Ching; Karolina Peplowska; Sijia Huang; Xun Zhu; Yi Shen; Janos Molnar; Herbert Yu; Maarit Tiirikainen; Ben Fogelgren; Rong Fan; Lana X. Garmire
Long intergenic noncoding RNAs (lincRNAs) are a relatively new class of non-coding RNAs that have the potential as cancer biomarkers. To seek a panel of lincRNAs as pan-cancer biomarkers, we have analyzed transcriptomes from over 3300 cancer samples with clinical information. Compared to mRNA, lincRNAs exhibit significantly higher tissue specificities that are then diminished in cancer tissues. Moreover, lincRNA clustering results accurately classify tumor subtypes. Using RNA-Seq data from thousands of paired tumor and adjacent normal samples in The Cancer Genome Atlas (TCGA), we identify six lincRNAs as potential pan-cancer diagnostic biomarkers (PCAN-1 to PCAN-6). These lincRNAs are robustly validated using cancer samples from four independent RNA-Seq data sets, and are verified by qPCR in both primary breast cancers and MCF-7 cell line. Interestingly, the expression levels of these six lincRNAs are also associated with prognosis in various cancers. We further experimentally explored the growth and migration dependence of breast and colon cancer cell lines on two of the identified lncRNAs. In summary, our study highlights the emerging role of lincRNAs as potentially powerful and biologically functional pan-cancer biomarkers and represents a significant leap forward in understanding the biological and clinical functions of lincRNAs in cancers.
Clinical and translational medicine | 2017
Michael A. Ortega; Olivier B. Poirion; Xun Zhu; Sijia Huang; Thomas K. Wolfgruber; Robert Sebra; Lana X. Garmire
It has become increasingly clear that both normal and cancer tissues are composed of heterogeneous populations. Genetic variation can be attributed to the downstream effects of inherited mutations, environmental factors, or inaccurately resolved errors in transcription and replication. When lesions occur in regions that confer a proliferative advantage, it can support clonal expansion, subclonal variation, and neoplastic progression. In this manner, the complex heterogeneous microenvironment of a tumour promotes the likelihood of angiogenesis and metastasis. Recent advances in next-generation sequencing and computational biology have utilized single-cell applications to build deep profiles of individual cells that are otherwise masked in bulk profiling. In addition, the development of new techniques for combining single-cell multi-omic strategies is providing a more precise understanding of factors contributing to cellular identity, function, and growth. Continuing advancements in single-cell technology and computational deconvolution of data will be critical for reconstructing patient specific intra-tumour features and developing more personalized cancer treatments.
Scientific Reports | 2016
Liangqun Lu; Sara McCurdy; Sijia Huang; Xun Zhu; Karolina Peplowska; Maarit Tiirikainen; William A. Boisvert; Lana X. Garmire
Polarization of macrophages is regulated through complex signaling networks. Correlating miRNA and mRNA expression over time after macrophage polarization has not yet been investigated. We used paired RNA-Seq and miRNA-Seq experiments to measure the mRNA and miRNA expression in bone marrow-derived macrophages over a time-series of 8 hours. Bioinformatics analysis identified 31 differentially expressed miRNAs between M1 and M2 polarized macrophages. The top 4 M1 miRNAs (miR-155-3p, miR-155-5p, miR-147-3p and miR-9-5p) and top 4 M2 miRNAs (miR-27a-5p, let-7c-1-3p, miR-23a-5p and miR-23b-5p) were validated by qPCR. Interestingly, M1 specific miRNAs could be categorized to early- and late-response groups, in which three new miRNAs miR-1931, miR-3473e and miR-5128 were validated as early-response miRNAs. M1 polarization led to the enrichment of genes involved in immune responses and signal transduction, whereas M2 polarization enriched genes involved in cell cycle and metabolic processes. C2H2 zinc-finger family members are key targets of DE miRNAs. The integrative analysis between miRNAs and mRNAs demonstrates the regulations of miRNAs on nearly four thousand differentially expressed genes and most of the biological pathways enriched in macrophage polarization. In summary, this study elucidates the expression profiles of miRNAs and their potential targetomes during macrophage polarization.
Oncotarget | 2016
Runmin Wei; Immaculata De Vivo; Sijia Huang; Xun Zhu; Harvey A. Risch; Jason H. Moore; Herbert Yu; Lana X. Garmire
Endometrial Cancer (EC) is one of the most common female cancers. Genome-wide association studies (GWAS) have been investigated to identify genetic polymorphisms that are predictive of EC risks. Here we utilized a meta-dimensional integrative approach to seek genetically susceptible pathways that may be associated with tumorigenesis and progression of EC. We analyzed GWAS data obtained from Connecticut Endometrial Cancer Study (CECS) and identified the top 20 EC susceptible pathways. To further verify the significance of top 20 EC susceptible pathways, we conducted pathway-level multi-omics analyses using EC exome-Seq, RNA-Seq and survival data, all based on The Cancer Genome Atlas (TCGA) samples. We measured the overall consistent rankings of these pathways in all four data types. Some well-studied pathways, such as p53 signaling and cell cycle pathways, show consistently high rankings across different analyses. Additionally, other cell signaling pathways (e.g. IGF-1/mTOR, rac-1 and IL-5 pathway), genetic information processing pathway (e.g. homologous recombination) and metabolism pathway (e.g. sphingolipid metabolism) are also highly associated with EC risks, diagnosis and prognosis. In conclusion, the meta-dimensional integration of EC cohorts has suggested some common pathways that may be associated from predisposition, tumorigenesis to progression.
Cancer Epidemiology, Biomarkers & Prevention | 2016
Sijia Huang; Lei Kou; Hideki Furuya; Changhong Yu; Steve Goodison; Michael W. Kattan; Lana X. Garmire; Charles J. Rosser
Background: Improvements in the noninvasive clinical evaluation of patients at risk for bladder cancer would be of benefit both to individuals and to health care systems. We investigated the potential utility of a hybrid nomogram that combined key demographic features with the results of a multiplex urinary biomarker assay in hopes of identifying patients at risk of harboring bladder cancer. Methods: Logistic regression analysis was used to model the probability of bladder cancer burden in a cohort of 686 subjects (394 with bladder cancer) using key demographic features alone, biomarker data alone, and the combination of demographic features and key biomarker data. We examined discrimination, calibration, and decision curve analysis techniques to evaluate prediction model performance. Results: Area under the receiver operating characteristic curve (AUC) analyses revealed that demographic features alone predicted tumor burden with an accuracy of 0.806 [95% confidence interval (CI), 0.76–0.85], while biomarker data had an accuracy of 0.835 (95% CI, 0.80–0.87). The addition of molecular data into the nomogram improved the predictive performance to 0.891 (95% CI, 0.86–0.92). Decision curve analyses showed that the hybrid nomogram performed better than demographic or biomarker data alone. Conclusion: A nomogram construction strategy that combines key demographic features with biomarker data may facilitate the accurate, noninvasive evaluation of patients at risk of harboring bladder cancer. Further research is needed to evaluate the bladder cancer risk nomogram for potential clinical utility. Impact: The application of such a nomogram may better inform the decision to perform invasive diagnostic procedures. Cancer Epidemiol Biomarkers Prev; 25(9); 1361–6. ©2016 AACR.
bioRxiv | 2018
Fadhl Alakwaa; Sijia Huang; Lana X. Garmire
Lilikoi (Hawaiian word for passion fruit) is a new and comprehensive R package for personalized pathway based classification modelling, using metabolomics data. Four basic modules are presented as the backbone of the package: 1) Feature mapping module, which standardizes the metabolite names provided by users, and map them to pathways. 2) Dimension transformation module, which transforms the metabolomic profiles to personalized pathway-based profiles using pathway deregulation scores (PDS). 3) Feature selection module which helps to select the significant pathway features related to the disease phenotypes, and 4) Classification and prediction module which offers various machine-learning classification algorithms. The package is freely available under the GPLv3 license through the github repository at: https://github.com/lanagarmire/lilikoi