Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ruowang Li is active.

Publication


Featured researches published by Ruowang Li.


Biodata Mining | 2013

ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network.

Dokyoon Kim; Ruowang Li; Scott M. Dudek; Marylyn D. Ritchie

BackgroundGene expression profiles have been broadly used in cancer research as a diagnostic or prognostic signature for the clinical outcome prediction such as stage, grade, metastatic status, recurrence, and patient survival, as well as to potentially improve patient management. However, emerging evidence shows that gene expression-based prediction varies between independent data sets. One possible explanation of this effect is that previous studies were focused on identifying genes with large main effects associated with clinical outcomes. Thus, non-linear interactions without large individual main effects would be missed. The other possible explanation is that gene expression as a single level of genomic data is insufficient to explain the clinical outcomes of interest since cancer can be dysregulated by multiple alterations through genome, epigenome, transcriptome, and proteome levels. In order to overcome the variability of diagnostic or prognostic predictors from gene expression alone and to increase its predictive power, we need to integrate multi-levels of genomic data and identify interactions between them associated with clinical outcomes.ResultsHere, we proposed an integrative framework for identifying interactions within/between multi-levels of genomic data associated with cancer clinical outcomes using the Grammatical Evolution Neural Networks (GENN). In order to demonstrate the validity of the proposed framework, ovarian cancer data from TCGA was used as a pilot task. We found not only interactions within a single genomic level but also interactions between multi-levels of genomic data associated with survival in ovarian cancer. Notably, the integration model from different levels of genomic data achieved 72.89% balanced accuracy and outperformed the top models with any single level of genomic data.ConclusionsUnderstanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different levels of genomic data is expected to provide guidance for improved prognostic biomarkers and individualized therapies.


Biodata Mining | 2014

Knowledge-driven genomic interactions: an application in ovarian cancer

Do Kyoon Kim; Ruowang Li; Scott M. Dudek; Alex T. Frase; Sarah A. Pendergrass; Marylyn D. Ritchie

BackgroundEffective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner.MethodsThus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project.ResultsWe identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge.ConclusionsThe success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.


Journal of Biomedical Informatics | 2015

Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer

Dokyoon Kim; Ruowang Li; Scott M. Dudek; Marylyn D. Ritchie

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.


pacific symposium on biocomputing | 2014

Binning somatic mutations based on biological knowledge for predicting survival: an application in renal cell carcinoma.

Dokyoon Kim; Ruowang Li; Scott M. Dudek; John R. Wallace; Marylyn D. Ritchie

Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a new approach for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.


Journal of the American Medical Informatics Association | 2016

Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma

Dokyoon Kim; Ruowang Li; Anastasia Lucas; Shefali S. Verma; Scott M. Dudek; Marylyn D. Ritchie

It is common that cancer patients have different molecular signatures even though they have similar clinical features, such as histology, due to the heterogeneity of tumors. To overcome this variability, we previously developed a new approach incorporating prior biological knowledge that identifies knowledge-driven genomic interactions associated with outcomes of interest. However, no systematic approach has been proposed to identify interaction models between pathways based on multi-omics data. Here we have proposed such a novel methodological framework, called metadimensional knowledge-driven genomic interactions (MKGIs). To test the utility of the proposed framework, we applied it to an ovarian cancer dataset including multi-omics profiles from The Cancer Genome Atlas to predict grade, stage, and survival outcome. We found that each knowledge-driven genomic interaction model, based on different genomic datasets, contains different sets of pathway features, which suggests that each genomic data type may contribute to outcomes in ovarian cancer via a different pathway. In addition, MKGI models significantly outperformed the single knowledge-driven genomic interaction model. From the MKGI models, many interactions between pathways associated with outcomes were found, including the mitogen-activated protein kinase (MAPK) signaling pathway and the gonadotropin-releasing hormone (GnRH) signaling pathway, which are known to play important roles in cancer pathogenesis. The beauty of incorporating biological knowledge into the model based on multi-omics data is the ability to improve diagnosis and prognosis and provide better interpretability. Thus, determining variability in molecular signatures based on these interactions between pathways may lead to better diagnostic/treatment strategies for better precision medicine.


pacific symposium on biocomputing | 2016

BIOFILTER AS A FUNCTIONAL ANNOTATION PIPELINE FOR COMMON AND RARE COPY NUMBER BURDEN.

Dokyoon Kim; Anastasia Lucas; Joseph T. Glessner; Shefali S. Verma; Yukiko Bradford; Ruowang Li; Alex T. Frase; Hakon Hakonarson; Peggy L. Peissig; Murray H. Brilliant; Marylyn D. Ritchie

Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter - a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record - total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value < 0.01). Based on the copy number burden analysis, it follows that the more and larger the copy number changes, the more likely that one or more target genes that influence disease risk and phenotypic severity will be affected. Thus, our study suggests the proposed enrichment pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to explore CNV enrichment for any traits or phenotypes. Biofilter continues to be a powerful bioinformatics tool for annotating, filtering, and constructing biologically informed models for association analysis - now including copy number variants.


european conference on applications of evolutionary computation | 2014

An integrated analysis of genome-wide DNA methylation and genetic variants underlying etoposide-induced cytotoxicity in European and African populations

Ruowang Li; Do Kyoon Kim; Scott M. Dudek; Marylyn D. Ritchie

Genetic variations among individuals account for a large portion of variability in drug response. The underlying mechanism of the variability is still not known, but it is expected to comprise of a wide range of genetic factors that interact and communicate with each other. Here, we present an integrated genome-wide approach to uncover the interactions among genetic factors that can explain some of the inter-individual variation in drug response. The International HapMap consortium generated genotyping data on human lymphoblastoid cell lines of (Center d’Etude du Polymorphisme Humain population - CEU) European descent and (Yoruba population - YRI) African descent. Using genome-wide analysis, Huang et al. identified SNPs that are associated with etoposide, a chemotherapeutic drug, response on the cell lines. Using the same lymphoblastoid cell lines, Fraser et al. generated genome-wide methylation profiles for gene promoter regions. We evaluated associations between candidate SNPs generated by Huang et al and genome-wide methylation sites. The analysis identified a set of methylation sites that are associated with etoposide related SNPs. Using the set of methylation sites and the candidate SNPs, we built an integrated model to explain etoposide response observed in CEU and YRI cell lines. This integrated method can be extended to combine any number of genomics data types to explain many phenotypes of interest.


GPTP | 2014

Evaluation of Parameter Contribution to Neural Network Size and Fitness in ATHENA for Genetic Analysis

Ruowang Li; Emily Rose Holzinger; Scott M. Dudek; Marylyn D. Ritchie

The vast amount of available genomics data provides us an unprecedented ability to survey the entire genome and search for the genetic determinants of complex diseases. Until now, Genome-wide association studies have been the predominant method to associate DNA variations to disease traits. GWAS have successfully uncovered many genetic variants associated with complex diseases when the effect loci are strongly associated with the trait. However, methods for studying interaction effects among multiple loci are still lacking. Established machine learning methods such as the grammatical evolution neural networks (GENN) can be adapted to help us uncover the missing interaction effects that are not captured by GWAS studies. We used an implementation of GENN distributed in the software package ATHENA (Analysis Tool for Heritable and Environmental Network Associations) to investigate the effects of multiple GENN parameters and data noise levels on model detection and network structure. We concluded that the models produced by GENN were greatly affected by algorithm parameters and data noise levels. We also produced complex, multi-layer networks that were not produced in the previous study. In summary, GENN can produce complex, multi-layered networks when the data require it for higher fitness and when the parameter settings allow for a wide search of the complex model space.


Pharmacogenomics Journal | 2018

Integration of genetic and functional genomics data to uncover chemotherapeutic induced cytotoxicity

Ruowang Li; Dokyoon Kim; Heather E. Wheeler; Scott M. Dudek; M. Eileen Dolan; Marylyn D. Ritchie

Identifying genetic variants associated with chemotherapeutic induced toxicity is an important step towards personalized treatment of cancer patients. However, annotating and interpreting the associated genetic variants remains challenging because each associated variant is a surrogate for many other variants in the same region. The issue is further complicated when investigating patterns of associated variants with multiple drugs. In this study, we used biological knowledge to annotate and compare genetic variants associated with cellular sensitivity to mechanistically distinct chemotherapeutic drugs, including platinating agents (cisplatin, carboplatin), capecitabine, cytarabine, and paclitaxel. The most significantly associated SNPs from genome wide association studies of cellular sensitivity to each drug in lymphoblastoid cell lines derived from populations of European (CEU) and African (YRI) descent were analyzed for their enrichment in biological pathways and processes. We annotated genetic variants using higher-level biological annotations in efforts to group variants into more interpretable biological modules. Using the higher-level annotations, we observed distinct biological modules associated with cell line populations as well as classes of chemotherapeutic drugs. We also integrated genetic variants and gene expression variables to build predictive models for chemotherapeutic drug cytotoxicity and prioritized the network models based on the enrichment of DNA regulatory data. Several biological annotations, often encompassing different SNPs, were replicated in independent datasets. By using biological knowledge and DNA regulatory information, we propose a novel approach for jointly analyzing genetic variants associated with multiple chemotherapeutic drugs.


Biodata Mining | 2016

Identification of genetic interaction networks via an evolutionary algorithm evolved Bayesian network

Ruowang Li; Scott M. Dudek; Dokyoon Kim; Molly A. Hall; Yukiko Bradford; Peggy L. Peissig; Murray H. Brilliant; James G. Linneman; Catherine A. McCarty; Le Bao; Marylyn D. Ritchie

Collaboration


Dive into the Ruowang Li's collaboration.

Top Co-Authors

Avatar

Marylyn D. Ritchie

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Scott M. Dudek

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Dokyoon Kim

Geisinger Health System

View shared research outputs
Top Co-Authors

Avatar

Alex T. Frase

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Anastasia Lucas

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Do Kyoon Kim

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shefali S. Verma

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Yukiko Bradford

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge