Sunmo Yang
Yonsei University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Sunmo Yang.
Scientific Reports | 2015
Heonjong Han; Hongseok Shim; Donghyun Shin; Jung Eun Shim; Yunhee Ko; Junha Shin; Hanhae Kim; Ara Cho; Eiru Kim; Tak Lee; Hyojin Kim; Kyung Soo Kim; Sunmo Yang; Dasom Bae; Ayoung Yun; Sunphil Kim; Chan Yeong Kim; Hyeon Jin Cho; Byunghee Kang; Susie Shin; Insuk Lee
The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.
Nucleic Acids Research | 2015
Tak Lee; Sunmo Yang; Eiru Kim; Younhee Ko; Sohyun Hwang; Junha Shin; Jung Eun Shim; Hongseok Shim; Hyojin Kim; Chanyoung Kim; Insuk Lee
Arabidopsis thaliana is a reference plant that has been studied intensively for several decades. Recent advances in high-throughput experimental technology have enabled the generation of an unprecedented amount of data from A. thaliana, which has facilitated data-driven approaches to unravel the genetic organization of plant phenotypes. We previously published a description of a genome-scale functional gene network for A. thaliana, AraNet, which was constructed by integrating multiple co-functional gene networks inferred from diverse data types, and we demonstrated the predictive power of this network for complex phenotypes. More recently, we have observed significant growth in the availability of omics data for A. thaliana as well as improvements in data analysis methods that we anticipate will further enhance the integrated database of co-functional networks. Here, we present an updated co-functional gene network for A. thaliana, AraNet v2 (available at http://www.inetbio.org/aranet), which covers approximately 84% of the coding genome. We demonstrate significant improvements in both genome coverage and accuracy. To enhance the usability of the network, we implemented an AraNet v2 web server, which generates functional predictions for A. thaliana and 27 nonmodel plant species using an orthology-based projection of nonmodel plant genes on the A. thaliana gene network.
Nucleic Acids Research | 2014
Sohyun Hwang; Eiru Kim; Sunmo Yang; Edward M. Marcotte; Insuk Lee
Despite recent advances in human genetics, model organisms are indispensable for human disease research. Most human disease pathways are evolutionally conserved among other species, where they may phenocopy the human condition or be associated with seemingly unrelated phenotypes. Much of the known gene-to-phenotype association information is distributed across diverse databases, growing rapidly due to new experimental techniques. Accessible bioinformatics tools will therefore facilitate translation of discoveries from model organisms into human disease biology. Here, we present a web-based discovery tool for human disease studies, MORPHIN (model organisms projected on a human integrated gene network), which prioritizes the most relevant human diseases for a given set of model organism genes, potentially highlighting new model systems for human diseases and providing context to model organism studies. Conceptually, MORPHIN investigates human diseases by an orthology-based projection of a set of model organism genes onto a genome-scale human gene network. MORPHIN then prioritizes human diseases by relevance to the projected model organism genes using two distinct methods: a conventional overlap-based gene set enrichment analysis and a network-based measure of closeness between the query and disease gene sets capable of detecting associations undetectable by the conventional overlap-based methods. MORPHIN is freely accessible at http://www.inetbio.org/morphin.
Nucleic Acids Research | 2018
Heonjong Han; Jae Won Cho; Sangyoung Lee; Ayoung Yun; Hyojin Kim; Dasom Bae; Sunmo Yang; Chan Yeong Kim; Muyoung Lee; Eunbeen Kim; Sungho Lee; Byunghee Kang; Dabin Jeong; Yaeji Kim; Hyeon Nae Jeon; Haein Jung; Sunhwee Nam; Michael Chung; Jong Hoon Kim; Insuk Lee
Abstract Transcription factors (TFs) are major trans-acting factors in transcriptional regulation. Therefore, elucidating TF–target interactions is a key step toward understanding the regulatory circuitry underlying complex traits such as human diseases. We previously published a reference TF–target interaction database for humans—TRRUST (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining)—which was constructed using sentence-based text mining, followed by manual curation. Here, we present TRRUST v2 (www.grnpedia.org/trrust) with a significant improvement from the previous version, including a significantly increased size of the database consisting of 8444 regulatory interactions for 800 TFs in humans. More importantly, TRRUST v2 also contains a database for TF–target interactions in mice, including 6552 TF–target interactions for 828 mouse TFs. TRRUST v2 is also substantially more comprehensive and less biased than other TF–target interaction databases. We also improved the web interface, which now enables prioritization of key TFs for a physiological condition depicted by a set of user-input transcriptional responsive genes. With the significant expansion in the database size and inclusion of the new web tool for TF prioritization, we believe that TRRUST v2 will be a versatile database for the study of the transcriptional regulation involved in human diseases.
Nucleic Acids Research | 2017
Jung Eun Shim; Changbae Bang; Sunmo Yang; Tak Lee; Sohyun Hwang; Chan Yeong Kim; U. Martin Singh-Blom; Edward M. Marcotte; Insuk Lee
Abstract During the last decade, genome-wide association studies (GWAS) have represented a major approach to dissect complex human genetic diseases. Due in part to limited statistical power, most studies identify only small numbers of candidate genes that pass the conventional significance thresholds (e.g. P ≤ 5 × 10−8). This limitation can be partly overcome by increasing the sample size, but this comes at a higher cost. Alternatively, weak association signals can be boosted by incorporating independent data. Previously, we demonstrated the feasibility of boosting GWAS disease associations using gene networks. Here, we present a web server, GWAB (www.inetbio.org/gwab), for the network-based boosting of human GWAS data. Using GWAS summary statistics (P-values) for SNPs along with reference genes for a disease of interest, GWAB reprioritizes candidate disease genes by integrating the GWAS and network data. We found that GWAB could more effectively retrieve disease-associated reference genes than GWAS could alone. As an example, we describe GWAB-boosted candidate genes for coronary artery disease and supporting data in the literature. These results highlight the inherent value in sub-threshold GWAS associations, which are often not publicly released. GWAB offers a feasible general approach to boost such associations for human disease genetics.
Scientific Reports | 2016
Sohyun Hwang; Chan Yeong Kim; Sun-Gou Ji; Junhyeok Go; Hanhae Kim; Sunmo Yang; Hye Jin Kim; Ara Cho; Sang Sun Yoon; Insuk Lee
Pseudomonas aeruginosa is a Gram-negative bacterium of clinical significance. Although the genome of PAO1, a prototype strain of P. aeruginosa, has been extensively studied, approximately one-third of the functional genome remains unknown. With the emergence of antibiotic-resistant strains of P. aeruginosa, there is an urgent need to develop novel antibiotic and anti-virulence strategies, which may be facilitated by an approach that explores P. aeruginosa gene function in systems-level models. Here, we present a genome-wide functional network of P. aeruginosa genes, PseudomonasNet, which covers 98% of the coding genome, and a companion web server to generate functional hypotheses using various network-search algorithms. We demonstrate that PseudomonasNet-assisted predictions can effectively identify novel genes involved in virulence and antibiotic resistance. Moreover, an antibiotic-resistance network based on PseudomonasNet reveals that P. aeruginosa has common modular genetic organisations that confer increased or decreased resistance to diverse antibiotics, which accounts for the pervasiveness of cross-resistance across multiple drugs. The same network also suggests that P. aeruginosa has developed mechanism of trade-off in resistance across drugs by altering genetic interactions. Taken together, these results clearly demonstrate the usefulness of a genome-scale functional network to investigate pathogenic systems in P. aeruginosa.
Nucleic Acids Research | 2016
Eiru Kim; Sohyun Hwang; Hyojin Kim; Hongseok Shim; Byunghee Kang; Sunmo Yang; Jae Ho Shim; Seung Yeon Shin; Edward M. Marcotte; Insuk Lee
Laboratory mouse, Mus musculus, is one of the most important animal tools in biomedical research. Functional characterization of the mouse genes, hence, has been a long-standing goal in mammalian and human genetics. Although large-scale knockout phenotyping is under progress by international collaborative efforts, a large portion of mouse genome is still poorly characterized for cellular functions and associations with disease phenotypes. A genome-scale functional network of mouse genes, MouseNet, was previously developed in context of MouseFunc competition, which allowed only limited input data for network inferences. Here, we present an improved mouse co-functional network, MouseNet v2 (available at http://www.inetbio.org/mousenet), which covers 17 714 genes (>88% of coding genome) with 788 080 links, along with a companion web server for network-assisted functional hypothesis generation. The network database has been substantially improved by large expansion of genomics data. For example, MouseNet v2 database contains 183 co-expression networks inferred from 8154 public microarray samples. We demonstrated that MouseNet v2 is predictive for mammalian phenotypes as well as human diseases, which suggests its usefulness in discovery of novel disease genes and dissection of disease pathways. Furthermore, MouseNet v2 database provides functional networks for eight other vertebrate models used in various research fields.
Nucleic Acids Research | 2015
Junha Shin; Sunmo Yang; Eiru Kim; Chan Yeong Kim; Hongseok Shim; Ara Cho; Hyojin Kim; Sohyun Hwang; Jung Eun Shim; Insuk Lee
Drosophila melanogaster (fruit fly) has been a popular model organism in animal genetics due to the high accessibility of reverse-genetics tools. In addition, the close relationship between the Drosophila and human genomes rationalizes the use of Drosophila as an invertebrate model for human neurobiology and disease research. A platform technology for predicting candidate genes or functions would further enhance the usefulness of this long-established model organism for gene-to-phenotype mapping. Recently, the power of network prioritization for gene-to-phenotype mapping has been demonstrated in many organisms. Here we present a network prioritization server dedicated to Drosophila that covers ∼95% of the coding genome. This server, dubbed FlyNet, has several distinctive features, including (i) prioritization for both genes and functions; (ii) two complementary network algorithms: direct neighborhood and network diffusion; (iii) spatiotemporal-specific networks as an additional prioritization strategy for traits associated with a specific developmental stage or tissue and (iv) prioritization for human disease genes. FlyNet is expected to serve as a versatile hypothesis-generation platform for genes and functions in the study of basic animal genetics, developmental biology and human disease. FlyNet is available for free at http://www.inetbio.org/flynet.
Nucleic Acids Research | 2016
Hongseok Shim; Ji Hyun Kim; Chan Yeong Kim; Sohyun Hwang; Hyojin Kim; Sunmo Yang; Ji Eun Lee; Insuk Lee
Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery.
Nucleic Acids Research | 2017
Sunmo Yang; Chan Yeong Kim; Sohyun Hwang; Eiru Kim; Hyojin Kim; Hongseok Shim; Insuk Lee
The use of high-throughput array and sequencing technologies has produced unprecedented amounts of gene expression data in central public depositories, including the Gene Expression Omnibus (GEO). The immense amount of expression data in GEO provides both vast research opportunities and data analysis challenges. Co-expression analysis of high-dimensional expression data has proven effective for the study of gene functions, and several co-expression databases have been developed. Here, we present a new co-expression database, COEXPEDIA (www.coexpedia.org), which is distinctive from other co-expression databases in three aspects: (i) it contains only co-functional co-expressions that passed a rigorous statistical assessment for functional association, (ii) the co-expressions were inferred from individual studies, each of which was designed to investigate gene functions with respect to a particular biomedical context such as a disease and (iii) the co-expressions are associated with medical subject headings (MeSH) that provide biomedical information for anatomical, disease, and chemical relevance. COEXPEDIA currently contains approximately eight million co-expressions inferred from 384 and 248 GEO series for humans and mice, respectively. We describe how these MeSH-associated co-expressions enable the identification of diseases and drugs previously unknown to be related to a gene or a gene group of interest.
