Yunku Yeu
Yonsei University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yunku Yeu.
Computers in Biology and Medicine | 2013
Jaegyoon Ahn; Youngmi Yoon; Yunku Yeu; Hookuen Lee; Sanghyun Park
There has been much active research in bioinformatics to support our understanding of oncogenesis and tumor progression. Most research relies on mRNA gene expression data to identify marker genes or cancer specific gene networks. However, considering that proteins are functional molecules that carry out the biological tasks of genes, they can be direct markers of biological functions. Protein abundance data on a genome scale have not been investigated in depth due to the limited availability of high throughput protein assays. This hindrance is chiefly caused by a lack of robust techniques such as RT-PCR (real-time polymerase chain reaction). In this study, we quantified phospho-proteomes of breast cancer cell lines treated with TGF-beta (transforming growth factor beta). To discover biomarkers and observe changes in the signaling pathways related to breast cancer, we applied a protein network-based approach to generate a classifier of subnet markers. The accuracy of that classifier outperformed other network-based classification algorithms, and current feature selection and classification algorithms. Moreover, many cancer-related proteins were identified in those sub-networks. Each sub-network provides functional insights and can serve as a potential marker for TGF-beta treatments. After interpreting the roles of proteins in sub-networks with various signaling pathways, we found strong candidate proteins and various related interactions that are expected to affect breast cancer outcomes. These results demonstrate the high quality of the quantified phospho-proteomes data and show that our network construction and classification method is appropriate for an analysis of this type of data.
BMC Medical Informatics and Decision Making | 2013
Jaegyoon Ahn; Dae Hyun Lee; Youngmi Yoon; Yunku Yeu; Sanghyun Park
BackgroundDetecting protein complexes is one of essential and fundamental tasks in understanding various biological functions or processes. Therefore accurate identification of protein complexes is indispensable.MethodsFor more accurate detection of protein complexes, we propose an algorithm which detects dense protein sub-networks of which proteins share closely located bottleneck proteins. The proposed algorithm is capable of finding protein complexes which allow overlapping with each other.ResultsWe applied our algorithm to several PPI (Protein-Protein Interaction) networks of Saccharomyces cerevisiae and Homo sapiens, and validated our results using public databases of protein complexes. The prediction accuracy was even more improved over our previous work which used also bottleneck information of the PPI network, but showed limitation when predicting small-sized protein complex detection.ConclusionsOur algorithm resulted in overlapping protein complexes with significantly improved F1 score over existing algorithms. This result comes from high recall due to effective network search, as well as high precision due to proper use of bottleneck information during the network search.
conference on information and knowledge management | 2012
Jaegyoon Ahn; Dae Hyun Lee; Youngmi Yoon; Yunku Yeu; Sanghyun Park
Detecting protein complexes is one of essential and fundamental tasks in understanding various biological functions or processes. Therefore, precise identification of protein complexes is indispensible. For more precise detection of protein complexes, we propose a novel data structure which employs bottleneck proteins as partitioning points for detecting the protein complexes. The partitioning process allows overlapping between resulting protein complexes. We applied our algorithm to several PPI (Protein-Protein Interaction) networks of Saccharomyces cerevisiae and Homo sapiens, and validated our results using public databases of protein complexes. Our algorithm resulted in overlapping protein complexes with significantly improved F1 score, which comes from higher precision.
evolutionary computation machine learning and data mining in bioinformatics | 2011
Yunku Yeu; Jaegyoon Ahn; Youngmi Yoon; Sanghyun Park
Finding protein complexes and their functions is essential work for understanding biological process. However, one of the difficulties in inferring protein complexes from protein-protein interaction(PPI) network originates from the fact that protein interactions suffer from high false positive rate. We propose a complex finding algorithm which is not strongly dependent on topological traits of the protein interaction network. Our method exploits a new measure, GECSS (Gene Expression Condition Set Similarity) which considers mRNA expression data for a set of PPI. The complexes we found exhibit a higher match with reference complexes than the existing methods. Also we found several novel protein complexes, which are significantly enriched on Gene Ontology database.
acm symposium on applied computing | 2016
Junbum Cha; Jeongwoo Kim; Yunku Yeu; Sanghyun Park
As text mining advances rapidly in the biomedical field, the importance of text data is increasing. Most text data is obtained through a Medical Subjects Headings (MeSH) term search; in this process, a large amount of valuable data is missed because the data is not indexed yet with MeSH terms. In this paper, we propose a method for obtaining additional text data in addition to that obtained using a conventional MeSH term search. In order to obtain additional data, we used the Support Vector Machine (SVM) as the data mining method for classifying documents to related or unrelated. We evaluated the results using a frequency-based text mining approach measuring the quality of data in study of lung cancer. This was confirmed that the data extracted using our method provided as much valuable information as searching using MeSH terms. Further, we found that the amount of information found was increased by 40% using additional extracted data.
Information Sciences | 2015
Junsu Lee; Yunku Yeu; Hongchan Roh; Youngmi Yoon; Sanghyun Park
Abstract Sequence alignment is a widely-used tool in genomics. With the development of next generation sequencing (NGS) technology, the production of sequence read data has recently increased. A number of read alignment algorithms for handling NGS data have been developed. However, these algorithms suffer from a trade-off between the throughput and alignment quality, due to the large computational costs for processing repeat reads. Conversely, alignment algorithms with distributed systems such as Hadoop and Trinity can obtain a better throughput than existing algorithms on single machine without compromising the alignment quality. In this paper, we suggest BulkAligner, a novel sequence alignment algorithm on the graph-based in-memory distributed system Trinity. We covert the original reference sequence into graph form and perform sequence alignment by finding the longest paths on the graph. Our experimental results show that BulkAligner has at least an 1.8× and up to 57× better throughput with the same, or higher quality than existing algorithms with Hadoop. We analyze the scalability and show that we can obtain a better throughput by simply adding machines.
systems, man and cybernetics | 2016
Jungrim Kim; Yunku Yeu; Jeongwoo Kim; Youngmi Yoon; Sanghyun Park
The biclustering method is a useful co-clustering technique to identify biologically relevant gene modules. In this paper, we propose a novel method to find not only functionally-related gene modules but also state specific gene modules by applying a genetic algorithm to gene expression data. To identify these gene modules, the proposed method finds biclusters in which genes are statistically overexpressed or under expressed, and are differentially-expressed in the samples in the bicluster compared to the samples not in the bicluster. In addition, we improve the genetic algorithm by adding a selection pool for preserving the diversity of the population. The resulting gene modules exhibit better performances than comparative methods in the GO (Gene Ontology) term enrichment test and an analysis connection between gene modules and disease. This is especially the case with gene modules that receive the highest score in the breast cancer dataset; they are closely linked to the ribosome pathway. Recent studies show that dysregulation of ribosome biogenesis is associated with breast tumor progression.
acm symposium on applied computing | 2015
Daye Jeong; Yunku Yeu; Jaegyoon Ahn; Youngmi Yoon; Sanghyun Park
An important goal of systems biology is to understand and identify mechanisms of the human body system. Genes play functional roles in the context of complex pathways. Analysis of genes as networks is therefore important to understand whole system mechanisms. Biological activities are governed by various signaling networks. The advent of high-throughput technologies has made it possible to obtain biological information on a genome-wide scale. Genetic interactions have been identified from high-throughput data such as microarray data using Bayesian networks. In this paper, we infer the disease-specific gene interaction network using a Bayesian network, which is robust to noise in the data. We apply a genetic algorithm to learn the Bayesian network. We use heterogeneous data, including microarray, protein-protein interaction (PPI), and HumanNET data to learn and score the network. We also exploit single nucleotide polymorphism (SNP) data to infer disease-specific genetic interaction network. We included SNPs as this data may help detect weak signals related to genetic variation. In this paper, we reconstruct interactions between pathway genes using our method. We confirm that our method has statistically significant reconstruction power by applying it to Type II diabetes data. Importantly, using Alzheimer disease data, we infer an unreported interaction between a SNP and a disease-related gene.
data and text mining in bioinformatics | 2014
Jeongwoo Kim; Hyunjin Kim; Yunku Yeu; Mincheol Shin; Sanghyun Park
After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.
bioinformatics and biomedicine | 2014
Jungrim Kim; Youngmi Yoon; Sanghyun Park; Jeagyoon Ahn; Yunku Yeu
Gene clustering is a method for finding gene sets which are related to the same biological processes or molecular function. In order to find these gene sets, previous studies have clustered genes which showed similar mRNA expression or a specific expression pattern in a (sub) sample set. However, for two contrasting groups of samples, it is not easy to identify gene sets which show significant expression pattern in only one group using current gene clustering methods. Existing biclustering methods use only one group (disease) of samples. It is hard to identify disease specific biclusters which are differentially expressed in the disease although those methods can find biclusters which have specific expression pattern. Here, we proposed a novel method using a genetic algorithm in gene expression data, in order to find gene sets which can represent specific subtype of cancer. Proposed method finds gene sets which have statistically differential mRNA expression on two contrasting samples and fraction of cancer samples. The resulting gene modules share higher number of GO (Gene Ontology) terms related to a specific disease than gene modules identified by current algorithms. We also identify that when we integrate protein-protein interaction data with gene expression data of colorectal cancer samples, proposed method can find more functionally related gene sets.