Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Donghyeon Yu is active.

Publication


Featured researches published by Donghyeon Yu.


Genomics & Informatics | 2013

Review of biological network data and its applications.

Donghyeon Yu; Min-Soo Kim; Guanghua Xiao; Tae Hyun Hwang

Studying biological networks, such as protein-protein interactions, is key to understanding complex biological activities. Various types of large-scale biological datasets have been collected and analyzed with high-throughput technologies, including DNA microarray, next-generation sequencing, and the two-hybrid screening system, for this purpose. In this review, we focus on network-based approaches that help in understanding biological systems and identifying biological functions. Accordingly, this paper covers two major topics in network biology: reconstruction of gene regulatory networks and network-based applications, including protein function prediction, disease gene prioritization, and network-based genome-wide association study.


Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring | 2016

Predicting progression from mild cognitive impairment to Alzheimer's disease using longitudinal callosal atrophy

Sang Han Lee; Alvin H. Bachman; Donghyeon Yu; Johan Lim; Babak A. Ardekani

We investigate whether longitudinal callosal atrophy could predict conversion from mild cognitive impairment (MCI) to Alzheimers disease (AD).


Journal of Computational and Graphical Statistics | 2015

High-Dimensional Fused Lasso Regression Using Majorization–Minimization and Parallel Processing

Donghyeon Yu; Joong-Ho Won; Tae Hoon Lee; Johan Lim; Sungroh Yoon

In this article, we propose a majorization–minimization (MM) algorithm for high-dimensional fused lasso regression (FLR) suitable for parallelization using graphics processing units (GPUs). The MM algorithm is stable and flexible as it can solve the FLR problems with various types of design matrices and penalty structures within a few tens of iterations. We also show that the convergence of the proposed algorithm is guaranteed. We conduct numerical studies to compare our algorithm with other existing algorithms, demonstrating that the proposed MM algorithm is competitive in many settings including the two-dimensional FLR with arbitrary design matrices. The merit of GPU parallelization is also exhibited. Supplementary materials are available online.


Journal of Neuroscience Methods | 2014

Application of fused lasso logistic regression to the study of corpus callosum thickness in early Alzheimer's disease.

Sang H. Lee; Donghyeon Yu; Alvin H. Bachman; Johan Lim; Babak A. Ardekani

We propose a fused lasso logistic regression to analyze callosal thickness profiles. The fused lasso regression imposes penalties on both the l1-norm of the model coefficients and their successive differences, and finds only a small number of non-zero coefficients which are locally constant. An iterative method of solving logistic regression with fused lasso regularization is proposed to make this a practical procedure. In this study we analyzed callosal thickness profiles sampled at 100 equal intervals between the rostrum and the splenium. The method was applied to corpora callosa of elderly normal controls (NCs) and patients with very mild or mild Alzheimers disease (AD) from the Open Access Series of Imaging Studies (OASIS) database. We found specific locations in the genu and splenium of AD patients that are proportionally thinner than those of NCs. Callosal thickness in these regions combined with the Mini Mental State Examination scores differentiated AD from NC with 84% accuracy.


Bioinformatics | 2016

An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types

Sunho Park; Seung Jun Kim; Donghyeon Yu; Samuel Peña-Llopis; Jianjiong Gao; Jin Suk Park; Beibei Chen; Jessie Norris; Xinlei Wang; Min Chen; Min-Soo Kim; Jeongsik Yong; Zabi Wardak; Kevin S. Choe; Michael D. Story; Timothy K. Starr; Jae Ho Cheong; Tae Hyun Hwang

MOTIVATION Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. Precise identification and understanding of these altered pathways may provide novel insights into patient stratification, therapeutic strategies and the development of new drugs. However, a challenge remains in accurately identifying pathways altered by somatic mutations across human cancers, due to the diverse mutation spectrum. We developed an innovative approach to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. RESULTS We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4790 cancer patients with 19 different types of tumors. Our analysis identified cancer-type-specific altered pathways enriched with known cancer-relevant genes and targets of currently available drugs. To investigate the clinical significance of these altered pathways, we performed consensus clustering for patient stratification using member genes in the altered pathways coupled with gene expression datasets from 4870 patients from TCGA, and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal. AVAILABILITY AND IMPLEMENTATION The code is available at: http://www.taehyunlab.org CONTACT [email protected] or [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Computational Statistics & Data Analysis | 2012

Permutation test for incomplete paired data with application to cDNA microarray data

Donghyeon Yu; Johan Lim; Feng Liang; Kyunga Kim; Byung Soo Kim; Woncheol Jang

A paired data set is common in microarray experiments, where the data are often incompletely observed for some pairs due to various technical reasons. In microarray paired data sets, it is of main interest to detect differentially expressed genes, which are usually identified by testing the equality of means of expressions within a pair. While much attention has been paid to testing mean equality with incomplete paired data in previous literature, the existing methods commonly assume the normality of data or rely on the large sample theory. In this paper, we propose a new test based on permutations, which is free from the normality assumption and large sample theory. We consider permutation statistics with linear mixtures of paired and unpaired samples as test statistics, and propose a procedure to find the optimal mixture that minimizes the conditional variances of the test statistics, given the observations. Simulations are conducted for numerical power comparisons between the proposed permutation tests and other existing methods. We apply the proposed method to find differentially expressed genes for a colorectal cancer study.


Biostatistics | 2015

Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks.

Donghyeon Yu; Won Son; Johan Lim; Guanghua Xiao

We study the estimation of a Gaussian graphical model whose dependent structures are partially identified. In a Gaussian graphical model, an off-diagonal zero entry in the concentration matrix (the inverse covariance matrix) implies the conditional independence of two corresponding variables, given all other variables. A number of methods have been proposed to estimate a sparse large-scale Gaussian graphical model or, equivalently, a sparse large-scale concentration matrix. In practice, the graph structure to be estimated is often partially identified by other sources or a pre-screening. In this paper, we propose a simple modification of existing methods to take into account this information in the estimation. We show that the partially identified dependent structure reduces the error in estimating the dependent structure. We apply the proposed method to estimating the gene regulatory network from lung cancer data, where protein-protein interactions are partially identified from the human protein reference database. The application shows that proposed method identified many important cancer genes as hub genes in the constructed lung cancer network. In addition, we validated the prognostic importance of a newly identified cancer gene, PTPN13, in four independent lung cancer datasets. The results indicate that the proposed method could facilitate studying underlying lung cancer mechanisms and identifying reliable biomarkers for lung cancer prognosis.


Applied Intelligence | 2018

Comparative study of computational algorithms for the Lasso with high-dimensional, highly correlated data

Baekjin Kim; Donghyeon Yu; Joong-Ho Won

Variable selection is important in high-dimensional data analysis. The Lasso regression is useful since it possesses sparsity, soft-decision rule, and computational efficiency. However, since the Lasso penalized likelihood contains a nondifferentiable term, standard optimization tools cannot be applied. Many computation algorithms to optimize this Lasso penalized likelihood function in high-dimensional settings have been proposed. To name a few, coordinate descent (CD) algorithm, majorization-minimization using local quadratic approximation, fast iterative shrinkage thresholding algorithm (FISTA) and alternating direction method of multipliers (ADMM). In this paper, we undertake a comparative study that analyzes relative merits of these algorithms. We are especially concerned with numerical sensitivity to the correlation between the covariates. We conduct a simulation study considering factors that affect the condition number of covariance matrix of the covariates, as well as the level of penalization. We apply the algorithms to cancer biomarker discovery, and compare convergence speed and stability.


Statistical Applications in Genetics and Molecular Biology | 2012

Detection of differentially expressed gene sets in a partially paired microarray data set.

Johan Lim; Jayoun Kim; Sang-Cheol Kim; Donghyeon Yu; Kyunga Kim; Byung Soo Kim

Partially paired data sets often occur in microarray experiments (Kim et al., 2005; Liu, Liang and Jang, 2006). Discussions of testing with partially paired data are found in the literature (Lin and Stivers 1974; Ekbohm, 1976; Bhoj, 1978). Bhoj (1978) initially proposed a test statistic that uses a convex combination of paired and unpaired t statistics. Kim et al. (2005) later proposed the t3 statistic, which is a linear combination of paired and unpaired t statistics, and then used it to detect differentially expressed (DE) genes in colorectal cancer (CRC) cDNA microarray data. In this paper, we extend Kim et al.’s t3 statistic to the Hotelling’s T2 type statistic Tp for detecting DE gene sets of size p. We employ Efron’s empirical null principle to incorporate inter-gene correlation in the estimation of the false discovery rate. Then, the proposed Tp statistic is applied to Kim et al’s CRC data to detect the DE gene sets of sizes p=2 and p=3. Our results show that for small p, particularly for p=2 and marginally for p=3, the proposed Tp statistic compliments the univariate procedure by detecting additional DE genes that were undetected in the univariate test procedure. We also conduct a simulation study to demonstrate that Efron’s empirical null principle is robust to the departure from the normal assumption.


BMC Bioinformatics | 2017

Enhanced construction of gene regulatory networks using hub gene information

Donghyeon Yu; Johan Lim; Xinlei Wang; Faming Liang; Guanghua Xiao

BackgroundGene regulatory networks reveal how genes work together to carry out their biological functions. Reconstructions of gene networks from gene expression data greatly facilitate our understanding of underlying biological mechanisms and provide new opportunities for biomarker and drug discoveries. In gene networks, a gene that has many interactions with other genes is called a hub gene, which usually plays an essential role in gene regulation and biological processes. In this study, we developed a method for reconstructing gene networks using a partial correlation-based approach that incorporates prior information about hub genes. Through simulation studies and two real-data examples, we compare the performance in estimating the network structures between the existing methods and the proposed method.ResultsIn simulation studies, we show that the proposed strategy reduces errors in estimating network structures compared to the existing methods. When applied to Escherichia coli, the regulation network constructed by our proposed ESPACE method is more consistent with current biological knowledge than the SPACE method. Furthermore, application of the proposed method in lung cancer has identified hub genes whose mRNA expression predicts cancer progress and patient response to treatment.ConclusionsWe have demonstrated that incorporating hub gene information in estimating network structures can improve the performance of the existing methods.

Collaboration


Dive into the Donghyeon Yu's collaboration.

Top Co-Authors

Avatar

Johan Lim

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Joong-Ho Won

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guanghua Xiao

University of Texas Southwestern Medical Center

View shared research outputs
Top Co-Authors

Avatar

Kyunga Kim

Sookmyung Women's University

View shared research outputs
Top Co-Authors

Avatar

Woncheol Jang

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Alvin H. Bachman

Nathan Kline Institute for Psychiatric Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sang Han Lee

Nathan Kline Institute for Psychiatric Research

View shared research outputs
Top Co-Authors

Avatar

Min-Soo Kim

Daegu Gyeongbuk Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge