Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Donghwan Lee is active.

Publication


Featured researches published by Donghwan Lee.


BMC Bioinformatics | 2010

Super-sparse principal component analyses for high-throughput genomic data

Donghwan Lee; Woojoo Lee; Youngjo Lee; Yudi Pawitan

BackgroundPrincipal component analysis (PCA) has gained popularity as a method for the analysis of high-dimensional genomic data. However, it is often difficult to interpret the results because the principal components are linear combinations of all variables, and the coefficients (loadings) are typically nonzero. These nonzero values also reflect poor estimation of the true vector loadings; for example, for gene expression data, biologically we expect only a portion of the genes to be expressed in any tissue, and an even smaller fraction to be involved in a particular process. Sparse PCA methods have recently been introduced for reducing the number of nonzero coefficients, but these existing methods are not satisfactory for high-dimensional data applications because they still give too many nonzero coefficients.ResultsHere we propose a new PCA method that uses two innovations to produce an extremely sparse loading vector: (i) a random-effect model on the loadings that leads to an unbounded penalty at the origin and (ii) shrinkage of the singular values obtained from the singular value decomposition of the data matrix. We develop a stable computing algorithm by modifying nonlinear iterative partial least square (NIPALS) algorithm, and illustrate the method with an analysis of the NCI cancer dataset that contains 21,225 genes.ConclusionsThe new method has better performance than several existing methods, particularly in the estimation of the loading vectors.


Bioinformatics | 2015

Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival

Chen Suo; Olga Hrydziuszko; Donghwan Lee; Setia Pramana; Dhany Saputra; Himanshu Joshi; Stefano Calza; Yudi Pawitan

MOTIVATION Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures. RESULTS An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer. AVAILABILITY AND IMPLEMENTATION The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Statistics in Medicine | 2013

Sparse partial least‐squares regression for high‐throughput survival data analysis

Donghwan Lee; Youngjo Lee; Yudi Pawitan; Woojoo Lee

The partial least-square (PLS) method has been adapted to the Coxs proportional hazards model for analyzing high-dimensional survival data. But because the latent components constructed in PLS employ all predictors regardless of their relevance, it is often difficult to interpret the results. In this paper, we propose a new formulation of sparse PLS (SPLS) procedure for survival data to allow simultaneous sparse variable selection and dimension reduction. We develop a computing algorithm for SPLS by modifying an iteratively reweighted PLS algorithm and illustrate the method with the Swedish and the Netherlands Cancer Institute breast cancer datasets. Through the numerical studies, we find that our SPLS method generally performs better than the standard PLS and sparse Cox regression methods in variable selection and prediction.


Team Performance Management | 2011

Collective intelligence ratio

Paul Kim; Donghwan Lee; Youngjo Lee; Chuan Huang; Tamas Makany

Purpose – With a team interaction analysis model, the authors sought to identify a varying range of individual and collective intellectual behaviors in a series of communicative intents particularly expressed with multimodal interaction methods. In this paper, the authors aim to present a new construct (i.e. collective intelligence ratio (CIR)) which refers to a numeric indicator representing the degree of intelligence of a team in which each team member demonstrates an individual intelligence ratio (IR) specific to a team goal.Design/methodology/approach – The authors analyzed multimodal team interaction data linked to communicative intents with a Poisson‐hierarchical generalized linear model (HGLM).Findings – The study found evidence of a distinctive IR for each team member in selecting a communicative method for a certain task, ultimately leading to varying degrees of team CIR.Research limitations/implications – The authors limited the type and nature of human intelligence observed with a very short li...


Oncotarget | 2016

Comprehensive landscape of subtype-specific coding and non-coding RNA transcripts in breast cancer

Trung Nghia Vu; Setia Pramana; Stefano Calza; Chen Suo; Donghwan Lee; Yudi Pawitan

Molecular classification of breast cancer into clinically relevant subtypes helps improve prognosis and adjuvant-treatment decisions. The aim of this study is to provide a better characterization of the molecular subtypes by providing a comprehensive landscape of subtype-specific isoforms including coding, long non-coding RNA and microRNA transcripts. Isoform-level expression of all coding and non-coding RNAs is estimated from RNA-sequence data of 1168 breast samples obtained from The Cancer Genome Atlas (TCGA) project. We then search the whole transcriptome systematically for subtype-specific isoforms using a novel algorithm based on a robust quasi-Poisson model. We discover 5451 isoforms specific to single subtypes. A total of 27% of the subtype-specific isoforms have better accuracy in classifying the intrinsic subtypes than that of their corresponding genes. We find three subtype-specific miRNA and 707 subtype-specific long non-coding RNAs. The isoforms from long non-coding RNAs also show high performance for separation between Luminal A and Luminal B subtypes with an AUC of 0.97 in the discovery set and 0.90 in the validation set. In addition, we discover 1500 isoforms preferentially co-expressed in two subtypes, including 369 isoforms co-expressed in both Normal-like and Basal subtypes, which are commonly considered to have distinct ER-receptor status. Finally, analyses at protein level reveal four subtype-specific proteins and two subtype co-expression proteins that successfully validate results from the isoform level.


Journal of Multivariate Analysis | 2016

Extended likelihood approach to multiple testing with directional error control under a hidden Markov random field model

Donghwan Lee; Youngjo Lee

Current multiple testing procedures are often based on assumptions of independence of observations. However, the observations in genomics and neuroimaging are correlated and ignoring such a correlation can severely distort the conclusions of a test. Moreover, most tests investigate two-sided alternatives only as a two-action problem and do not worry about directional errors. Misspecifications in signs of effects should not be regarded as power. In this study, we derive an optimal multiple testing procedure to incorporate dependence among tests, controlling directional false discovery rates. Real data examples for gene expression and neuroimaging using hidden Markov random field models show that an appropriate model is crucial for the efficiency of tests. Proper modeling of the correlation structure and model selection tools in the likelihood approach enhance the performance of a test. Reporting the estimates of various error rates is useful for the test’s validity.


BMC Medical Research Methodology | 2015

Optimal likelihood-ratio multiple testing with application to Alzheimer’s disease and questionable dementia

Donghwan Lee; Hyejin Kang; Eun-Kyung Kim; Hyekyoung Lee; Heejung Kim; Yu Kyeong Kim; Youngjo Lee; Dong Soo Lee

BackgroundControlling the false discovery rate is important when testing multiple hypotheses. To enhance the detection capability of a false discovery rate control test, we applied the likelihood ratio-based multiple testing method in neuroimage data and compared the performance with the existing methods.MethodsWe analysed the performance of the likelihood ratio-based false discovery rate method using simulation data generated under independent assumption, and positron emission tomography data of Alzheimer’s disease and questionable dementia. We investigated how well the method detects extensive hypometabolic regions and compared the results to those of the conventional Benjamini Hochberg-false discovery rate method.ResultsOur findings show that the likelihood ratio-based false discovery rate method can control the false discovery rate, giving the smallest false non-discovery rate (for a one-sided test) or the smallest expected number of false assignments (for a two-sided test). Even though we assumed independence among voxels, the likelihood ratio-based false discovery rate method detected more extensive hypometabolic regions in 22 patients with Alzheimer’s disease, as compared to the 44 normal controls, than did the Benjamini Hochberg-false discovery rate method. The contingency and distribution patterns were consistent with those of previous studies. In 24 questionable dementia patients, the proposed likelihood ratio-based false discovery rate method was able to detect hypometabolism in the medial temporal region.ConclusionsThis study showed that the proposed likelihood ratio-based false discovery rate method efficiently identifies extensive hypometabolic regions owing to its increased detection capability and ability to control the false discovery rate.


International Journal of Environmental Research and Public Health | 2018

Effects of Internet and Smartphone Addictions on Depression and Anxiety Based on Propensity Score Matching Analysis

Yeon-Jin Kim; Hye Jang; Youngjo Lee; Donghwan Lee; Dai-Jin Kim

The associations of Internet addiction (IA) and smartphone addiction (SA) with mental health problems have been widely studied. We investigated the effects of IA and SA on depression and anxiety while adjusting for sociodemographic variables. In this study, 4854 participants completed a cross-sectional web-based survey including socio-demographic items, the Korean Scale for Internet Addiction, the Smartphone Addiction Proneness Scale, and the subscales of the Symptom Checklist 90 Items-Revised. The participants were classified into IA, SA, and normal use (NU) groups. To reduce sampling bias, we applied the propensity score matching method based on genetics matching. The IA group showed an increased risk of depression (relative risk 1.207; p < 0.001) and anxiety (relative risk 1.264; p < 0.001) compared to NUs. The SA group also showed an increased risk of depression (relative risk 1.337; p < 0.001) and anxiety (relative risk 1.402; p < 0.001) compared to NCs. These findings show that both, IA and SA, exerted significant effects on depression and anxiety. Moreover, our findings showed that SA has a stronger relationship with depression and anxiety, stronger than IA, and emphasized the need for prevention and management policy of the excessive smartphone use.


Twin Research and Human Genetics | 2016

One CNV Discordance in NRXN1 Observed Upon Genome-wide Screening in 38 Pairs of Adult Healthy Monozygotic Twins

Patrik K. E. Magnusson; Donghwan Lee; Xu Chen; Jin P. Szatkiewicz; Setia Pramana; Shu Mei Teo; Patrick F. Sullivan; Lars Feuk; Yudi Pawitan

Monozygotic (MZ) twins stem from the same single fertilized egg and therefore share all their inherited genetic variation. This is one of the unequivocal facts on which genetic epidemiology and twin studies are based. To what extent this also implies that MZ twins share genotypes in adult tissues is not precisely established, but a common pragmatic assumption is that MZ twins are 100% genetically identical also in adult tissues. During the past decade, this view has been challenged by several reports, with observations of differences in post-zygotic copy number variations (CNVs) between members of the same MZ pair. In this study, we performed a systematic search for differences of CNVs within 38 adult MZ pairs who had been misclassified as dizygotic (DZ) twins by questionnaire-based assessment. Initial scoring by PennCNV suggested a total of 967 CNV discordances. The within-pair correlation in number of CNVs detected was strongly dependent on confidence score filtering and reached a plateau of r = 0.8 when restricting to CNVs detected with confidence score larger than 50. The top-ranked discordances were subsequently selected for validation by quantitative polymerase chain reaction (qPCR), from which one single ~120kb deletion in NRXN1 on chromosome 2 (bp 51017111-51136802) was validated. Despite involving an exon, no sign of cognitive/mental consequences was apparent in the affected twin pair, potentially reflecting limited or lack of expression of the transcripts containing this exon in nerve/brain.


Statistics in Medicine | 2016

Nonparametric estimation of the rediscovery rate.

Donghwan Lee; Andrea Ganna; Yudi Pawitan; Woojoo Lee

Validation studies have been used to increase the reliability of the statistical conclusions for scientific discoveries; such studies improve the reproducibility of the findings and reduce the possibility of false positives. Here, one of the important roles of statistics is to quantify reproducibility rigorously. Two concepts were recently defined for this purpose: (i) rediscovery rate (RDR), which is the expected proportion of statistically significant findings in a study that can be replicated in the validation study and (ii) false discovery rate in the validation study (vFDR). In this paper, we aim to develop a nonparametric approach to estimate the RDR and vFDR and show an explicit link between the RDR and the FDR. Among other things, the link explains why reproducing statistically significant results even with low FDR level may be difficult. Two metabolomics datasets are considered to illustrate the application of the RDR and vFDR concepts in high-throughput data analysis. Copyright

Collaboration


Dive into the Donghwan Lee's collaboration.

Top Co-Authors

Avatar

Youngjo Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hee-Seok Oh

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Jaeyong Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chen Suo

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bohyeon Kim

Pukyong National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge