Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Woojoo Lee is active.

Publication


Featured researches published by Woojoo Lee.


BMC Bioinformatics | 2012

Network enrichment analysis: extension of gene-set enrichment analysis to gene networks

Andrey Alexeyenko; Woojoo Lee; Maria Pernemalm; Justin Guegan; Philippe Dessen; Vladimir Lazar; Janne Lehtiö; Yudi Pawitan

BackgroundGene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis.ResultsWe developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study.ConclusionsThe results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.


BMC Bioinformatics | 2010

Super-sparse principal component analyses for high-throughput genomic data

Donghwan Lee; Woojoo Lee; Youngjo Lee; Yudi Pawitan

BackgroundPrincipal component analysis (PCA) has gained popularity as a method for the analysis of high-dimensional genomic data. However, it is often difficult to interpret the results because the principal components are linear combinations of all variables, and the coefficients (loadings) are typically nonzero. These nonzero values also reflect poor estimation of the true vector loadings; for example, for gene expression data, biologically we expect only a portion of the genes to be expressed in any tissue, and an even smaller fraction to be involved in a particular process. Sparse PCA methods have recently been introduced for reducing the number of nonzero coefficients, but these existing methods are not satisfactory for high-dimensional data applications because they still give too many nonzero coefficients.ResultsHere we propose a new PCA method that uses two innovations to produce an extremely sparse loading vector: (i) a random-effect model on the loadings that leads to an unbounded penalty at the origin and (ii) shrinkage of the singular values obtained from the singular value decomposition of the data matrix. We develop a stable computing algorithm by modifying nonlinear iterative partial least square (NIPALS) algorithm, and illustrate the method with an analysis of the NCI cancer dataset that contains 21,225 genes.ConclusionsThe new method has better performance than several existing methods, particularly in the estimation of the loading vectors.


Statistical Methods in Medical Research | 2017

Does McNemar’s test compare the sensitivities and specificities of two diagnostic tests?

Soeun Kim; Woojoo Lee

McNemar’s test is often used in practice to compare the sensitivities and specificities for the evaluation of two diagnostic tests. For correct evaluation of accuracy, an intuitive recommendation is to test the diseased and the non-diseased groups separately so that the sensitivities can be compared among the diseased, and specificities can be compared among the healthy group of people. This paper provides a rigorous theoretical framework for this argument and study the validity of McNemar’s test regardless of the conditional independence assumption. We derive McNemar’s test statistic under the null hypothesis considering both assumptions of conditional independence and conditional dependence. We then perform power analyses to show how the result is affected by the amount of the conditional dependence under alternative hypothesis.


Statistics and Computing | 2012

Modifications of REML algorithm for HGLMs

Woojoo Lee; Youngjo Lee

Hierarchical generalized linear models (HGLMs) have become popular in data analysis. However, their maximum likelihood (ML) and restricted maximum likelihood (REML) estimators are often difficult to compute, especially when the random effects are correlated; this is because obtaining the likelihood function involves high-dimensional integration. Recently, an h-likelihood method that does not involve numerical integration has been proposed. In this study, we show how an h-likelihood method can be implemented by modifying the existing ML and REML procedures. A small simulation study is carried out to investigate the performances of the proposed methods for HGLMs with correlated random effects.


Metabolomics | 2016

Large-scale non-targeted metabolomic profiling in three human population-based studies

Andrea Ganna; Tove Fall; Samira Salihovic; Woojoo Lee; Corey D. Broeckling; Jitender Kumar; Sara Hägg; Markus Stenemo; Patrik K. E. Magnusson; Jessica E. Prenni; Lars Lind; Yudi Pawitan; Erik Ingelsson

Non-targeted metabolomic profiling is used to simultaneously assess a large part of the metabolome in a biological sample. Here, we describe both the analytical and computational methods used to analyze a large UPLC–Q-TOF MS-based metabolomic profiling effort using plasma and serum samples from participants in three Swedish population-based studies of middle-aged and older human subjects: TwinGene, ULSAM and PIVUS. At present, more than 200 metabolites have been manually annotated in more than 3600 participants using an in-house library of standards and publically available spectral databases. Data available at the metabolights repository include individual raw unprocessed data, processed data, basic demographic variables and spectra of annotated metabolites. Additional phenotypical and genetic data is available upon request to cohort steering committees. These studies represent a unique resource to explore and evaluate how metabolic variability across individuals affects human diseases.


Statistics in Medicine | 2013

Sparse partial least‐squares regression for high‐throughput survival data analysis

Donghwan Lee; Youngjo Lee; Yudi Pawitan; Woojoo Lee

The partial least-square (PLS) method has been adapted to the Coxs proportional hazards model for analyzing high-dimensional survival data. But because the latent components constructed in PLS employ all predictors regardless of their relevance, it is often difficult to interpret the results. In this paper, we propose a new formulation of sparse PLS (SPLS) procedure for survival data to allow simultaneous sparse variable selection and dimension reduction. We develop a computing algorithm for SPLS by modifying an iteratively reweighted PLS algorithm and illustrate the method with the Swedish and the Netherlands Cancer Institute breast cancer datasets. Through the numerical studies, we find that our SPLS method generally performs better than the standard PLS and sparse Cox regression methods in variable selection and prediction.


Computational Statistics & Data Analysis | 2011

The hierarchical-likelihood approach to autoregressive stochastic volatility models

Woojoo Lee; Johan Lim; Youngjo Lee; Joan del Castillo

Many volatility models used in financial research belong to a class of hierarchical generalized linear models with random effects in the dispersion. Therefore, the hierarchical-likelihood (h-likelihood) approach can be used. However, the dimension of the Hessian matrix is often large, so techniques of sparse matrix computation are useful to speed up the procedure of computing the inverse matrix. Using numerical studies we show that the h-likelihood approach gives better long-term prediction for volatility than the existing MCMC method, while the MCMC method gives better short-term prediction. We show that the h-likelihood approach gives comparable estimations of fixed parameters to those of existing methods.


European Journal of Epidemiology | 2014

Bounds on sufficient-cause interaction

Arvid Sjölander; Woojoo Lee; Henrik Källberg; Yudi Pawitan

AbstractA common goal of epidemiologic research is to study how two exposures interact in causing a binary outcome. Sufficient-cause interaction is a special type of mechanistic interaction, which requires that two events (e.g. specific exposure levels from two risk factors) are necessary in order for the outcome to occur. Recently, tests have been derived to establish the presence of sufficient-cause interactions, for categorical exposures with at most three levels. In this paper we derive prevalence bounds, i.e. lower and upper bounds on the prevalence of subjects for which sufficient-cause interaction is present. The derived bounds hold for categorical exposures with arbitrary many levels. We apply the bounds to data from a study of gene–gene interaction in the development of Rheumatoid Arthritis. We provide an R-program to estimate the bounds from real data .


Genetic Epidemiology | 2016

A Critical Look at Entropy‐Based Gene‐Gene Interaction Measures

Woojoo Lee; Arvid Sjölander; Yudi Pawitan

Several entropy‐based measures for detecting gene‐gene interaction have been proposed recently. It has been argued that the entropy‐based measures are preferred because entropy can better capture the nonlinear relationships between genotypes and traits, so they can be useful to detect gene‐gene interactions for complex diseases. These suggested measures look reasonable at intuitive level, but so far there has been no detailed characterization of the interactions captured by them. Here we study analytically the properties of some entropy‐based measures for detecting gene‐gene interactions in detail. The relationship between interactions captured by the entropy‐based measures and those of logistic regression models is clarified. In general we find that the entropy‐based measures can suffer from a lack of specificity in terms of target parameters, i.e., they can detect uninteresting signals as interactions. Numerical studies are carried out to confirm theoretical findings.


bioRxiv | 2014

A workflow for UPLC-MS non-targeted metabolomic profiling in large human population-based studies

Andrea Ganna; Tove Fall; Woojoo Lee; Corey D. Broeckling; Jitender Kumar; Sara Hägg; Patrik K. E. Magnusson; Jessica E. Prenni; Lars Lind; Yudi Pawitan; Erik Ingelsson

Non-targeted metabolomic profiling is used to simultaneously assess a large part of the metabolome in a biological sample. Here, we describe both the analytical and computational methods used to analyze a large UPLC-Q-TOF MS-based metabolomic profiling effort using plasma and serum samples from participants in three Swedish population-based studies of middle-aged and older human subjects: TwinGene, ULSAM and PIVUS. At present, more than 200 metabolites have been manually annotated in more than 3,600 participants using an in-house library of standards and publically available spectral databases. Data available at the Metabolights repository include individual raw unprocessed data, processed data, basic demographic variables and spectra of annotated metabolites. Additional phenotypical and genetic data is available upon request to cohort steering committees. These studies represent a unique resource to explore and evaluate how metabolic variability across individuals affects human diseases.

Collaboration


Dive into the Woojoo Lee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Youngjo Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Seung-Sik Hwang

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Sungho Won

Seoul National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge