Sunghoon Kwon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sunghoon Kwon is active.

Explore More

Publication

Featured researches published by Sunghoon Kwon.

Computational Statistics & Data Analysis | 2016

The use of random-effect models for high-dimensional variable selection problems

Sunghoon Kwon; Seung-Young Oh; Youngjo Lee

We study the use of random-effect models for variable selection in high-dimensional generalized linear models where the number of covariates exceeds the sample size. Certain distributional assumptions on the random effects produce a penalty that is non-convex and unbounded at the origin. We introduce a unified algorithm that can be applied to various statistical models including generalized linear models. Simulation studies and data analysis are provided.

Computational Statistics & Data Analysis | 2016

A modified local quadratic approximation algorithm for penalized optimization problems

Sangin Lee; Sunghoon Kwon; Yongdai Kim

In this paper, we propose an optimization algorithm called the modified local quadratic approximation algorithm for minimizing various ? 1 -penalized convex loss functions. The proposed algorithm iteratively solves ? 1 -penalized local quadratic approximations of the loss function, and then modifies the solution whenever it fails to decrease the original ? 1 -penalized loss function. As an extension, we construct an algorithm for minimizing various nonconvex penalized convex loss functions by combining the proposed algorithm and convex concave procedure, which can be applied to most nonconvex penalty functions such as the smoothly clipped absolute deviation and minimax concave penalty functions. Numerical studies show that the algorithm is stable and fast for solving high dimensional penalized optimization problems.

Computational Statistics & Data Analysis | 2015

Moderately clipped LASSO

Sunghoon Kwon; Sangin Lee; Yongdai Kim

The least absolute shrinkage and selection operator (LASSO) has been widely used in high-dimensional linear regression models. However, it is known that the LASSO selects too many noisy variables. In this paper, we propose a new estimator, the moderately clipped LASSO (MCL), that deletes noisy variables successively without sacrificing prediction accuracy much. Various numerical studies are done to illustrate superiority of the MCL over other competitors.

Genetics | 2017

Improving Disease Prediction by Incorporating Family Disease History in Risk Prediction Models with Large-Scale Genetic Data

Jungsoo Gim; Wonji Kim; Soo Heon Kwak; Hosik Choi; Changyi Park; Kyong Soo Park; Sunghoon Kwon; Taesung Park; Sungho Won

Despite the many successes of genome-wide association studies (GWAS), the known susceptibility variants identified by GWAS have modest effect sizes, leading to notable skepticism about the effectiveness of building a risk prediction model from large-scale genetic data. However, in contrast to genetic variants, the family history of diseases has been largely accepted as an important risk factor in clinical diagnosis and risk prediction. Nevertheless, the complicated structures of the family history of diseases have limited their application in clinical practice. Here, we developed a new method that enables incorporation of the general family history of diseases with a liability threshold model, and propose a new analysis strategy for risk prediction with penalized regression analysis that incorporates both large numbers of genetic variants and clinical risk factors. Application of our model to type 2 diabetes in the Korean population (1846 cases and 1846 controls) demonstrated that single-nucleotide polymorphisms accounted for 32.5% of the variation explained by the predicted risk scores in the test data set, and incorporation of family history led to an additional 6.3% improvement in prediction. Our results illustrate that family medical history provides valuable information on the variation of complex diseases and improves prediction performance.

Computational Statistics & Data Analysis | 2017

Homogeneity detection for the high-dimensional generalized linear model

Jong-June Jeon; Sunghoon Kwon; Hosik Choi

We propose to use a penalized estimator for detecting homogeneity of the high-dimensional generalized linear model. Here, the homogeneity is a specific model structure where regression coefficients are grouped having exactly the same value in each group. The proposed estimator achieves weak oracle property under mild regularity conditions and is invariant to the choice of reference levels when there are categorical covariates in the model. An efficient algorithm is also provided. Various numerical studies confirm that the proposed penalized estimator gives better performance than other conventional variable selection estimators when the model has homogeneity.

Communications in Statistics - Simulation and Computation | 2017

A robust support vector machine for labeling errors

Hosik Choi; Yongdai Kim; Sunghoon Kwon; Changyi Park

ABSTRACT Support vector machine (SVM) is sparse in that its classifier is expressed as a linear combination of only a few support vectors (SVs). Whenever an outlier is included as an SV in the classifier, the outlier may have serious impact on the estimated decision function. In this article, we propose a robust loss function that is convex. Our learning algorithm is more robust to outliers than SVM. Also the convexity of our loss function permits an efficient solution path algorithm. Through simulated and real data analysis, we illustrate that our method can be useful in the presence of labeling errors.

BMC Genetics | 2017

Network analysis for count data with excess zeros

Hosik Choi; Jungsoo Gim; Sungho Won; You Jin Kim; Sunghoon Kwon; Changyi Park

BackgroundUndirected graphical models or Markov random fields have been a popular class of models for representing conditional dependence relationships between nodes. In particular, Markov networks help us to understand complex interactions between genes in biological processes of a cell. Local Poisson models seem to be promising in modeling positive as well as negative dependencies for count data. Furthermore, when zero counts are more frequent than are expected, excess zeros should be considered in the model.MethodsWe present a penalized Poisson graphical model for zero inflated count data and derive an expectation-maximization (EM) algorithm built on coordinate descent. Our method is shown to be effective through simulated and real data analysis.ResultsResults from the simulated data indicate that our method outperforms the local Poisson graphical model in the presence of excess zeros. In an application to a RNA sequencing data, we also investigate the gender effect by comparing the estimated networks according to different genders. Our method may help us in identifying biological pathways linked to sex hormone regulation and thus understanding underlying mechanisms of the gender differences.ConclusionsWe have presented a penalized version of zero inflated spatial Poisson regression and derive an efficient EM algorithm built on coordinate descent. We discuss possible improvements of our method as well as potential research directions associated with our findings from the RNA sequencing data.

Korean Journal of Applied Statistics | 2016

Residual-based copula parameter estimation

Okyoung Na; Sunghoon Kwon

This paper considers we consider the estimation of copula parameters based on residuals in stochastic regression models. We prove that a semiparametric estimator using residual empirical distributions is consistent under some conditions and apply the results to the copula-ARMA model. We provide simulation results for illustration.

Journal of Machine Learning Research | 2012