Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jinseog Kim is active.

Publication


Featured researches published by Jinseog Kim.


international conference on machine learning | 2004

Gradient LASSO for feature selection

Yongdai Kim; Jinseog Kim

LASSO (Least Absolute Shrinkage and Selection Operator) is a useful tool to achieve the shrinkage and variable selection simultaneously. Since LASSO uses the L1 penalty, the optimization should rely on the quadratic program (QP) or general non-linear program which is known to be computational intensive. In this paper, we propose a gradient descent algorithm for LASSO. Even though the final result is slightly less accurate, the proposed algorithm is computationally simpler than QP or non-linear program, and so can be applied to large size problems. We provide the convergence rate of the algorithm, and illustrate it with simulated models as well as real data sets.


Bioinformatics | 2009

Gradient lasso for Cox proportional hazards model

Insuk Sohn; Jinseog Kim; Sin-Ho Jung; Changyi Park

MOTIVATION There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Coxs proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data. RESULTS Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity. AVAILABILITY R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.


Journal of Computational and Graphical Statistics | 2008

A Gradient-Based Optimization Algorithm for LASSO

Jinseog Kim; Yuwon Kim; Yongdai Kim

LASSO is a useful method for achieving both shrinkage and variable selection simultaneously. The main idea of LASSO is to use the L1 constraint in the regularization step which has been applied to various models such as wavelets, kernel machines, smoothing splines, and multiclass logistic models. We call such models with the L1 constraint generalized LASSO models. In this article, we propose a new algorithm called the gradient LASSO algorithm for generalized LASSO. The gradient LASSO algorithm is computationally more stable than QP-based algorithms because it does not require matrix inversions, and thus it can be more easily applied to high-dimensional data. Simulation results show that the proposed algorithm is fast enough for practical purposes and provides reliable results. To illustrate its computing power with high-dimensional data, we analyze multiclass microarray data using the proposed algorithm.


BMC Bioinformatics | 2013

Prediction of a time-to-event trait using genome wide SNP data

Jinseog Kim; Insuk Sohn; Dae-Soon Son; Dong Hwan Kim; Taejin Ahn; Sin-Ho Jung

BackgroundA popular objective of many high-throughput genome projects is to discover various genomic markers associated with traits and develop statistical models to predict traits of future patients based on marker values.ResultsIn this paper, we present a prediction method for time-to-event traits using genome-wide single-nucleotide polymorphisms (SNPs). We also propose a MaxTest associating between a time-to-event trait and a SNP accounting for its possible genetic models. The proposed MaxTest can help screen out nonprognostic SNPs and identify genetic models of prognostic SNPs. The performance of the proposed method is evaluated through simulations.ConclusionsIn conjunction with the MaxTest, the proposed method provides more parsimonious prediction models but includes more prognostic SNPs than some naive prediction methods. The proposed method is demonstrated with real GWAS data.


Communications in Statistics - Simulation and Computation | 2012

Analysis of Survival Data with Group Lasso

Jinseog Kim; Insuk Sohn; Sin-Ho Jung; Sujong Kim; Changyi Park

Identification of influential genes and clinical covariates on the survival of patients is crucial because it can lead us to better understanding of underlying mechanism of diseases and better prediction models. Most of variable selection methods in penalized Cox models cannot deal properly with categorical variables such as gender and family history. The group lasso penalty can combine clinical and genomic covariates effectively. In this article, we introduce an optimization algorithm for Cox regression with group lasso penalty. We compare our method with other methods on simulated and real microarray data sets.


Knowledge and Information Systems | 2004

Convex Hull Ensemble Machine for Regression and Classification

Yongdai Kim; Jinseog Kim

We propose a new ensemble algorithm called Convex Hull Ensemble Machine (CHEM). CHEM in Hilbert space is first developed and modified for regression and classification problems. We prove that the ensemble model converges to the optimal model in Hilbert space under regularity conditions. Empirical studies reveal that, for classification problems, CHEM has a prediction accuracy similar to that of boosting, but CHEM is much more robust with respect to output noise and never overfits datasets even when boosting does. For regression problems, CHEM is competitive with other ensemble methods such as gradient boosting and bagging.


Reliability Engineering & System Safety | 2015

Robust D-optimal designs under correlated error, applicable invariantly for some lifetime distributions

Rabindra Nath Das; Jinseog Kim; Jeong-Soo Park

In quality engineering, the most commonly used lifetime distributions are log-normal, exponential, gamma and Weibull. Experimental designs are useful for predicting the optimal operating conditions of the process in lifetime improvement experiments. In the present article, invariant robust first-order D-optimal designs are derived for correlated lifetime responses having the above four distributions. Robust designs are developed for some correlated error structures. It is shown that robust first-order D-optimal designs for these lifetime distributions are always robust rotatable but the converse is not true. Moreover, it is observed that these designs depend on the respective error covariance structure but are invariant to the above four lifetime distributions. This article generalizes the results of Das and Lin [7] for the above four lifetime distributions with general (intra-class, inter-class, compound symmetry, and tri-diagonal) correlated error structures.


Computational and Mathematical Methods in Medicine | 2013

SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test

Jinseog Kim; Insuk Sohn; Dennis Dong Hwan Kim; Sin-Ho Jung

One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity.


International Journal of Hydrology Science and Technology | 2012

GLM and joint GLM techniques in hydrogeology: an illustration

Rabindra Nath Das; Jinseog Kim

In regression models with positive observations, estimation is often based on either the log-normal or the gamma model. Generalised linear models and joint generalised linear models are appropriate for analysing positive data with constant and non-constant variance, respectively. This article focuses on the use of these two techniques in hydrogeology. As an illustration, groundwater quality factors are analysed. Softness, non-alkalinity, content dissolved oxygen, chemical oxygen demand, chloride content and electrical conductivity are all the basic positive characteristics (i.e., values are positive in nature) for good drinking water. This article identifies the causal factors of these basic quality characteristics of groundwater at Muzaffarpur Town, Bihar, India, using the above techniques. Many statistical significant factors for these six basic quality characteristics of groundwater are detected. In the process, probabilistic model for each characteristic is developed. Effects of different factors on e...


Computational Statistics & Data Analysis | 2006

Maximum a posteriori pruning on decision trees and its application to bootstrap BUMPing

Jinseog Kim; Yongdai Kim

The cost-complexity pruning generates nested subtrees and selects the best one. However, its computational cost is large since it uses holdout sample or cross-validation. On the other hand, the pruning algorithms based on posterior calculations such as BIC (MDL) and MEP are faster, but they sometimes produce too big or small trees to yield poor generalization errors. In this paper, we propose an alternative pruning procedure which combines the ideas of the cost-complexity pruning and posterior calculation. The proposed algorithm uses only training samples, so that its computational cost is almost same as the other posterior-based algorithms, and at the same time yields similar accuracies as the cost-complexity pruning. Moreover it can be used for comparing non-nested trees, which is necessary for the BUMPing procedure. The empirical results show that the proposed algorithm performs similarly as the cost-complexity pruning in standard situations and works better for BUMPing.

Collaboration


Dive into the Jinseog Kim's collaboration.

Top Co-Authors

Avatar

Yongdai Kim

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Insuk Sohn

Samsung Medical Center

View shared research outputs
Top Co-Authors

Avatar

Changyi Park

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Han-Joon Kim

Seoul National University Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jaeyong Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Jiyun Kim

Seoul National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge