Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongdai Kim is active.

Publication


Featured researches published by Yongdai Kim.


Journal of the American Statistical Association | 2008

Smoothly Clipped Absolute Deviation on High Dimensions

Yongdai Kim; Hosik Choi; Hee-Seok Oh

The smoothly clipped absolute deviation (SCAD) estimator, proposed by Fan and Li, has many desirable properties, including continuity, sparsity, and unbiasedness. The SCAD estimator also has the (asymptotically) oracle property when the dimension of covariates is fixed or diverges more slowly than the sample size. In this article we study the SCAD estimator in high-dimensional settings where the dimension of covariates can be much larger than the sample size. First, we develop an efficient optimization algorithm that is fast and always converges to a local minimum. Second, we prove that the SCAD estimator still has the oracle property on high-dimensional problems. We perform numerical studies to compare the SCAD estimator with the LASSO and SIS–SCAD estimators in terms of prediction accuracy and variable selectivity when the true model is sparse. Through the simulation, we show that the variance estimator of Fan and Li still works well for some limited high-dimensional cases where the true nonzero coefficients are not too small and the sample size is moderately large. We apply the proposed algorithm to analyze a high-dimensional microarray data set.


Spine | 2008

Quantitative Analysis of Back Muscle Degeneration in the Patients With the Degenerative Lumbar Flat Back Using a Digital Image Analysis : Comparison With the Normal Controls

Jae Chul Lee; Jang-Gyu Cha; Yongdai Kim; Yon-Il Kim; Byung-Joon Shin

Study Design. The degree of back muscle degeneration was quantified in patients with a degenerative lumbar flat back and a normal control group by magnetic resonance (MR) imaging and a digital image analysis technique. Objective. To compare the degree of degeneration of the paravertebral back muscles of patients with that of a normal control group. Summary of Background Data. Extensive degeneration of the back muscle is well noted in patients with a degenerative lumbar flat back. Even though the problems of back muscles in the spinal column have been suggested to be one of the causes of spinal deformity, no study has quantified the extent of back muscle degeneration in spinal deformities such as a degenerative lumbar flat back. This study applied a method using MR imaging, which was originally developed to quantify the degree of the leg muscle degeneration in muscular dystrophy patients, to assess the degree of muscle degeneration of the spine in patients with a degenerative lumbar flat back. Methods. The digital images of the paravertebral back muscles in 21 female patients (11 patients with degenerative lumbar flat back deformity and 10 normal volunteers) using T2-weighted axial images and Picture Archiving and Communication System viewing software were analyzed and compared. The signal intensity, relative cross-sectional area (CSA) (the CSA of the paravertebral back muscle divided by that of the vertebral body in the same image), and the degree of the fat infiltration in the patients with degenerative flat back deformity and normal volunteers were measured and compared. Results. There was significantly higher signal intensity (295.9 ± 57.1) and a larger area of fat infiltration (41.3% ± 8.2%) of the back muscle of the patients group than that of the control (179.1 ± 26.5 and 15.9% ± 3.2%, respectively). The relative CSA of the back muscle compartment was constantly smaller in the patient group but significant differences were only found at the L4–L5 level. These changes in the patient group were more significant in the lower lumbar levels than in the upper lumbar levels. Conclusion. T2 weighted MR Image analysis of the paravertebral back muscles in patients with degenerative lumbar flat back showed significant fat infiltration compared with those in the normal control using digital image analysis. Digital image analysis of the paravertebral back muscles is a useful tool for measuring the degree of paravertebral back muscle degeneration.


international conference on machine learning | 2004

Gradient LASSO for feature selection

Yongdai Kim; Jinseog Kim

LASSO (Least Absolute Shrinkage and Selection Operator) is a useful tool to achieve the shrinkage and variable selection simultaneously. Since LASSO uses the L1 penalty, the optimization should rely on the quadratic program (QP) or general non-linear program which is known to be computational intensive. In this paper, we propose a gradient descent algorithm for LASSO. Even though the final result is slightly less accurate, the proposed algorithm is computationally simpler than QP or non-linear program, and so can be applied to large size problems. We provide the convergence rate of the algorithm, and illustrate it with simulated models as well as real data sets.


Spine | 2014

Risk factors of adjacent segment disease requiring surgery after lumbar spinal fusion: comparison of posterior lumbar interbody fusion and posterolateral fusion.

Jae Chul Lee; Yongdai Kim; Jae-Wan Soh; Byung-Joon Shin

Study Design. A retrospective study. Objective. To determine the incidence and risk factors of adjacent segment disease (ASD) requiring surgery among patients previously treated with spinal fusion for degenerative lumbar disease and to compare the survivorship of adjacent segment according to various risk factors including comparison of fusion methods: posterior lumbar interbody fusion (PLIF) versus posterolateral fusion (PLF). Summary of Background Data. One of the major issues after lumbar spinal fusion is the development of adjacent segment disease. Biomechanically, PLIF has been reported to be more rigid than PLF, and therefore, patients who undergo PLIF are suspected to experience a higher incidence of ASD than those who underwent PLF. There have been many studies analyzing the risk factors of ASD, but we are not aware of any study comparing PLIF with PLF in incidence of ASD requiring surgery. Methods. A consecutive series of 490 patients who had undergone lumbar spinal fusion of 3 or fewer segments to treat degenerative lumbar disease was identified. The mean age at index operation was 53 years, and the mean follow-up period was 51 months (12–236 mo). The number of patients treated by PLF and PLIF were 103 and 387, respectively. The incidence and prevalence of revision surgery for ASD were calculated by Kaplan-Meier method. For risk factor analysis, we used log-rank test and Cox regression analysis with fusion methods, sex, age, number of fused segments, and presence of laminectomy adjacent to index fusion. Results. After index spinal fusion, 23 patients (4.7%) had undergone additional surgery for ASD. Kaplan-Meier analysis predicted a disease-free survival rate of adjacent segments in 94.2% of patients at 5 years and 89.6% at 10 years after the index operation. In the analysis of risk factors, PLIF was associated with 3.4 times higher incidence of ASD requiring surgery than PLF (P = 0.037). Patients older than 60 years at the time of index operation were 2.5 times more likely to undergo revision operation than those younger than 60 years (P = 0.038). There were no significant differences in survival rates of the adjacent segment according to sex, preoperative diagnosis, number of fused segments, and concomitant laminectomy to adjacent segment. Conclusion. It was predicted that 10% of patients would undergo additional surgery for treating ASD within 10 years after index lumbar fusion. In this study, PLIF showed higher incidence of ASD than did PLF. Patient age greater than 60 years was another independent risk factor. Surgeons should carefully consider these factors at the time of surgical planning of lumbar fusion. Level of Evidence: 3


Annals of Statistics | 2013

Calibrating nonconvex penalized regression in ultra-high dimension

Lan Wang; Yongdai Kim; Runze Li

We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis.


Computational Statistics & Data Analysis | 2004

A new algorithm to generate beta processes

Jaeyong Lee; Yongdai Kim

Based on the fact that any nondecreasing positive Levy process can be approximated by a sequence of compound Poisson processes, an approximate sampling algorithm to generate a sample path of a beta process is developed. The proposed algorithm is compared with other similar algorithms. Also, illustrated is, as an example, a Markov chain Monte Carlo algorithm for proportional hazard model based on the proposed algorithm.


Journal of Computational and Graphical Statistics | 2008

A Gradient-Based Optimization Algorithm for LASSO

Jinseog Kim; Yuwon Kim; Yongdai Kim

LASSO is a useful method for achieving both shrinkage and variable selection simultaneously. The main idea of LASSO is to use the L1 constraint in the regularization step which has been applied to various models such as wavelets, kernel machines, smoothing splines, and multiclass logistic models. We call such models with the L1 constraint generalized LASSO models. In this article, we propose a new algorithm called the gradient LASSO algorithm for generalized LASSO. The gradient LASSO algorithm is computationally more stable than QP-based algorithms because it does not require matrix inversions, and thus it can be more easily applied to high-dimensional data. Simulation results show that the proposed algorithm is fast enough for practical purposes and provides reliable results. To illustrate its computing power with high-dimensional data, we analyze multiclass microarray data using the proposed algorithm.


Computational Statistics & Data Analysis | 2006

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Yongdai Kim; Sunghoon Kwon; Seuck Heun Song

Monitoring gene expression profiles is a novel approach to cancer diagnosis. Several studies have showed that the sparse logistic regression is a useful classification method for gene expression data. Not only does it give a sparse solution with high accuracy, it provides the user with explicit probabilities of classification apart from the class information. However, its optimal extension to more than two classes is not obvious. In this paper, we propose a multiclass extension of sparse logistic regression. Analysis of five publicly available gene expression data sets shows that the proposed method outperforms the standard multinomial logistic model in prediction accuracy as well as gene selectivity.


Information Processing Letters | 2007

An empirical study on classification methods for alarms from a bug-finding static C analyzer

Kwangkeun Yi; Hosik Choi; Jaehwang Kim; Yongdai Kim

A key application for static analysis is automatic bug-finding. Given the program source, a static analyzer computes an approximation of dynamic program states occurring at each program point, and reports possible bugs by examining the approximate states. From such static bug-finding analysis, false alarms are inevitable. Because static analysis is done at compile-time, exact computation of the program’s run-time states is impossible. Hence some approximation must be involved, so that the detected bugs can contain some false positives. Methodologies such as the abstract interpretation framework [6–8] counsel us to design a correct (conservative) static analyzer. The correctness criterion exacerbates the false alarm problem, because whenever in doubt the analysis must err on the pessimistic side.


Journal of Computational and Graphical Statistics | 2014

Functional Data Analysis of Tree Data Objects

Dan Shen; Haipeng Shen; Shankar Bhamidi; Yolanda Muñoz Maldonado; Yongdai Kim; J. S. Marron

Data analysis on non-Euclidean spaces, such as tree spaces, can be challenging. The main contribution of this article is establishment of a connection between tree-data spaces and the well-developed area of functional data analysis (FDA), where the data objects are curves. This connection comes through two tree representation approaches, the Dyck path representation and the branch length representation. These representations of trees in the Euclidean spaces enable us to exploit the power of FDA to explore statistical properties of tree data objects. A major challenge in the analysis is the sparsity of tree branches in a sample of trees. We overcome this issue by using a tree-pruning technique that focuses the analysis on important underlying population structures. This method parallels scale-space analysis in the sense that it reveals statistical properties of tree-structured data over a range of scales. The effectiveness of these new approaches is demonstrated by some novel results obtained in the analysis of brain-artery trees. The scale-space analysis reveals a deeper relationship between structure and age. These methods are the first to find a statistically significant gender difference. Supplementary materials for this article are available online.

Collaboration


Dive into the Yongdai Kim's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gwangsu Kim

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Jaeyong Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Woncheol Jang

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Joungyoun Kim

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hee-Seok Oh

Seoul National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge