Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Changyi Park is active.

Publication


Featured researches published by Changyi Park.


Bioinformatics | 2009

Gradient lasso for Cox proportional hazards model

Insuk Sohn; Jinseog Kim; Sin-Ho Jung; Changyi Park

MOTIVATION There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Coxs proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data. RESULTS Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity. AVAILABILITY R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.


BMC Bioinformatics | 2008

A copula method for modeling directional dependence of genes

Jong-Min Kim; Yoon-Sung Jung; Engin A. Sungur; Kap-Hoon Han; Changyi Park; Insuk Sohn

BackgroundGenes interact with each other as basic building blocks of life, forming a complicated network. The relationship between groups of genes with different functions can be represented as gene networks. With the deposition of huge microarray data sets in public domains, study on gene networking is now possible. In recent years, there has been an increasing interest in the reconstruction of gene networks from gene expression data. Recent work includes linear models, Boolean network models, and Bayesian networks. Among them, Bayesian networks seem to be the most effective in constructing gene networks. A major problem with the Bayesian network approach is the excessive computational time. This problem is due to the interactive feature of the method that requires large search space. Since fitting a model by using the copulas does not require iterations, elicitation of the priors, and complicated calculations of posterior distributions, the need for reference to extensive search spaces can be eliminated leading to manageable computational affords. Bayesian network approach produces a discretely expression of conditional probabilities. Discreteness of the characteristics is not required in the copula approach which involves use of uniform representation of the continuous random variables. Our method is able to overcome the limitation of Bayesian network method for gene-gene interaction, i.e. information loss due to binary transformation.ResultsWe analyzed the gene interactions for two gene data sets (one group is eight histone genes and the other group is 19 genes which include DNA polymerases, DNA helicase, type B cyclin genes, DNA primases, radiation sensitive genes, repaire related genes, replication protein A encoding gene, DNA replication initiation factor, securin gene, nucleosome assembly factor, and a subunit of the cohesin complex) by adopting a measure of directional dependence based on a copula function. We have compared our results with those from other methods in the literature. Although microarray results show a transcriptional co-regulation pattern and do not imply that the gene products are physically interactive, this tight genetic connection may suggest that each gene product has either direct or indirect connections between the other gene products. Indeed, recent comprehensive analysis of a protein interaction map revealed that those histone genes are physically connected with each other, supporting the results obtained by our method.ConclusionThe results illustrate that our method can be an alternative to Bayesian networks in modeling gene interactions. One advantage of our approach is that dependence between genes is not assumed to be linear. Another advantage is that our approach can detect directional dependence. We expect that our study may help to design artificial drug candidates, which can block or activate biologically meaningful pathways. Moreover, our copula approach can be extended to investigate the effects of local environments on protein-protein interactions. The copula mutual information approach will help to propose the new variant of ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks): an algorithm for the reconstruction of gene regulatory networks.


Computational Statistics & Data Analysis | 2008

Classification of gene functions using support vector machine for time-course gene expression data

Changyi Park; Ja-Yong Koo; Sujong Kim; Insuk Sohn; Jae Won Lee

Since most biological systems are developmental and dynamic, time-course gene expression profiles provide an important characterization of gene functions. Assigning functions for genes with unknown functions based on time-course gene expressions is an important task in functional genomics. Recently, various methods have been proposed for the classification of gene functions based on time-course gene expression data. In this paper, we consider the classification of gene functions from functional data analysis viewpoint, where a functional support vector machine is adopted. The functional support vector machine can model temporal effects of time-course gene expression data by incorporating the coefficients as well as the basis matrix obtained from a finite expansion of gene expressions on a set of basis functions. We apply the functional support vector machine to both real microarray and simulated data. Our results indicate that the functional support vector machine is effective in discriminating gene functions of time-course gene expressions with predefined functions. The method also provides valuable functional information about interactions between genes and allows the assignment of new functions to genes with unknown functions.


Computational Statistics & Data Analysis | 2008

Stepwise feature selection using generalized logistic loss

Changyi Park; Ja-Yong Koo; Peter T. Kim; Jae Won Lee

Microarray experiments have raised challenging questions such as how to make an accurate identification of a set of marker genes responsible for various cancers. In statistics, this specific task can be posed as the feature selection problem. Since a support vector machine can deal with a vast number of features, it has gained wide spread use in microarray data analysis. We propose a stepwise feature selection using the generalized logistic loss that is a smooth approximation of the usual hinge loss. We compare the proposed method with the support vector machine with recursive feature elimination for both real and simulated datasets. It is illustrated that the proposed method can improve the quality of feature selection through standardization while the method retains similar predictive performance compared with the recursive feature elimination.


Computational Statistics & Data Analysis | 2011

Feature selection in the Laplacian support vector machine

Sangjun Lee; Changyi Park; Ja-Yong Koo

Traditional classifiers including support vector machines use only labeled data in training. However, labeled instances are often difficult, costly, or time consuming to obtain while unlabeled instances are relatively easy to collect. The goal of semi-supervised learning is to improve the classification accuracy by using unlabeled data together with a few labeled data in training classifiers. Recently, the Laplacian support vector machine has been proposed as an extension of the support vector machine to semi-supervised learning. The Laplacian support vector machine has drawbacks in its interpretability as the support vector machine has. Also it performs poorly when there are many non-informative features in the training data because the final classifier is expressed as a linear combination of informative as well as non-informative features. We introduce a variant of the Laplacian support vector machine that is capable of feature selection based on functional analysis of variance decomposition. Through synthetic and benchmark data analysis, we illustrate that our method can be a useful tool in semi-supervised learning.


Transportation Research Record | 2005

Exploiting Correlations Between Link Flows to Improve Estimation of Average Annual Daily Traffic on Coverage Count Segments: Methodology and Numerical Study

Prem K. Goel; Mark R. McCord; Changyi Park

A method is developed for exploiting correlations among segment flows that result from common origin-destination (O-D) path flows when average annual daily traffic (AADT) is being estimated on highway segments sampled with coverage counts. The method, which can be used with only two daily traffic volumes on the coverage count segment, is based on generalized least squares estimation of AADT, rather than on ordinary least squares estimation, which is traditionally used. The focus is on the correlation between the volumes on a single coverage count segment and a single segment equipped with a continuous automatic traffic recorder (ATR). The performance of this correlation-based method is compared with that of the traditional method through use of thousands of simulated O-D flow replications that are assigned on a small network to determine segment flows. The correlation-based method markedly outperforms the traditional method when the volume on the coverage count segment is highly correlated with that on the ATR segment. When the correlation between the volumes on the coverage count and ATR segments is low, the performance of the two methods is similar. It is expected that future developments for exploiting correlations between volumes on coverage count segments and multiple ATR segments will improve performance further.


Korean Journal of Applied Statistics | 2012

Cutpoint Selection via Penalization in Credit Scoring

Seul-Ki Jin; Kwang-Rae Kim; Changyi Park

In constructing a credit scorecard, each characteristic variable is divided into a few attributes; subsequently, weights are assigned to those attributes in a process called coarse classification. While partitioning a characteristic variable into attributes, one should determine appropriate cutpoints for the partition. In this paper, we propose a cutpoint selection method via penalization. In addition, we compare the performances of the proposed method with classification spline machine (Koo et al., 2009) on both simulated and real credit data.


Journal of Statistical Computation and Simulation | 2009

A classification spline machine for building a credit scorecard

Ja-Yong Koo; Changyi Park; Myoungshic Jhun

In constructing a scorecard, we partition each characteristic variable into a few attributes and assign weights to those attributes. For the task, a simulated annealing algorithm has been proposed. A drawback of simulated annealing is that the number of cutpoints separating each characteristic variable into attributes is required as an input. We introduce a scoring method, called a classification spline machine (CSM), which determines cutpoints automatically via a stepwise basis selection. In this paper, we compare performances of CSM and simulated annealing on simulated datasets. The results indicate that the CSM can be useful in the construction of scorecards.


BioMed Research International | 2015

Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data

Sungho Won; Hosik Choi; Su Yeon Park; Juyoung Lee; Changyi Park; Sunghoon Kwon

Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called “large P and small N” problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.


Communications in Statistics - Simulation and Computation | 2012

Analysis of Survival Data with Group Lasso

Jinseog Kim; Insuk Sohn; Sin-Ho Jung; Sujong Kim; Changyi Park

Identification of influential genes and clinical covariates on the survival of patients is crucial because it can lead us to better understanding of underlying mechanism of diseases and better prediction models. Most of variable selection methods in penalized Cox models cannot deal properly with categorical variables such as gender and family history. The group lasso penalty can combine clinical and genomic covariates effectively. In this article, we introduce an optimization algorithm for Cox regression with group lasso penalty. We compare our method with other methods on simulated and real microarray data sets.

Collaboration


Dive into the Changyi Park's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Insuk Sohn

Samsung Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sungho Won

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jungsoo Gim

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge