Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Heewon Park is active.

Publication


Featured researches published by Heewon Park.


Journal of Statistical Computation and Simulation | 2014

Robust sparse regression and tuning parameter selection via the efficient bootstrap information criteria

Heewon Park; Fumitake Sakaori; Sadanori Konishi

There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.


Communications in Statistics - Simulation and Computation | 2016

Robust Coordinate Descent Algorithm Robust Solution Path for High-dimensional Sparse Regression Modeling

Heewon Park; Sadanori Konishi

The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.


PLOS ONE | 2015

Recursive Random Lasso (RRLasso) for Identifying Anti-Cancer Drug Targets

Heewon Park; Seiya Imoto; Satoru Miyano

Uncovering driver genes is crucial for understanding heterogeneity in cancer. L 1-type regularization approaches have been widely used for uncovering cancer driver genes based on genome-scale data. Although the existing methods have been widely applied in the field of bioinformatics, they possess several drawbacks: subset size limitations, erroneous estimation results, multicollinearity, and heavy time consumption. We introduce a novel statistical strategy, called a Recursive Random Lasso (RRLasso), for high dimensional genomic data analysis and investigation of driver genes. For time-effective analysis, we consider a recursive bootstrap procedure in line with the random lasso. Furthermore, we introduce a parametric statistical test for driver gene selection based on bootstrap regression modeling results. The proposed RRLasso is not only rapid but performs well for high dimensional genomic data analysis. Monte Carlo simulations and analysis of the “Sanger Genomics of Drug Sensitivity in Cancer dataset from the Cancer Genome Project” show that the proposed RRLasso is an effective tool for high dimensional genomic data analysis. The proposed methods provide reliable and biologically relevant results for cancer driver gene selection.


Journal of Statistical Computation and Simulation | 2016

Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection

Heewon Park; Sadanori Konishi

The penalized logistic regression is a useful tool for classifying samples and feature selection. Although the methodology has been widely used in various fields of research, their performance takes a sudden turn for the worst in the presence of outlier, since the logistic regression is based on the maximum log-likelihood method which is sensitive to outliers. It implies that we cannot accurately classify samples and find important factors having crucial information for classification. To overcome the problem, we propose a robust penalized logistic regression based on a weighted likelihood methodology. We also derive an information criterion for choosing the tuning parameters, which is a vital matter in robust penalized logistic regression modelling in line with generalized information criteria. We demonstrate through Monte Carlo simulations and real-world example that the proposed robust modelling strategies perform well for sparse logistic regression modelling even in the presence of outliers.


PLOS ONE | 2014

Robust Prediction of Anti-Cancer Drug Sensitivity and Sensitivity-Specific Biomarker

Heewon Park; Teppei Shimamura; Satoru Miyano; Seiya Imoto

The personal genomics era has attracted a large amount of attention for anti-cancer therapy by patient-specific analysis. Patient-specific analysis enables discovery of individual genomic characteristics for each patient, and thus we can effectively predict individual genetic risk of disease and perform personalized anti-cancer therapy. Although the existing methods for patient-specific analysis have successfully uncovered crucial biomarkers, their performance takes a sudden turn for the worst in the presence of outliers, since the methods are based on non-robust manners. In practice, clinical and genomic alterations datasets usually contain outliers from various sources (e.g., experiment error, coding error, etc.) and the outliers may significantly affect the result of patient-specific analysis. We propose a robust methodology for patient-specific analysis in line with the NetwrokProfiler. In the proposed method, outliers in high dimensional gene expression levels and drug response datasets are simultaneously controlled by robust Mahalanobis distance in robust principal component space. Thus, we can effectively perform for predicting anti-cancer drug sensitivity and identifying sensitivity-specific biomarkers for individual patients. We observe through Monte Carlo simulations that the proposed robust method produces outstanding performances for predicting response variable in the presence of outliers. We also apply the proposed methodology to the Sanger dataset in order to uncover cancer biomarkers and predict anti-cancer drug sensitivity, and show the effectiveness of our method.


Journal of Computational Biology | 2017

Interaction-Based Feature Selection for Uncovering Cancer Driver Genes Through Copy Number-Driven Expression Level

Heewon Park; Atsushi Niida; Seiya Imoto; Satoru Miyano

Driver gene selection is crucial to understand the heterogeneous system of cancer. To identity cancer driver genes, various statistical strategies have been proposed, especially the L1-type regularization methods have drawn a large amount of attention. However, the statistical approaches have been developed purely from algorithmic and statistical point, and the existing studies have applied the statistical approaches to genomic data analysis without consideration of biological knowledge. We consider a statistical strategy incorporating biological knowledge to identify cancer driver gene. The alterations of copy number have been considered to driver cancer pathogenesis processes, and the region of strong interaction of copy number alterations and expression levels was known as a tumor-related symptom. We incorporate the influence of copy number alterations on expression levels to cancer driver gene-selection processes. To quantify the dependence of copy number alterations on expression levels, we consider [Formula: see text] and [Formula: see text] effects of copy number alterations on expression levels of genes, and incorporate the symptom of tumor pathogenesis to gene-selection procedures. We then proposed an interaction-based feature-selection strategy based on the adaptive L1-type regularization and random lasso procedures. The proposed method imposes a large amount of penalty on genes corresponding to a low dependency of the two features, thus the coefficients of the genes are estimated to be small or exactly 0. It implies that the proposed method can provide biologically relevant results in cancer driver gene selection. Monte Carlo simulations and analysis of the Cancer Genome Atlas (TCGA) data show that the proposed strategy is effective for high-dimensional genomic data analysis. Furthermore, the proposed method provides reliable and biologically relevant results for cancer driver gene selection in TCGA data analysis.


Journal of Statistical Computation and Simulation | 2017

Outlier-resistant high-dimensional regression modelling based on distribution-free outlier detection and tuning parameter selection

Heewon Park

ABSTRACT The -type regularization is a useful tool for high-dimensional regression modelling. Although the -type approaches perform well regression modelling, the methods suffer from outliers, since the -type approaches are based on non-robust methods (e.g. least squares loss function). In order to resolve the drawback, we propose a robust -type regularization method based on distribution-free outlier detection measure. We consider outlier detection in principal component spaces (PCSs) to overcome dimensionality problem of high-dimensional data, and propose a novel cut-off value based on a non-parametric test. By using the distribution-free outlier detection measure, we can effectively detect outliers in PCS without distribution assumption of the Mahalanobis distance. We then propose a robust -type regularization method via a weighted elastic net. The tuning parameter selection is a vital matter in -type regularized regression modelling, since choosing the tuning parameters can be seen as variable selection and model estimation. We derive an information criterion to select the tuning parameters of the proposed robust -type regularization method. Monte Carlo simulations and NCI60 data analysis show that the proposed robust regression modelling strategies effectively perform for high-dimensional regression modelling, even in the presence of outliers.


Communications for Statistical Applications and Methods | 2014

Forecasting Symbolic Candle Chart-Valued Time Series

Heewon Park; Fumitake Sakaori

This study introduces a new type of symbolic data, a candle chart-valued time series. We aggregate four stock indices (i.e., open, close, highest and lowest) as a one data point to summarize a huge amount of data. In other words, we consider a candle chart, which is constructed by open, close, highest and lowest stock indices, as a type of symbolic data for a long period. The proposed candle chart-valued time series effectively summarize and visualize a huge data set of stock indices to easily understand a change in stock indices. We also propose novel approaches for the candle chart-valued time series modeling based on a combination of two midpoints and two half ranges between the highest and the lowest indices, and between the open and the close indices. Furthermore, we propose three types of sum of square for estimation of the candle chart valued-time series model. The proposed methods take into account of information from not only ordinary data, but also from interval of object, and thus can effectively perform for time series modeling (e.g., forecasting future stock index). To evaluate the proposed methods, we describe real data analysis consisting of the stock market indices of five major Asian countries’. We can see thorough the results that the proposed approaches outperform for forecasting future stock indices compared with classical data analysis.


Computational Statistics | 2013

Lag weighted lasso for time series model

Heewon Park; Fumitake Sakaori


Journal of Computational Biology | 2015

Sparse Overlapping Group Lasso for Integrative Multi-Omics Analysis

Heewon Park; Atsushi Niida; Satoru Miyano; Seiya Imoto

Collaboration


Dive into the Heewon Park's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge