Sooyoung Cheon
Korea University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sooyoung Cheon.
Computational Statistics & Data Analysis | 2010
Sooyoung Cheon; Jae-Hee Kim
Bayesian multiple change-point models are proposed for multivariate means. The models require that the data be from a multivariate normal distribution with a truncated Poisson prior for the number of change-points and conjugate priors for the distributional parameters. We apply the stochastic approximation Monte Carlo (SAMC) algorithm to the multiple change-point detection problems. Numerical results show that SAMC makes a significant improvement over RJMCMC for complex Bayesian model selection problems in change-point estimation.
Molecular Phylogenetics and Evolution | 2009
Sooyoung Cheon; Faming Liang
Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time.
BioSystems | 2011
Sooyoung Cheon; Faming Liang
Recently, the stochastic approximation Monte Carlo algorithm has been proposed by Liang et al. (2007) as a general-purpose stochastic optimization and simulation algorithm. An annealing version of this algorithm was developed for real small protein folding problems. The numerical results indicate that it outperforms simulated annealing and conventional Monte Carlo algorithms as a stochastic optimization algorithm. We also propose one method for the use of secondary structures in protein folding. The predicted protein structures are rather close to the true structures.
Journal of Time Series Analysis | 2010
Jaehee Kim; Sooyoung Cheon
This article provides a new Bayesian approach for AR(2) time-series models with multiple regime-switching points. Our formulation of the regime-switching model involves a binary discrete variable that indicates the regime change. This variable is specified to be detected by data in each regime. The model is estimated using Stochastic approximation Monte Carlo method proposed by Liang et al. [JASA (2007)]. This methodology is quite useful since it allows for fitting of more complex regime-switching models without transition constraint. The proposed model is illustrated using simulated and real data such as GNP and US interest rate data.
Statistics and Computing | 2014
Sooyoung Cheon; Faming Liang; Yuguo Chen; Kai Yu
Importance sampling and Markov chain Monte Carlo methods have been used in exact inference for contingency tables for a long time, however, their performances are not always very satisfactory. In this paper, we propose a stochastic approximation Monte Carlo importance sampling (SAMCIS) method for tackling this problem. SAMCIS is a combination of adaptive Markov chain Monte Carlo and importance sampling, which employs the stochastic approximation Monte Carlo algorithm (Liang et al., J. Am. Stat. Assoc., 102(477):305–320, 2007) to draw samples from an enlarged reference set with a known Markov basis. Compared to the existing importance sampling and Markov chain Monte Carlo methods, SAMCIS has a few advantages, such as fast convergence, ergodicity, and the ability to achieve a desired proportion of valid tables. The numerical results indicate that SAMCIS can outperform the existing importance sampling and Markov chain Monte Carlo methods: It can produce much more accurate estimates in much shorter CPU time than the existing methods, especially for the tables with high degrees of freedom.
Korean Journal of Applied Statistics | 2006
Seuck-Heun Song; Sooyoung Cheon
This paper considers a panel regression model with ill-posed data and proposes the generalized maximum entropy(GME) estimator of the unknown parameters. These are natural extensions from the biometries, statistics and econometrics literature. The performance of this estimator is investigated by using of Monte Carlo experiments. The results indicate that the GME method performs the best in estimating the unknown parameters.
Journal of Statistical Computation and Simulation | 2017
Hwa Kyung Lim; Naveen N. Narisetty; Sooyoung Cheon
ABSTRACT Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.
Journal of Physics: Conference Series | 2009
Faming Liang; Sooyoung Cheon
The problem of simulating from distributions with intractable normalizing constants has received much attention in the recent literature. In this paper, we propose a new MCMC algorithm, the so-called Monte Carlo dynamically weighted importance sampler, for tickling this problem. The new algorithm is illustrated with the spatial autologistic models. The novelty of our algorithm is that it allows for the use of Monte Carlo estimates in MCMC simulations, while still leaving the target distribution invariant under the criterion of dynamically weighted importance sampling. Unlike the auxiliary variable MCMC algorithms, the new algorithm removes the need of perfect sampling, and thus can be applied to a wide range of problems for which perfect sampling is not available or very expensive. The new algorithm can also be used for simulating from the incomplete posterior distribution for the missing data problem.
Communications in Statistics - Simulation and Computation | 2018
Byoung Cheol Jung; Sooyoung Cheon; Hwa Kyung Lim
ABSTRACT The estimation of the mixtures of regression models is usually based on the normal assumption of components and maximum likelihood estimation of the normal components is sensitive to noise, outliers, or high-leverage points. Missing values are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this article, we propose the mixtures of regression models for contaminated incomplete heterogeneous data. The proposed models provide robust estimates of regression coefficients varying across latent subgroups even under the presence of missing values. The methodology is illustrated through simulation studies and a real data analysis.
Journal of Statistical Computation and Simulation | 2017
Hwa Kyung Lim; Jaejun Lee; Sooyoung Cheon
ABSTRACT In the expectation–maximization (EM) algorithm for maximum likelihood estimation from incomplete data, Markov chain Monte Carlo (MCMC) methods have been used in change-point inference for a long time when the expectation step is intractable. However, the conventional MCMC algorithms tend to get trapped in local mode in simulating from the posterior distribution of change points. To overcome this problem, in this paper we propose a stochastic approximation Monte Carlo version of EM (SAMCEM), which is a combination of adaptive Markov chain Monte Carlo and EM utilizing a maximum likelihood method. SAMCEM is compared with the stochastic approximation version of EM and reversible jump Markov chain Monte Carlo version of EM on simulated and real datasets. The numerical results indicate that SAMCEM can outperform among the three methods by producing much more accurate parameter estimates and the ability to achieve change-point positions and estimates simultaneously.