Cheolwoo Park
University of Georgia
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Cheolwoo Park.
Computer Networks | 2005
Stilian Stoev; Murad S. Taqqu; Cheolwoo Park; J. S. Marron
The fluctuations of Internet traffic possess an intricate structure which cannot be simply explained by long-range dependence and self-similarity. In this work, we explore the use of the wavelet spectrum, whose slope is commonly used to estimate the Hurst parameter of long-range dependence. We show that much more than simple slope estimates are needed for detecting important traffic features. In particular, the multi-scale nature of the traffic does not admit simple description of the type attempted by the Hurst parameter. By using simulated examples, we demonstrate the causes of a number of interesting effects in the wavelet spectrum of the data. This analysis leads us to a better understanding of several challenging phenomena observed in real network traffic. Although the wavelet analysis is robust to many smooth trends, high-frequency oscillations and non-stationarities such as abrupt changes in the mean have an important effect. In particular, the breaks and level-shifts in the local mean of the traffic rate can lead one to overestimate the Hurst parameter of the time series. Novel statistical techniques are required to address such issues in practice.
Computational Statistics & Data Analysis | 2006
Stilian Stoev; Murad S. Taqqu; Cheolwoo Park; George Michailidis; J. S. Marron
The Hurst parameter H characterizes the degree of long-range dependence (and asymptotic self-similarity) in stationary time series. Many methods have been developed for the estimation of H from data. In practice, however, the classical estimation techniques can be severely affected by non-stationary artifacts in the time series. In fact, the assumption that the data can be modeled by a stationary process with a single Hurst exponent H may be unrealistic. This work focuses on practical issues associated with the detection of long-range dependence in Internet traffic data and proposes two tools that can be used to address some of these issues. The first is an animation tool which is used to visualize the local dependence structure. The second is a statistical tool for the local analysis of self-similarity (LASS). The LASS tool is designed to handle time series that have long-range dependence and are long enough that some parts are essentially stationary, while others exhibit non-stationarity, which is either deterministic or stochastic in nature. The tool exploits wavelets to analyze the local dependence structure in the data over a set of windows. It can be used to visualize local deviations from self-similar, long-range dependence scaling and to provide reliable local estimates of the Hurst exponents. The tool, which is illustrated by using a trace of Internet traffic measurements, can also be applied to economic time series. In addition, a median-based wavelet spectrum is introduced. It yields robust local or global estimates of the Hurst parameter that are less susceptible to local non-stationarity. The software tools are freely available and their use is described in an appendix.
Journal of Applied Statistics | 2011
Cheolwoo Park; Félix Hernández-Campos; Long Le; J. S. Marron; Juhyun Park; Vladas Pipiras; F.D. Smith; Richard L. Smith; Michele Trovero; Zhengyuan Zhu
Long-range-dependent time series are endemic in the statistical analysis of Internet traffic. The Hurst parameter provides a good summary of important self-similar scaling properties. We compare a number of different Hurst parameter estimation methods and some important variations. This is done in the context of a wide range of simulated, laboratory-generated, and real data sets. Important differences between the methods are highlighted. Deep insights are revealed on how well the laboratory data mimic the real data. Non-stationarities, which are local in time, are seen to be central issues and lead to both conceptual and practical recommendations.
Journal of Applied Statistics | 2004
Cheolwoo Park; J. S. Marron; Vitaliana Rondonotti
In this paper, we extend SiZer (SIgnificant ZERo crossing of the derivatives) to dependent data for the purpose of goodness-of-fit tests for time series models. Dependent SiZer compares the observed data with a specific null model being tested by adjusting the statistical inference using an assumed autocovariance function. This new approach uses a SiZer type visualization to flag statistically significant differences between the data and a given null model. The power of this approach is demonstrated through some examples of time series of Internet traffic data. It is seen that such time series can have even more burstiness than is predicted by the popular, long- range dependent, Fractional Gaussian Noise model.
Computational Statistics & Data Analysis | 2008
Cheolwoo Park; Kee-Hoon Kang
In this article we introduce a graphical method for the test of the equality of two regression curves. Our method is based on SiZer (SIgnificant ZERo crossing of the differences) analysis, which is a scale-space visualization tool for statistical inferences. The proposed method does not require any specification of smoothing parameters, it offers a device to compare in a wide range of resolutions, instead. This enables us to find the differences between two curves that are present at each resolution level. The extension of the proposed method to the comparison of more than two regression curves is also done using residual analysis. A broad simulation study is conducted to demonstrate the sample performance of the proposed tool. Applications with two real examples are also included.
Stochastic Models | 2005
Félix Hernández-Campos; Cheolwoo Park; J. S. Marron; Sidney I. Resnick
ABSTRACT For bivariate heavy tailed data, the extremes may carry distinctive dependence information not seen from moderate values. For example, a large value in one component may help cause a large value in the other. This is the idea behind the notion of extremal dependence. We discuss ways to detect and measure extremal dependence. We apply the techniques discussed to internet data and conclude that for files transferred, file size and throughput (the inferred rate at which the file is transferred) exhibit extremal independence.
Computational Statistics & Data Analysis | 2007
Cheolwoo Park; Fred Godtliebsen; Murad S. Taqqu; Stilian Stoev; J. S. Marron
SiZer (SIgnificant ZERo crossing of the derivatives) and SiNos (SIgnificant NOn-Stationarities) are scale-space based visualization tools for statistical inference. They are used to discover meaningful structure in data through exploratory analysis involving statistical smoothing techniques. Wavelet methods have been successfully used to analyze various types of time series. In this paper, we propose a new time series analysis approach, which combines the wavelet analysis with the visualization tools SiZer and SiNos. We use certain functions of wavelet coefficients at different scales as inputs, and then apply SiZer or SiNos to highlight potential non-stationarities. We show that this new methodology can reveal hidden local non-stationary behavior of time series, that are otherwise difficult to detect.
Journal of Statistical Computation and Simulation | 2013
Young Joo Yoon; Cheolwoo Park; Taewook Lee
Penalized regression methods have recently gained enormous attention in statistics and the field of machine learning due to their ability of reducing the prediction error and identifying important variables at the same time. Numerous studies have been conducted for penalized regression, but most of them are limited to the case when the data are independently observed. In this paper, we study a variable selection problem in penalized regression models with autoregressive (AR) error terms. We consider three estimators, adaptive least absolute shrinkage and selection operator, bridge, and smoothly clipped absolute deviation, and propose a computational algorithm that enables us to select a relevant set of variables and also the order of AR error terms simultaneously. In addition, we provide their asymptotic properties such as consistency, selection consistency, and asymptotic normality. The performances of the three estimators are compared with one another using simulated and real examples.
Journal of Computational and Graphical Statistics | 2010
Cheolwoo Park; Thomas C. M. Lee; Jan Hannig
The SiZer methodology proposed by Chaudhuri and Marron (1999) is a valuable tool for conducting exploratory data analysis. Since its inception different versions of SiZer have been proposed in the literature. Most of these SiZer variants are targeting the mean structure of the data, and are incapable of providing any information about the quantile composition of the data. To fill this need, this article proposes a quantile version of SiZer for the regression setting. By inspecting the SiZer maps produced by this new SiZer, real quantile structures hidden in a dataset can be more effectively revealed, while at the same time spurious features can be filtered out. The utility of this quantile SiZer is illustrated via applications to both real data and simulated examples. This article has supplementary material online.
Statistical Analysis and Data Mining | 2012
Jeongyoun Ahn; Muliang Peng; Cheolwoo Park; Yongho Jeon
We consider interval-valued data that frequently appear with advanced technologies in current data collection processes. Interval-valued data refer to the data that are observed as ranges instead of single values. In the last decade, several approaches to the regression analysis of interval-valued data have been introduced, but little work has been done on relevant statistical inferences concerning the regression model. In this paper, we propose a new approach to fit a linear regression model to interval-valued data using a resampling idea. A key advantage is that it enables one to make inferences on the model such as the overall model significance test and individual coefficient test. We demonstrate the proposed approach using simulated and real data examples, and also compare its performance with those of existing methods.
