Moon Yul Huh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Moon Yul Huh is active.

Explore More

Publication

Featured researches published by Moon Yul Huh.

Computational Statistics & Data Analysis | 2003

A measure of association for complex data

Seung-Chun Lee; Moon Yul Huh

A measure of association for complex data types is proposed based on the measure of departure from independence using the p-value of a statistical independence test. The measure is numerically shown to be comparable to Silveys general measure of association. It is demonstrated with real data sets of complex data types that the measure works efficiently for the decision tree and the logistic regression at the initial stage of variable selection.

IEEE Transactions on Knowledge and Data Engineering | 2016

Booster in High Dimensional Data Classification

Hyun-Ji Kim; Byong Su Choi; Moon Yul Huh

Classification problems in high dimensional data with a small number of observations are becoming more common especially in microarray data. During the last two decades, lots of efficient classification models and feature selection (FS) algorithms have been proposed for higher prediction accuracies. However, the result of an FS algorithm based on the prediction accuracy will be unstable over the variations in the training set, especially in high dimensional data. This paper proposes a new evaluation measure Q-statistic that incorporates the stability of the selected feature subset in addition to the prediction accuracy. Then, we propose the Booster of an FS algorithm that boosts the value of the Q-statistic of the algorithm applied. Empirical studies based on synthetic data and 14 microarray data sets show that Booster boosts not only the value of the Q-statistic but also the prediction accuracy of the algorithm applied unless the data set is intrinsically difficult to predict with the given algorithm.

Computational Statistics & Data Analysis | 2007

Editorial: Special issue on statistical algorithms and software in R

Cristian Gatu; James E. Gentle; John Hinde; Moon Yul Huh

The journal of Computational Statistics and Data Analysis (CSDA) regularly publishes papers with a strong algorithmic and software component. Some recent CSDA related articles can be found in Bustos and Frery (2006), Hammill and Preisser (2006), Keeling and Pavur (2007), Novikov and Oberman (2007), Rosenthal (2007) and Tomasi and Bro (2006). This is the first CSDA special issue on Statistical Algorithms and Software. It brings together a number of contributions that relate to statistical software, algorithms, and methodology. The first five papers are concerned with statistical software and packages. Kratizig (2007) introduces an open-source Java software framework, JStatCom, that aims to support the development of rich desktop clients for data analysis. In the second article, Aluja-Banet et al. (2007) present a system for multipurpose data fusion based on the k-nearest neighbor hot-deck imputation method. Fujiwara et al. (2007) have implemented a general statistical language using Java and MathML technologies. A package for co-breaking analysis, COBRA, is presented by Massmann (2007). Finally, Höhle and Feldmann (2007) describes an R package, RLadyBug, for the simulation, visualization and estimation of stochastic epidemic models. The second part is concerned with statistical algorithms. Chavent et al. (2007) present a divisive hierarchical clustering algorithm, DIVCLUS-T, based on a monothetic bipartitional approach, allowing the dendrogram of the hierarchy to be read as a decision tree. Fernando and Kulatunga (2007) describe a Fortran program for the fitting of multivariate isotonic regression. Adaptive population-based search algorithms for the estimation of nonlinear regression parameters are proposed and implemented by Tvrdik et al. (2007). Gramm et al. (2007) provides a comparison and evaluation of algorithms for compact letter displays. Beninel and Grelaud (2007) devise algorithms for computing exact distribution values of statistics-linear combination of 3-nomial variables. The third and the last part of the special issue gathers contributions related to methodological algorithms. Park et al. (2007) present an algorithm for sampling streaming data with replacement. Bernholt et al. (2007) describe algorithms for computing the least quartile difference estimator in the plane. Several applications of random recursive partitioning are discussed by Iacus and Porro (2007). The article by Consonni and Marin (2007) illustrates the behavior of mean-field variational Bayesian inference in the setting of the probit model. Finally, Gatu et al. (2007) introduce a graph approach to the combinatorial problem of subset regression model selection.

Archive | 2004

Line Mosaic Plot: Algorithm and Implementation

Moon Yul Huh

Conventional mosaic plot is to graphically represent contingency tables by tiles whose size is proportional to the cell count. The plot is informative when we are well trained reading this. This paper introduces a new approach for mosaic plot called line mosaic plot which uses lines instead of tiles to represent the size of the cells in contingency tables. We also give a general straightforward algorithm to construct the plot directly from the data set while the conventional approach is to construct the plot from the cross tabulation. We demonstrate the effectiveness of this tool for visual inference using a real data set.

Computational Statistics | 2004

Adding visualization functions of DAVIS to Jasp: Mixing two Java-based statistical systems

Junji Nakano; Moon Yul Huh; Yoshikazu Yamamoto; Takeshi Fujiwara; Ikunori Kobayashi

SummaryJasp is an experimental general purpose Java-based statistical system which adopts several new computing technologies. It has a function-based and object-oriented language, an advanced user interface, flexible extensibility and a server/client architecture with distributed computing abilities. DAVIS is, on the other hand, a stand-alone Java-based system, and is designed for providing advanced data visualization functions with easy operations by a GUI. In this paper, it is made possible to use tools of DAVIS from within Jasp, in order that the new integrated system can handle not only data filtering and statistical analysis but also data visualization. We develop a mechanism for extending the server/client system of Jasp to realize an efficient collaboration with DAVIS in the client-side. It is shown that the mechanism is straightforward and simple.

Archive | 2006

Subset selection algorithm based on mutual information

Moon Yul Huh

Best subset selection problem is one of the classical problems in statistics and in data mining. When variables of concern are continuous types, the problem is classical in classical regression problems. Most of the data mining techniques including decision trees are designed to handle discrete type variables only. With complex data, most of the data mining techniques first transform continuous variables into discrete variables before applying the techniques. Hence the result depends on the discretiztion method applied. This paper proposes an algorithm to select a best subset using the original data set. The algorithm is based on mutual information (MI) introduced by Shannon [Shan48]. It computes MI’s of up to two-dimensional variables: both continuous, both discrete, or one continuous and one discrete. It has and automatic stopping criterion when appropriate subset is selected.

Communications for Statistical Applications and Methods | 2009

Variable Selection Based on Mutual Information

Moon Yul Huh; Byong Su Choi

Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.

Communications for Statistical Applications and Methods | 2004

Principles of Multivariate Data Visualization

Moon Yul Huh; Woon Ock Cha

Data visualization is the automation process and the discovery process to data sets in an effort to discover underlying information from the data. It provides rich visual depictions of the data. It has distinct advantages over traditional data analysis techniques such as exploring the structure of large scale data set both in the sense of number of observations and the number of variables by allowing great interaction with the data and end-user. We discuss the principles of data visualization and evaluate the characteristics of various tools of visualization according to these principles.

Communications for Statistical Applications and Methods | 2003

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

Woon Ock Cha; Moon Yul Huh

We evaluated the efficiencies of applying attribute selection methods and prior discretization to supervised learning, modelled by C4.5 and Naive Bayes. Three databases were obtained from UCI data archive, which consisted of continuous attributes except for one decision attribute. Four methods were used for attribute selection : MDI, ReliefF, Gain Ratio and Consistency-based method. MDI and ReliefF can be used for both continuous and discrete attributes, but the other two methods can be used only for discrete attributes. Discretization was performed using the Fayyad and Irani method. To investigate the effect of noise included in the database, noises were introduced into the data sets up to the extents of 10 or 20%, and then the data, including those either containing the noises or not, were processed through the steps of attribute selection, discretization and classification. The results of this study indicate that classification of the data based on selected attributes yields higher accuracy than in the case of classifying the full data set, and prior discretization does not lower the accuracy.

Communications for Statistical Applications and Methods | 2003

Contour Plot to Explore the Structure of Categorical Data

Hyun-Chul Kim; Moon Yul Huh; Hee Suk Chung

In this paper, contour plot is considered as a method to explore the structure of categorical data. For this purpose, the paper suggests a method to sort two-way contingency table with respect to the expected marginals. It is found that the suggested plot provides us with valuable information for the underlying data structure. Firstly, we can investigate independency between the categories by examining the differences of expected frequency contours and observed frequency contours. With the plot, we can also visually investigate the existence of outliers inherent in the data. These properties of the suggested contour plot will be demonstrated by several sets of real data.

Explore More