Norman Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Norman Kim is active.

Explore More

Publication

Featured researches published by Norman Kim.

Expert Systems With Applications | 2011

Pattern selection approaches for the logical analysis of data considering the outliers and the coverage of a pattern

Jeong Han; Norman Kim; Bong-Jin Yum; Myong K. Jeong

Abstract The logical analysis of data (LAD) is one of the most promising data mining methods developed to date for extracting knowledge from data. The key feature of the LAD is the capability of detecting hidden patterns in the data. Because patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors. The LAD technique, however, tends to choose too many patterns by solving a set covering problem to build a classifier; this is especially the case when outliers exist in the data set. In the set covering problem of the LAD, each observation should be covered by at least one pattern, even though the observation is an outlier. Thus, existing approaches tend to select too many patterns to cover these outliers, resulting in the problem of overfitting. Here, we propose new pattern selection approaches for LAD that take both outliers and the coverage of a pattern into account. The proposed approaches can avoid the problem of overfitting by building a sparse classifier. The performances of the proposed pattern selection approaches are compared with existing LAD approaches using several public data sets. The computational results show that the sparse classifiers built on the patterns selected by the proposed new approaches yield an improved classification performance compared to the existing approaches, especially when outliers exist in the data set.

systems man and cybernetics | 2012

Kernel Ridge Regression with Lagged-Dependent Variable: Applications to Prediction of Internal Bond Strength in a Medium Density Fiberboard Process

Norman Kim; Young-Seon Jeong; Myong K. Jeong; Timothy M. Young

Medium density fiberboard (MDF) is one of the most popular products in wood composites industry. Kernel-based regression approaches such as the support vector machine for regression have been used to predict the final product quality characteristics of MDF. However, existing approaches for the prediction do not consider the autocorrelation of observations while exploring the nonlinearity of data. To avoid such a problem, this paper proposes a kernel-based regression model with lagged-dependent variables (LDVs) to consider both autocorrelations of response variables and the nonlinearity of data. We will explore the nonlinear relationship between the response and both independent variables and past response variables using various kernel functions. In this case, it will be difficult to apply existing kernel trick because of LDVs. We derive the kernel ridge estimators with LDVs using a new mapping idea so that the nonlinear mapping does not have to be computed explicitly. In addition, the centering technique of the individual mapped data in the feature space is derived to consider an intercept term in kernel ridge regression (KRR) with LDVs. The performances of the proposed approaches are compared with those of popular approaches such as KRR, ordinary least squares (OLS) with LDVs using simulated and real-life datasets. Experimental results show that the proposed approaches perform better than KRR or ridge regression and yield consistently better results than OLS with LDVs, implying that it can be used as a promising alternative when there are autocorrelations of response variables.

Annals of Operations Research | 2014

The sparse signomial classification and regression model

Kyungsik Lee; Norman Kim; Myong K. Jeong

Kernel-based methods (KBMs) such as support vector machines (SVMs) are popular data mining tools for solving classification and regression problems. Due to their high prediction accuracy, KBMs have been successfully used in various fields. However, KBMs have three major drawbacks. First, it is not easy to obtain an explicit description of the discrimination (or regression) function in the original input space and to make a variable selection decision in the input space. Second, depending on the magnitude and numeric range of the given data points, the resulting kernel matrices may be ill-conditioned, with the possibility that the learning algorithms will suffer from numerical instability. Although data scaling can generally be applied to deal with this problem and related issues, it may not always be effective. Third, the selection of an appropriate kernel type and its parameters can be a complex undertaking, with the choice greatly affecting the performance of the resulting functions. To overcome these drawbacks, we present here the sparse signomial classification and regression (SSCR) model. SSCR seeks a sparse signomial function by solving a linear program to minimize the weighted sum of the ℓ1-norm of the coefficient vector of the function and the ℓ1-norm of violation (or loss) caused by the function. SSCR employs the signomial function in the original variables and can therefore explore the nonlinearity in the data. SSCR is also less sensitive to numerical values or numeric ranges of the given data and gives a sparse explicit description of the resulting function in the original input space, which will be useful for the interpretation purpose in terms of which original input variables and/or interaction terms are more meaningful than others. We also present column generation techniques to select important signomial terms in the classification and regression processes and explore a number of theoretical properties of the proposed formulation. Computational studies demonstrate that SSCR is at the very least competitive and can even perform better compared to other widely used learning methods for classification and regression.

Quality and Reliability Engineering International | 2012

A Genetic‐Based Iterative Quantile Regression Algorithm for Analyzing Fatigue Curves

Jong In Park; Norman Kim; Suk Joo Bae

Accurate prediction of fatigue failure times of materials such as fracture and plastic deformation at various stress ranges has a strong bearing on practical fatigue design of materials. In this study, we propose a novel genetic-based iterative quantile regression (GA-IQR) algorithm for analyzing fatigue curves that represent a nonlinear relationship between a given stress amplitude and fatigue life. We reduce the problem to a linear framework and develop the iterative algorithm for determining the model coefficients including unknown fatigue limits. The procedure keeps updating the estimates in a direction to reduce its resulting error. Also, our approach benefits from the population-based stochastic search of the genetic algorithms so that the algorithm becomes less sensitive to its initialization. Compared with conventional approaches, the proposed GA-IQR requires fewer assumptions to develop fatigue model, capable of exploring the data structure in a relatively flexible manner. All procedures and calculations are quite straightforward, such that the proposed quantile regression model has a high potential value in a wide range of applications for exploring nonlinear relationships with lifetime data. Computational results for real data sets found in the literature present good evidences to support the argument. Copyright

Expert Systems With Applications | 2011

Comparisons of classification methods in the original and pattern spaces

Jeong Han; Norman Kim; Myong K. Jeong; Bong-Jin Yum

The logical analysis of data (LAD) is one of the most promising data mining and machine learning techniques developed to date for extracting knowledge from data. The LAD is based on the concepts of combinatorics, optimization, and Boolean functions. The key feature of the LAD is the capability of detecting hidden patterns in the data. Since patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other class. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors. The patterns are also interpretable and can serve as an essential tool for understanding the problem. These desirable properties of the patterns generated from the LAD motivate the use of the LAD patterns as input variables to other classification techniques to achieve a more stable and accurate performance. In this paper, the patterns generated from the LAD are used as the input variables to the decision tree and k-nearest neighbor classification methods. The applicability and usefulness of the LAD patterns for classification are investigated experimentally. The classification accuracy and sensitivity of the classification results for different classifiers in the original and pattern spaces are compared using several public data. The experimental results show that classifications in the pattern space can yield better and stable performance than those in the original space in terms of accuracy when the classification accuracy of the LAD is relatively good (i.e., the LAD patterns are of good quality), the ratio of the number of patterns to the total number of attributes is small, or the data set for classification is balanced between two classes.

Journal of the Operational Research Society | 2013

Multiphase Support Vector Regression for Function Approximation with Break-Points

Jong In Park; Norman Kim; Myong K. Jeong; Kyoung Seok Shin

In this paper, we propose a novel multiphase support vector regression (mp-SVR) technique to approximate a true relationship for the case where the effect of input on output changes abruptly at some break-points. A new formulation for mp-SVR is presented to allow such structural changes in regression function. And then, we present a new hybrid-encoding scheme in genetic algorithms to select the best combination of the kernel functions and to determine both break-points and hyperparameters of mp-SVR. The proposed method has a major advantage over the conventional ones that different kernel functions can be possibly adapted to different regions of the data domain. Computational results in two examples including a real-life data demonstrate its capability in capturing the local characteristics of the data more effectively. Consequently, the mp-SVR has a high potential value in a wide range of applications for function approximations.

Journal of Korean Institute of Industrial Engineers | 2010