Nam Hun Park | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nam Hun Park is active.

Explore More

Publication

Featured researches published by Nam Hun Park.

international conference on management of data | 2004

Statistical grid-based clustering over data streams

Nam Hun Park; Won Suk Lee

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This paper proposes a statistical grid-based approach to clustering data elements of a data stream. Initially, the multidimensional data space of a data stream is partitioned into a set of mutually exclusive equal-size initial cells. When the support of a cell becomes high enough, the cell is dynamically divided into two mutually exclusive intermediate cells based on its distribution statistics. Three different ways of partitioning a dense cell are introduced. Eventually, a dense region of each initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. A cluster of a data stream is a group of adjacent dense unit cells. In order to minimize the number of cells, a sparse intermediate or unit cell is pruned if its support becomes much less than a minimum support. Furthermore, in order to confine the usage of memory space, the size of a unit cell is dynamically minimized such that the result of clustering becomes as accurate as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics.

Information Sciences | 2010

Anomaly intrusion detection by clustering transactional audit streams in a host computer

Nam Hun Park; Sang Hyun Oh; Won Suk Lee

In anomaly intrusion detection, modeling the normal behavior of activities performed by a user is an important issue. To extract normal behavior from the activities of a user, conventional data mining techniques are widely applied to a finite audit data set. However, these approaches model only the static behavior of a user in the audit data set. This drawback can be overcome by viewing a users continuous activities as an audit data stream. This paper proposes an anomaly intrusion detection method that continuously models the normal behavior of a user over the audit data stream. A set of features is used to represent the characteristics of an activity. For each feature, clusters of feature values corresponding to activities observed thus far in an audit data stream are identified by a statistical grid-based clustering algorithm for a data stream. Each cluster represents the frequency range of the activities with respect to the feature. As a result, without the physical maintenance of any historical activity of the user, the users new activities can be continuously reflected in the ongoing results. At the same time, various statistics of activities related to the identified clusters are also modeled to improve the performance of anomaly detection. The proposed algorithm is illustrated by a series of experiments to identify various characteristics.

data and knowledge engineering | 2007

Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams

Nam Hun Park; Won Suk Lee

To effectively trace the clusters of recently generated data elements in an on-line data stream, a sibling list and a cell tree are proposed in this paper. Initially, the multi-dimensional data space of a data stream is partitioned into mutually exclusive equal-sized grid-cells. Each grid-cell monitors the recent distribution statistics of data elements within its range. The old distribution statistics of each grid-cell are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. Given a partitioning factor h, a dense grid-cell is partitioned into h equal-size smaller grid-cells. Such partitioning is continued until a grid-cell becomes the smallest one called a unit grid-cell. Conversely, a set of consecutive sparse grid-cells can be merged into a single grid-cell. A sibling list is a structure to manage the set of all grid-cells in a one-dimensional data space and it acts as an index for locating a specific grid-cell. Upon creating a dense unit grid-cell on a one-dimensional data space, a new sibling list for another dimension is created as a child of the grid-cell. In such a way, a cell tree is created. By repeating this process, a multi-dimensional dense unit grid-cell is identified by a path of a cell tree. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.

conference on information and knowledge management | 2007

Grid-based subspace clustering over data streams

Nam Hun Park; Won Suk Lee

A real-life data stream usually contains many dimensions and some dimensional values of its data elements may be missing. In order to effectively extract the on-going change of a data stream with respect to all the subsets of the dimensions of the data stream, a grid-based subspace clustering algorithm is proposed in this paper. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space is firstly monitored by a list of grid-cells called a sibling list. Once a dense grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found in the kth level of the sibling tree. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.

data and knowledge engineering | 2009

Efficiently tracing clusters over high-dimensional on-line data streams

Jaewoo Lee; Nam Hun Park; Won Suk Lee

A good clustering method should provide flexible scalability on the number of dimensions as well as the size of a data set. This paper proposes a method of efficiently tracing the clusters of a high-dimensional on-line data stream. While tracing the one-dimensional clusters of each dimension independently, a technique which is similar to frequent itemset mining is employed to find the set of multi-dimensional clusters. By finding a frequently co-occurred set of one-dimensional clusters, it is possible to trace a multi-dimensional rectangular space whose range is defined by the one-dimensional clusters collectively. In order to trace such candidates over a multi-dimensional online data stream, a cluster-statistics tree (CS-Tree) is proposed in this paper. A k-depth node(k=

Expert Systems With Applications | 2012

Comparative analysis of sequence weighting approaches for mining time-interval weighted sequential patterns

Joong Hyuk Chang; Nam Hun Park

Unlike the general sequential pattern mining that considers only the generation order of data elements, mining weighted sequential patterns aims to get more interesting sequential patterns by considering the weights of data elements in a target sequence database in addition to their generation order. In general, for a sequence or a sequential pattern, not only the generation order of data elements but also their generation times and time-intervals are important because they can be helpful in finding more interesting sequential patterns. Applying the mining method of time-interval weighted sequential (TiWS) patterns that has been proposed in our previous work, this paper proposes several sequence weighting approaches to get the time-interval weight of a sequence in mining TiWS patterns for a sequence database, and the effectiveness of each approach in mining TiWS patterns is analyzed through a set of experiments. The proposed sequence weighting approaches may be helpful in obtaining more interesting sequential patterns in mining sequential patterns for a sequence database.

2008 IEEE International Workshop on Semantic Computing and Applications | 2008

Anomaly Detection over Clustering Multi-dimensional Transactional Audit Streams

Nam Hun Park; Won Suk Lee

In anomaly detection, one important issue how to model the normal behavior of activities performed by a user is an important issue. To extract the normal behavior from the activities of a user, conventional data mining techniques are widely applied to a finite audit data set. However, these approaches can only model the static behavior of a user in the audit data set. This drawback can be overcome by viewing the continuous activities of a user as an audit data stream. This paper proposes an anomaly detection method that continuously models the normal behavior of a user over the multi-dimensional audit data stream. Each cluster represents the frequent range of the activities with respect to a set of features. As a result, without physically maintaining any historical activity of a user, the new activities of the user can be continuously reflected onto the on-going result. At the same time, various statistics of the activities related to the identified clusters are additionally modeled to improve the performance of anomaly detection. The proposed algorithm is analyzed by a series of experiments to identify various characteristics.

Archive | 2014

An Adaptive Teaching and Learning System for Efficient Ubiquitous Learning

Kil Hong Joo; Nam Hun Park; Jin Tak Choi

In this paper we present our pedagogical and technological approach for supporting the design of novel situated teaching and learning activities that can be conducted both, outside the school and in the classroom. Education has undergone major changes in recent years, with the development of digital information transfer, storage and communication methods having a significant effect. This development has allowed for access to global communications and the number of resources available to today’s students at all levels of schooling. Therefore, ubiquitous learning is a new educational paradigm made possible in part by the affordances of digital information. Ubiquitous learning is characterized by providing intuitive ways for identifying right learning collaborators, right learning contents and right learning services in the right place at the right time. This paper first creates ubiquitous environment, providing function enabling learning to take place anytime and anywhere with any available learning device, for ubiquitous learning according to various properties. Also, in order to improve of proposed ubiquitous system, this paper proposes the scaffolding and mentoring system. If the scaffolding and the mentoring in the ubiquitous learning are provided, studying efficiency would be maximized. Furthermore, the adaptive teaching and learning system with ubiquitous computing may offer great innovation in the delivery of education, allowing for personalization and customization to student needs. The experiments in studying achievements and attitudes of students are performed and show the application possibility of the ubiquitous teaching and learning model.

database and expert systems applications | 2010

Supporting multi-criteria decision support queries over time-interval data streams

Nam Hun Park; Venkatesh Raghavan; Elke A. Rundensteiner

Multi-criteria result extraction is crucial in many real-time stream processing applications, such as habitat and disaster monitoring. The ease in expressing user preferences makes skyline queries a popular class of queries. Skyline evaluation is computationally intensive especially over continuous time-interval streams where each object has its own individual expiration time. In this work, we propose TI-Sky - a continuous skyline evaluation framework. TI-Sky strikes a perfect balance between the costs of continuously maintaining the result space upon the arrival of new objects or the expiration of old objects, and the costs of computing the final skyline result from this space whenever a pull-based user query is received. This is achieved by incrementally maintaining a precomputed skyline result space at a higher level of abstraction and digging into the more expensive object-level processing only upon demand. Our experimental study demonstrates the superiority of TI-Sky over existing techniques.

international database engineering and applications symposium | 2008

Memory efficient subspace clustering for online data streams

Nam Hun Park; Won Suk Lee

Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.

Explore More