Jong Soo Park | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jong Soo Park is active.

Explore More

Publication

Featured researches published by Jong Soo Park.

international conference on management of data | 1995

An effective hash-based algorithm for mining association rules

Jong Soo Park; Ming-Syan Chen; Philip S. Yu

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

international conference on management of data | 1999

Fast algorithms for projected clustering

Charu C. Aggarwal; Joel L. Wolf; Philip S. Yu; Cecilia M. Procopiuc; Jong Soo Park

The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.

IEEE Transactions on Knowledge and Data Engineering | 1998

Efficient data mining for path traversal patterns

Ming-Syan Chen; Jong Soo Park; Philip S. Yu

The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. The solution procedure consists of two steps. First, they derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, one can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, they derive algorithms to determine the frequent traversal patterns-i.e., large reference sequences-from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted.

IEEE Transactions on Knowledge and Data Engineering | 1997

Using a hash-based method with transaction trimming for mining association rules

Jong Soo Park; Ming-Syan Chen; Philip S. Yu

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, t...We examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying, within this candidate set, these itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of a candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. An extensive simulation study is conducted to evaluate performance of the proposed algorithm.

international conference on distributed computing systems | 1996

Data mining for path traversal patterns in a web environment

Ming-Syan Chen; Jong Soo Park; Philip S. Yu

In this paper, we explore a new data mining capability which involved mining path traversal patterns in a distributed information providing environment like world-wide-web. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some backward references which are mainly made for ease of traveling. Second, we derive algorithms to determine the frequent traversal patterns, i.e., large reference sequences, from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences: one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed.

conference on information and knowledge management | 1995

Efficient parallel data mining for association rules

Jong Soo Park; Ming-Syan Chen; Philip S. Yu

In this paper, we develop an algorithm, called PDM, to conduct parallel data mining for association rules. Consider a transaction as a collection of items, and a large itemset is a set of items such that the number of transactions containing it exceeds a pre-specilied threshold. PDM is so designed that the global set of large itemsets can be identified efficiently and the amount of inter-node data exchange required is minimized. SpecificaUy, with a given database partition, each processing node will collect (count ) information on each itemset from its local database efficiently via a hashing method. The information discovered by each node is next shared with other nodes via some communication schemes. Then, PDM employs a technique, called clue-andpoll, to address the uncertainty due to the partial knowledge collected at each node by judiciously selecting a small fraction of the itemsets for the exchange of count information among nodes, thus reducing the communication cost. The global set of large iternsets can hence be determined based on the aggregate count of itemsets. It is experimentally shown that PDM not only attains very good parallelization efficiencies, but also provides robust performance for various input patterns.

Physica A-statistical Mechanics and Its Applications | 2008

Statistical Analysis of the Metropolitan Seoul Subway System: Network Structure and Passenger Flows

Keumsook Lee; Woo-Sung Jung; Jong Soo Park; M. Y. Choi

The Metropolitan Seoul Subway system, consisting of 380 stations, provides the major transportation mode in the metropolitan Seoul area. Focusing on the network structure, we analyze statistical properties and topological consequences of the subway system. We further study the passenger flows on the system, and find that the flow weight distribution exhibits a power-law behavior. In addition, the degree distribution of the spanning tree of the flows also follows a power law.

conference on information and knowledge management | 1997

Mining association rules with adjustable accuracy

Jong Soo Park; Philip S. Yu; Ming-Syan Chen

In this paper, we devise efficient algorithms for mining association rules with adjustable accuracy. It is noted that several applications require mining the transaction data to capture the customer behavior frequently. In those applications, the efficiency of data mining could be a more important faktor t.han the requirement for complete accuracy of the mining results. Allowing imprecise results can significantly improve the data mining efficiency. In this paper, two methods for mining association rules with adjustable accuracy are developed. By dealing with the concept of sampling, both methods obtain some essential knowledge from a sampled subset first, and in light of that knowledge, perform efficient association rule mining on the entire database. A technique of relaxing the support factor based on the sampling size is devised to achieve the desired level of accuracy. These two methods differ from each other in their ways of utilizing the sampled data. Performance of these two methods is comparatively analyzed. As shown by our experimental results, the relaxation factor, as well as the sample size, can be properly adjusted so as to improve the result accuracy while minimizing the corresponding execution time, thereby allowing us to effectively achieve a design trade-off between accuracy and efficiency with two control parameters. It is shown that with the advantage of controlled sampling, the proposed methods are very flexible and efficient, and can in general lead to results of a very high degree of accuracy.

Scientific Reports | 2016

Reducing-Agent-Free Instant Synthesis of Carbon-Supported Pd Catalysts in a Green Leidenfrost Droplet Reactor and Catalytic Activity in Formic Acid Dehydrogenation.

Dongwook Lee; Min-Ho Jin; Young-Joo Lee; Ju-Hyoung Park; Chun-Boo Lee; Jong Soo Park

The development of green synthesis methods for supported noble metal catalysts remains important challenges to improve their sustainability. Here we first synthesized carbon-supported Pd catalysts in a green Leidenfrost droplet reactor without reducing agents, high-temperature calcination and reduction procedures. When the aqueous solution containing Pd nitrate precursor, carbon support, and water is dripped on a hot plate, vapor layer is formed between a solution droplet and hot surface, which allow the solution droplet to be levitated on the hot surface (Leidenfrost phenomena). Subsequently, Pd nanoparticles can be prepared without reducing agents in a weakly basic droplet reactor created by the Leidenfrost phenomena, and then the as-prepared Pd nanoparticles are loaded on carbon supports during boiling down the droplet on hot surface. Compared to conventional incipient wetness and chemical synthetic methods, the Leidenfrost droplet reactor does not need energy-consuming, time-consuming, and environmentally unfriendly procedures, which leads to much shorter synthesis time, lower carbon dioxide emission, and more ecofriendly process in comparison with conventional synthesis methods. Moreover, the catalysts synthesized in the Leidenfrost droplet reactor provided much better catalytic activity for room-temperature formic acid decomposition than those prepared by the incipient wetness method.

Journal of Physics A | 2011

Master equation approach to the intra-urban passenger flow and application to the Metropolitan Seoul Subway system

Keumsook Lee; Segun Goh; Jong Soo Park; Woo-Sung Jung; M. Y. Choi

The master equation approach is proposed to describe the evolution of passengers in a subway system. With the transition rate constructed from simple geographical consideration, the evolution equation for the distribution of subway passengers is found to bear skew distributions including log-normal, Weibull, and power-law distributions. This approach is then applied to the Metropolitan Seoul Subway system: analysis of the trip data of all passengers in a day reveals that the data in most cases fit well to the log-normal distributions. Implications of the results are also discussed.

Explore More