Dongseung Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dongseung Kim is active.

Explore More

Publication

Featured researches published by Dongseung Kim.

International Journal of Parallel Programming | 2003

Parallel Merge Sort with Load Balancing

Minsoo Jeon; Dongseung Kim

Parallel merge sort is useful for sorting a large quantity of data progressively. The merge sort should be parallelized carefully since the conventional algorithm has poor performance due to the successive reduction of the number of participating processors by half, and down to one in the last merging stage. The proposed load-balanced merge sort utilizes all processors throughout the computation. It evenly distributes data to all processors in each stage. Thus every processor is forced to work in all phases. Significant performance enhancement has been achieved up to a speedup of (P−1)/log P where P is the number of processors. Experimental results demonstrate a speedup of 9.6 (upper bound of 10.7) on 32-processor Cray T3E when sorting 4M 32-bit integers, and a speed up of 2.3 (upper bound of 2.8) on an 8-node PC cluster.

international conference on parallel and distributed systems | 2001

Communication-efficient bitonic sort on a distributed memory parallel computer

Yong Cheol Kim; Minsoo Jeon; Dongseung Kim; Andrew Sohn

Sort can be speeded up on parallel computers by dividing and computing data individually in parallel. Bitonic sorting can be parallelized, however, a great portion of execution time is consumed due to O(log/sup 2/P) time of data exchange of N/P keys where P, N are the number of processors and keys, respectively. This paper presents an efficient way of data communication in bitonic sort to minimize the interprocessor communication and comparison time. Before actual data movement, each pair processor exchanges the minimum and maximum in its list of keys to determine which keys are to be sent to its partner. Very often no keys need to exchange, or only a fraction of them are exchanged. At least 20% or greater of execution time could be reduced on the T3E computer in our experiments. We believe the scheme is a good way to shorten the communication time in similar applications.

computer and information technology | 2006

Improved Association Rule Mining by Modified Trimming

Wontae Hwang; Dongseung Kim

This paper presents a new association mining algorithm that uses two phase sampling for shortening the execution time at the cost of precision of the mining result. Previous FAST (Finding Association by Sampling Technique) algorithm has the weakness in that it only considered the frequent 1-itemsets in trimming/growing, thus, it did not have ways of considering mulit-itemsets including 2-itemsets. The new algorithm reflects the multi-itemsets in sampling transactions. It improves the mining results by adjusting the counts of both missing itemsets and false itemsets. Experimentally on a representative synthetic database, the accuracy of 2-itemsets reaches 0.68 compared to 0.46 while it maintains the same quality.

foundations of computer science | 2001

Hierarchical cluster for scalable web servers

Jonghyuck Hong; Dongseung Kim

Cluster architecture is widely used for the avail- ability and scalability of web servers in these days. To increase the performance we have developed a hierarchical architecture for web servers that can accommodate wide range of connection requests by gradually expanding the nodes in proper levels in the hierarchy. Especially, multiple dispatchers can distribute the incoming client requests to various backend servers in the cluster. Experimen- tal results with two-level cluster with dynamic load balancing are included to verify the idea.

ieee international conference on high performance computing data and analytics | 2000

Partitioned Parallel Radix Sort

Shin Jae Lee; Minsoo Jeon; Andrew Sohn; Dongseung Kim

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributedmemory platforms, including IBM SP2, CRAY T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGIT3E does 20% to 100% in execution time. PC cluster shows over 2.5 fold improvement in execution time.

international conference on information networking | 2008

A Method for Optimal Bandwidth Utilization in IEEE 802.11 WLAN Networks

Hu-Keun Kwak; Cheong Ghil Kim; Young-Hyo Yoon; Myung-Won Kim; Dongseung Kim; Kyu-Sik Chung

This paper proposes a load sharing scheme to maximize network bandwidth utilization in IEEE 802.11 WLAN networks using the SSED (Service Set Identifier) hiding. For this purpose, the proposed scheme keeps checking the available bandwidths of a group of wireless routers; selects the most bandwidth-optimal one; makes it visible to clients. Such that, whenever a client connects to a wireless router, only the selected one is visible to it while others are hiding. We implemented the proposed scheme with modifying the firmware of ASUS WL- 500 G wireless router and performed experiments. Experimental results show 35.6% performance increase in the bandwidth utilization compared to the conventional scheme.

computer and information technology | 2007

Load Balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers

Soonwook Hwang; Kyu-Sik Chung; Dongseung Kim

Algorithms of finding prime numbers up to some integer N by Sieve of Eratosthenes are simple and fast. However, even with the time complexity no greater than O(N In In N), it may take weeks or even years of CPU time if N is large like over 15 decimal digits. No known shortcut was reported yet. We develop efficient parallel algorithms to balance the workload of each computer, and to extend memory limit with disk storage to accommodate Giga-bytes of data. Our experiments show that a complete set of up to 14-digit prime numbers can be found in a week of computing time (CPU time) using our 8 1.6 GHz Pentium-4 PCs with Linux and MPI library. Also, by sieve of Eratosthenes, we think it is very unlikely that we can compute all primes up to 20 digits even using the fastest computers in the world.

international parallel and distributed processing symposium | 2002

Load-balanced parallel merge sort on distributed memory parallel computers

Minsoo Jeon; Dongseung Kim

Sort can be speeded up on parallel computers by dividing and computing data individually in parallel. Merge sort can be parallelized, however, the conventional algorithm implemented on distributed memory computers has poor performance due to the successive reduction of the number of active (non-idling) processors by a half, up to one in the last merging stage. This paper presents load-balanced parallel merge sort algorithm where all processors participate in merging throughout the computation. Data are evenly distributed to all processors, and every processor is forced to work in merging phase. Significant enhancement of the performance has been achieved. Our analysis shows the upper bound of the speedup of the merge time as (P -1)= logP. We have had a speedup of 9.6 (upper bound is 10.5) on 32-processor Cray T3E in sorting of 4M 32-bit integers. The same idea can be applied to parallellize other sorting algorithms.

ieee international conference on high performance computing data and analytics | 2004

High-speed parallel external sorting of data with arbitrary distribution

Minsoo Jeon; Dongseung Kim

Many parallel sorting algorithms of external disk data have been reported such as NOW-sort, SPsort, hill sort and so on. They all reduce the execution time compared with some known sequential sort; however, they differ in terms of the speed, throughput or cost-effectiveness. Mostly, they deal with uniformly distributed data in their value range. If we divide and redistribute data to processors by fixed and equal division of the key range, all processors will have about equal numbers of keys to sort and store. But if irregularly distributed data are given, the performance will suffer severely as the partitioning would no longer produce balanced loads among processors. Few research results have been reported for parallel external sort of data with arbitrary distribution. In this paper, we develop two distribution-insensitive scalable parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of keys, which eventually contribute to achieve good performance. Experimental results on a cluster of 16 Linux workstations show up to threefold enhancement of the performance compared with NOW-sort for sorting 16 GB integer keys.

international conference on parallel and distributed systems | 1997

Performance analysis and experiments of sorts on a parallel computer with parallel computation models

Dongseung Kim; Ilhong Yoon

This paper investigates the execution behaviors of parallel sorting algorithms on an experimental multiprocessor (KuPP) and compares with predicted performance under LogP and BSP (Bulk Synchronous Parallel) models. Since the communication overhead is considered a primary candidate for improvement, a few schemes are devised and experimented on KuPP to reduce the time spent in communication, thus to enhance the overall performance. The authors believe the ideas can be adopted in other high-performance parallel computers.

Explore More