Is this you? Create Your Porfile

Soonwook Hwang

Korea Institute of Science and Technology Information

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soonwook Hwang is active.

Explore More

Publication

Featured researches published by Soonwook Hwang.

Bioorganic & Medicinal Chemistry Letters | 2011

Virtual screening identification of novel severe acute respiratory syndrome 3C-like protease inhibitors and in vitro confirmation

Thi Thanh Hanh Nguyen; Hwa-Ja Ryu; Sehoon Lee; Soonwook Hwang; Vincent Breton; Joon Haeng Rhee; Doman Kim

Abstract The 3C-like protease (3CLpro) of severe acute respiratory syndrome associated coronavirus (SARS-CoV) is vital for SARS-CoV replication and is a promising drug target. Structure based virtual screening of 308307 chemical compounds was performed using the computation tool Autodock 3.0.5 on a WISDOM Production Environment. The top 1468 ranked compounds with free binding energy ranging from −14.0 to −17.09kcalmol−1 were selected to check the hydrogen bond interaction with amino acid residues in the active site of 3CLpro. Fifty-three compounds from 35 main groups were tested in an in vitro assay for inhibition of 3CLpro expressed by Escherichia coli. Seven of the 53 compounds were selected; their IC50 ranged from 38.57±2.41 to 101.38±3.27μM. Two strong 3CLpro inhibitors were further identified as competitive inhibitors of 3CLpro with K i values of 9.11±1.6 and 9.93±0.44μM. Hydrophobic and hydrogen bond interactions of compound with amino acid residues in the active site of 3CLpro were also identified.

ieee international conference on high performance computing data and analytics | 2012

Abstract: HTCaaS: A Large-Scale High-Throughput Computing by Leveraging Grids, Supercomputers and Cloud

Seungwoo Rho; Seoyoung Kim; Sangwan Kim; Seokkyoo Kim; Jik-Soo Kim; Soonwook Hwang

We present the HTCaaS (High-Throughput Computing as a Service) system which aims to provide researchers with ease of exploring large-scale and complex HTC problems by leveraging Supercomputers, Grids, and Cloud. HTCaaS can hide heterogeneity and complexity of harnessing different types of computing infrastructures from users, and efficiently submit a large number of jobs at once by effectively managing and exploiting of all available computing resources. Our system has been effectively integrated with national Supercomputers in Korea, international computational Grids, and Amazon EC2 resulting in combining a vast amount of computing resources to support most challenging scientific problems.

IEEE Transactions on Parallel and Distributed Systems | 2016

Resource Allocation Policies for Loosely Coupled Applications in Heterogeneous Computing Systems

Eunji Hwang; Suntae Kim; Tae-kyung Yoo; Jik-Soo Kim; Soonwook Hwang; Young-ri Choi

High-Throughput Computing (HTC) and Many-Task Computing (MTC) paradigms employ loosely coupled applications which consist of a large number, from tens of thousands to even billions, of independent tasks. To support such large-scale applications, a heterogeneous computing system composed of multiple computing platforms with different types such as supercomputers, grids, and clouds can be used. On allocating heterogeneous resources of the system to multiple users, there are three important aspects to consider: fairness among users, efficiency for maximizing the system throughput, and user satisfaction for reducing the average user response time. In this paper, we present three resource allocation policies for multi-user and multi-application workloads in a heterogeneous computing system. These three policies are a fairness policy, a greedy efficiency policy, and a fair efficiency policy. We evaluate and compare the performance of the three resource allocation policies over various settings of a heterogeneous computing system and loosely coupled applications, using simulation based on the trace from real experiments. Our simulation results show that the fair efficiency policy can provide competitive efficiency, with a balanced level of fairness and user satisfaction, compared to the other two resource allocation policies.

Cluster Computing | 2014

Towards effective science cloud provisioning for a large-scale high-throughput computing

Seoyoung Kim; Jik-Soo Kim; Soonwook Hwang; Yoonhee Kim

The science cloud paradigm has been actively developed and investigated, but still requires a suitable model for science cloud system in order to support increasing scientific computation needs with high performance. This paper presents an effective provisioning model of science cloud, particularly for large-scale high throughput computing applications. In this model, we utilize job traces where a statistical method is applied to pick the most influential features to improve application performance. With these features, a system determines where VM is deployed (allocation) and which instance type is proper (provisioning). An adaptive evaluation step which is subsequent to the job execution enables our model to adapt to dynamical computing environments. We show performance achievements by comparing the proposed model with other policies through experiments and expect noticeable improvements on performance as well as reduction of cost from resource consumption through our model.

networked computing and advanced information management | 2008

Improvement of Task Retrieval Performance Using AMGA in a Large-Scale Virtual Screening

Sunil Ahn; N. G. Kim; Seehoon Lee; Soonwook Hwang; Dukyun Nam; Birger Koblitz; Vincent Breton; Sang-Yong Han

In this paper, we address performance and scalability issues when AMGA (ARDA Metadata Grid Application) is used as a metadata service for task retrieval in the WISDOM (Wide in Silico Docking on Malaria) environment, and propose optimization techniques to deal with the issues. First, to deal with the performance problem due to the communication overhead caused by the need for jobs to call a series of AMGA operations in order for them to retrieve a task from the AMGA server in the WISDOM environment, we propose a new AMGA operation which allows jobs deployed on the Grid to retrieve a task in a single operation instead of calling series of existing AMGA operations. According to the performance study that we have done, the throughput of task retrieval using the new AMGA operation can be as much as 70 times higher than the throughput of using the existing AMGA operations. Second, to address the scalability problem when thousands of jobs running have access to the single AMGA server concurrently in an attempt to grab available tasks, we propose the use of multiple AMGA servers for the purpose of task retrieval. Our test results demonstrate that throughput can be improved linearly in proportion to the number of AMGA servers set up for load balancing.

2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W) | 2016

KOHA: Building a Kafka-Based Distributed Queue System on the Fly in a Hadoop Cluster

Cao Ngoc Nguyen; Jik-Soo Kim; Soonwook Hwang

Message queues take a crucial role in a distributed and scalable system by interconnecting loosely-coupled and autonomic computational units. Among the state of art distributed message queue systems, Apache Kafka has been able to achieve high throughput, low latency, and good load-balancing. Recently, we have worked on developing a new data processing framework that can efficiently handle a very large number of tasks on top of a Hadoop cluster by effectively leveraging Kafka as a job queue, which motivated us to explore more opportunities of utilizing Kafka in the Hadoop platform. The Apache Hadoop has already become the de facto big data processing infrastructure and with the help of YARN, it is now evolving into multi-use data platform that can harness various types of data processing workflows. Therefore, effectively utilizing Kafka for various purposes including message distribution, task processing, metadata management in a Hadoop cluster can potentially contribute to the expansion of current Hadoop ecosystem. In this paper, we design and implement a framework called KOHA (Kafka On HAdoop) that provides users with a simple, convenient and powerful way to develop a large-scale distributed Kafka-based application running on top of a Hadoop cluster. The framework automatically builds and starts Kafka brokers on the fly and allocates resources to launch producers and consumers. Users can use the framework to adopt Apache Kafka without any understanding of YARN programming model and efforts to deploy a Kafka cluster. In addition, we also present a use case of the framework to evaluate Kafkas performance with various test cases and working scenarios. The experimental results allow Kafkas potential users to perceive the influences of different settings on the queuing performance.

ieee/acm international symposium cluster, cloud and grid computing | 2015

Platform and Co-Runner Affinities for Many-Task Applications in Distributed Computing Platforms

Seontae Kim; Eunji Hwang; Tae-kyung Yoo; Jik-Soo Kim; Soonwook Hwang; Young-ri Choi

Recent emerging applications from a wide range of scientific domains often require a very large number of loosely coupled tasks to be efficiently processed. To support such applications effectively, all the available resources from different types of computing platforms such as supercomputers, grids, and clouds need to be utilized. However, exploiting heterogeneous resources from the platforms for multiple loosely coupled many-task applications is challenging, since the performance of an application can vary significantly depending on which platform is used to run it, and which applications co-run in the same node with it. In this paper, we analyze the platform and co-runner affinities of many-task applications in distributed computing platforms. We perform a comprehensive experimental study using four different platforms, and five many-task applications. We then present a two-level scheduling algorithm, which distributes the resources of different platforms to each application based on the platform affinity in the first level, and maps tasks of the applications to computing nodes based on the co-runner affinity for each platform in the second level. Finally, we evaluate the performance of our scheduling algorithm, using a trace-based simulator. Our simulation results demonstrate that our scheduling algorithm can improve the performance up to 30.0%, compared to a baseline scheduling algorithm.

international conference on e-science | 2016

MOHA: Many-task computing meets the big data platform

Jik-Soo Kim; Cao Ngoc Nguyen; Soonwook Hwang

Many-Task Computing (MTC) has been a new computing paradigm that aims to bridge the gap between traditional High-Throughput Computing (HTC) and High-Performance Computing (HPC). MTC applications from various scientific domains such as pharmaceuticals, astronomy, physics often consist of a very large number (from thousands to even billions) of data-intensive (tens of MB of I/O per second) tasks with relatively short per task execution times (from seconds to minutes). Each task in MTC applications may require relatively small amount of data processing especially compared to existing Big Data applications typically based on larger data block sizes (e.g. the default block size in Hadoop is 64MB). However, they can consist of much larger numbers of tasks where each task communicates through files instead of message passing interfaces such as MPI in HPC applications. Therefore, MTC can be another type of data-intensive workload where a large number of data processing tasks should be efficiently processed within a relatively short period of time. In this paper, we present design and implementation of MOHA (Many-task computing On HAdoop) which can make an effective convergence of MTC technologies and the existing Big Data platform Hadoop. MOHA is developed as a Hadoop YARN application so that it can transparently co-host existing MTC applications with other Big Data processing frameworks such as MapReduce in a single Hadoop cluster. Our evaluation results based on microbenchmark show that MOHA can substantially reduce the overall execution time of many-task processing with minimal amount of resources compared to an existing Hadoop YARN application. In addition, MOHA can efficiently dispatch a large number of tasks which can be crucial to support challenging MTC applications. MOHA can bring many interesting research issues related to data grouping and declustering on Hadoop Distributed File System (HDFS), scalable job/metadata management, dynamic task load balancing which can ultimately contribute to a new data processing framework in the YARN based Hadoop 2.0 ecosystem.

ieee/acm international symposium cluster, cloud and grid computing | 2015

A Comparative Analysis of Scheduling Mechanisms for Virtual Screening Workflow in a Shared Resource Environment

Jik-Soo Kim; Seungwoo Rho; Seoyoung Kim; Sangwan Kim; Soonwook Hwang; Emmanuel Medernach; Vincent Breton

Traditional High-Throughput Computing (HTC) consists of running many loosely-coupled tasks that are independent but requires a large amount of computing power during significant period of time. However, recent emerging applications requiring millions or even billions of tasks to be processed within a relatively short period of time have expanded the traditional HTC into Many-Task Computing (MTC).In silico drug discovery offers an efficient alternative to reduce the cost of drug development and discovery process. For this purpose, virtual screening is used to select the most promising candidate drugs for in vitro testing from millions of chemical compounds. This process requires a substantial amount of computing resources and high-performance processing of docking simulations, which shows the typical characteristics of MTC applications. As the number of users performing this virtual screening process increases with limited available computing resources, it becomes crucial to devise an effective scheduling policy that can ensure a certain degree of fairness and user satisfaction. In this paper, we present a comparative analysis of scheduling mechanisms for the virtual screening workflow where multiple users in the system are sharing a common service infrastructure. To effectively support these multiple users, the underlying system should be able to consider fairness, user response time and overall system throughput. We have implemented two different scheduling algorithms which can address fairness and user response time respectively in a common middleware stack called HTCaaS which is a pilot-job based multi-level scheduling system running on top of a dedicated production-level cluster. Throughout comparative analysis of two different scheduling mechanisms targeting different metrics on top of a single H/W and S/W system, we can give an insight to the research community on the design and implementation of a scheduling mechanism that can trade-off user fairness and overall system performance whichs crucial to support challenging MTC applications.

Cluster Computing | 2015

High performance parallelization of Boyer---Moore algorithm on many-core accelerators

Yosang Jeong; Myungho Lee; Dukyun Nam; Jik-Soo Kim; Soonwook Hwang

Boyer–Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. Using the two shift rules, pattern matching operations are performed against the target input string in the second phase. The string shift rules calculated in the first phase let parts of the target input string be skipped where there are no matches to be found in the second phase. The second phase is a time consuming process and needs to be parallelized in order to realize the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU along with the general-purpose multi-core microprocessors. For the parallel string matching, the target input data is partitioned amongst multiple threads. Data lying on the threads’ boundaries is searched redundantly so that the pattern string lying on the boundary between two neighboring threads cannot be missed. The redundant data search overheads increases significantly for a large number of threads. For a fixed target input length, the number of possible matches decreases as the pattern length increases. Furthermore, the positions of the pattern string are spread all over the target data randomly. This leads to the unbalanced workload distribution among threads. We employ the dynamic scheduling and the multithreading techniques to deal with the load balancing issue. We also use the algorithmic cascading technique to maximize the benefit of the multithreading and to reduce the overheads associated with the redundant data search between neighboring threads. Our parallel implementation leads to

Explore More