Seungryoul Maeng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seungryoul Maeng is active.

Explore More

Publication

Featured researches published by Seungryoul Maeng.

ieee international conference on cloud computing technology and science | 2010

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seo; Edward J. Yoon; Jaehong Kim; Seongwook Jin; Jin-Soo Kim; Seungryoul Maeng

Various scientific computations have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demonstrate these primitives with different computation engines to show the performance and scalability. We believe the opportunity for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature.

Future Generation Computer Systems | 2011

Cost optimized provisioning of elastic resources for application workflows

Eun-Kyu Byun; Yang-Suk Kee; Jin-Soo Kim; Seungryoul Maeng

Workflow technologies have become a major vehicle for easy and efficient development of scientific applications. In the meantime, state-of-the-art resource provisioning technologies such as cloud computing enable users to acquire computing resources dynamically and elastically. A critical challenge in integrating workflow technologies with resource provisioning technologies is to determine the right amount of resources required for the execution of workflows in order to minimize the financial cost from the perspective of users and to maximize the resource utilization from the perspective of resource providers. This paper suggests an architecture for the automatic execution of large-scale workflow-based applications on dynamically and elastically provisioned computing resources. Especially, we focus on its core algorithm named PBTS (Partitioned Balanced Time Scheduling), which estimates the minimum number of computing hosts required to execute a workflow within a user-specified finish time. The PBTS algorithm is designed to fit both elastic resource provisioning models such as Amazon EC2 and malleable parallel application models such as MapReduce. The experimental results with a number of synthetic workflows and several real science workflows demonstrate that PBTS estimates the resource capacity close to the theoretical low bound.

international conference on cluster computing | 2009

HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment

Sangwon Seo; Ingook Jang; Kyungchang Woo; In-Kyo Kim; Jin-Soo Kim; Seungryoul Maeng

MapReduce is a programming model that supports distributed and parallel processing for large-scale data-intensive applications such as machine learning, data mining, and scientific simulation. Hadoop is an open-source implementation of the MapReduce programming model. Hadoop is used by many companies including Yahoo!, Amazon, and Facebook to perform various data mining on large-scale data sets such as user search logs and visit logs. In these cases, it is very common to share the same computing resources by multiple users due to practical considerations about cost, system utilization, and manageability. However, Hadoop assumes that all cluster nodes are dedicated to a single user, failing to guarantee high performance in the shared MapReduce computation environment. In this paper, we propose two optimization schemes, prefetching and pre-shuffling, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop. The proposed schemes are implemented in the native Hadoop-0.18.3 as a plug-in component called HPMR (High Performance MapReduce Engine). Our evaluation on the Yahoo!Grid platform with three different workloads and seven types of test sets from Yahoo! shows that HPMR reduces the execution time by up to 73%.

international conference on supercomputing | 2009

FTL design exploration in reconfigurable high-performance SSD for server applications

Ji-Yong Shin; Zenglin Xia; Ningyi Xu; Rui Gao; Xiong-Fei Cai; Seungryoul Maeng; Feng-Hsiung Hsu

Solid-state disks (SSDs) are becoming widely used in personal computers and are expected to replace a great portion of magnetic disks in servers and supercomputers. Although many high-speed SSDs are present in the market, both the design of hardware architecture and the details of the flash translation layer (FTL) are not well known. Meanwhile, in the systems requiring high-end storages, specially tuned SSDs can perform better than the generic ones, because the applications in such environment are usually fixed. Based on the architectural design of our reconfigurable high-performance SSD prototype and by using a trace-driven simulator, we explore the key factors and tradeoffs that must be considered when designing a customized FTL. FTL related issues, such as data allocation, cleaning, and wear leveling, are analyzed in detail presenting suitable design decisions for different workload characteristics. The experimental result shows that the figures for the performance metrics will vary from several percent to more than tens of times among each other depending on the decision made for designing each FTL functionality.

IEEE Computer Architecture Letters | 2010

Exploiting Internal Parallelism of Flash-based SSDs

Seonyeong Park; Euiseong Seo; Ji-Yong Shin; Seungryoul Maeng; Joonwon Lee

For the last few years, the major driving force behind the rapid performance improvement of SSDs has been the increment of parallel bus channels between a flash controller and flash memory packages inside the solid-state drives (SSDs). However, there are other internal parallelisms inside SSDs yet to be explored. In order to improve performance further by utilizing the parallelism, this paper suggests request rescheduling and dynamic write request mapping. Simulation results with real workloads have shown that the suggested schemes improve the performance of the SSDs by up to 15% without any additional hardware support.

high performance distributed computing | 2012

Locality-aware dynamic VM reconfiguration on MapReduce clouds

Jongse Park; Daewoo Lee; Bo-Kyeong Kim; Jaehyuk Huh; Seungryoul Maeng

Cloud computing based on system virtualization, has been expanding its services to distributed data-intensive platforms such as MapReduce and Hadoop. Such a distributed platform on clouds runs in a virtual cluster consisting of a number of virtual machines. In the virtual cluster, demands on computing resources for each node may fluctuate, due to data locality and task behavior. However, current cloud services use a static cluster configuration, fixing or manually adjusting the computing capability of each virtual machine (VM). The fixed homogeneous VM configuration may not adapt to changing resource demands in individual nodes. In this paper, we propose a dynamic VM reconfiguration technique for data-intensive computing on clouds, called Dynamic Resource Reconfiguration (DRR). DRR can adjust the computing capability of individual VMs to maximize the utilization of resources. Among several factors causing resource imbalance in the Hadoop platforms, this paper focuses on data locality. Although assigning tasks on the nodes containing their input data can improve the overall performance of a job significantly, the fixed computing capability of each node may not allow such locality-aware scheduling. DRR dynamically increases or decreases the computing capability of each node to enhance locality-aware task scheduling. We evaluate the potential performance improvement of DRR on a 100-node cluster, and its detailed behavior on a small scale cluster with constrained network bandwidth. On the 100-node cluster, DRR can improve the throughput of Hadoop jobs by 15% on average, and 41% on the private cluster with the constrained network connection.

IEEE Transactions on Computers | 2011

Energy Reduction in Consolidated Servers through Memory-Aware Virtual Machine Scheduling

Jae-Wan Jang; Myeongjae Jeon; Hyo-Sil Kim; Heeseung Jo; Jin-Soo Kim; Seungryoul Maeng

Increasing energy consumption in server consolidation environments leads to high maintenance costs for data centers. Main memory, no less than processor, is a major energy consumer in this environment. This paper proposes a technique for reducing memory energy consumption using virtual machine scheduling in multicore systems. We devise several heuristic scheduling algorithms by using a memory power simulator, which we designed and implemented. We also implement the biggest cover set first (BCSF) scheduling algorithm in the working server system. Through extensive simulation and implementation experiments, we observe the effectiveness of the memory-aware virtual machine scheduling in saving memory energy. In addition, we find out that power-aware memory management is essential to reduce the memory energy consumption.

Journal of Parallel and Distributed Computing | 2011

BTS: Resource capacity estimate for time-targeted science workflows

Eun-Kyu Byun; Yang-Suk Kee; Jin-Soo Kim; Ewa Deelman; Seungryoul Maeng

Workflow technologies have become a major vehicle for easy and efficient development of scientific applications. A critical challenge in integrating workflow technologies with state-of-the-art resource provisioning technologies is to determine the right amount of resources required for the execution of workflows. This paper introduces an approximation algorithm named BTS (Balanced Time Scheduling), which estimates the minimum number of computing hosts required to execute workflows within a user-specified finish time. The experimental results, based on a number of synthetic workflows and several real science workflows, demonstrate that the BTS estimate of resource capacity approaches to the theoretical lower bound. The BTS algorithm is scalable and its turnaround time is only tens of seconds, even with huge workflows with thousands of tasks and edges. Moreover, BTS achieves good performance with workflows having MPI-like parallel tasks. Finally, BTS can be easily integrated with any resource description languages and resource provisioning systems since the resource estimate of BTS is abstract.

architectural support for programming languages and operating systems | 2013

Demand-based coordinated scheduling for SMP VMs

Hwanju Kim; Sang-Wook Kim; Jinkyu Jeong; Joonwon Lee; Seungryoul Maeng

As processor architectures have been enhancing their computing capacity by increasing core counts, independent workloads can be consolidated on a single node for the sake of high resource efficiency in data centers. With the prevalence of virtualization technology, each individual workload can be hosted on a virtual machine for strong isolation between co-located workloads. Along with this trend, hosted applications have increasingly been multithreaded to take advantage of improved hardware parallelism. Although the performance of many multithreaded applications highly depends on communication (or synchronization) latency, existing schemes of virtual machine scheduling do not explicitly coordinate virtual CPUs based on their communication behaviors. This paper presents a demand-based coordinated scheduling scheme for consolidated virtual machines that host multithreaded workloads. To this end, we propose communication-driven scheduling that controls time-sharing in response to inter-processor interrupts (IPIs) between virtual CPUs. On the basis of in-depth analysis on the relationship between IPI communications and coordination demands, we devise IPI-driven coscheduling and delayed preemption schemes, which effectively reduce synchronization latency and unnecessary CPU consumption. In addition, we introduce a load-conscious CPU allocation policy in order to address load imbalance in heterogeneously consolidated environments. The proposed schemes are evaluated with respect to various scenarios of mixed workloads using the PARSEC multithreaded applications. In the evaluation, our scheme improves the overall performance of consolidated workloads, especially communication-intensive applications, by reducing inefficient synchronization latency.

Microprocessing and Microprogramming | 1990

Parallel simulation of multilayered neural networks on distributed-memory multiprocessors

Hyunsoon Yoon; Jong H. Nang; Seungryoul Maeng

Abstract In this paper, we present a parallel simulation of a fully connected multilayered neural network using the backpropagation learning algorithm on a distributed-memory multiprocessor system. In our system, the neurons on each layer are partitioned into p disjoint sets and each set is mapped on a processor of a p-processor system. A fully distributed backpropagation algorithm, necessary communication pattern among the processors, and their time/space complexities are investigated. The p-processor speed-up of the backpropagation algorithm over a single processor is also analyzed theoretically which can be used as a basis in determining the most cost-effective or optimal number of processors. The experimental results with a network of Transputers are also presented to demonstrate the usefulness of our system.

Explore More