Young-Kyoon Suh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Young-Kyoon Suh is active.

Explore More

Publication

Featured researches published by Young-Kyoon Suh.

international conference on management of data | 2013

DBMS metrology: measuring query time

Sabah Currim; Richard T. Snodgrass; Young-Kyoon Suh; Rui Zhang; Matthew Wong Johnson; Cheng Yi

It is surprisingly hard to obtain accurate and precise measurements of the time spent executing a query. We review relevant process and overall measures obtainable from the Linux kernel and introduce a structural causal model relating these measures. A thorough correlational analysis provides strong support for this model. Using this model, we developed a timing protocol, which (1) performs sanity checks to ensure validity of the data, (2) drops some query executions via clearly motivated predicates, (3) drops some entire queries at a cardinality, again via clearly motivated predicates, (4) for those that remain, for each computes a single measured time by a carefully justified formula over the underlying measures of the remaining query executions, and (5) performs post-analysis sanity checks. The resulting query time measurement procedure, termed the Tucson Protocol, applies to proprietary and open-source DBMSes.

very large data bases | 2014

AZDBLab: a laboratory information system for large-scale empirical DBMS studies

Young-Kyoon Suh; Richard T. Snodgrass; Rui Zhang

In the database field, while very strong mathematical and engineering work has been done, the scientific approach has been much less prominent. The deep understanding of query optimizers obtained through the scientific approach can lead to better engineered designs. Unlike other domains, there have been few DBMS-dedicated laboratories, focusing on such scientific investigation. In this demonstration, we present a novel DBMS-oriented research infrastructure, called Arizona Database Laboratory (AZDBLab), to assist database researchers in conducting a large-scale empirical study across multiple DBMSes. For them to test their hypotheses on the behavior of query optimizers, AZDBLab can run and monitor a large-scale experiment with thousands (or millions) of queries on different DBMSes. Furthermore, AZDBLab can help users automatically analyze these queries. In the demo, the audience will interact with AZDBLab through the stand-alone application and the mobile app to conduct such a large-scale experiment for a study. The audience will then run a Tucson Timing Protocol analysis on the finished experiment and then see the analysis (data sanity check and timing) results.

modeling, analysis, and simulation on computer and telecommunication systems | 2012

Extent Mapping Scheme for Flash Memory Devices

Young-Kyoon Suh; Bongki Moon; Alon Efrat; Jin-Soo Kim; Sang-Won Lee

Flash memory devices commonly rely on traditional address mapping schemes such as page mapping, block mapping or a hybrid of the two. Page mapping is more flexible than block mapping or hybrid mapping without being restricted by block boundaries. However, its mapping table tends to grow large quickly as the capacity of flash memory devices does. To overcome this limitation, we propose a novel mapping scheme that is fundamentally different from the existing mapping strategies. We call this new scheme Virtual Extent Trie (VET), as it manages mapping information by treating each I/O request as an extent and by using extents as basic mapping units rather than pages or blocks. By storing extents instead of individual addresses, VET consumes much less memory to store mapping information and still remains as flexible as page mapping. We observed in our experiments that VET reduced memory consumption by up to an order of magnitude in comparison with the traditional mapping schemes for several real world workloads. The VET scheme also scaled well with increasing address spaces by synthetic workloads. With a binary search mechanism, VET limits the mapping time to O(log log|U |), where U denotes the set of all possible logical addresses. Though the asymptotic mapping cost of VET is higher than the O(1) time of a page mapping scheme, the amount of increased overhead was almost negligible or low enough to be hidden by an accompanying I/O operation.

Software - Practice and Experience | 2017

EMP: execution time measurement protocol for compute‐bound programs

Young-Kyoon Suh; Richard T. Snodgrass; John D. Kececioglu; Peter J. Downey; Robert S. Maier; Cheng Yi

Measuring execution time is one of the most used performance evaluation techniques in computer science research. Inaccurate measurements cannot be used for a fair performance comparison between programs. Despite the prevalence of its use, the intrinsic variability in the time measurement makes it hard to obtain repeatable and accurate timing results of a program running on an operating system. We propose a novel execution time measurement protocol (termed EMP) for measuring the execution time of a compute‐bound program on Linux, while minimizing that measurements variability. During the development of execution time measurement protocol, we identified several factors that disturb execution time measurement. We introduce successive refinements to the protocol by addressing each of these factors, in concert, reducing variability by more than an order of magnitude. We also introduce a new visualization technique, what we term ‘dual‐execution scatter plot’ that highlights infrequent, long‐running daemons, differentiating them from frequent and/or short‐running daemons. Our empirical results show that the proposed protocol successfully achieves three major aspects—precision, accuracy, and scalability—in execution time measurement that can work for open‐source and proprietary software. Copyright

Journal of Systems Architecture | 2014

Memory efficient and scalable address mapping for flash storage devices

Young-Kyoon Suh; Bongki Moon; Alon Efrat; Jin-Soo Kim; Sang-Won Lee

Flash memory devices commonly rely upon traditional address mapping schemes such as page mapping, block mapping or a hybrid of the two. Page mapping is more flexible than block or hybrid mapping without being restricted by block boundaries. However, its mapping table tends to grow large quickly as the capacity of flash memory devices does. To overcome this limitation, we propose novel mapping schemes that are fundamentally different from the existing mapping strategies. We call these new schemes Virtual Extent Trie (VET) and Extent Mapping Tree (EMT), as they manage mapping information by treating each I/O request as an extent and by using extents as basic mapping units rather than pages or blocks. By storing extents instead of individual addresses, our extent mapping schemes consume much less memory to store mapping information and still remain as flexible as page mapping. We observed in our experiments that our schemes reduced memory consumption by up to an order of magnitude in com- parison with the traditional mapping schemes for several real world workloads. Our extent mapping schemes also scaled well with increasing address spaces by synthetic workloads. Even though the asymp- totic mapping cost of VET and EMT is higher than the O(1) time of a page mapping scheme, the amount of increased overhead was almost negligible or low enough to be hidden by an accompanying I/O operation.

2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) | 2017

SuperMan: A Novel System for Storing and Retrieving Scientific-Simulation Provenance for Efficient Job Executions on Computing Clusters

Young-Kyoon Suh; Jin Ma

Compute-intensive simulations typically chargesubstantial workloads on an online simulation platform backedby limited computing clusters and storage resources. Some (ormost) of the simulations initiated by users may accompany inputparameters/files that have been already provided by other (orsame) users in the past. Unfortunately, these duplicatesimulations may aggravate the performance of the platform bydrastic consumption of the limited resources shared by a numberof users on the platform. To minimize or avoid conductingrepeated simulations, we present a novel system, calledSUPERMAN (SimUlation ProvEnance Recycling MANager) thatcan record simulation provenances and recycle the results of pastsimulations. This system presents a great opportunity to not onlyreutilize existing results but also perform various analyticshelpful for those who are not familiar with the platform. Thesystem also offers interoperability across other systems bycollecting the provenances in a standardized format. In oursimulated experiments we found that over half of past computingjobs could be answered without actual executions by our system.

international conference on e science | 2007

Application Parameter Description Scheme for Multiple Job Generation in Problem Solving Environment

Byungsang Kim; Dukyun Nam; Young-Kyoon Suh; June Hawk Lee; Kumwon Cho; Soonwook Hwang

In e-science environments, scientists need to execute a scientific application with various parameters multiple times to simulate and experiment complicated problems on the grid. For this, they should write every single job description with distinct parameters even if this is a troublesome task. To provide the flexibility and adaptability for parameter study, we propose an application parameter description language (APDL) and a service oriented parameter study scheme, called a parametric study service (PSS), for parameterized simulations on the grid. The APDL extends the job submission description language (JSDL) to generate parameters for multiple jobs. The proposed PSS provides a unified interface to submit jobs into various middleware platforms such as gLite, Globus, etc. The problem solving environment (PSE) assists a parameter study for their applications and every research fields tend to construct individual own PSE., The proposed PSS can be easily adapted into the specific PSE because of being implemented as Web services. In practice, we apply the APDL and the PSS into aerospace research PSE which carry out the three-dimensional turbulent analysis for compressible flow.

International Journal of Data Mining and Bioinformatics | 2017

Development of a simulation result management and prediction system using machine learning techniques

Ki-Yong Lee; Young-Kyoon Suh; Kum Won Cho

Simulations are widely used in various fields of computational science and engineering. As IT technology advances, the complexity and accuracy requirements of the simulations are increasingly rising up, accordingly escalating their execution cost as well. Nevertheless, it appears that the community has not yet paid much attention to the reuse of previously obtained simulation results to improve the performance of the execution of later requested simulations. In this regard, we propose a novel simulation service system that can utilise the results of previously executed simulations and thus improve the performance of later simulations. The proposed system can not only convert completed simulation results into a standard form and then store them into a NoSQL database for efficient retrieval, but also predict the result of a requested simulation using machine learning techniques without actual simulations. We demonstrate that the proposed system achieved very low error prediction rates only up to 7.4% from 0.9%.

Proceedings of the Sixth International Conference on Emerging Databases | 2016

Design and implementation of a data-driven simulation service system

Ki-Yong Lee; YoonJae Shin; YeonJeong Choe; SeonJeong Kim; Young-Kyoon Suh; Jeong Hwan Sa; Kum Won Cho

Computer simulations are widely used in various fields of science and engineering, including computational fluid dynamics. As the demand for the accuracy and quality of simulations grows, the cost of executing simulations itself is also rapidly increasing. However, until now, the reuse of previously obtained simulation results to improve the execution of later simulations has not been much investigated yet. In this paper, we design and implement a simulation service system, which executes requested simulations and returns the result back to the user. More importantly, our simulation service system is data-driven in the sense that the system utilizes previously obtained simulation results to improve the execution of later simulations. Furthermore, our system provides the ability to predict the result of a requested simulation using statistical machine learning techniques on the previous simulation results. This allows the user to roughly estimate the result of her requested simulation without actually executing it. Using our developed system, scientists and engineers can share simulation results and obtain simulation results very quickly without executing simulations redundantly, resulting in a lower overhead of the simulation service system.

conference on information and knowledge management | 2012

A new tool for multi-level partitioning in teradata

Young-Kyoon Suh; Ahmad Ghazal; Alain Crolotte; Pekka Kostamaa

This paper introduces a new tool that recommends an optimized partitioning solution called Multi-Level Partitioned Primary Index (MLPPI) for a fact table based on the queries in the workload. The tool implements a new technique using a greedy algorithm for search space enumeration. The space is driven by predicates in the queries. This technique fits very well the Teradata MLPPI scheme, as it is based on a general framework using general expressions, ranges and case expressions for partition definitions. The cost model implemented in the tool is based on the Teradata optimizer, and it is used to prune the search space for reaching a final solution. The tool resides completely on the client, and interfaces the database through APIs as opposed to previous work that requires optimizer code extension. The APIs are used to simplify the workload queries, and to capture fact table predicates and costs necessary to make the recommendation. The predicate-driven method implemented by the tool is general, and it can be applied to any clustering or partitioning scheme based on simple field expressions or complex SQL predicates. Experimental results given a particular workload will show that the recommendation from the tool outperforms a human expert. The experiments also show that the solution is scalable both with the workload complexity and the size of the fact table.

Explore More