Heechul Yun
University of Kansas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Heechul Yun.
euromicro conference on real-time systems | 2012
Heechul Yun; Gang Yao; Rodolfo Pellizzoni; Marco Caccamo; Lui Sha
Shared resource access interference, particularly memory and system bus, is a big challenge in designing predictable real-time systems because its worst case behavior can significantly differ. In this paper, we propose a software based memory throttling mechanism to explicitly control the memory interference. We developed analytic solutions to compute proper throttling parameters that satisfy schedulability of critical tasks while minimize performance impact caused by throttling. We implemented the mechanism in Linux kernel and evaluated isolation guarantee and overall performance impact using a set of synthetic and real applications.
real time technology and applications symposium | 2013
Heechul Yun; Gang Yao; Rodolfo Pellizzoni; Marco Caccamo; Lui Sha
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard distinguishes memory bandwidth as two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each cores reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput.
real time technology and applications symposium | 2014
Heechul Yun; Renato Mancuso; Zheng Pei Wu; Rodolfo Pellizzoni
DRAM consists of multiple resources called banks that can be accessed in parallel and independently maintain state information. In Commercial Off-The-Shelf (COTS) multicore platforms, banks are typically shared among all cores, even though programs running on the cores do not share memory space. In this situation, memory performance is highly unpredictable due to contention in the shared banks. In this paper, we propose PALLOC, a DRAM bank-aware memory allocator which exploits the page-based virtual memory system to allocate memory pages of each application to specific banks. With PALLOC, we can dynamically partition banks to avoid bank sharing among cores, thereby improving isolation on COTS multicore platforms without requiring any special hardware support. We performed an extensive set of experiments to investigate the performance impact of DRAM bank partitioning on two COTS multicore platforms with a set of synthetic and SPEC2006 benchmarks. Our evaluation results demonstrate that DRAM bank partitioning significantly improves isolation and real-time performance.
international conference on cyber-physical systems | 2010
Cheolgi Kim; Mu Sun; Sibin Mohan; Heechul Yun; Lui Sha; Tarek F. Abdelzaher
There exists a growing need for automated interoperability among medical devices in modern healthcare systems. This requirement is not just for convenience, but to prevent the possibility of errors due to the complexity of interactions between the devices and human operators. Hence, a system supporting such interoperability is supposed to provide the means to interconnect distributed medial devices in an open space, so must be designed to account for network failures. In this paper, we introduce a generic framework, the Network-Aware Supervisory System (NASS) to integrate medical devices into such a clinical interoperability system that uses real networks. It provides a development environment, in which medical-device supervisory logic can be developed based on the assumptions of an ideal, robust network. A case study shows that the NASS framework provides the same procedural effectiveness as the original logic based on the ideal network model but with protection against real-world network failures.
euromicro conference on real-time systems | 2015
Heechul Yun; Rodolfo Pellizzon; Prathap Kumar Valsan
In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay experienced by running tasks on the platform. In this paper, we present a new parallelism-aware worst-case memory interference delay analysis for COTS multicore systems. The analysis considers a COTS processor that can generate multiple outstanding requests and a COTS DRAM controller that has a separate read and write request buffer, prioritizes reads over writes, and supports out-of-order request processing. Focusing on LLC and DRAM bank partitioned systems, our analysis computes worst-case upper bounds on memory-interference delays, caused by competing memory requests. We validate our analysis on a Gem5 full-system simulator modeling a realistic COTS multicore platform, with a set of carefully designed synthetic benchmarks as well as SPEC2006benchmarks. The evaluation results show that our analysis produces safe upper bounds in all tested benchmarks, while the current state-of-the-art analysis significantly under-estimates the delays.
real time technology and applications symposium | 2016
Prathap Kumar Valsan; Heechul Yun; Farzad Farshchi
In this paper, we show that cache partitioning does not necessarily ensure predictable cache performance in modern COTS multicore platforms that use non-blocking caches to exploit memory- level-parallelism (MLP). Through carefully designed experiments using three real COTS multicore platforms (four distinct CPU architectures) and a cycle- accurate full system simulator, we show that special hardware registers in non-blocking caches, known as Miss Status Holding Registers (MSHRs), which track the status of outstanding cache-misses, can be a significant source of contention; we observe up to 21X WCET increase in a real COTS multicore platform due to MSHR contention. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core MLP by the OS. Using the hardware extension, the OS scheduler then globally controls each cores MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle- accurate fullsystem simulator and the scheduler modification in Linux 3.14 kernel. We evaluate the effectiveness of our approach using a set of synthetic and macro benchmarks. In a case study, we achieve up to 19% WCET reduction (average: 13%) for a set of EEMBC benchmarks compared to a baseline cache partitioning setup.
Real-time Systems | 2011
Heechul Yun; Po-Liang Wu; Anshu Arya; Cheolgi Kim; Tarek F. Abdelzaher; Lui Sha
Most dynamic voltage and frequency scaling (DVS) techniques adjust only CPU parameters; however, recent embedded systems provide multiple adjustable clocks which can be independently tuned. When considering multiple components, energy optimal frequencies depend on task set characteristics such as the number of CPU and memory access cycles. In this work, we propose a realistic energy model considering multiple components with individually adjustable frequencies such as CPUs, system bus and memory, and related task set characteristics. The model is validated on a real platform and shows less than 2% relative error compared to measured values. Based on the proposed energy model, we present an optimal static frequency assignment scheme for multiple DVS components to schedule a set of periodic real-time tasks. We simulate the energy gain of the proposed scheme compared to other DVS schemes for various task and system configurations, showing up to a 20% energy reduction. We also experimentally verify energy savings of the proposed scheme on a real hardware platform.
euromicro conference on real-time systems | 2015
Renato Mancuso; Rodolfo Pellizzoni; Marco Caccamo; Lui Sha; Heechul Yun
Multi-core platforms represent the answer of the industry to the increasing demand for computational capabilities. From a real-time perspective, however, the inherent sharing of resources, such as memory subsystem and I/O channels, creates inter-core timing interference among critical tasks and applications deployed on different cores. As a result, modular per-core certification cannot be performed, meaning that: (1) current industrial engineering processes cannot be reused, (2) software developed and certified for single-core chips cannot be deployed on multi-core platforms as is. In this work, we propose the Single Core Equivalence (SCE) technology: a framework of OS-level techniques designed for commercial (COTS) architectures that exports a set of equivalent single-core virtual machines from a multi-core platform. This allows per-core schedulability results to be calculated in isolation and to hold when multiple cores of the system run in parallel. Thus, SCE allows each core of a multi-core chip to be considered as a conventional single-core chip, ultimately enabling industry to reuse existing software, schedulability analysis methodologies and engineering processes.
cluster computing and the grid | 2001
Heechul Yun; Sang-Kwon Lee; Joonwon Lee; Seungryoul Maeng
Home-based lazy release consistency (HLRC) shows poor performance on lock based applications because of two reasons: a whole page is fetched on a page fault while actual modification is much smaller; and a home is at the fixed location while the access pattern is migratory. We present an efficient lock protocol for HLRC. In this protocol, the pages that are expected to be used by the acquirer are selectively updated using diffs. The diff accumulation problem is minimized by limiting the size of diffs to be sent for each page. Our protocol reduces the number of page faults inside critical sections because pages can be updated by applying locally stored diffs. This reduction yields the reduction of average lock waiting time and the reduction of message amount. The experiment with five applications shows that our protocol archives 2%-40% speedup against base HLRC for four applications.
international conference on cyber physical systems | 2015
Prathap Kumar Valsan; Heechul Yun
Commercial-Off-The-Shelf (COTS) DRAM controllers are optimized for high memory throughput, but they do not provide predictable timing among memory requests from different cores in multicore systems. Therefore, memory requests from a critical real-time task on one core can be substantially delayed by memory requests from on-real-time tasks on the other cores. In this work, we propose a DRAM controller design, called MEDUSA, to provide predictable memory performance in multicore based real-time systems. MEDUSA can provide high time predictability when needed for real-time tasks but also strive to provide high average performance for non-real-time tasks through a close collaboration between the OSand the DRAM controller. In our approach, the OS partially partitions DRAM banks into two groups: reserved banks and shared banks. The reserved banks are exclusive to each core to provide predictable timing while the shared banks are shared by all cores to efficiently utilize the resources. MEDUSA has two separate queues for read and write requests, and it prioritizes reads over writes. In processing read requests, MEDUSA employs a two-level scheduling algorithm that prioritizes the memory requests to the reserved banks in a Round Robin fashion to provide strong timing predictability. In processing write requests, MEDUSA largely relies on the FR-FCFS for high throughput but makes an immediate switch to read upon arrival of read requests to the reserved banks. We implemented MEDUSA in a Gem5 full-system simulator and a Linux kernel and performed experiments using a set of synthetic and SPEC2006 benchmarks to analyze the performance impact of MEDUSA on both real-time and non-real-time tasks. The results show that MEDUSA achieves up to 95% better worst-case performance for real-time tasks while achieving up to 31% throughput improvement for non-real-time tasks.