Sung-Soo Lim
Kookmin University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sung-Soo Lim.
international conference on hardware/software codesign and system synthesis | 2011
Luis Angel D. Bathen; Nikil D. Dutt; Dongyoun Shin; Sung-Soo Lim
Emerging multicore platforms are increasingly deploying distributed scratchpad memories to achieve lower energy and area together with higher predictability; but this requires transparent and efficient software management of these critical resources. In this paper, we introduce SPMVisor, a hardware/software layer that virtualizes the scratchpad memory space in order to facilitate the use of distributed SPMs in an efficient, transparent and secure manner. We introduce the notion of virtual scratchpad memories (vSPMs), which can be dynamically created and managed as regular SPMs. To protect the on-chip memory space, the SP-MVisor supports vSPM-level and block-level access control lists. In order to efficiently manage the on-chip real-estate, our SPMVisor supports policy-driven allocation strategies based on privilege levels. Our experimental results on Me-diabench/CHStone benchmarks running on various Chip-Multiprocessor configurations and software stacks (RTOS, virtualization, secure execution) show that SPMVisor enhances performance by 71% on average and reduces power consumption by 79% on average.
embedded systems for real time multimedia | 2014
Jurn-Gyu Park; Chen-Ying Hsieh; Nikil D. Dutt; Sung-Soo Lim
Contemporary mobile platforms use mobile GPUs for graphics-intensive applications, and deploy proprietary Dynamic Voltage Frequency Scaling (DVFS) policies in an attempt to save energy without sacrificing quality. However, there have been no previous systematic studies to correlate the performance, power, and energy efficiency of mobile GPUs based on diverse graphics workloads to enable more efficient mobile platform DVFS policies for energy savings. For the first time we present a study of mobile GPU graphics workload characterization for DVFS design considering user experience and energy efficiency on a real smart-phone. We develop micro-benchmarks that stress specific stages of the graphics pipeline separately, and study the relationship between varying graphics workloads and resulting energy and performance of different mobile graphics pipeline stages. We use these results to outline opportunities for more efficient, integrated DVFS policies across the mobile GPU, memory and CPU hardware components for saving energy without sacrificing user experience. Our experimental results on the Nexus 4 smartphone show that it is important to characterize GPU hardware and graphics workloads accurately in order to achieve increased energy efficiency without degradation in graphics performance for better user experience. We believe that our observations and results will enable more energy-efficient DVFS algorithms for mobile graphics rendering in the face of rapidly changing mobile GPU architectures.
embedded software | 2007
Changhee Jung; Duk-Kyun Woo; Kanghee Kim; Sung-Soo Lim
Application launching times in embedded systems are more crucial than in general-purpose systems since the response times of embedded applications are significantly affected by the launching times. As general-purpose operating systems are increasingly used in embedded systems, reducing appli-cation launching times are one of the most influential factors for performance improvement. In order to reduce the application launching times, three factors should be considered at the same time: relocation time, symbol resolution time, and binary loading time. In this paper, we propose a new application execution model using a combination of prelinking and preloading to reduce the relocation, symbol resolution, and binary load overheads at the same time. Such application execution model is realized using fork and dlopen execution model instead of traditional fork and exec execution model. We evaluate the performance effect of the proposed fork and dlopen application execution model on a Linux-based embedded system using XScale processor. By applying the proposed application execution model using both prelinking and preloading, the application launching times are reduced up to 71% and relocation counts are reduced up to 91% in the benchmark programs we used.
acm symposium on applied computing | 2016
Jurn-Gyu Park; Chen-Ying Hsieh; Nikil D. Dutt; Sung-Soo Lim
Mobile platforms are increasingly using Heterogeneous Multi-Processor Systems-on-Chip (HMPSoCs) with differentiated processing cores and GPUs to achieve high performance for graphics-intensive applications such as mobile games. Traditionally, separate CPU and GPU governors are deployed in order to achieve energy efficiency through Dynamic Voltage Frequency Scaling (DVFS), but miss opportunities for further energy savings through coordinated system-level application of DVFS. We present Co-Cap, a cooperative CPU-GPU DVFS strategy that orchestrates energy-efficient CPU and GPU DVFS through coordinated CPU and GPU frequency capping to avoid frequency over-provisioning while maintaining desired performance. Unlike traditional approaches that target a narrow set of mobile games, our Co-Cap approach is applicable across a wide range of mobile games. Our methodology deploys a training phase followed by a deployment phase, allowing not only deployment across a wide range of mobile games with varying graphics workloads, but also across new mobile architectural platforms. Our experimental results across a large set of over 70 mobile games show that Co-Cap improves energy per frame by 10.6% and 10.0% (23.1% and 19.1% in CPU dominant applications) on average and achieves minimal frames per second (FPS) loss by 0.5% and 0.7% (1.3% and 1.7% in CPU dominant applications) on average in training- and deployment sets, respectively, compared to the default CPU and GPU governors, with negligible overhead in execution time and power consumption on the ODROID-XU3 platform.
embedded systems for real time multimedia | 2015
Chen-Ying Hsieh; Jurn-Gyu Park; Nikil D. Dutt; Sung-Soo Lim
Modern mobile heterogeneous platforms have GPUs integrated with multicore processors to enable execution of highend graphics-intensive games. However, these gaming applications consume significant power due to heavy utilization of CPU-GPU resources, which drains battery resources that are critical for mobile devices. While Dynamic Voltage and Frequency Scaling (DVFS) techniques have been exploited previously for dynamic power management, contemporary techniques do not fully exploit the memory access footprint for graphics-intensive gaming applications, missing opportunities for energy efficiency. In this paper, we for the first time propose a memory-aware cooperative CPU-GPU DVFS governor that considers both the memory access footprint as well as the CPU/GPU frequency to improve energy efficiency of high-end mobile game workloads. Our experimental results show that our proposed game governor achieves on average 13% and 5% improvement of energy efficiency with minor degradation of performance compared to default governors and state-of-the-art game governors.
Proceedings of the Workshop on Embedded Systems Security | 2011
Daeyoung Hong; Luis Angel D. Bathen; Sung-Soo Lim; Nikil D. Dutt
Todays embedded systems are often used to access, store, manipulate, and communicate sensitive data. Embedded system security risks are exacerbated by emerging trends (e.g., network connectivity, application download service, migration to multiprocessors). To preserve data confidentiality, various memory encryption schemes have been proposed, however, the overhead of encryption and decryption operations that precede memory access are very high and can lead to significant performance degradation, particularly for embedded systems. In this paper, we propose DynaPoMP, a novel dynamic policy-driven scratchpad memory allocation methodology that ensures data confidentiality while minimizing the memory access latency overhead. We define three allocation policies to ensure confidentiality of sensitive data. The first policy, called SensitivityFirst, retains sensitive data in trusted on-chip SPM as long as possible, thereby minimizing the number of encryption/decryption operations due to off-chip memory accesses. The second policy, called AccessFirst, protects data mapped to off-chip memory via selective encryption/decryption, while mapping data sets with highest utilization to on-chip memory space and reducing number of off-chip memory accesses. Finally, the third policy, referred to as Hybrid, trades-off space given to sensitive data and non-sensitive data, with the goal of reducing the execution time of the given application. Our results on a set of security-enhanced embedded benchmarks from Mediabench II show that DynaPoMP reduces the total latency by up to 42.82% when compared to conventional dynamic scratchpad allocation schemes without considering encryption latency.
international symposium on low power electronics and design | 2016
Jurn-Gyu Park; Nikil D. Dutt; Hoyeonjiki Kim; Sung-Soo Lim
Contemporary mobile platforms use software governors to achieve high performance with energy-efficiency for heterogeneous CPU-GPU based architectures that execute mobile games and other graphics-intensive applications. Mobile games typically exhibit inherent behavioral dynamism, which existing governor policies are unable to exploit effectively to manage CPU/GPU DVFS policies. To overcome this problem, we present HiCAP: a Hierarchical Finite State Machine (HFSM) based CPU-GPU governor that models the dynamic behavior of mobile gaming workloads, and applies a cooperative, dynamic CPU-GPU frequency-capping policy to yield energy efficiency adapting to the games inherent dynamism. Our experiments on a large set of 37 mobile games exhibiting dynamic behavior show that our CAP dynamic governor policy achieved substantial energy efficiency gains of up to 18% improvement in energy-per-frame over existing governor policies, with minimal degradation in quality.
The Kips Transactions:partc | 2005
Ho-Young Hwang; Sung-Soo Lim
This paper studies network recovery methods for WDM optical mesh networks, concentrating on improving spare resource utilization. The resource efficiency can be obtained by sharing spare resources needed for network recovery. To improve the sharability of spare resources in WDM networks, methods to share backup paths us well as spare capacity should be studied. The proposed method in this paper uses multiple ring-covers and this method provides fast and simple recovery operation by exploiting the characteristics of logical ring topology, and also provides efficient resource utilization by using multiple distributed backup paths to improve the sharability of overall spare resources in the networks. This method can provide layered reliability to network service by enabling hierarchical robustness against multiple failures. The performance results show that the proposed method provide improved resource efficiency for single failure and enhanced robustness for multiple failures.
embedded systems for real time multimedia | 2011
Namseung Lee; Sung-Soo Lim
As the products based on Android platform have been widely spread in consumer electronics market, the needs for systematic performance analysis have significantly increased. Conventional approaches rely on publicly open performance analysis tools in Android SDK or Linux community such as DDMS (Dalvik Debug Monitor Server), LTTng, Oprofile, and Ftrace. Though the approaches provide analysis or measurement results in certain aspects and specific software layers, any methods do not give a whole software layer view in performance analysis. For example, once a method in an Android application turned out to be a performance bottleneck, it is very hard to locate the code fragments that actually caused the bottleneck in the whole software layers: the application codes do not provide direct reason for the bottleneck, but the underlying native layers including kernel events often cause the bottleneck.
international conference software and computer applications | 2017
Joohyun Kyong; Jinwoo Jeon; Sung-Soo Lim
We propose an Apache Spark-based scale-up server architecture using Docker container-based partitioning method to improve performance scalability. The performance scalability problem of Apache Spark-based scale-up servers is due to garbage collection(GC) and remote memory access overheads when the servers are equipped with significant number of cores and Non-Uniform Memory Access(NUMA). The proposed method minimizes the problems using Docker container-based architecture effectively partitioning the original scale-up server into small logical servers. Our evaluation study based on benchmark programs revealed that the partitioning method showed performance improvement by ranging from 1.1x through 1.7x on a 120 core scale-up system. Our proof-of-concept scale-up server architecture provides the basis towards complete and practical design of partitioning-based scale-up servers showing performance scalability.