Yeseong Kim
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yeseong Kim.
international conference on computer aided design | 2015
Yeseong Kim; Mohsen Imani; Shruti Patil; Tajana Simunic Rosing
Mobile devices are severely limited in memory, which affects critical user-experience metrics such as application service time. Emerging non-volatile memory (NVM) technologies such as STT-RAM and PCM are ideal candidates to provide higher memory capacity with negligible energy overhead. However, existing memory management systems overlook mobile users application usage which provides crucial cues for improving user experience. In this paper, we propose CAUSE, a novel memory system based on DRAM-NVM hybrid memory architecture. CAUSE takes explicit account of the application usage patterns to distinguish data criticality and identify suitable swap candidates. We also devise NVM hardware design optimized for the access characteristics of the swapped pages. We evaluate CAUSE on a real Android smartphone and NVSim simulator using user application usage logs. Our experimental results show that the proposed technique achieves 32% faster launch time for mobile applications while reducing energy cost by 90% and 44% on average over non-optimized STT-RAM and PCM, respectively.
international symposium on low power electronics and design | 2016
Mohsen Imani; Yeseong Kim; Abbas Rahimi; Tajana Simunic Rosing
The Internet of Things (IoT) dramatically increases the amount of data to be processed for many applications including multimedia. Unlike traditional computing environment, the workload of IoT significantly varies overtime. Thus, an efficient runtime profiling is required to extract highly frequent computations and pre-store them for memory-based computing. In this paper, we propose an approximate computing technique using a low-cost adaptive associative memory, named ACAM, which utilizes runtime learning and profiling. To recognize the temporal locality of data in real-world applications, our design exploits a reinforcement learning algorithm with a least recently use (LRU) strategy to select images to be profiled; the profiler is implemented using an approximate concurrent state machine. The profiling results are then stored into ACAM for computation reuse. Since the selected images represent the observed input dataset, we can avoid redundant computations thanks to high hit rates displayed in the associative memory. We evaluate ACAM on the recent AMD Southern Island GPU architecture, and the experimental results shows that the proposed design achieves by 34.7% energy saving for image processing applications with an acceptable quality of service (i.e., PSNR>30dB).
ACM Transactions in Embedded Computing Systems | 2014
Wook Song; Yeseong Kim; Hakbong Kim; Jehun Lim; Jihong Kim
As a highly personalized computing device, smartphones present a unique new opportunity for system optimization. For example, it is widely observed that a smartphone user exhibits very regular application usage patterns (although different users are quite different in their usage patterns). User-specific high-level app usage information, when properly managed, can provide valuable hints for optimizing various system design requirements. In this article, we describe the design and implementation of a personalized optimization framework for the Android platform that takes advantage of users application usage patterns in optimizing the performance of the Android platform. Our optimization framework consists of two main components, the application usage modeling module and the usage model-based optimization module. We have developed two novel application usage models that correctly capture typical smartphone users application usage patterns. Based on the application usage models, we have implemented an app-launching experience optimization technique which tries to minimize user-perceived delays, extra energy consumption, and state loss when a user launches apps. Our experimental results on the Nexus S Android reference phones show that our proposed optimization technique can avoid unnecessary application restarts by up to 78.4% over the default LRU-based policy of the Android platform.
2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA) | 2016
Mohsen Imani; Abbas Rahimi; Yeseong Kim; Tajana Simunic Rosing
Modern microprocessors have increased the word width to 64-bits to support larger main memory sizes. It has been observed that data can often be represented by relatively few bits, so-called narrow-width values. To leverage narrow-width data, we propose a hybrid cache architecture composed of magnetic RAM (MRAM) and SRAM to save the upper and lower 32-bits of each word in MRAM and SRAM respectively. To address write performance issue of MRAM, we propose an optimal dynamic write buffer (DWB) allocation mechanism. To enhance efficacy of our hybrid cache in the absence of narrow-width values, we propose a double row write (DRW) technique that adaptively partitions non-narrow data to two 32-bit pieces for consecutive row writes in the SRAM part. DWB and DRW jointly guarantee the performance of the proposed hybrid cache and balance a tradeoff between the buffer size and the number of double row writes. Our evaluation on SPEC CPU2000, SPEC CPU2006 and Mibench benchmarks shows that our hybrid cache can achieve up to 46% power and 24% area savings at the same performance as the conventional SRAM cache.
design, automation, and test in europe | 2017
Mohsen Imani; Daniel Peroni; Yeseong Kim; Abbas Rahimi; Tajana Simunic Rosing
Recently, neural networks have been demonstrated to be effective models for image processing, video segmentation, speech recognition, computer vision and gaming. However, high energy computation and low performance are the primary bottlenecks of running the neural networks. In this paper, we propose an energy/performance-efficient network acceleration technique on General Purpose GPU (GPGPU) architecture which utilizes specialized resistive nearest content addressable memory blocks, called NNCAM, by exploiting computation locality of the learning algorithms. NNCAM stores highly frequent patterns corresponding to neural network operations and searches for the most similar patterns to reuse the computation results. To improve NNCAM computation efficiency and accuracy, we proposed layer-based associative update and selective approximation techniques. The layer-based update improves data locality of NNCAM blocks by filling NNCAM values based on the frequent computation patterns of each neural network layer. To guarantee the appropriate level of computation accuracy while providing maximum energy saving, our design adaptively allocates the neural network operations to either NNCAM or GPGPU floating point units (FPUs). The selective approximation relaxes computation on neural network layers by considering the impact on accuracy. In evaluation, we integrate NNCAM blocks with the modern AMD southern Island GPU architecture. Our experimental evaluation shows that the enhanced GPGPU can result in 68% energy savings and 40% speedup running on four popular convolutional neural networks (CNN), ensuring acceptable < 2% quality loss.
ieee annual computing and communication workshop and conference | 2017
Wanlin Cui; Yeseong Kim; Tajana Simunic Rosing
With the emergence of the Internet of Things (IoT) and Big Data era, many applications are expected to assimilate a large amount of data collected from environment to extract useful information. However, how heterogeneous computing devices of IoT ecosystems can execute the data processing procedures has not been clearly explored. In this paper, we propose a framework which characterizes energy and performance requirements of the data processing applications across heterogeneous devices, from a server in the cloud and a resource-constrained gateway at edge. We focus on diverse machine learning algorithms which are key procedures for handling the large amount of IoT data. We build analytic models which automatically identify the relationship between requirements and data in a statistical way. The proposed framework also considers network communication cost and increasing processing demand. We evaluate the proposed framework on two heterogenous devices, a Raspberry Pi and a commercial Intel server. We show that the identified models can accurately estimate performance and energy requirements with less than error of 4.8% for both platforms. Based on the models, we also evaluate whether the resource-constrained gateway can process the data more efficiently than the server in the cloud. The results present that the less-powerful device can achieve better energy and performance efficiency for more than 50% of machine learning algorithms.
IEEE Transactions on Mobile Computing | 2016
Yeseong Kim; Boyeong Jeon; Jihong Kim
The radio energy consumption takes a large portion of the total energy consumption in smartphones. However, a significant portion of radio energy is wasted in a special waiting interval, known as the tail time after a transmission is completed while waiting for a subsequent transmission. In order to reduce the wasted energy in the tail time, the fast dormancy feature allows a quick release of a radio connection in the tail time. For supporting the fast dormancy efficiently, it is important to accurately predict whether a subsequent transmission will occur in the tail time. In this paper, we show that there are strong personal characteristics on how user interacts with a radio network within the tail time. Based on these observations, we propose a novel personalized network activity-aware predictive dormancy technique, called Personalized Diapause (pD). By automatically identifying user-specific tail-time transmission characteristics for various network activities, our proposed technique takes advantages of personalized high-level network usage patterns in deciding when to release radio connections. Our experimental results using real network usage logs from 25 users show that pD can reduce the amount of the wasted tail time energy by 51 percent on average, thus saving the total radio energy consumption by 23 percent with less than 10 percent reconnection increase.
mobile computing, applications, and services | 2015
Shruti Patil; Yeseong Kim; Kunal Korgaonkar; Ibrahim Awwal; Tajana Simunic Rosing
Mobile systems leverage heterogeneous cores to deliver a desired user experience. However, how these cores cooperate in executing interactive mobile applications in the hands of a real user is unclear, preventing more realistic studies on mobile platforms. In this paper, we study how 33 users run applications on modern smartphones over a period of a month. We analyze the usage of CPUs, GPUs and associated memory operations in real user interactions, and develop microbenchmarks on an automated methodology which describes realistic and replayable test runs that statistically mimic user variations. Based on the generated test runs, we further empirically characterize memory bandwidth and power consumption of CPUs and GPUs to show the impact of user variations in the system, and identify user variation-aware optimization opportunities in actual mobile application uses.
international conference on computer aided design | 2015
Yeseong Kim; Francesco Parterna; Sameer Tilak; Tajana Simunic Rosing
international conference on computer aided design | 2017
Yeseong Kim; Mohsen Imani; Tajana Simunic Rosing