Is this you? Create Your Porfile

Junghee Lee

University of Texas at San Antonio

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junghee Lee is active.

Explore More

Publication

Featured researches published by Junghee Lee.

international symposium on performance analysis of systems and software | 2011

A semi-preemptive garbage collector for solid state drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Feiyi Wang; Jongman Kim

NAND flash memory is a preferred storage media for various platforms ranging from embedded systems to enterprise-scale systems. Flash devices do not have any mechanical moving parts and provide low-latency access. They also require less power compared to rotating media. Unlike hard disks, flash devices use out-of-update operations and they require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semi-preemptive GC scheme that can preempt on-going GC processing and service pending I/O requests in the queue. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-preemptive GC sheme with realistic workloads demonstrate both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-preemptive GC scheme.

asia and south pacific design automation conference | 2003

Memory access pattern analysis and stream cache design for multimedia applications

Junghee Lee; Chanik Park; Soonhoi Ha

Memory system is a major performance and power bottleneck in embedded systems especially for multimedia applications. Most multimedia applications access stream type of data structures with regular access patterns. It is observed that conventional caches behave poorly for stream-type data structure. Therefore, prediction-based prefetching techniques have been extensively researched to exploit the regular access patterns. Prefetching, however, may pollute the cache if the prediction is not accurate and needs extra hardware prediction logic. To overcome these problems, we propose a novel hardware prefetching technique that is assisted by static analysis of data access pattern with stream caches. With the proposed stream cache architecture, we could achieve significant performance improvement compared with the conventional cache architecture.

high-performance computer architecture | 2013

ECM: Effective Capacity Maximizer for high-performance compressed caching

Seungcheol Baek; Hyung Gyu Lee; Chrysostomos Nicopoulos; Junghee Lee; Jongman Kim

Compressed Last-Level Cache (LLC) architectures have been proposed to enhance system performance by efficiently increasing the effective capacity of the cache, without physically increasing the cache size. In a compressed cache, the cacheline size varies depending on the achieved compression ratio. We observe that this size information gives a useful hint when selecting a victim, which can lead to increased cache performance. However, no replacement policy tailored to compressed LLCs has been investigated so far. This paper introduces the notion of size-aware compressed cache management as a way to maximize the performance of compressed caches. Toward this end, the Effective Capacity Maximizer (ECM) scheme is introduced, which targets compressed LLCs. The proposed mechanism revolves around three fundamental principles: Size-Aware Insertion (SAI), a Dynamically Adjustable Threshold Scheme (DATS), and Size-Aware Replacement (SAR). By adjusting the eviction criteria, based on the compressed data size, one may increase the effective cache capacity and minimize the miss penalty. Extensive simulations with memory traces from real applications running on a full-system simulator demonstrate significant improvements compared to compressed cache schemes employing the conventional Least-Recently Used (LRU) and Dynamic Re-Reference Interval Prediction (DRRIP) [11] replacement policies. Specifically, ECM shows an average effective capacity increase of 15% over LRU and 18.8% over DRRIP, an average cache miss reduction of 9.4% over LRU and 3.9% over DRRIP, and an average system performance improvement of 6.2% over LRU and 3.3% over DRRIP.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013

Preemptible I/O Scheduling of Garbage Collection for Solid State Drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Jongman Kim

Unlike hard disks, flash devices use out-of-place updates operations and require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in an I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semipreemptible GC (PGC) scheme that allows GC processing to be preempted while pending I/O requests in the queue are serviced. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-PGC scheme with realistic workloads demonstrates both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-PGC scheme. In addition, we explore opportunities of a new NAND flash device that supports suspend/resume commands for read, write, and erase operations for fully PGC (F-PGC). Our experiments with an F-PGC enabled flash device show that request response time can be improved by up to 14.57% compared to semi-PGC.

ieee computer society annual symposium on vlsi | 2013

Do we need wide flits in Networks-on-Chip?

Junghee Lee; Chrysostomos Nicopoulos; Sung Joo Park; Madhavan Swaminathan; Jongman Kim

Packet-based Networks-on-Chip (NoC) have emerged as the most viable candidates for the interconnect backbone of future Chip Multi-Processors (CMP). The flit size (or width) is one of the fundamental design parameters within a NoC router, which affects both the performance and the cost of the network. Most studies pertaining to the NoC of general-purpose microprocessors adopt a certain flit width without any reasoning or explanation. In fact, it is not easy to pinpoint an optimal flit size, because the flit size is intricately intertwined with various aspects of the system. This paper aims to provide a guideline on how to choose an appropriate flit width. It will be demonstrated that arbitrarily choosing a flit width without proper investigation may have serious repercussions on the overall behavior of the system.

IEEE Transactions on Computers | 2014

Coordinating Garbage Collectionfor Arrays of Solid-State Drives

Young-Jae Kim; Junghee Lee; H Sarp Oral; David A Dillow; Feiyi Wang; Galen M. Shipman

Although solid-state drives (SSDs) offer significant performance improvements over hard disk drives (HDDs) for a number of workloads, they can exhibit substantial variance in request latency and throughput as a result of garbage collection (GC). When GC conflicts with an I/O stream, the stream can make no forward progress until the GC cycle completes. GC cycles are scheduled by logic internal to the SSD based on several factors such as the pattern, frequency, and volume of write requests. When SSDs are used in a RAID with currently available technology, the lack of coordination of the SSD-local GC cycles amplifies this performance variance. We propose a global garbage collection (GGC) mechanism to improve response times and reduce performance variability for a RAID of SSDs. We include a high-level design of SSD-aware RAID controller and GGC-capable SSD devices and algorithms to coordinate the GGC cycles. We develop reactive and proactive GC coordination algorithms and evaluate their I/O performance and block erase counts for various workloads. Our simulations show that GC coordination by a reactive scheme improves average response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write-dominated workloads, response time was improved by 69 percent and performance variability was reduced by 71 percent. We show that a proactive GC coordination algorithm can further improve the I/O response times by up to 9 percent and the performance variability by up to 15 percent. We also observe that it could increase the lifetimes of SSDs with some workloads (e.g., Financial) by reducing the number of block erase counts by up to 79 percent relative to a reactive algorithm for write-dominant enterprise workloads.

ieee conference on mass storage systems and technologies | 2011

Harmonia: A globally coordinated garbage collector for arrays of Solid-State Drives

Young-Jae Kim; H Sarp Oral; Galen M. Shipman; Junghee Lee; David A Dillow; Feiyi Wang

Solid-State Drives (SSDs) offer significant performance improvements over hard disk drives (HDD) on a number of workloads. The frequency of garbage collection (GC) activity is directly correlated with the pattern, frequency, and volume of write requests, and scheduling of GC is controlled by logic internal to the SSD. SSDs can exhibit significant performance degradations when garbage collection (GC) conflicts with an ongoing I/O request stream. When using SSDs in a RAID array, the lack of coordination of the local GC processes amplifies these performance degradations. No RAID controller or SSD available today has the technology to overcome this limitation. This paper presents Harmonia, a Global Garbage Collection (GGC) mechanism to improve response times and reduce performance variability for a RAID array of SSDs. Our proposal includes a high-level design of SSD-aware RAID controller and GGC-capable SSD devices, as well as algorithms to coordinate the global GC cycles. Our simulations show that this design improves response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write dominant workloads response time was improved by 69% while performance variability was reduced by 71%.

great lakes symposium on vlsi | 2016

A Low-Power Network-on-Chip Architecture for Tile-based Chip Multi-Processors

Anastasios Psarras; Junghee Lee; Pavlos M. Mattheakis; Chrysostomos Nicopoulos; Giorgos Dimitrakopoulos

Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.

IEEE Transactions on Very Large Scale Integration Systems | 2013

IsoNet: Hardware-Based Job Queue Management for Many-Core Architectures

Junghee Lee; Chrysostomos Nicopoulos; Hyung Gyu Lee; Shreepad Panth; Sung Kyu Lim; Jongman Kim

Imbalanced distribution of workloads across a chip multiprocessor (CMP) constitutes wasteful use of resources. Most existing load distribution and balancing techniques employ very limited hardware support and rely predominantly on software for their operation. This paper introduces IsoNet, a hardware-based conflict-free dynamic load distribution and balancing engine. IsoNet is a lightweight job queue manager responsible for administering the list of jobs to be executed, and maintaining load balance among all CMP cores. By exploiting a micro-network of load-balancing modules, the proposed mechanism is shown to effectively reinforce concurrent computation in many-core environments. Detailed evaluation using a full-system simulation framework indicates that IsoNet significantly outperforms existing techniques and scales efficiently to as many as 1024 cores. Furthermore, to assess its feasibility, the IsoNet design is synthesized, placed, and routed in 45-nm VLSI technology. Analysis of the resulting low-level implementation shows that IsoNets area and power overhead are almost negligible.

IEEE Computer Architecture Letters | 2015

Synchronous I/O Scheduling of Independent Write Caches for an Array of SSDs

Junghee Lee; Young-Jae Kim; Jongman Kim; Galen M. Shipman

Solid-state drives (SSD) offer a significant performance improvement over the hard disk drives (HDD), however, it can exhibit a significant variance in latency and throughput due to internal garbage collection (GC) process on the SSD. When the SSDs are configured in a RAID, the performance variance of individual SSDs could significantly degrade the overall performance of the RAID of SSDs. The internal cache on the RAID controller can help mitigate the performance variability issues of SSDs in the array; however, the state-of-the-art cache algorithm of the RAID controller does not consider the characteristics of SSDs. In this paper, we examine the most recent write cache algorithm for the array of disks, and propose a synchronous independent write cache (SIW) algorithm. We also present a pre-parity-computation technique for the RAID of SSDs with parity computations, which calculates parities of blocks in advance before they are stored in the write cache. With this new technique, we propose a complete paradigm shift in the design of write cache. In our evaluation study, large write requests dominant workloads show up to about 50 and 20 percent improvements in average response times on RAID-0 and RAID-5 respectively as compared to the state-of-the-art write cache algorithm.

Explore More