Is this you? Create Your Porfile

Jongman Kim

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jongman Kim is active.

Explore More

Publication

Featured researches published by Jongman Kim.

international symposium on computer architecture | 2007

A novel dimensionally-decomposed router for on-chip communication in 3D architectures

Jongman Kim; Chrysostomos Nicopoulos; Dongkook Park; Reetuparna Das; Yuan Xie; Vijaykrishnan Narayanan; Mazin S. Yousif; Chita R. Das

Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D) chip structures are envisioned as a viable solution to skyrocketing transistor densities and burgeoning die sizes in multi-core architectures. Partitioning a larger die into smaller segments and then stacking them in a 3D fashion can significantly reduce latency and energy consumption. Such benefits emanate from the notion that inter-wafer distances are negligible compared to intra-wafer distances. This attribute substantially reduces global wiring length in 3D chips. The work in this paper integrates the increasingly popular idea of packet-based Networks-on-Chip (NoC) into a 3D setting. While NoCs have been studied extensively in the 2D realm, the microarchitectural ramifications of moving into the third dimension have yet to be fully explored. This paper presents a detailed exploration of inter-strata communication architectures in 3D NoCs. Three design options are investigated; a simple bus-based inter-wafer connection, a hop-by-hop standard 3D design, and a full 3D crossbar implementation. In this context, we propose a novel partially-connected 3D crossbar structure, called the 3D Dimensionally-Decomposed (DimDe) Router, which provides a good tradeoff between circuit complexity and performance benefits. Simulation results using (a) a stand-alone cycle-accurate 3D NoC simulator running synthetic workloads, and (b) a hybrid 3D NoC/cache simulation environment running real commercial and scientific benchmarks, indicate that the proposed DimDe design provides latency and throughput improvements of over 20% on average over the other 3D architectures, while remaining within 5% of the full 3D crossbar performance. Furthermore, based on synthesized hardware implementations in 90 nm technology, the DimDe architecture outperforms all other designs -- including the full 3D crossbar -- by an average of 26% in terms of the Energy-Delay Product (EDP).

design automation conference | 2005

A low latency router supporting adaptivity for on-chip interconnects

Jongman Kim; Dongkook Park; Theo Theocharides; Narayanan Vijaykrishnan; Chita R. Das

The increased deployment of system-on-chip designs has drawn attention to the limitations of on-chip interconnects. As a potential solution to these limitations, networks-on-chip (NoC) have been proposed. The NoC routing algorithm significantly influences the performance and energy consumption of the chip. We propose a router architecture which utilizes adaptive routing while maintaining low latency. The two-stage pipelined architecture uses look ahead routing, speculative allocation, and optimal output path selection concurrently. The routing algorithm benefits from congestion-aware flow control, making better routing decisions. We simulate and evaluate the proposed architecture in terms of network latency and energy consumption. Our results indicate that the architecture is effective in balancing the performance and energy of NoC designs.

dependable systems and networks | 2006

Exploring Fault-Tolerant Network-on-Chip Architectures

Dongkook Park; Chrysostomos Nicopoulos; Jongman Kim; Narayanan Vijaykrishnan; Chita R. Das

The advent of deep sub-micron technology has exacerbated reliability issues in on-chip interconnects. In particular, single event upsets, such as soft errors, and hard faults are rapidly becoming a force to be reckoned with. This spiraling trend highlights the importance of detailed analysis of these reliability hazards and the incorporation of comprehensive protection measures into all network-on-chip (NoC) designs. In this paper, we examine the impact of transient failures on the reliability of on-chip interconnects and develop comprehensive counter-measures to either prevent or recover from them. In this regard, we propose several novel schemes to remedy various kinds of soft error symptoms, while keeping area and power overhead at a minimum. Our proposed solutions are architected to fully exploit the available infrastructures in an NoC and enable versatile reuse of valuable resources. The effectiveness of the proposed techniques has been validated using a cycle-accurate simulator

international symposium on performance analysis of systems and software | 2011

A semi-preemptive garbage collector for solid state drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Feiyi Wang; Jongman Kim

NAND flash memory is a preferred storage media for various platforms ranging from embedded systems to enterprise-scale systems. Flash devices do not have any mechanical moving parts and provide low-latency access. They also require less power compared to rotating media. Unlike hard disks, flash devices use out-of-update operations and they require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semi-preemptive GC scheme that can preempt on-going GC processing and service pending I/O requests in the queue. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-preemptive GC sheme with realistic workloads demonstrate both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-preemptive GC scheme.

architectures for networking and communications systems | 2005

Design and analysis of an NoC architecture from performance, reliability and energy perspective

Jongman Kim; Dongkook Park; Chrysostomos Nicopoulos; Narayanan Vijaykrishnan; Chita R. Das

Network-on-chip (NoC) architectures employing packet-based communication are being increasingly adopted in system-on-chip (SoC) designs. In addition to providing high performance, the fault-tolerance and reliability of these networks is becoming a critical issue due to several artifacts of deep sub-micron technologies. Consequently, it is important for a designer to have access to fast methods for evaluating the performance, reliability, and energy-efficiency of an on-chip network. Towards this end, first, we propose a novel path-sensitive router architecture for low-latency applications. Next, we present a queuing-theory-based model for evaluating the performance and energy behavior of on-chip networks. Then the model is used to demonstrate the effectiveness of our proposed router. The performance (average latency) and energy consumption results from the analytical model are validated with those obtained from a cycle-accurate simulator. Finally, we explore error detection and correction mechanisms that provide different energy-reliability-performance tradeoffs and extend our model to evaluate the on-chip network in the presence of these error protection schemes. Our reliability exploration culminates with the introduction of an array of transient fault protection techniques, both architectural and algorithmic, to tackle reliability issues within the routers individual hardware components. We propose a complete solution safeguarding against both the traditional link faults and internal router upsets, without incurring any significant latency, area and power overhead.

high performance interconnects | 2007

Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects

Dongkook Park; Reetuparna Das; Chrysostomos Nicopoulos; Jongman Kim; Narayanan Vijaykrishnan; Ravishankar R. Iyer; Chita R. Das

In modern multi-core system-on-chip (SoC) architectures, the design of innovative interconnection fabrics is indispensable. The concept of the network-on-chip (NoC) architecture has been proposed recently to better suit this requirement. Especially, the router architecture has a significant effect on the overall performance and energy consumption of the chip. We propose a dynamic path management scheme that exploits network traffic information during switch arbitration. Consequently, flits transferred across frequently used paths are expedited by traversing a reduced router pipeline. This technique, based on pipeline bypassing, is simulated and evaluated in terms of network latency and average power consumption. Simulation results with real-world application traces show that the architecture improves the performance up to 30% while incurring only minimal area/power overhead.

ieee computer society annual symposium on vlsi | 2012

A Compression-Based Hybrid MLC/SLC Management Technique for Phase-Change Memory Systems

Hyung Gyu Lee; Seungcheol Baek; Jongman Kim; Chrysostomos Nicopoulos

The storage density of PCM has been demonstrated to double through the employment of Multi-Level Cell (MLC) PCM arrays. However, this increase in capacity comes at the expense of increased latency (both read and write) and decreased long-term endurance, as compared to the more conventional Single-Level Cell (SLC) PCM. These negative traits of MLCs detract from the potentially invaluable storage benefits. This paper introduces a compression-based hybrid MLC/SLC PCM management technique that aims to combine the performance edge of SLCs with the higher capacity of MLCs in a hybrid environment. Our trace-driven simulations with real application workloads demonstrate that the proposed technique achieves 3.6X performance enhancement and 72% energy reduction, on average, as compared with MLC-only configurations, while always providing the same effective capacity as the MLC-only mode.

high-performance computer architecture | 2013

ECM: Effective Capacity Maximizer for high-performance compressed caching

Seungcheol Baek; Hyung Gyu Lee; Chrysostomos Nicopoulos; Junghee Lee; Jongman Kim

Compressed Last-Level Cache (LLC) architectures have been proposed to enhance system performance by efficiently increasing the effective capacity of the cache, without physically increasing the cache size. In a compressed cache, the cacheline size varies depending on the achieved compression ratio. We observe that this size information gives a useful hint when selecting a victim, which can lead to increased cache performance. However, no replacement policy tailored to compressed LLCs has been investigated so far. This paper introduces the notion of size-aware compressed cache management as a way to maximize the performance of compressed caches. Toward this end, the Effective Capacity Maximizer (ECM) scheme is introduced, which targets compressed LLCs. The proposed mechanism revolves around three fundamental principles: Size-Aware Insertion (SAI), a Dynamically Adjustable Threshold Scheme (DATS), and Size-Aware Replacement (SAR). By adjusting the eviction criteria, based on the compressed data size, one may increase the effective cache capacity and minimize the miss penalty. Extensive simulations with memory traces from real applications running on a full-system simulator demonstrate significant improvements compared to compressed cache schemes employing the conventional Least-Recently Used (LRU) and Dynamic Re-Reference Interval Prediction (DRRIP) [11] replacement policies. Specifically, ECM shows an average effective capacity increase of 15% over LRU and 18.8% over DRRIP, an average cache miss reduction of 9.4% over LRU and 3.9% over DRRIP, and an average system performance improvement of 6.2% over LRU and 3.3% over DRRIP.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013

Preemptible I/O Scheduling of Garbage Collection for Solid State Drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Jongman Kim

Unlike hard disks, flash devices use out-of-place updates operations and require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in an I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semipreemptible GC (PGC) scheme that allows GC processing to be preempted while pending I/O requests in the queue are serviced. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-PGC scheme with realistic workloads demonstrates both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-PGC scheme. In addition, we explore opportunities of a new NAND flash device that supports suspend/resume commands for read, write, and erase operations for fully PGC (F-PGC). Our experiments with an F-PGC enabled flash device show that request response time can be improved by up to 14.57% compared to semi-PGC.

ieee computer society annual symposium on vlsi | 2013

Do we need wide flits in Networks-on-Chip?

Junghee Lee; Chrysostomos Nicopoulos; Sung Joo Park; Madhavan Swaminathan; Jongman Kim

Packet-based Networks-on-Chip (NoC) have emerged as the most viable candidates for the interconnect backbone of future Chip Multi-Processors (CMP). The flit size (or width) is one of the fundamental design parameters within a NoC router, which affects both the performance and the cost of the network. Most studies pertaining to the NoC of general-purpose microprocessors adopt a certain flit width without any reasoning or explanation. In fact, it is not easy to pinpoint an optimal flit size, because the flit size is intricately intertwined with various aspects of the system. This paper aims to provide a guideline on how to choose an appropriate flit width. It will be demonstrated that arbitrarily choosing a flit width without proper investigation may have serious repercussions on the overall behavior of the system.

Explore More