Ki Hwan Yum | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ki Hwan Yum is active.

Explore More

Publication

Featured researches published by Ki Hwan Yum.

international symposium on low power electronics and design | 2003

Energy optimization techniques in cluster interconnects

Eun Jung Kim; Ki Hwan Yum; Greg M. Link; Narayanan Vijaykrishnan; Mahmut T. Kandemir; Mary Jane Irwin; Mazin S. Yousif; Chita R. Das

Designing energy-efficient clusters has recently become an important concern to make these systems economically attractive for many applications. Since the links and switch buffers consume the major portion of the power budget of the cluster, the focus of this paper is to optimize the energy consumption in these two components. To minimize power in the links, we propose a novel dynamic link shutdown (DLS) technique. The DLS technique makes use of an appropriate adaptive routing algorithm to shutdown the links intelligently. We also present an optimized buffer design for reducing leakage energy. Our analysis on different networks using a complete system simulator reveals that the proposed DLS technique can provide optimized performance-energy behavior (up to 40% energy savings with less than 5% performance degradation in the best case) for the cluster interconnects.

high-performance computer architecture | 2007

A Domain-Specific On-Chip Network Design for Large Scale Cache Systems

Yuho Jin; Eun Jung Kim; Ki Hwan Yum

As circuit integration technology advances, the design of efficient interconnects has become critical. On-chip networks have been adopted to overcome scalability and the poor resource sharing problems of shared buses or dedicated wires. However, using a general on-chip network for a specific domain may cause underutilization of the network resources and huge network delays because the interconnects are not optimized for the domain. Addressing these two issues is challenging because in-depth knowledges of interconnects and the specific domain are required. Non-uniform cache architectures (NUCAs) use wormhole-routed 2D mesh networks to improve the performance of on-chip L2 caches. We observe that network resources in NUCAs are underutilized and occupy considerable chip area (52% of cache area). Also the network delay is significantly large (63% of cache access time). Motivated by our observations, we investigate how to optimize cache operations and and design the network in large scale cache systems. We propose a single-cycle router architecture that can efficiently support multicasting in on-chip caches. Next, we present fast-LRU replacement, where cache replacement overlaps with data request delivery. Finally we propose a deadlock-free XYX routing algorithm and a new halo network topology to minimize the number of links in the network. Simulation results show that our networked cache system improves the average IPC by 38% over the mesh network design with multicast promotion replacement while using only 23% of the interconnection area. Specifically, multicast fast-LRU replacement improves the average IPC by 20% compared with multicast promotion replacement. A halo topology design additionally improves the average IPC by 18% over a mesh topology

international conference on multimedia and expo | 2006

Bandwidth Estimation in Wireless Lans for Multimedia Streaming Services

Heung Ki Lee; Varrian Hall; Ki Hwan Yum; Kyoung Ill Kim; Eun Jung Kim

The popularity of multimedia streaming services via wireless networks presents major challenges in the management of network bandwidth. One challenge is to quickly and precisely estimate the available bandwidth for the decision of streaming rates of layered and scalable multimedia services. Previous works based on wired networks are too burdensome to be applied to multimedia applications in wireless networks. In this paper, a new method, IdleGap, is suggested to estimate the available bandwidth of a wireless LAN based on the information from a low layer in the protocol stack. We use a network simulation tool, NS-2, to evaluate our new method with various range of cross traffic and observation times. Our simulation results show that IdleGap accurately estimates the available bandwidth for all ranges of cross traffic (100Kbps ~ 1Mbps) with a very short observation time of 10 seconds.

international symposium on microarchitecture | 2008

Adaptive data compression for high-performance low-power on-chip networks

Yuho Jin; Ki Hwan Yum; Eun Jung Kim

With the recent design shift towards increasing the number of processing elements in a chip, high-bandwidth support in on-chip interconnect is essential for low-latency communication. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect cost. In this paper, we exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one can increase the effective bandwidth of routers and links, while saving power due to reduced operations. The main challenges are providing a scalable implementation of tables and minimizing overhead of the compression latency. First, we propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. Next, we present streamlined encoding that combines flit injection and encoding in a pipeline. Furthermore, data compression can be selectively applied to communication on congested paths only if compression improves performance. Simulation results in a 16-core CMP show that our compression method improves the packet latency by up to 44% with an average of 36% and reduces the network power consumption by 36% on average.

international symposium on computer architecture | 2001

QoS provisioning in clusters: an investigation of Router and NIC design

Ki Hwan Yum; Eun Jung Kim; Chita R. Das

Design of high performance cluster networks (routers) with Quality-of-Service (QoS) guarantees is becoming increasingly important to support a variety of multimedia applications, many of which have real-time constraints. Most commercial routers, which are based on the wormhole-switching paradigm, can deliver high performance, but lack QoS provisioning. In this paper, we present a pipelined wormhole router architecture that can provide high and predictable performance for integrated traffic in clusters. We consider two different implementations—a non-preemptive model and a more aggressive preemptive model. We also present the design of a network interface card (NIC) based on the Virtual Interface Architecture (VIA) design paradigm to support QoS in the NIC. The QoS capable router and NIC designs are evaluated with a mixed workload consisting of best-effort traffic, multimedia streams, and control traffic. Simulation results of an 8-port router and a (2 × 2) mesh network indicate that the preemptive router can provide better performance than the non-preemptive router for dynamically changing workloads. Co-evaluation of the QoS-aware NIC with the proposed router models shows significant performance improvement compared to that with a traditional NIC without any QoS support.

IEEE Transactions on Computers | 2005

A holistic approach to designing energy-efficient cluster interconnects

Eun Jung Kim; Greg M. Link; Ki Hwan Yum; Narayanan Vijaykrishnan; Mahmut T. Kandemir; Mary Jane Irwin; Chita R. Das

Designing energy-efficient clusters has recently become an important concern to make these systems economically attractive for many applications. Since the cluster interconnect is a major part of the system, the focus of this paper is to characterize and optimize the energy consumption in the entire interconnect. Using a cycle-accurate simulator of an InfiniBand Architecture (IBA) compliant interconnect fabric and actual designs of its components, we investigate the energy behavior on regular and irregular interconnects. The energy profile of the three major components (switches, network interface cards (NICs), and links) reveals that the links and switch buffers consume the major portion of the power budget. Hence, we focus on energy optimization of these two components. To minimize power in the links, first we investigate the dynamic voltage scaling (DVS) algorithm and then propose a novel dynamic link shutdown (DLS) technique. The DLS technique makes use of an appropriate adaptive routing algorithm to shut down the links intelligently. We also present an optimized buffer design for reducing leakage energy in 70nm technology. Our analysis on different networks reveals that, while DVS is an effective energy conservation technique, it incurs significant performance penalty at low to medium workload. Moreover, energy saving with DVS reduces as the buffer leakage current becomes significant with 70nm design. On the other hand, the proposed DLS technique can provide optimized performance-energy behavior (up to 40 percent energy savings with less than 5 percent performance degradation in the best case) for the cluster interconnects.

design automation conference | 2015

Bandwidth-efficient on-chip interconnect designs for GPGPUs

Hyunjun Jang; Jinchun Kim; Paul V. Gratz; Ki Hwan Yum; Eun Jung Kim

Modern computational workloads require abundant thread level parallelism (TLP), necessitating highly-parallel, many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs). GPGPUs place a heavy demand on the on-chip interconnect between the many cores and a few memory controllers (MCs). Thus, traffic is highly asymmetric, impacting on-chip resource utilization and system performance. Here, we analyze the communication demands of typical GPGPU applications, and propose efficient Network-on-Chip (NoC) designs to meet those demands. We show that the proposed schemes improve performance by up to 64.7%. Compared to the best of class prior work, our VC monopolizing and partitioning schemes improve performance by 25%.

advances in multimedia | 2007

Bandwidth estimation in wireless lans for multimedia streaming services

Heung Ki Lee; Varrian Hall; Ki Hwan Yum; Kyoung Ill Kim; Eun Jung Kim

The popularity of multimedia streaming services via wireless networks presents major challenges in the management of network bandwidth. One challenge is to quickly and precisely estimate the available bandwidth for the decision of streaming rates of layered and scalable multimedia services. Previous works based on wired networks are too burdensome to be applied to multimedia applications in wireless networks. In this paper, a new method, IdleGap, is suggested to estimate the available bandwidth of a wireless LAN based on the information from a low layer in the protocol stack. We use a network simulation tool, NS-2, to evaluate our new method with various range of cross traffic and observation times. Our simulation results show that IdleGap accurately estimates the available bandwidth for all ranges of cross traffic (100 Kbps~1 Mbps) with a very short observation time of 10 seconds

networks on chips | 2012

A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects

Hyunjun Jang; Baik Song An; Nikhil Kulkarni; Ki Hwan Yum; Eun Jung Kim

As the chip multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) has been a major bottleneck in CMP systems. Using high-density memories in input buffers helps to reduce the bottleneck through increasing throughput. Spin-Torque Transfer Magnetic RAM (STT-MRAM) can be a suitable solution due to its nature of high density and near-zero leakage power. But its long latency and high power consumption in write operations still need to be addressed. We explore the design issues in using STT-MRAM for NoC input buffers. Motivated by short intra-router latency, we use the previously proposed write latency reduction technique sacrificing retention time. Then we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Simulation results show that the proposed scheme enhances the throughput by 21% on average.

IEEE Transactions on Parallel and Distributed Systems | 2002

MediaWorm: a QoS capable router architecture for clusters

Ki Hwan Yum; Eun Jung Kim; Chita R. Das; Aniruddha S. Vaidya

With the increasing use of clusters in real-time applications, it has become essential to design high-performance networks with quality-of-service (QoS) guarantees. We explore the feasibility of providing QoS in wormhole switched routers, which are widely used in designing scalable, high-performance cluster interconnects. In particular, we are interested in supporting multimedia video streams with CBR and VBR traffic, in addition to the conventional best-effort traffic. The proposed MediaWorm router uses a rate-based bandwidth allocation mechanism, called Fine-Grained VirtualClock (FGVC), to schedule network resources for different traffic classes. Our simulation results on an 8-port router indicate that it is possible to provide jitter-free delivery to VBR/CBR traffic up to an input load of 70-80 percent of link bandwidth and the presence of best-effort traffic has no adverse effect on real-time traffic. Although the MediaWorm router shows a slightly lower performance than a pipelined circuit switched (PCS) router, commercial success of wormhole switching, coupled with simpler and cheaper design, makes it an attractive alternative. Simulation of a (2/spl times/2) fat-mesh using this router shows performance comparable to that of a single switch and suggests that clusters designed with appropriate bandwidth balance between links can provide required performance for different types of traffic.

Explore More