Michihiro Koibuchi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michihiro Koibuchi is active.

Explore More

Publication

Featured researches published by Michihiro Koibuchi.

networks on chips | 2008

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

Michihiro Koibuchi; Hiroki Matsutani; Hideharu Amano; T. Mark Pinkston

Survival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (DBPs) designed to maintain, in the presence of failures, network connectivity of both non-faulty routers as well as healthy processor cores which may be connected to faulty routers. The mechanism provides default paths as backup between certain router ports which serve as alternative datapaths to circumvent failed components within a faulty router. Along with a minimal subset of normal network channels, the set of default backup paths internal to faulty routers form - in the worst case - a unidirectional ring topology that provides network-wide connectivity to all processor cores. Routing using the DBP mechanism is proved to be deadlock-free with only two virtual channels even for fault scenarios in which regular networks degrade to irregular (arbitrary) topologies. Evaluation results show that, for a 2-D mesh wormhole NoC, only 12.6% additional hardware resources are needed to implement the proposed DBP mechanism in order to provide graceful performance degradation without chip-wide failure as the number of faults increases to the maximum needed to form ring.

IEEE Transactions on Parallel and Distributed Systems | 2012

A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms

Jose Flich; Tor Skeie; Andres Mejia; Olav Lysne; Pedro López; Antonio Robles; José Duato; Michihiro Koibuchi; Tomas Rokicki; José Carlos Sancho

Most standard cluster interconnect technologies are flexible with respect to network topology. This has spawned a substantial amount of research on topology-agnostic routing algorithms, which make no assumption about the network structure, thus providing the flexibility needed to route on irregular networks. Actually, such an irregularity should be often interpreted as minor modifications of some regular interconnection pattern, such as those induced by faults. In fact, topology-agnostic routing algorithms are also becoming increasingly useful for networks on chip (NoCs), where faults may make the preferred 2D mesh topology irregular. Existing topology-agnostic routing algorithms were developed for varying purposes, giving them different and not always comparable properties. Details are scattered among many papers, each with distinct conditions, making comparison difficult. This paper presents a comprehensive overview of the known topology-agnostic routing algorithms. We classify these algorithms by their most important properties, and evaluate them consistently. This provides significant insight into the algorithms and their appropriateness for different on- and off-chip environments.

high-performance computer architecture | 2009

Prediction router: Yet another low latency on-chip router architecture

Hiroki Matsutani; Michihiro Koibuchi; Hideharu Amano; Tsutomu Yoshinaga

Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a low-latency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4–15.9% and 8.0–9.5% respectively, up to 89.8% of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.

international conference on parallel processing | 2001

L-turn routing: an adaptive routing in irregular networks

Michihiro Koibuchi; Akira Funahashi; Akiya Jouraku; Hideharu Amano

Network-based parallel processing using commodity personal computers has been widely developed. Since such systems require high degree of flexibility and scalability of wiring, a high-speed network with an irregular topology is often needed. In traditional routing algorithms for irregular networks, available paths are considerably restricted in order to avoid deadlocks. In this paper we propose a novel routing algorithm called left-up-first turn routing (L-turn routing), which makes a better traffic balancing in irregular networks by building a specific spanning tree. Result of simulations shows that L-turn routing achieves better performance than traditional ones with each topology.

asia and south pacific design automation conference | 2008

Run-time power gating of on-chip routers using look-ahead routing

Hiroki Matsutani; Michihiro Koibuchi; Daihan Wang; Hideharu Amano

Since on-chip routers in network-on-chips play a key role in on-chip communication between cores, they should be always preparing for packet injections even if a part of cores are in standby mode, resulting in a larger standby power of routers compared with cores. The run-time power gating of individual channels in a router is one of attractive solutions to reduce the standby power of chip without affecting the on-chip communication. However, a state transition between sleep and active mode incurs the performance penalty, and turning a power switch on or off dissipates the overhead energy, which means a short-term sleep adversely increases the power consumption. In this paper, we propose a sleep control method based on look-ahead routing that detects the arrival of packets two hops ahead, so as to hide the wake-up delay and reduce the short-term sleeps of channels. Simulation results using real application traces show that the proposed method conceals the wake-up delay of less than five cycles, and more leakage power can be saved compared with the original naive method.

international conference on parallel processing | 2007

Tightly-Coupled Multi-Layer Topologies for 3-D NoCs

Hiroki Matsutani; Michihiro Koibuchi; Hideharu Amano

Three-dimensional network-on-chip (3-D NoC) is an emerging research topic exploring the network architecture of 3-D ICs that stack several smaller wafers for reducing wire length and wire delay. Although the network topology of 3-D NoC has been explored for a couple of years, there is still only a narrow range of choices. In this paper, we propose a class of 3-D topologies called Xbar-connected network-on-tiers (XNoTs), which consist of multiple network layers tightly connected via crossbar switches. To make the best use of the short delay and high density of inter-wafer links, XNoTs topologies have crossbar switches that connect different layers and their cores. The planar topology on every layer can be independently customized so as to meet the cost-performance requirements, as far as network connectivity is at least guaranteed with the bottom layer. We also propose their routing algorithm, which guarantees deadlock-freedom by restricting the inter-layer packet transfer from a lower-numbered layer to a higher-numbered layer. Path sets at the bottom layer close to the heat sink of the chip can be selectively employed in order to mitigate the heat-dissipation problem of 3-D ICs. Several forms of XNoTs topologies including meshes, tori, and/or trees are created, and they are evaluated in terms of performance, cost, and energy consumption. As a result, we show that even with the flexibilities mentioned above, XNoTs achieve at least as high throughput as existing 3-D topologies for equivalent chip sizes.

international symposium on computer architecture | 2012

A case for random shortcut topologies for HPC interconnects

Michihiro Koibuchi; Hiroki Matsutani; Hideharu Amano; D. Frank Hsu; Henri Casanova

As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Fortunately, modern High Performance Computing (HPC) systems can exploit low-latency topologies of high-radix switches. In this context, we propose the use of random shortcut topologies, which are generated by augmenting classical topologies with random links. Using graph analysis we find that these topologies, when compared to non-random topologies of the same degree, lead to drastically reduced diameter and average shortest path length. The best results are obtained when adding random links to a ring topology, meaning that good random shortcut topologies can easily be generated for arbitrary numbers of switches. Using flit-level discrete event simulation we find that random shortcut topologies achieve throughput comparable to and latency lower than that of existing non-random topologies such as hypercubes and tori. Finally, we discuss and quantify practical challenges for random shortcut topologies, including routing scalability and larger physical cable lengths.

networks on chips | 2010

Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs

Hiroki Matsutani; Michihiro Koibuchi; Daisuke Ikebuchi; Kimiyoshi Usami; Hiroshi Nakamura; Hideharu Amano

This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which power supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled in response to the applied workload.As only the router components which are just transferring a packet are activated, the leakage power of the on-chip network can be reduced to the near-optimal level.However, a certain amount of wakeup latency is required to activate the sleeping components, and the application performance will be degraded.In this paper, we estimate the wakeup latency for each component based on circuit simulations using a 65nm process.Then we propose four early wakeup methods to overcome the wakeup latency.The proposed router with the early wakeup methods is evaluated in terms of the application performance, area, and leakage power.As a result, it reduces the leakage power by 78.9%, at the expense of the 4.3% area and 4.0% performance when we assume a 1GHz operation.

networks on chips | 2008

Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks

Hiroki Matsutani; Michihiro Koibuchi; Daihan Wang; Hideharu Amano

In this paper, we introduce the use of slow-silent virtual channels to reduce the switching power of on-chip networks while keeping the leakage power small. Adding virtual channels to a network improves the throughput until each link bandwidth is saturated. This enables us to reduce the switching power of on-chip networks by decreasing their operating frequency and supply voltage. However, adding virtual channels increases the leakage power of routers as well as the area due to their large buffers; so the runtime power gating is applied to individual virtual channels to eliminate this problem. We evaluate the performance of slow-silent virtual channels by using real application traces, and their power consumption (switching and leakage) is evaluated based on the detailed design of a virtual-channel router placed and routed with a 90 nm technology. These evaluation results show that a network with three or four virtual channels achieves the best energy efficiency in a uniform traffic. In the cases of neighboring communications, a network with two virtual channels is better than the other networks with more virtual channels, because the performance improvement from no virtual channel to two virtual channels is the largest and their frequency and supply voltage can also be reduced well in these cases.

asia and south pacific design automation conference | 2013

A case for wireless 3D NoCs for CMPs

Hiroki Matsutani; Paul Bogdan; Radu Marculescu; Yasuhiro Take; Daisuke Sasaki; Hao Zhang; Michihiro Koibuchi; Tadahiro Kuroda; Hideharu Amano

Inductive-coupling is yet another 3D integration technique that can be used to stack more than three known-good-dies in a SiP without wire connections. We present a topology-agnostic 3D CMP architecture using inductive-coupling that offers great flexibility in customizing the number of processor chips, SRAM chips, and DRAM chips in a SiP after chips have been fabricated. In this paper, first, we propose a routing protocol that exchanges the network information between all chips in a given SiP to establish efficient deadlock-free routing paths. Second, we propose its optimization technique that analyzes the application traffic patterns and selects different spanning tree roots so as to minimize the average hop counts and improve the application performance.

Explore More