Morihiro Kuga
Kumamoto University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Morihiro Kuga.
field-programmable custom computing machines | 2010
Yoshihiro Ichinomiya; Shiro Tanoue; Motoki Amagasaki; Masahiro Iida; Morihiro Kuga; Toshinori Sueyoshi
SRAM-based field programmable gate arrays (FPGAs) are vulnerable to a single event upset (SEU), which is induced by radiation effect. This paper presents a technique for ensuring reliable softcore processor implementation on SRAM-based FPGAs. Although an FPGA is susceptible to SEUs, these faults can be corrected as a result of its reconfigurability. We propose techniques for SEU mitigation and recovery of a softcore processor using triple modular redundancy (TMR) and partial reconfiguration (PR) with state synchronization. By carrying out an experiment, we confirm that a faulty softcore processor can be recovered and synchronized with other softcore processors. The proposed technique requires 4.315 times the resource usage and 62.491% of the operating frequency of the base processor. However, the proposed recovery process only takes 6 μs under TMR and PR. As a result of reliability estimation, the proposed system achieved about 2.713 times longer MTBF comparing with the previous system.
field-programmable logic and applications | 2009
Shiro Tanoue; Tomoyuki Ishida; Yoshihiro Ichinomiya; Motoki Amagasaki; Morihiro Kuga; Toshinori Sueyoshi
The present paper describes a technique for ensuring re- liable softcore processor implementation on SRAM-based Field Programmable Gate Arrays (FPGAs), which can han- dle the effects of Single Event Upsets (SEUs). We pro- pose the Triple Modular Redundancy (TMR) scheme cou- pled with dynamic partial reconfiguration to remove SEUs from the configuration memory of the FPGA. Although the FPGA is subject to SEUs, these errors can be corrected as a result of its reconfigurability. Furthermore, we consider the synchronization after a partial reconfiguration using an interrupt process of an RTOS. Experimental results reveal that one faulty softcore processor is recovered and synchronized with the other softcore processors. The present study demonstrates that a softcore processor can recover from an SEU using the proposed dynamic partial reconfiguration and the synchronization process.
ACM Sigarch Computer Architecture News | 1991
Morihiro Kuga; Kazuaki Murakami; Shinji Tomita
A new superscalar processor architecture, called DSNS (Dynamically-hazard-resolved, Statically-code-scheduled, Nonuniform Superscalar), is proposed and discussed. DSNS has the following major architectural features.1. Dynamically-hazard-resolved superscalar: DSNS is object-code compatible with respect to the degree of superscalar. Pipeline interlock hardware should be provided for detecting and resolving hazards at run time.2. Statically-cade-scheduled superscalar: The performance of DSNS could not be scalable with respect to the degree of superscalar. Compilers must be responsible for scheduling instructions to reduce the pipeline stalls for a particular degree of superscalar.3. Nonuniform superscalar: Although nonuniform superscalar potentially suffers instruction-class conflicts, it can be more cost-effective than uniform superscalar. Again compilers must take care that the class conflicts do not increase structural hazards.4. Static memory disambiguation: The DSNS architecture provides three types of LOAD/STORE instructions; strongly ordered, weakly ordered, and unordered. Memory disambiguation at compile time is responsible for marking each LOAD/STORE instruction. At run time, processors need not detect nor resolve data hazards for every type; they just perform memory accesses inorder for strongly or weakly ordered instructions, and arbitrarily for unordered.5. Static branch prediction with branch-target buffer: Branch instructions predicted as taken by compilers are stored in the branch target buffer. Hardware never guesses the outcomes of branch instructions.6. Early branch resolution with advanced conditioning: Advanced conditioning allows branch decisions to precede further the corresponding branches. It reduces the branch delay and results in resolving branches early in the pipeline.7. Conditional mode execution with dual register files: Dual register file facilitates maintaining the precise machine state that otherwise might be violated by speculative execution such as conditional mode.8. Weakly precise interrupts: The DSNS architecture defines interrupts as being somewhat imprecise but restartable with the help of interrupt handlers. The definition alleviates hardware constraints for ensuring precise interrupts strongly.This paper also presents an implementation of the DSNS architecture. The DSNS processor prototype under development is a four-stage pipelined processor of superscalar-degree four. The instruction pipelines, especially the branch pipeline, are discussed in detail.
field-programmable technology | 2004
H. Shibamurat; M. Fukuyama; D. Uchida; S. Ikeda; Morihiro Kuga; Toshinori Sueyoshi
This work presents a dynamically reconfigurable platform called EXPRESS-1, which uses the commercially available embedded processor FPGA. The system makes the most of a fully reconfigurable logic part to explore the area of fine-grained reconfigurable computing. A dynamic reconfiguration mechanism is implemented utilizing a real-time operating system, so device reconfiguration in response to application demand works without suspending other services. EXPRESS-1 features a transparent execution mechanism. Whether a function is executed by the hardware or software, the mechanism frees users from awareness of its execution manner. Furthermore, there is no need to explicitly specify reconfiguration commands into a program because the system determines if reconfiguration is needed based on current conditions. The development of EXPRESS-1 and the runtime reconfiguration mechanism of the fully reconfigurable logic are described. System capabilities are also reported through fundamental evaluations with some practical applications such as JavaVM, encryption processing, and image processing.
field programmable logic and applications | 2014
Qian Zhao; Kyosei Yanagida; Motoki Amagasaki; Masahiro Iida; Morihiro Kuga; Toshinori Sueyoshi
Most modern field-programmable gate arrays (FPGAs) employ a look-up table (LUT) as their basic logic cell. Although a k-input LUT can implement any k-input logic, its functionality relies on a large amount of configuration memory. As FPGA scales improve, the increased quantity of configuration memory cells required for FPGAs will require a larger area and consume more power. Moreover, the soft-error rate per device will also increase as more configuration memory cells are embedded. We propose scalable logic modules (SLMs), logic cells requiring less configuration memory, reducing configuration memory by making use of partial functions of Shannon expansion for frequently appearing logics. Experimental results show that SLM-based FPGAs use much less configuration memory and have smaller area than conventional LUT-based FPGAs.
ieee region 10 conference | 1994
Hidetomo Shibamura; Morihiro Kuga; Toshinori Sueyoshi
Presents an interconnection network simulator, called INSIGHT, to evaluate the performance of various interconnection networks toward the realization of massively parallel computers. The capability of INSIGHT to modify the simulation parameters support the development of large-scale interconnection networks. In this study, we consider the requirements of a simulator to evaluate the performance of large-scale interconnection networks, and describe the framework of INSIGHT. Moreover, we examined the effects on the communication performance or execution time in relation to the performance of the processors. As a result, in e-cube routing, we found out that in some cases, the wormhole flow control method, which achieved low communication latency, has low performance compared to the store-and-forward method.<<ETX>>
field-programmable logic and applications | 2013
Qian Zhao; Motoki Amagasaki; Masahiro Iida; Morihiro Kuga; Toshinori Sueyoshi
Conventional FPGA design and implementation processes involve two separate flows. The FPGA architecture is determined by academic FPGA design flow. However, in the implementation phase, commercial VLSI design flow are used. In this research, we propose an FPGA design framework in order to improve synthesizable FPGA IP design efficiency. A novel FPGA routing tool is developed in this framework, namely the EasyRouter, which can bridge the two flows efficiently. With this design flow, accurate physical information can be reported when a new FPGA IP architecture is evaluated with reliable commercial VLSI CADs.
Systems and Computers in Japan | 2002
Toshinori Sueyoshi; Morihiro Kuga; Hidetomo Shibamura
Studies in the field of computer science cover aspects of science that attach importance to analysis and aspects of engineering that attach importance to synthesis. Therefore, the ability to manage synthesis, represented by system development and/or its design, is required in addition to analysis in the field. On the other hand, essential course subjects in computer science education such as logic circuits, computer architecture, operating system, and compilers have to be closely related with one another; nevertheless, they tend to be isolated from one another because of their great sophistication and complication. Thus, consistent education in computer science becomes difficult. Moreover, experiments, not virtual experiences, which involve the pleasure and emotion of creation are necessary in hardware education. Accordingly, since the beginning of the 1990s, we have developed an educational microprocessor called KITE using reconfigurable LSI, together with teaching materials that allow students to learn actively, considering the essence of synthesis as related to comprehensive understanding of lectures and practical use. These teaching materials have been made available to the public, and have been introduced into more than 30 educational facilities such as universities and companies. They are well designed for various stages of computer education, varying from introductory education to design education, and system software education, providing an effective education, with experiments in addition to lectures, and a consistent grounding in computer science.
ifip ieee international conference on very large scale integration | 2015
Motoki Amagasaki; Yuto Takeuchi; Qian Zhao; Masahiro Iida; Morihiro Kuga; Toshinori Sueyoshi
A three-dimensional (3D) integration based on wafer-to-wafer bonding using through-silicon vias (TSVs) has been developed for the fabrication of new 3D large-scale integrated chips. To balance between cost and performance, and to explore 3D field-programmable gate array (FPGA) with realistic 3D integration processes, we propose spatially distributed and functionally distributed types of 3D FPGA architectures. The functionally distributed architecture consists of two wafers, a logic layer and a routing layer, and is stacked by a face-down process technology. Since vertical wires pass through microbumps, no TSVs are needed. In contrast, the spatially distributed architecture is divided into multiple layers with the same structure, unlike in the functionally distributed type. This architecture can be expanded to more than two layers by stacking multiples of the same die. The goal of this paper is to elucidate the advantages and disadvantages of these two types of 3D FPGAs. According to our evaluation, when only two layers are used, the functionally distributed architecture is more effective. When higher performance is achieved by using more than two layers, the spatially distributed architecture achieves better performance.
reconfigurable computing and fpgas | 2012
Yuki Nishitani; Kazuki Inoue; Motoki Amagasaki; Masahiro Iida; Morihiro Kuga; Toshinori Sueyoshi
FPGA fault detection consumes a great deal of test time compared with ASICs because FPGAs have complex structures. Re-placement and re-routing must be performed to avoid fault points, which causes an increase in recovery time and degrades performance. Therefore, we propose a fault detection method and develop placement and routing tools to avoid fault sources in tile and multiplexer level avoidance, respectively. In the evaluation, the detection method diagnosed faulty MUXes with six test configurations. We found that the performance of a faulty FPGA slightly decreased by 2% compared with a normal FPGA in multiplexer level avoidance.