Yasuhiro Inagami | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuhiro Inagami is active.

Explore More

Publication

Featured researches published by Yasuhiro Inagami.

international conference on supercomputing | 1993

A scalar architecture for pseudo vector processing based on slide-windowed registers

Hiroshi Nakamura; Taisuke Boku; Hideo Wada; Hiromitsu Imori; Ikuo Nakata; Yasuhiro Inagami; Kisaburo Nakazawa; Yoshiyuki Yamashita

In this paper, we present a new scalar architecture for high-speed vector processing. Without using cache memory, the proposed architecture tolerates main memory access latency by introducing slide-windowed floating-point registers with data preloading feature and pipelined memory. The architecture can hold upward compatibility with existing scalar architectures. In the new architecture, software can control the window structure. This is the advantage compared with our previous work of register-windows. Because of this advantage, registers are utilized more flexibly and computational efficiency is largely enhanced. Furthermore, this flexibility helps the compiler to generate efficient object codes easily. We have evaluated its performance on Livermore Fortran Kernels. The evaluation results show that the proposed architecture reduces the penalty of main memory access better than an ordinary scalar processor and a processor with cache prefetching. The proposed architecture with 64 registers tolerates memory access latency of 30 CPU cyles. Compared with our previous work, the proposed architecture hides longer memory access latency with fewer registers.

international conference on supercomputing | 1993

Scalable parallel memory architecture with a skew scheme

Tadayuki Sakakibara; Katsuyoshi Kitai; Tadaaki Isobe; Shigeko Yazawa; Teruo Tanaka; Yasuhiro Inagami; Yoshiko Tamaki

We present a scalable parallel memory architecture with skew scheme. It achieves equal conflict-free vector access strides among different numbers of memory modules. With previous skew schemes, conflict-free strides depended on the number of memory modules. Therefore the skew scheme should be independent of the number of memory modules. We analyze two kinds of cause of conflicts, permanent concentration in the limited memory modules and transient concentration in one memory module. The conflict-free strides are proved to be independent of the number of memory modules by solving two concentrations separately. The strategy is to increase the interval of the shifting address assignment of the memory modules in order to reduce the permanent concentrations, and to provide buffers for each memory module in accordance with this interval in order to absorb the transient concentrations. The skew scheme uses the same period for memory systems with different numbers of memory modules. Consequently, scalability for conflict-free strides can be realized, independent of the number of the memory modules.

international conference on supercomputing | 1989

The specification of a new Manchester Dataflow machine

Yasuhiro Inagami; John F. Foley

The Present Manchester Dataflow Machine is constructed using standard TTL technology and is not designed with raw processing power in mind. This paper discusses the issues raised in the redesign of the machine using supercomputer technology. The resulting machine structure is presented in some detail, together with the initial results of simulation which indicate that a performance of hundreds of megaflops appears readily achievable.

international parallel processing symposium | 1997

Deadlock-free fault-tolerant routing in the multi-dimensional crossbar network and its implementation for the Hitachi SR2201

Yoshiko Yasuda; Hiroaki Fujii; Hideya Akashi; Yasuhiro Inagami; Teruo Tanaka; Junji Nakagoshi; Hideo Wada; Tsutomu Sumimoto

We have developed a hardware detour path selection facility for the Hitachi SR2201 parallel computer, which uses a multi-dimensional crossbar as an inter-processor network to ensure operating efficiency and high reliability when a part of the network is faulty. When this hardware facility is used, packets are transmitted to their destination along alternative paths to avoid the fault. However, changing the routing may cause deadlock. This paper describes a deadlock-free fault-tolerant routing scheme that can be used by the detour path selection facility to avoid deadlock, and its implementation for the SR2201.

international conference on supercomputing | 1993

Parallel processing architecture for the Hitachi S-3800 shared-memory vector multiprocessor

Katsuyoshi Kitai; Tadaaki Isobe; Yoshikazu Tanaka; Yoshiko Tamaki; Masakazu Fukagawa; Teruo Tanaka; Yasuhiro Inagami

This paper discusses the architecture of the new Hitachi supercomputer series, which is capable of achieving 8 GFLOPS in each of up to four processors. This architecture provides high-performance processing for fine-grain parallelism, and it allows efficient parallel processing even in an undedicated environment. It also features the newly-developed time-limited spin-loop synchronization, which combines spin-loop synchronization with operating system primitives, and a communication buffer (CB) which caches shared variables for synchronization, thus allowing them to be accessed faster. Three new instructions take advantage of the CB in order to reduce the parallel overhead. The results of performance measurements confirm the effectiveness of the CB and the new instructions.

ieee region 10 conference | 1994

Pseudo vector processor for high-speed list vector computation with hiding memory access latency

Hiroshi Nakamura; T. Wakabayashi; Kisaburo Nakazawa; Taisuke Boku; Hideo Wada; Yasuhiro Inagami

We present two scalar processors called PVP-SWPC and PVP-SWSW for high-speed list vector processing. Memory access latency should be tolerated for this objective. PVP-SWPC tolerates the latency by introducing slide-windowed floating-point registers and prefetch-to-cache instruction. PVP-SWSW tolerates the latency by introducing slide-windowed general and floating-point registers. Owing to the slide-window structure, both processors can utilize more registers in keeping upward compatibility with existing scalar architecture. The evaluation shows that these processors successfully hide memory latency and realize fast list vector processing.<<ETX>>

international symposium on parallel architectures algorithms and networks | 1994

An interprocessor memory access arbitrating scheme for the S-3800 vector supercomputer

Tadayuki Sakakibara; Katsuyoshi Kitai; Tadaaki Isobe; Shigeko Yazawa; Teruo Tanaka; Yoshiko Tamaki; Yasuhiro Inagami

Reports an instruction-based variable priority scheme which achieves high sustained memory throughput on a tightly coupled multiprocessor (TCMP) vector supercomputer. We analyze the two types of priority control for arbitrating interprocessor memory access conflict. In the case of request level priority control, mutual obstruction causes performance degradation, while in the case of fixed priority control, it is caused by memory bank occupation. Mutual obstruction is caused by requests of different instructions that interfere with each other, and memory bank occupation is caused by continuous accessing of the same memory bank by higher priority instructions. The instruction-based variable priority scheme works as follows: (1) the priority of each pipeline is usually changed at the end of an instruction. (2) The priority is changed more than once in the middle of an instruction, such as a stride multiple-of-8 or indirect access instruction which may occupy the same memory bank by itself. This strategy reduces mutual obstruction because the priority of each pipeline is stable in the middle of an instruction. It also reduces memory bank occupation because opportunity for memory access among different instructions is made equal by changing the priority at the end of on instruction. Moreover, it prevents memory bank occupation by stride multiple-of-8 or indirect access instruction, by changing the priority more frequently. Consequently, high sustained memory throughput can be achieved on TCMP vector supercomputers. We implemented this scheme in Hitachis S-3800 supercomputer.<<ETX>>

Archive | 1987