Hideo Wada | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hideo Wada is active.

Explore More

Publication

Featured researches published by Hideo Wada.

international conference on supercomputing | 1993

A scalar architecture for pseudo vector processing based on slide-windowed registers

Hiroshi Nakamura; Taisuke Boku; Hideo Wada; Hiromitsu Imori; Ikuo Nakata; Yasuhiro Inagami; Kisaburo Nakazawa; Yoshiyuki Yamashita

In this paper, we present a new scalar architecture for high-speed vector processing. Without using cache memory, the proposed architecture tolerates main memory access latency by introducing slide-windowed floating-point registers with data preloading feature and pipelined memory. The architecture can hold upward compatibility with existing scalar architectures. In the new architecture, software can control the window structure. This is the advantage compared with our previous work of register-windows. Because of this advantage, registers are utilized more flexibly and computational efficiency is largely enhanced. Furthermore, this flexibility helps the compiler to generate efficient object codes easily. We have evaluated its performance on Livermore Fortran Kernels. The evaluation results show that the proposed architecture reduces the penalty of main memory access better than an ordinary scalar processor and a processor with cache prefetching. The proposed architecture with 64 registers tolerates memory access latency of 30 CPU cyles. Compared with our previous work, the proposed architecture hides longer memory access latency with fewer registers.

international parallel processing symposium | 1997

Deadlock-free fault-tolerant routing in the multi-dimensional crossbar network and its implementation for the Hitachi SR2201

Yoshiko Yasuda; Hiroaki Fujii; Hideya Akashi; Yasuhiro Inagami; Teruo Tanaka; Junji Nakagoshi; Hideo Wada; Tsutomu Sumimoto

We have developed a hardware detour path selection facility for the Hitachi SR2201 parallel computer, which uses a multi-dimensional crossbar as an inter-processor network to ensure operating efficiency and high reliability when a part of the network is faulty. When this hardware facility is used, packets are transmitted to their destination along alternative paths to avoid the fault. However, changing the routing may cause deadlock. This paper describes a deadlock-free fault-tolerant routing scheme that can be used by the detour path selection facility to avoid deadlock, and its implementation for the SR2201.

international conference on supercomputing | 1988

High-speed processing schemes for summation type and iteration type vector instructions on Hitachi supercomputer S-820 system

Hideo Wada; K. Ishil; Masakazu Fukagawa; Hiroshi Murayama; Shun Kawabe

The HITACHI supercomputer S-820 system has been developed as Hitachis top end supercomputer. It is also rated as one of the most powerful supercomputers in the world. Among the vector instructions which supercomputers support, summation type vector instructions and iteration type vector instructions are not suitable for parallel processing, since elements to be processed are not independent in these instructions. The S-820 employs high-speed processing schemes for summation type vector instructions and iteration type vector instructions; the performance of summation type instructions is enhanced by high-speed post-processing scheme and the performance of iteration type instructions is enhanced by high-speed parallelizing scheme for iteration arithmetic. Thanks to these schemes, the execution speeds for Kernel 3 and Kernel 4 of the Lawrence Livermore Laboratorys 24 Kernels become 838.7 MFLOPS and 258.5 MFLOPS respectively, and those for Kernel 5, Kernel 6 and Kernel 11 of the Lawrence Livermore Laboratorys 14 Kernels become 114.6 MFLOPS, 111.8 MFLOPS and 98.4 MFLOPS, respectively.

international conference on supercomputing | 1989

High-speed storage control schemes of HITACHI supercomputer S-820 system

Hideo Wada; Tadaaka Isobe; Masao Furukawa; Shun Kawabe

The HITACHI supercomputer S-820 has been developed as Hitachis top end supercomputer. It is also rated as one of the most powerful supercomputers in the world. To match high performance of arithmetic units, the S-820 employed advanced storage control schemes. Of these schemes, this paper introduces the parallel structure of the storage control, section number assigning, bank group number modifying and vector indirect store instruction. By parallel structure of the storage control and section number assigning, peak memory throughput of 16 Gbytes/sec is achieved. By bank group number modifying, critical memory access conflicts are avoided and excellent memory throughput is achieved especially for vectors with short strides. By using vector indirect store instruction, the performance of store-type list vector operation without duplicated list vector elements is 2.3-fold improved over that obtained by using conventional store-type list vector instructions.

ieee region 10 conference | 1994

Pseudo vector processor for high-speed list vector computation with hiding memory access latency

Hiroshi Nakamura; T. Wakabayashi; Kisaburo Nakazawa; Taisuke Boku; Hideo Wada; Yasuhiro Inagami

We present two scalar processors called PVP-SWPC and PVP-SWSW for high-speed list vector processing. Memory access latency should be tolerated for this objective. PVP-SWPC tolerates the latency by introducing slide-windowed floating-point registers and prefetch-to-cache instruction. PVP-SWSW tolerates the latency by introducing slide-windowed general and floating-point registers. Owing to the slide-window structure, both processors can utilize more registers in keeping upward compatibility with existing scalar architecture. The evaluation shows that these processors successfully hide memory latency and realize fast list vector processing.<<ETX>>

Supercomputer'8 Anwendungen, Architekturen, Trends, Seminar, Mannheim, | 1989

An Overview of the HITACHI S-820 Supercomputer System

Michihiro Hirai; Shun Kawabe; Hideo Wada

The HITACHI S-820 has made a debut as one of the most powerful supercomputers in the world, with a peak arithmetic performance of 3 GFLOPS. Like its predecessor S-810, the first Japanese-made supercomputer, it consists of a scalar processor and a vector processor. As the scalar processor has the same architecture as a general-purpose mainframe, the whole complex fits well in conventional operating environments. Central to the high computation speed is the vector processor with its multiple-pipeline structure and large vector register memory. The use of the semiconductor Extended Storage dramatically reduces I/O time, contributing to faster job turnaround and balanced system performance. A high degree of parallelism is incorporated inside the vector processor as well as between the vector and the scalar processors. The hardware technology employed in the system, which is the key to high performance and supreme reliability, includes the field-proven state-of-the-art high-speed logic LSIs originally developed for Hitachi’s top-of-the-line mainframes, a 256K bit CMOS RAM with an access time of 45 nsec, and a vector register LSI which combines logic and RAMs on a monolithic chip. A variety of software products have also been developed to fully exploit the hardware capabilities. They include a vectorizing compiler FORT77/HAP with enhanced vectorization capability, an easy-to-code differential equation solver DEQSOL E2, and a mathematical subroutine library MATRIX/HAP. In a benchmark with Lawrence Livermore Laboratory’s 14 Kernels, the S-820 scores 355 (417 with a biCMOS version of the Main Storage) MFLOPS (algebraic average), among the highest in the industry.

international parallel processing symposium | 1997