Katsuyoshi Kitai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katsuyoshi Kitai is active.

Explore More

Publication

Featured researches published by Katsuyoshi Kitai.

international conference on supercomputing | 1994

Distributed storage control unit for the Hitachi S-3800 multivector supercomputer

Katsuyoshi Kitai; Tadaaki Isobe; Tadayuki Sakakibara; Shigeko Yazawa; Yoshiko Tamaki; Teruo Tanaka; Kouichi Ishii

This paper discusses the storage control unit of the Hitachi S-3800 supercomputer series, which is capable of achieving 8 GFLOPS in each of up to four shared-memory multiprocessors. This storage control unit is distributed to the V-SCs (vector-processor-side storage control units) and the M-SCs (main-storage-side storage control units), and achieves 128 gigabytes per second of total memory throughput. This distributed storage control unit supports scalability with increases in the number of processors and segmented parallel pipelines, simply by reconnecting the flat cables between the V-SCs and M-SCs. The distributed storage control unit also facilitated high sustained memory throughput for all types of vector-load and -store instructions. It features three new storage control schemes. (1) A hierarchical request-identification-number assignment scheme, which allows independent parallel memory access control in the V-SCs and M-SCs. This also enhances the indirect memory access performance. (2) A multistage address modification scheme, which achieves conflict-free constant-stride parallel memory access in both the V-SCs and M-SCs. (3) An instruction-based variable priority scheme, which achieves stable high memory throughput independent of other programs executed on the other processors. Results of performance measurements show the benefit of these schemes in the scalable distributed storage control unit for the S-3800 series.

international conference on supercomputing | 1993

Scalable parallel memory architecture with a skew scheme

Tadayuki Sakakibara; Katsuyoshi Kitai; Tadaaki Isobe; Shigeko Yazawa; Teruo Tanaka; Yasuhiro Inagami; Yoshiko Tamaki

We present a scalable parallel memory architecture with skew scheme. It achieves equal conflict-free vector access strides among different numbers of memory modules. With previous skew schemes, conflict-free strides depended on the number of memory modules. Therefore the skew scheme should be independent of the number of memory modules. We analyze two kinds of cause of conflicts, permanent concentration in the limited memory modules and transient concentration in one memory module. The conflict-free strides are proved to be independent of the number of memory modules by solving two concentrations separately. The strategy is to increase the interval of the shifting address assignment of the memory modules in order to reduce the permanent concentrations, and to provide buffers for each memory module in accordance with this interval in order to absorb the transient concentrations. The skew scheme uses the same period for memory systems with different numbers of memory modules. Consequently, scalability for conflict-free strides can be realized, independent of the number of the memory modules.

international conference on supercomputing | 1993

Parallel processing architecture for the Hitachi S-3800 shared-memory vector multiprocessor

Katsuyoshi Kitai; Tadaaki Isobe; Yoshikazu Tanaka; Yoshiko Tamaki; Masakazu Fukagawa; Teruo Tanaka; Yasuhiro Inagami

This paper discusses the architecture of the new Hitachi supercomputer series, which is capable of achieving 8 GFLOPS in each of up to four processors. This architecture provides high-performance processing for fine-grain parallelism, and it allows efficient parallel processing even in an undedicated environment. It also features the newly-developed time-limited spin-loop synchronization, which combines spin-loop synchronization with operating system primitives, and a communication buffer (CB) which caches shared variables for synchronization, thus allowing them to be accessed faster. Three new instructions take advantage of the CB in order to reduce the parallel overhead. The results of performance measurements confirm the effectiveness of the CB and the new instructions.

international symposium on parallel architectures algorithms and networks | 1994

An interprocessor memory access arbitrating scheme for the S-3800 vector supercomputer

Tadayuki Sakakibara; Katsuyoshi Kitai; Tadaaki Isobe; Shigeko Yazawa; Teruo Tanaka; Yoshiko Tamaki; Yasuhiro Inagami

Reports an instruction-based variable priority scheme which achieves high sustained memory throughput on a tightly coupled multiprocessor (TCMP) vector supercomputer. We analyze the two types of priority control for arbitrating interprocessor memory access conflict. In the case of request level priority control, mutual obstruction causes performance degradation, while in the case of fixed priority control, it is caused by memory bank occupation. Mutual obstruction is caused by requests of different instructions that interfere with each other, and memory bank occupation is caused by continuous accessing of the same memory bank by higher priority instructions. The instruction-based variable priority scheme works as follows: (1) the priority of each pipeline is usually changed at the end of an instruction. (2) The priority is changed more than once in the middle of an instruction, such as a stride multiple-of-8 or indirect access instruction which may occupy the same memory bank by itself. This strategy reduces mutual obstruction because the priority of each pipeline is stable in the middle of an instruction. It also reduces memory bank occupation because opportunity for memory access among different instructions is made equal by changing the priority at the end of on instruction. Moreover, it prevents memory bank occupation by stride multiple-of-8 or indirect access instruction, by changing the priority more frequently. Consequently, high sustained memory throughput can be achieved on TCMP vector supercomputers. We implemented this scheme in Hitachis S-3800 supercomputer.<<ETX>>

Archive | 1996