Hideo Hirono | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hideo Hirono is active.

Explore More

Publication

Featured researches published by Hideo Hirono.

international conference on supercomputing | 1993

Super-threading: architectural and software mechanisms for optimizing parallel computation

Shuichi Sakai; Kazuaki Okamoto; Hiroshi Matsuoka; Hideo Hirono; Yuetsu Kodama; Mitsuhisa Sato

This paper presents super-threading, which generically means the architectural and software mechanisms for optimizing parallel computation. Super-threading includes architectural optimization of a processing element (PE), mechanism for supporting fast communication and computation, techniques of a compiler and a run time system for optimizing thread creation, thread allocation, tuning of granularity and data allocation to physically distributed storage. This paper states what super-threading is and examines some of the technologies belonging to it. The processor architecture based on super-threading is proposed and its implementation on a highly parallel computer EM-4 is shown with performance data. Software issues about super-threading are also examined mainly from the viewpoint of granularity optimization. Dynamic granularity optimization methods are proposed here, and evaluated on EM-4. The performance data indicate that super-threading is a key technology for realizing an efficient massively parallel computer.

international conference on computer design | 1995

A prototype router for the massively parallel computer RWC-1

Takashi Yokota; Hiroshi Matsuoka; Kazuaki Okamoto; Hideo Hirono; Atsushi Hori; Shuichi Sakai

The RWC-1 is a massively parallel computer based on a multi-threaded architecture. This architecture requires extremely high communication performance with reasonable hardware cost. ln this paper, we first introduce a new class of direct interconnection networks called MDCE (Multidimensional Directed Cycles Ensemble extension). MDCE has many desirable features for RWC-1 including small degree, low latency, and high throughput. MDCE is thus adopted for a RWC-1 network. We have designed an MDCE router and fabricated an experimental VLSI chip. We explain the design details in this paper. The chip employs operating system support features as well as communication functions, and enables advanced resource management, A prototype chip with about 125,000 gates has been fabricated using 0.6-/spl mu/m CMOS gate array technology. Its clock runs at 50 MHz and a transmission rate of 300 M bytes per second per communication port is achieved.

job scheduling strategies for parallel processing | 1995

Time Space Sharing Scheduling and Architectural Support

Atsushi Hori; Takashi Yokota; Yutaka Ishikawa; Shuichi Sakai; Hiroki Konaka; Munenori Maeda; Takashi Tomokiyo; Jörg Nolte; Hiroshi Matsuoka; Kazuaki Okamoto; Hideo Hirono

In this paper, we describe a new job scheduling class, called “Time Space Sharing Scheduling” (TSSS) for parallel machines with variable partition. As an instance of TSSS, we explain the “Distributed Queue Tree” (DQT) that we have proposed already. We also propose some architectural support to implement TSSS on a parallel machine as adequately as TSS on sequential machines. The most important architectural support is “network preemption.” The proposed architectural support will be implemented on our RWC-1, a message-driven parallel machine, and the DQT will also be implemented in the operating system on the RWC-1, called SCore, under development in our RWC project.

parallel computing | 1995

Reduced interprocessor-communication architecture and its implementation on EM-4

Shuichi Sakai; Yuetsu Kodama; Mitsuhisa Sato; Andrew Shaw; Hiroshi Matsuoka; Hideo Hirono; Kazuaki Okamoto; Takashi Yokota

Abstract One of the most significant issues in building general purpose massively parallel computers is the integration of computation and communication in an efficient and cost-effective manner. This paper presents a way of integrating computation and communication from the viewpoint of the processor architecture. Two statements will be presented and examined: (1) computation and communication should be tightly coupled and their operation should be highly overlapped; and (2) communication structure should be efficient and simple, i.e. turnaround from data input to execution should be as short as possible using a simplified message handling mechanism. Briefly we can say ‘Fuse communication and computation, then reduce the fused structure as simple and efficient as possible’. The fused structure is called RICA, Reduced Interprocessor-Communication Architecture, in this paper. The word ‘Reduced’ here means the simplified structure of message handling, invocation of a new thread, computation and message generation. Based on the RICA design principles, the authors have developed the EM-series parallel computers, EM-4, EM-X and EM-5. This paper concentrates on the architecture and implementation of EM-4, whose first prototype has been fully operational since April 1990. The communication performance of EM-4 is comparable with its computation performance. These two primitives are efficiently and simply fused within the EM-4 architecture, i.e. a simple fused pipeline which performs message handling, instruction execution and packet output: message-handling time is two RISC clocks which is independent of executing processor instructions. This pipeline naturally includes a sequential RISC pipeline for executing local operations. Secondly this paper evaluates RICA by implementing on EM-4 several programming models generally considered ‘effective’ or ‘promising’. The multi-threaded model, message passing model and data-parallel model have been implemented and a shared-memory model is being implemented on EM-4. Performance of communication primitives for each model is measured on EM-4 prototype and reported here.

international parallel and distributed processing symposium | 1993

RICA: Reduced Interprocessor-Communication Architecture -concept and mechanisms

Shuichi Sakai; Hiroshi Matsuoka; Yuetsu Kodama; Mitsuhisa Sato; Andrew Shaw; Hideo Hirono; Kazuaki Okamoto; Takashi Yokota

One of the most significant issues in building general purpose massively parallel computers is the integration of computation and communication in an efficient and cost-effective manner. This paper presents a way of integrating computation and communication from the viewpoint of the processor architecture. It firstly states the concept of RICA, Reduced Interprocessor - Communication Architecture, which means the simplified and fused structure of communication and computation. Hardwired simple direct invocation of threads, and fusion of execution pipelines and message handling pipelines are the mechanisms for RICA.<<ETX>>

ieee international conference on high performance computing, data, and analytics | 1997

Virtual control channel and its application to the massively parallel computer RWC-1

Takashi Yokota; Hiroshi Matsuoka; Kazuaki Okamoto; Hideo Hirono; Shuichi Sakai

Global operation and system control are important issues in massively parallel systems. The paper discusses virtual control networks (VCNs), which are substitutes for the current dedicated control networks. First we introduce a new mechanism called Virtual Control Channel (VCC) used to conduct control information over data network links. The network nodes have control finite state machines (CFSMs) and a VCN is composed of CFSMs. The VCC performs the role of a connection wire between CFSMs. The mechanisms are applied to the RWC-1 machine. The simulation results reveal that reduction and broadcasting operations are efficiently executed on VCNs by exploiting the tree structure.

Archive | 2004