Shinji Tomita | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shinji Tomita is active.

Explore More

Publication

Featured researches published by Shinji Tomita.

ACM Sigarch Computer Architecture News | 1991

DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

Morihiro Kuga; Kazuaki Murakami; Shinji Tomita

A new superscalar processor architecture, called DSNS (Dynamically-hazard-resolved, Statically-code-scheduled, Nonuniform Superscalar), is proposed and discussed. DSNS has the following major architectural features.1. Dynamically-hazard-resolved superscalar: DSNS is object-code compatible with respect to the degree of superscalar. Pipeline interlock hardware should be provided for detecting and resolving hazards at run time.2. Statically-cade-scheduled superscalar: The performance of DSNS could not be scalable with respect to the degree of superscalar. Compilers must be responsible for scheduling instructions to reduce the pipeline stalls for a particular degree of superscalar.3. Nonuniform superscalar: Although nonuniform superscalar potentially suffers instruction-class conflicts, it can be more cost-effective than uniform superscalar. Again compilers must take care that the class conflicts do not increase structural hazards.4. Static memory disambiguation: The DSNS architecture provides three types of LOAD/STORE instructions; strongly ordered, weakly ordered, and unordered. Memory disambiguation at compile time is responsible for marking each LOAD/STORE instruction. At run time, processors need not detect nor resolve data hazards for every type; they just perform memory accesses inorder for strongly or weakly ordered instructions, and arbitrarily for unordered.5. Static branch prediction with branch-target buffer: Branch instructions predicted as taken by compilers are stored in the branch target buffer. Hardware never guesses the outcomes of branch instructions.6. Early branch resolution with advanced conditioning: Advanced conditioning allows branch decisions to precede further the corresponding branches. It reduces the branch delay and results in resolving branches early in the pipeline.7. Conditional mode execution with dual register files: Dual register file facilitates maintaining the precise machine state that otherwise might be violated by speculative execution such as conditional mode.8. Weakly precise interrupts: The DSNS architecture defines interrupts as being somewhat imprecise but restartable with the help of interrupt handlers. The definition alleviates hardware constraints for ensuring precise interrupts strongly.This paper also presents an implementation of the DSNS architecture. The DSNS processor prototype under development is a four-stage pipelined processor of superscalar-degree four. The instruction pipelines, especially the branch pipeline, are discussed in detail.

international conference on supercomputing | 1989

The Kyushu University reconfigurable parallel processor: design of memory and intercommunicaiton architectures

Kazuaki Murakami; Shin-ichiro Mori; Akira Fukuda; Toshinori Sueyoshi; Shinji Tomita

The reconfigurable parallel processor system under development at Kyushu University is an MIMD-type multiprocessor which consists of N processing-elements (currently N is 128) fully connected by S N × N crossbar networks (currently S is 1). Each PE (Processing Element) employs a Fujitsu SPARC MB86900/10 chip-set, a Weitek WTL1164/65 chip-set, an MMU (Memory Management Unit) with 64K bytes of cache, 4M bytes of memory, and an MCU (Message Communication Unit). The modular 128 × 128 crossbar network is implemented by arranging 256 identical 8 × 8 crossbar LSI-modules in a 16 × 16 matrix form. The full 128-PE configuration achieves supercomputer levels of performance by providing 1.28 GIPS and 205 MFLOPS of computing power, 512M bytes of memory, and 2.56G bytes/s of inter-PE communication bandwidth. At the same time, it exploits unique reconfigurability in the memory and intercommunication architectures. By utilizing these two types of reconfigurability, we believe that the system can be effectively tailored to a wide spectrum of applications such as numerical computation, image processing, computer graphics, artificial intelligence, neurocomputing, and so on.

ACM Sigarch Computer Architecture News | 1988

An overview of the Kyushu University reconfigurable parallel processor

Kazuaki Murakami; Akira Fukuda; Toshinori Sueyoshi; Shinji Tomita

As a testbed to investigate systemwide aspects of highly parallel processing, a reconfigurable parallel processor system is currently developed at the Kyushu University in Japan. The system is a MIMD- type multiprocessor which consists of N processing-elements (currently N is 128) fully connected by S N×N crossbar networks (currently S is 1). Each processing-element (PE) employs a Fujitsu SPARC MB86900/10 chip-set, a Weitek WTL1164/65 chip-set and 4 Mbytes of memory. Although memory are organized to be distributed among all PEs, the system can be reconfigured as either a memory-shared tightly coupled multiprocessor or a message-passing loosely coupled multiprocessor at run time; also as a hybrid of the two. The crossbar network allows users to take arbitrary topologies for inter-PE (i.e. processor-memory and/or processor-processor) paths under software control of an operating system. The parallel/distributed operating system is also under development to exploit parallelism by making the best of reconfigurability. The full 128-PE configuration will provide up to 1.28 GIPS, 205 MFLOPS (single precision LINPACK), 141 MFLOPS (double precision LINPACK), 512 Mbytes of memory and 1.28 Gbytes/s inter-PE communication. This paper outlines the reconfigurable network and memory architectures among several unique architectural features of the reconfigurable parallel processor.

international symposium on microarchitecture | 1991

Toward advanced parallel processing: exploiting parallelism at task and instruction levels

Akira Fukuda; Kazuaki Murakami; Shinji Tomita

The status of two projects that entail the development of a reconfigurable parallel processor system with 128 Sparc microprocessors and a superscalar processor with four operations proceeding in parallel is discussed. The design principles, system configuration, processing element, network architecture, and memory architecture of the reconfigurable processors (called KRPP) are described. The operating system for KRPP is discussed. The architecture for the superscalar (called a dynamically hazard-resolved, statically code-scheduled, nonuniform superscalar) is presented.<<ETX>>

Archive | 1991

Computer Architecture in the 1990s

Shinji Tomita

In this panel discussion, I would like to discuss what types of computer architecture are expected to achieve more than one TFLOPS performance in the near future and clarify some key hardware technologies by which high-performance TFLOPS computers can be successfully built.

Systems and Computers in Japan | 1989

The structure of experts: A parallel processor system for 3-dimensional graphics

Haruo Niimi; Hiroshi Hagiwara; Shinji Tomita

A parallel processor system, called EXPERTS, is developed to generate realistic images of three-dimensional (3-D) scenes at a high speed. EXPERTS inputs a list of 3-D objects defined by a polyhedron model, executes hidden surface elimination by utilizing a scan-line algorithm, and calculates a frame of pixel values. To perform these processes efficiently, EXPERTS is constructed as a two-level hierarchical bus-connected multiprocessor system. This system architecture is derived from the processing scheme employed in which the scan-line algorithm is divided into two succeeding stages and parallelism is introduced to each stage, respectively. Two types of special-purpose processor elements were designed to speed up these two processes: that for the former stage is called Scan-Line Processor (SLP), and that for the latter stage is called PiXel processor (PXP). This paper describes the parallel processing method of scan-line algorithm, the interconnection scheme of the processor elements, and the details of their hardware architecture. Performance estimation of the system is also presented.

Archive | 1989