Yoshiharu Shigei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshiharu Shigei is active.

Explore More

Publication

Featured researches published by Yoshiharu Shigei.

The Visual Computer | 1988

Load balancing strategies for a parallel ray-tracing system based on constant subdivision

Hiroaki Kobayashi; Satoshi Nishimura; Hideyuki Kubota; Tadao Nakamura; Yoshiharu Shigei

Static and dynamic load balancing strategies for a multiprocessor system for a ray tracing algorithm based on constant subdivision are presented. An object space is divided into regular cubes (subspaces), whose boundary planes are perpendicular to the coordinate axes, and these are allocated to the processors in the system. Here, load balancing among the processors is the most important problem. Firstly, in a category of static load balancing, strategies for mapping the subspaces into the processors are evaluated by simulation. Moreover, we propose a hierarchical multiprocessor system in order to realize dynamic load balancing with the static one. Its architecture can overcome the limitation of the static load balancing in a large scale multiprocessor system.

The Visual Computer | 1987

Parallel processing of an object space for image synthesis using ray tracing

Hiroaki Kobayashi; Tadao Nakamura; Yoshiharu Shigei

This paper presents a novel parallel processing system for image synthesis using ray tracing. An object space is divided into parts (subspaces), each of which is allocated to a processor. The processor detects, simultaneously the intersections of the surfaces of each object and a fixed number of rays over the whole space, and calculates the local intensity on an object in each subspace. The global intensities of pixels on a screen are calculated by the other kind of processors simultaneously. We also present the optimal data structure, based on an adaptive division algorithm, for parallel processing of the object space.

Archive | 1988

A Strategy for Mapping Parallel Ray-Tracing into a Hypercube Multiprocessor System

Hiroaki Kobayashi; Tadao Nakamura; Yoshiharu Shigei

We present a systematic and efficient strategy for mapping an adaptively/regularly subdivided object space (a set of subspaces) into the nodes of the hypercube. The property of this mapping is that the distance between the neighbouring subspaces on the hypercube is proportional to the difference between the sizes of these subspaces. Especially, if neighbouring subspaces are of equal size, these subspaces are allocated to the neighbouring processors. As a result, we can realize a communication-effective implementation of parallel ray-tracing on the hypercube multiprocessor system. The mapping is derived from the byproduct of octree encoding of an object space.

Systems and Computers in Japan | 1988

A Clustered Multiprocessor System

Takeo Nakada; Susumu Horiguchi; Yasushi Takaki; Yoshiyuki Kawazoe; Yoshiharu Shigei

With the recent remarkable development of VLSI technology, the study and the experimental development of multiprocessors composed of a large number of processors have actively been performed. A serious problem in such systems is the contention (conflict) in communication. A clustered system is proposed as a means to ameliorate the problem, where processors are grouped into several units. This paper derives a theoretical expression for the processing efficiency of the clustered multiprocessor system. An experimental system was constructed, which is composed of 32 processors, being grouped into 4 clusters, each of which is composed of 8 processors. The design of the software and the result of implementation are described. As a practical example, the summation of a series and the solution of a system of linear equations are considered. The processing efficiencies of the clustered and nonclustered systems were measured and compared. It is shown as a result that the clustered system can improve the system efficiency considerably by adding a small amount of hardware.

Systems and Computers in Japan | 1990

Performance evaluation of the clustered multiprocessor system

Yasushi Takaki; Susumu Horiguchi; Yoshiyuki Kawazoe; Yoshiharu Shigei

A multiprocessor system with bus connections has a throughput which may be reduced significantly due to the bus conflict. One approach to ease this conflict is to group processors into a few clusters. This paper presents a clustered multiprocessor prototype called “MUGEN” and the systems software implemented. Its performance also is disusssed. As for the systems software, ext-C and para-C languages are introduced that are developed to express parallelism explicitly. Also, the par-C preprocessor (P3C) that translates a standard C program into a parallel program written in para-C is introduced. Finally, the Livermore loop results are presented to measure the performance.

computer software and applications conference | 1988

Optimal number of processors for finding the maximum value on multiprocessor systems

Susumu Horiguchi; Yoshiharu Shigei

Models for synchronized parallel computation are described, in which processors are interconnected by networks. These models are used to solve the problem of finding the maximum value, the minimum value, or both values in parallel. The proposed algorithm is based on the binary tree routing scheme. The time complexities are investigated by taking account of the communication overhead. The optimal number of processors for a fixed number of data is analytically obtained. This number depends significantly on the types of interconnection networks. Results are presented for linear, mesh, three-dimensional, and cube-connected arrays. Execution times are investigated and are measured using a cluster multiprocessor system.<<ETX>>

Systems and Computers in Japan | 1987

An adaptive routing method for computer networks by electric‐circuit modeling

Nobuyuki Oba; Tadao Nakamura; Yoshiharu Shigei

The computer network and multiprocessor system have been developed and studied. They are based on a network composed of nodes containing processors, aiming at the improvement of performance by distributed processing as well as the improvement of reliability by resource distribution. To realize high system performance, adequate routing and flow controls are required in the communication of information among nodes. This paper proposes a new routing control scheme to be used in the packet communication in the computer network or multiprocessor system. The scheme is called potential routing, which models the computer network by an electric circuit, and the packet routing from the source node to the destination node is performed to the potential difference between the adjacent nodes. The node potential is determined first by Kirchhoffs law and is modified dynamically according to the traffic situation during the routing procedure, providing an adequate criterion for the routing. The proposed scheme has a feature in that ping-pong and loop phenomena, which cause traffic congestion, are not produced in principle. It was verified by simulation that the transmission delay is reduced when the traffic is high or unbalanced.

Systems and Computers in Japan | 1987

Characteristics of a programmable logic unit

Takahiko Murayama; Hidekazu Yamada; Tadao Nakamura; Yoshiharu Shigei; Yoshio Yoshioka

This paper considers a programmable logic unit (PLU) which is a computing-in-memory device with stepwise programmable computing logics and the data transmission lines/buses, and discusses its operational characteristics. The PLU is combined with the memory device in a Neumann-type computer with a great number of computing logics. From the viewpoint of Flynns classification in the Neumann-type computer, we consider the level-1 PLU and the level-2 PLU as structures for the PLU, and discuss their configurations in detail. First, we discuss the mapping method and the number of steps for the program of the mathematical expression. It is shown that the PLU can map directly the program written in the inverse-Poland notion and that the connection lines/buses can easily be assigned. Second, we discuss the comparison between the computing time of the computer with the PLU and the processing time of the Neumann-type computer. It is shown that the computer with the PLU is several times faster than the Neumann-type computer in processing the program. On the other hand, it is seen that the level-2 PLU with the distributed buses is suited to the general-purpose computer, although the processing speed is somewhat low.

Systems and Computers in Japan | 1987

Realization of computers using programmable logic units

Hidekazu Yamada; Tadao Nakamura; Yoshiharu Shigei; Takahiko Murayama; Yoshio Yoshioka

A key to speeding up the processing is how to realize the programmable hardware using both a pipelined processing scheme and a multiprocessor processing scheme for such special hardware as the fast Fourier transform (FFT) unit and the digital filter. This paper proposes a structure for the programmable logic unit (PLU) based on such an idea, where the computing program is mapped on the hardware, and the processing is performed by write/read of the operand data. For the computer using the proposed PLU, three kinds of conceivable processing algorithms are presented. On the other hand, considering the number of pins, the regularity of the circuits, and the recent progress in three-dimensional VLSI technology, it is highly conceivable that the proposed PLU is realized by VLSI. Thus, from the viewpoint of program execution in the computer using PLU, the program is constructed of the instructions computed by the PLU and the control instructions executed by the main CPU, and the processing mechanism is described using a program example. A discussion is made on the processing time, and it is shown that the computer using the PLU can utilize the parallelism of the computation by PLU to reduce the processing time, compared with the computer not using PLU. Thus, the effectiveness of the computer using PLU is demonstrated.

international symposium on computer architecture | 1986

AT 2 = O(N log 4 N), T = O(log N) fast Fourier transform in a light connected 3-dimensional VLSI

Makoto Hasegawa; Yoshiharu Shigei

We can perform a N-point FFT with time performance T=O(log N) and area-time performance AT 2 =O(N log 4 N), by using the 3-dimensional VLSI system which is optically interconnected. This performance exceeds the theoretical lower bound of the area-time performance (N 2 log 2 N) of the conventional VLSI.

Explore More