Takeo Nakada | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takeo Nakada is active.

Explore More

Publication

Featured researches published by Takeo Nakada.

The Visual Computer | 1993

Parallel processing of incremental ray tracing on a shared-memory multiprocessor

Susumu Horiguchi; Masayuki Katahira; Takeo Nakada

This paper presents a novel parallel-processing method for image synthesis using incremental ray tracing on a shared-memory multiprocessor workstation. The most efficient technique for image synthesis is ray tracing, proposed by Whitted in 1980. Ray-tracing algorithms are simple and can generate realistic images. However, they are time-consuming, since calculations of the intersections between objects and ray increase exponentially as the complexity of scenes increases. Fast image synthesis for animation is one of the most important topics in computer graphics. As the area of computer applications has broadened, the complexity of images to be synthesized has increased. Parallel processing of computer graphics is one way of achieving fast image synthesis. This paper describes a parallel processing technique for incremental ray tracing, which recalculates only the rays changed by moving objects in successive scenes of continuous image synthesis. The performance of parallel ray tracing was evaluated on the multiprocessor workstation TOP-1. Strategies for allocating pixels to processes under a multiprocess operating system on this workstation are discussed.

Journal of Parallel and Distributed Computing | 1991

Performance evaluation of parallel fast fourier transform on a multiprocessor workstation

Susumu Horiguchi; Takeo Nakada

Abstract The fast Fourier transform (FFT) is very frequently used in various fields such as computer tomography, speech recognition, and image processing. As the area of computer applications has broadened, the quantity of data to be transformed has greatly increased. A parallel FFT is one way of achieving a fast transformation. Up until now, the experimental performance of parallel FFTs has not been sufficiently investigated on real multiprocessor systems. This paper describes an implementation of a parallel FFT on a multiprocessor workstation to investigate its real performance. The multiprocessor workstation provides parallel environments for both a multithread operating system and a multiprocess operating system. The performance of the parallel FFT is discussed with respect to cache protocols, floating-point coprocessors, and operating systems.

Systems and Computers in Japan | 1988

A Clustered Multiprocessor System

Takeo Nakada; Susumu Horiguchi; Yasushi Takaki; Yoshiyuki Kawazoe; Yoshiharu Shigei

With the recent remarkable development of VLSI technology, the study and the experimental development of multiprocessors composed of a large number of processors have actively been performed. A serious problem in such systems is the contention (conflict) in communication. A clustered system is proposed as a means to ameliorate the problem, where processors are grouped into several units. This paper derives a theoretical expression for the processing efficiency of the clustered multiprocessor system. An experimental system was constructed, which is composed of 32 processors, being grouped into 4 clusters, each of which is composed of 8 processors. The design of the software and the result of implementation are described. As a practical example, the summation of a series and the solution of a system of linear equations are considered. The processing efficiencies of the clustered and nonclustered systems were measured and compared. It is shown as a result that the clustered system can improve the system efficiency considerably by adding a small amount of hardware.

asia and south pacific design automation conference | 2011

Coarse-grained simulation method for performance evaluation a of shared memory system

Ryo Kawahara; Kenta Nakamura; Kouichi Ono; Takeo Nakada; Yoshifumi Sakamoto

We propose a coarse-grained simulation method which takes the effect of memory access contention into account. The method can be used for the evaluation of the execution time of an application program during the system architecture design in an early phase of development. In this phase, information about memory access timings is usually not available. Our method uses a statistical approximation of the memory access timings to estimate their influences on the execution time. We report a preliminary verification of our simulation method by comparing it with an experimental result from an image processing application on a dual-core PC. We find an error of the order of 3 percents on the execution time.

design, automation, and test in europe | 2010

A modeling method by eliminating execution traces for performance evaluation

Kouichi Ono; Manabu Toyota; Ryo Kawahara; Yoshifumi Sakamoto; Takeo Nakada; Naoaki Fukuoka

This paper describes a system-level modeling method in UML for performance evaluation of embedded systems. The core technology of this modeling method is reverse modeling based on dynamic analysis. A case study of real MFPs (multifunction peripherals/printers) is presented in this paper to evaluate the modeling method.

european conference on modelling foundations and applications | 2010

A model-based method for evaluating embedded system performance by abstraction of execution traces

Kouichi Ono; Manabu Toyota; Ryo Kawahara; Yoshifumi Sakamoto; Takeo Nakada; Naoaki Fukuoka

This paper describes a model-based method to evaluate performance of embedded systems. The core technology of this modeling method is reverse modeling based on dynamic analysis of the existing systems. A case study of real MFPs (multifunction peripherals/printers) is presented in this paper to evaluate the modeling method.

Ibm Journal of Research and Development | 1991

Design choices for the TOP-1 multiprocessor workstation

Shigenori Shimizu; Nobuyuki Oba; Takeo Nakada; Moriyoshi Ohara; Atsushi Moriwaki

A snoopy-cache-based multiprocessor workstation called TOP-1 (TOkyo research Parallel processor-1) was developed to evaluate multiprocessor architecture design choices as well as to conduct research on operating systems, compilers, and applications for multiprocessor workstations. TOP-1 is a ten-way multiprocessor using the Intel 80386TM microprocessor chip and the Weitek WTL 1167TM floating-point coprocessor chip. It is currently running under a multiprocessor version of AIX®, which was also developed at the IBiy/l Tokyo Research Laboratory. Our research interest was focused on the design of an effective snoopy cache (all caches monitor all memory-cache traffic) system and the quantitative evaluation of its performance. One of the unique aspects of the TOP-1 design is that the cache supports four different, original snoopy protocols, which may coexist in the system. To evaluate the performance, we implemented a hardware statistics monitor that gathers statistical data. This paper focuses mainly on the TOP-1 cache design—its protocol, and its evaluation by means of the statistics monitor. Besides its cache design, TOP-1 has three other unique architectural features: two independently arbitrated 64-bit buses supported by two snoopy-cache controllers per processor, a communication and interruption mechanism for notifying other processors of asynchronous events, and an efficient arbitration mechanism to allow prioritized quasi-round-robin service with distributed control. These features are also described in detail.

multicore software engineering performance and tools | 2013

MVA-Based Probabilistic Model of Shared Memory with a Round Robin Arbiter for Predicting Performance with Heterogeneous Workload

Ryo Kawahara; Kouichi Ono; Takeo Nakada

Memory access contention can be a cause of performance problems and should be assessed at early stages of development. We devised a probabilistic model of shared memory for performance estimation. The calculation time is polynomial in the number of processors. The model is applicable for the region of high and heterogeneous bandwidth utilization. A round-robin arbiter is modeled using Mean Value Analysis MVA based approximations and incorporating non-linear dependence to the bandwidth utilization. To evaluate our model, estimated execution time is compared with the measured execution time of benchmark programs with memory access contention. We find a maximum error of 4.2% for the round-robin arbitration when we compensate for the burstiness of accesses.

Archive | 1990