Akiyoshi Wakatani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Akiyoshi Wakatani is active.

Explore More

Publication

Featured researches published by Akiyoshi Wakatani.

international conference on parallel architectures and languages europe | 1994

A New Approach to Array Redistribution: Strip Mining Redistribution

Akiyoshi Wakatani; Michael Wolfe

Languages such as High Performance Fortran are used to implement parallel algorithms by distributing large data structures across a multicomputer system. To reduce the communication time for the redistribution of arrays, we proposes a new scheme, strip mining redistribution.

parallel computing | 1995

Optimization of array redistribution for distributed memory multicomputers

Akiyoshi Wakatani; Michael Wolfe

Abstract Languages such as High Performance Fortran implement parallel algorithms by distributing large data structures across a multicomputer system. To enhance parallelism and reduce communication, it is sometimes beneficial for a programmer to change the distribution between phases of the algorithm. We introduce a new mapping strategy, called the spiral mapping, that reduces the communication overhead of array redistribution. Redistribution using the spiral mapping exploits communication locality and reduces global communication conflicts. We implemented redistribution using the standard linear mapping and the spiral mapping for two dimensional arrays; for 1024 × 1024 arrays, redistribution using the spiral mapping is 36% faster than using the linear mapping on a 16 node Intel iPSC/860.

international conference on supercomputing | 1991

Parallel computer ADENART—its architecture and application

Hiroshi Kadota; Katsuyuki Kaneko; Ichiro Okabayashi; Tadashi Okamoto; T. Mimura; Yasuhiro Nakakura; Akiyoshi Wakatani; Masaitsu Nakajima; Junji Nishikawa; Koji Zaiki; Tatsuo Nogi

A new parallel computer, ADENART (previously it was called ADENA,) for numerical applications has been developed. It is composed of 256 processing elements (:PEs) and interconnection networtcHXnet.) Each PE consists of a dedicated floating-point processor VLSI whose sustained performance is 10 MFLOPS, a communication controller VLSI and locally-distributed memories. The peak performance of the system is, therefore, 2.56GFLOPS. HXnet supports two types of efficient data-transfer modes; FAST mode and SLOW mode. Both of them are useful for various applications. The practical performance of ADENART system has been evaluated by several application programs. In partial differential equation solver, the system performance was measured as 475 MFLOPS.

field-programmable technology | 2004

A new reconfigurable architecture with smart data-transfer subsystems for the intelligent image processing

Hiroshi Kadota; Yoshiaki Hori; Akiyoshi Wakatani

New reconfigurable accelerator architecture suitable for the intelligent image processing is proposed. Not only reconfigurable processing-unit blocks, but also smart data-transfer subsystems which consist of multistage interconnection networks and special buffers are implemented. The subsystem can supply any combinations of 8 /spl times/ 8 local image data simultaneously to the arbitrary processing units. The processing-unit block consists of arrays of arithmetic units which can be reconfigured as parallel adders/subtracters or multipliers with various precision. The peak performance of this accelerator is 204BOPS which is sufficient for the wavelet transforms in the real-time intelligent image-processing applications.

Systems and Computers in Japan | 1996

An evaluation of the subframe parallel approach for image clustering

Akiyoshi Wakatani; Yoshiteru Mino; Hiroshi Kadota

This paper describes a high-speed parallel clustering method that is suitable for use as a preprocessor for motion picture compression and coding. In the parallel clustering process, plural pixels are sampled in one frame of an image. Clusters are gathered according to their characteristics, among them, colors, motion vectors, etc., as a way to organize clusters or modify cluster-parameters in parallel. This operation is repeated until the frame is segmented optimally by a set of clusters. We propose the subframe parallel approach in order to enhance the parallelism of the clustering method and avoid load-imbalance in parallel tasks. The effectiveness of this approach is also presented and it is shown that a real-time clustering of motion picture images is available in most cases.

Memoirs of the Faculty of Engineering, Kyoto University | 1995

Parallel Programming Language Adetran

Koji Zaiki; Akiyoshi Wakatani; Tadashi Okamoto; Katsuyuki Kaneko; Tatsuo Nogi

The concept for the ADENA (Alternating Direction Edition Nexus Array) computer was created in the late 1970s by Nogi [1, 2, 3, 4, 5]. In those days, the demands of supercomputing began to grow in science and technology, and only the vector processor was widely accepted as a supercomputer, while the ILLIAC-IV, an early parallel computer, was already destined to retire. Nogi had a perspective that the vector processor might be replaced or strengthened by parallel processing facilities in the next generation, and thought that it would be very important to get a new, more sophisticated parallel machine based upon some new ideas specifically dedicated to number-crunching scientific computation. Seeing the decline of the ILLIAC-IV, he considered that a parallel computer should not be based only upon the computational style of any explicit schemes for PDEs(Partial Differential Equations), as firstly imagined by L. F. Richardson, a famous meteorologist, and later by adherents to his idea, including the developers of the ILLIAC-IV. Nogi considered that it might be better to avoid any grid architecture of processors. We already had some excellent implicit schemes that were clearly more efficient than any explicit ones on conventional machines and were hence becoming accepted for many complex application problems, even though these implicit schemes were considered, at first glance, inefficient for parallel processing.

Archive | 1991