Akira Naruse | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Akira Naruse is active.

Explore More

Publication

Featured researches published by Akira Naruse.

ieee international conference on high performance computing data and analytics | 2004

PM/InfiniBand-FJ: a high performance communication facility using InfiniBand for large scale PC clusters

Shinji Sumimoto; Akira Naruse; Kouichi Kumon; Kouji Hosoe; Toshiyuki Shimizu

This work describes a design of high performance communication facility called the PM/InfiniBand-FJ using InfiniBand interconnect for large scale PC clusters. The PM/InfiniBand-FJ has developed to realize higher application performance than commercial supercomputers and comparable availability to them. Since the specification of InfiniBand interconnect is designed for communication among servers and I/Os, there are some issues to use InfiniBand for high performance computation on over 1000 node PC clusters. Therefore, the PM/InfiniBand-FJ solves the issues by expanding the original specification of InfiniBand. We have implemented the PM/InfiniBand-FJ on SCore cluster system software, and evaluated the communication and application performance. The performance results show that a 913.2 MB/s of bandwidth and 15.6 /spl mu/s round trip time have been achieved on Xeon 2.8GHz PC with ServerWorks GC LE chipset. The result of NAS parallel benchmark shows that the 128 node result of IS Class B on PM/InfiniBand-FJ is 1.52 times faster than that of PM/MyrinetXP using Fujitsu PR1MERGY RX200 PC cluster (Xeon 3.06GHz).

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2009

The Design of Seamless MPI Computing Environment for Commodity-Based Clusters

Shinji Sumimoto; Kohta Nakashima; Akira Naruse; Kouichi Kumon; Takashi Yasui; Yoshikazu Kamoshida; Hiroya Matsuba; Atsushi Hori; Yutaka Ishikawa

This paper describes the design and implementation of a seamless MPI runtime environment, called MPI-Adapter, that realizes MPI program binary portability in different MPI runtime environments. MPI-Adapter enables an MPI binary program to run on different MPI implementations. It is implemented as a dynamic loadable module so that the module dynamically captures all MPI function calls and invokes functions defined in a different MPI implementation using the data type translation techniques. A prototype system was implemented for Linux PC clusters to evaluate the effectiveness of MPI-Adapter. The results of an evaluation on a Xeon Processor (3.8GHz) based cluster show that the MPI translation overhead of MPI sending (receiving) is around 0.028μs , and the performance degradation of MPI-Adapter is negligibly small on the NAS parallel benchmark IS.

ieee/acm international symposium cluster, cloud and grid computing | 2013

Interference-aware Incoming Message Detection for MPI Threaded Progression

Masahiro Miwa; Kohta Nakashima; Akira Naruse

To enable overlap of computation and communication with non-blocking collective communication, it is required to progress asynchronously a sequence of communications. One of the naive implementation is to use a separate thread for communitation and run it in back of computation thread. However if the total number of threads is greater than the number of physical cores, context switches cause performance degradation of the computation thread. Simultaneous MultiThread (SMT) can be used to avoid this problem. However, commonly-used busy polling for incoming message detection also causes performance degradation of the computation thread. In this paper, we propose incoming message detection method using MONITOR/MWAIT instructions to reduce the performance degradation. Experiment results show that the performance of computation thread is improved largely compared to the busy polling method while latency is suppressed by a small increase.

Archive | 2004