Akira Naruse
Fujitsu
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Akira Naruse.
ieee international conference on high performance computing data and analytics | 2004
Shinji Sumimoto; Akira Naruse; Kouichi Kumon; Kouji Hosoe; Toshiyuki Shimizu
This work describes a design of high performance communication facility called the PM/InfiniBand-FJ using InfiniBand interconnect for large scale PC clusters. The PM/InfiniBand-FJ has developed to realize higher application performance than commercial supercomputers and comparable availability to them. Since the specification of InfiniBand interconnect is designed for communication among servers and I/Os, there are some issues to use InfiniBand for high performance computation on over 1000 node PC clusters. Therefore, the PM/InfiniBand-FJ solves the issues by expanding the original specification of InfiniBand. We have implemented the PM/InfiniBand-FJ on SCore cluster system software, and evaluated the communication and application performance. The performance results show that a 913.2 MB/s of bandwidth and 15.6 /spl mu/s round trip time have been achieved on Xeon 2.8GHz PC with ServerWorks GC LE chipset. The result of NAS parallel benchmark shows that the 128 node result of IS Class B on PM/InfiniBand-FJ is 1.52 times faster than that of PM/MyrinetXP using Fujitsu PR1MERGY RX200 PC cluster (Xeon 3.06GHz).
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2009
Shinji Sumimoto; Kohta Nakashima; Akira Naruse; Kouichi Kumon; Takashi Yasui; Yoshikazu Kamoshida; Hiroya Matsuba; Atsushi Hori; Yutaka Ishikawa
This paper describes the design and implementation of a seamless MPI runtime environment, called MPI-Adapter, that realizes MPI program binary portability in different MPI runtime environments. MPI-Adapter enables an MPI binary program to run on different MPI implementations. It is implemented as a dynamic loadable module so that the module dynamically captures all MPI function calls and invokes functions defined in a different MPI implementation using the data type translation techniques. A prototype system was implemented for Linux PC clusters to evaluate the effectiveness of MPI-Adapter. The results of an evaluation on a Xeon Processor (3.8GHz) based cluster show that the MPI translation overhead of MPI sending (receiving) is around 0.028μs , and the performance degradation of MPI-Adapter is negligibly small on the NAS parallel benchmark IS.
ieee/acm international symposium cluster, cloud and grid computing | 2013
Masahiro Miwa; Kohta Nakashima; Akira Naruse
To enable overlap of computation and communication with non-blocking collective communication, it is required to progress asynchronously a sequence of communications. One of the naive implementation is to use a separate thread for communitation and run it in back of computation thread. However if the total number of threads is greater than the number of physical cores, context switches cause performance degradation of the computation thread. Simultaneous MultiThread (SMT) can be used to avoid this problem. However, commonly-used busy polling for incoming message detection also causes performance degradation of the computation thread. In this paper, we propose incoming message detection method using MONITOR/MWAIT instructions to reduce the performance degradation. Experiment results show that the performance of computation thread is improved largely compared to the busy polling method while latency is suppressed by a small increase.
Archive | 2004
Akira Naruse; Kouichi Kumon; Mitsuru Sato
Archive | 2009
Akira Naruse
usenix annual technical conference | 2002
Shuji Yamamura; Akira Hirai; Mitsuru Sato; Masao Yamamoto; Akira Naruse; Kouichi Kumon
Archive | 1998
Akira Naruse; Kouichi Kumon; Mitsuru Sato
Archive | 2012
Kohta Nakashima; Akira Naruse
Archive | 2011
Kohta Nakashima; Akira Naruse
Archive | 2007
Akira Naruse