Kazumi Yoshinaga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazumi Yoshinaga is active.

Explore More

Publication

Featured researches published by Kazumi Yoshinaga.

parallel, distributed and network-based processing | 2014

Multithreaded Two-Phase I/O: Improving Collective MPI-IO Performance on a Lustre File System

Yuichi Tsujita; Kazumi Yoshinaga; Atsushi Hori; Mikiko Sato; Mitaro Namiki; Yutaka Ishikawa

ROMIO, a representative MPI-IO implementation, has been widely used in recent large-scale parallel computations. The two-phase I/O optimization scheme of ROMIO improves I/O performance for non-contiguous access patterns, however, this scheme still has room to improve performance to make it suitable for recent data-intensive computing. We propose overlapping data exchange operations with file I/O operations by using a multithreaded scheme to achieve further I/O throughput improvement. We show up to 60% improvement by the multithreaded two-phase I/O relative to the original two-phase I/O in performance evaluation of collective write operations on a Lustre file system of a Linux PC cluster.

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface | 2012

Delegation-Based MPI communications for a hybrid parallel computer with many-core architecture

Kazumi Yoshinaga; Yuichi Tsujita; Atsushi Hori; Mikiko Sato; Mitaro Namiki; Yutaka Ishikawa

Many-core architecture draws much attention in HPC community towards the Exascale era. Many ongoing research activities using GPU or the Many Integrated Core (MIC) architecture from Intel exist worldwide. Many-core CPUs have a great deal of impact to improve computing performance, however, they are not favorable for heavy communications and I/Os which are essential for MPI operations in general. We have been focusing on the MIC architecture as many-core CPUs to realize a hybrid parallel computer in conjunction with multi-core CPUs. We propose a delegation mechanism for scalable MPI communications issued on many-core CPUs so as to play delegated operations on multi-core ones. This architecture also minimizes memory utilization of not only many-core CPUs but also multi-core ones by deploying multi-layered MPI communicator information. Here we evaluated the delegation mechanism on an emulated hybrid computing environment. We show our innovative design and its performance evaluation on the emulated environment in this paper.

international conference on parallel processing | 2008

Utilizing Multi-Networks Task Scheduler for Streaming Applications

Kazumi Yoshinaga; Yoshiyuki Urantani; Hiroshi Koide

This paper proposes and evaluates a new task scheduling method for parallel and distributed applications in an environment consisting of multiple networks having different characteristics. The proposed method can schedule both streaming applications and non-streaming applications effectively at the same time, since it selects the most suitable networks for the communications of tasks and considers the changing loads of the networks. The experimental results show the proposed method reduced the total execution time of a practical streaming application. The dispersion of the execution time was also suppressed even if the network bandwidth was dynamically changing. This characteristic is very important when this method is applied to more complicated task scheduling methods.

international workshop on runtime and operating systems for supercomputers | 2012

A design of hybrid operating system for a parallel computer with multi-core and many-core processors

Mikiko Sato; Go Fukazawa; Kiyohiko Nagamine; Ryuichi Sakamoto; Mitaro Namiki; Kazumi Yoshinaga; Yuichi Tsujita; Atsushi Hori; Yutaka Ishikawa

This paper describes the design of an operating system to manage the hybrid computer system architecture with multi-core and many-core processors for Exa-scale computing. In this study, a host operating system (Host OS) on a multi-core processor performs some functions of a lightweight operating system (LWOS) on a many-core processor, in order to dedicate to executing the application program on a many-core processor. In particular, to ensure that LWOS execution does not disturb the application program executed on the many-core processor, the functions such as process management, memory management, and I/O management are delegated to the Host OS. To demonstrate this design, we made an prototype system of a computer equipped with a multi-core processor and a many-core processor using an Intel Xeon dual-core processor system. The Linux and original LWOS were loaded on to each processor and the overhead for executing the program for LWOS from Linux was evaluated. Using this prototype system, the LWOS process can be started with at least 110 μsec overhead for the many-core program.

intelligent networking and collaborative systems | 2011

MVA Modeling of Multi-core Server Distributed Systems

Yuki Nakamizo; Hiroshi Koide; Kazumi Yoshinaga; Dirceu Cavendish; Yuji Oie

In this paper, we propose an extension to our previous MVA based methodology for estimating performance of transactions executed on multi-server systems for multicoreservers. The extension is based on the characterization of message processing service times for each server core under zeroloadconditions, and building an MVA model that accounts for each available core. Core utilization is characterized, as well as message routing probabilities within the multi-core machine. We illustrate the extended methodology on a prototype multi-server system.

intelligent networking and collaborative systems | 2010

Characterizing Transactions with Data Transfer on Multi-server Systems

Kazumi Yoshinaga; Washizu Shohei; Yoshiyuki Uratani; Hiroshi Koide; Dirceu Cavendish; Yuji Oie

In this paper, we propose an extension to our previous MVA based methodology for estimating performance of transactions executed on multi-server systems to transactions involving variable data transfers. The extension is based on the characterization of data transfers between servers under zero-load conditions, and a curve fitting step to capture server message processing time dependency with the size of the data transferred. We illustrate the extended methodology on two prototype multi-server systems.

Proceedings of the 22nd European MPI Users' Group Meeting on | 2015

Sliding Substitution of Failed Nodes

Atsushi Hori; Kazumi Yoshinaga; Thomas Herault; Aurelien Bouteiller; George Bosilca; Yutaka Ishikawa

This paper considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message collisions. In a stencil computation, rank mapping is done in a straightforward way on a Cartesian network without incurring any message collisions. However, once a substitution has occurred, the node- rank mapping may be destroyed. Therefore, these questions must be answered in a way that minimizes the degradation of communication performance. In this paper, several spare-node allocation and nodesubstitution methods will be proposed, analyzed, and compared in terms of communication performance following the substitution. It will be shown that when a failure occurs, the peer-to-peer (P2P) communication performance on the K computer can be slowed by a factor of three and collective performance can be cut in half. On BG/Q, P2P performance can be slowed by a factor of five and collective performance can be slowed by a factor of ten. However, those numbers can be reduced by using an appropriate substitution method.

parallel, distributed and network-based processing | 2013

A Delegation Mechanism on Many-Core Oriented Hybrid Parallel Computers for Scalability of Communicators and Communications in MPI

Kazumi Yoshinaga; Yuichi Tsujita; Atsushi Hori; Mikiko Sato; Mitaro Namiki; Yutaka Ishikawa

This paper describes a delegation based high throughput MPIcommunication mechanism under tough memory utilization constrains on a many-core oriented hybrid parallel computer. Towards the Exascale era, hybrid parallel computers consisting of many-core and multi-core architectures both on the same node are focused. Although many-core architectures such as GPU or Intel MIC has high potential in computing power by the large number of computing cores, per-core computing power is lower than that of multi-core CPUs. Furthermore, available memory resources for the many-core CPUs are quite smaller than those for multi-core CPUs. Thus we may have a sort of penalty in memory utilization in MPI communications when we utilize a normal MPI library. Here we deploy a delegatee process on each node to merge MPI communications and minimize memory utilization for an MPI communicator. Another advantage of the delegatee process scheme is minimization of memory utilization on many-core CPUs by delegating MPI requests to associated delegatee process on multi-core CPUs. In this paper, we show performance advantages and effective resource utilization by our proposed scheme compared with the original MPI implementation.

ieee international conference on high performance computing data and analytics | 2012