Yoshiyuki Morie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshiyuki Morie is active.

Explore More

Publication

Featured researches published by Yoshiyuki Morie.

parallel, distributed and network-based processing | 2015

Channel Interface: A Primitive Model for Memory Efficient Communication

Takeshi Nanri; Takeshi Soga; Yuichiro Ajima; Yoshiyuki Morie; Hiroaki Honda; Taizo Kobayashi; Toshiya Takami; Shinji Sumimoto

Though the size of the system is getting larger towards exa-scale computation, the amount of available memory on computing nodes is expected to remain the same or to decrease. Therefore, memory efficiency is becoming an important issue for achieving scalability. This paper pointed out the problem of memory-inefficiency in the de-facto standard parallel programming model, Message Passing Interface (MPI). To solve this problem, the channel interface was introduced in the paper. This interface enables the programmers to appropriately allocate and de-allocate channels so that the program consumes just-enough amount of memory for communication. In addition to that, by limiting the message transfer supported by a channel as simple as possible, the memory consumption and the overhead for handling messages with this interface can be minimal. This paper showed a sample implementation of this interface. Then, the memory efficiency of the implementation is examined by the models of the memory consumption and the performance.

international conference on simulation and modeling methodologies technologies and applications | 2016

NSIM-ACE: An interconnection network simulator for evaluating remote direct memory access

Ryutaro Susukita; Yoshiyuki Morie; Takeshi Nanri; Hidetomo Shibamura

Network simulation is an important technique for designing interconnection networks and communication libraries. Also network simulations are useful for the analysis of internal communication behavior in parallel applications. This paper introduces a new interconnection network simulator NSIM-ACE. This simulator enables us to evaluate RDMA directly while existing simulators do not have such capability. NSIM-ACE also provides a similar user-interface to RDMA-based parallel programs for easy use. The experimental evaluation indicates that the simulation accuracy is sufficient to compare performance of some RDMA-based algorithms and the simulator is capable of predicting performance scalability for non-extinct networks.

international conference on distributed computing systems | 2016

Memory Efficient One-Sided Communucation Library "ACP" in Globary Memory on Raspberry Pi 2

Yoshiyuki Morie; Hiroaki Honda; Takeshi Nanri; Taizo Kobayashi; Hidetomo Shibamura; Ryutaro Susukita; Yuichiro Ajima

Previously, communications in parallel programs for High Performance Computing (HPC) and Distributed Computing (DC) are mostly written with two-sided communication interfaces that are based on a pair of operations, Send and Receive. Since such interface requires explicit synchronization between both sides of the communication, techniques for communication optimization such as overlapping are not efficiently described in many cases. On the other hand, one-sided communication interface is becoming important as a method to describe asynchronous communications to enable highly overlapped communication with computation. As one of such interface, in this demonstration, Advanced Communication Primitives (ACP) is introduced. ACP is a portable interface that supports UDP, IBverbs of InfiniBand and Tofu library of K Computer. In addition to that, it is designed to be memory efficient. For example, with 10 thousand processes, the memory consumption of ACP over UDP is estimated to be less than 1MB. Since the number of computational elements is increasing more rapidly than the amount of available memory, this memory efficiency is becoming one of the keys for parallel programs in HPC and DC. To show this characteristics, we run ACP library on Raspberry Pi 2, and examine its performance and memory consumption.

ieee international conference on high performance computing data and analytics | 2012

Task Allocation Optimization for Neighboring Communication on Fat Tree

Yoshiyuki Morie; Takeshi Nanri

This paper proposes a task allocation optimization for neighboring communications on a fat tree. The proposed method finds an appropriate task allocation that reduces contentions to achieve better communication performance. Since neighboring communications assume that the logical topology of tasks is a mesh or torus, optimization of the task allocation on a tree-based physical topology is not straightforward. This paper describes the proposed task allocation optimization method, which considers the contentions on links for each given allocation to determine the bottleneck link. Then, this method finds the allocation that achieves as wide a bandwidth as possible at the bottleneck link. For comparison, three other allocation methods, TAHB, random, and default, are also examined. The experimental results show that the method proposed by the authors can achieve the same or better performance than can the three above mentioned methods. For example, task allocation with our method was a maximum of 45% faster than that with the TAHB method. The advantage of our method over the TAHB method depends on the number of uplinks on each leaf switch. Unlike TAHB, our method can appropriately consider multiple links.

computational sciences and optimization | 2011

A Method for Predicting a Penalty of Contentions by Considering Priorities of Routing among Packets on Direct Interconnection Network

Yoshiyuki Morie; Takeshi Nanri; Ryutaro Susukita

Contentions can degrade the performance of communication significantly on network topology like 3DMesh/ Torus. On these topologies, it is important to establish a method for predicting penalty of contentions to enable precise and fast tuning on communications. For example, techniques of tuning such as choosing dynamically faster algorithm of collective communications or finding appropriate task-allocation depends on the preciseness of the performance prediction. This research proposes a method for predicting a penalty of contentions by using detailed analysis of packet priority on direct interconnection network. Usually, the penalty of contentions is predicted by counting the number of communications issued on the same link at the same time. However, the actual amount of delay caused by contentions depends on the way of arbitrating messages. The message is divided into the packets and transmitted. Then, the contentions between those packets are arbitrated by applying the priority to each packet. Therefore, when the contentions occur, the message which has a lower priority is blocked. By considering these priorities, the method proposed in this research becomes to predict the penalty of contentions more accurately and faster. In the evaluation experiment, the penalty of contentions predicted without considering packet priority is the twice or more than it predicted by simulation. On the other hand, the proposed method of predicting the penalty of contentions is predictable in the accuracy within 20% against simulation. Moreover, the prediction by proposed method is possible to execute it enough at short time.

COMPUTATION IN MODERN SCIENCE AND ENGINEERING: Proceedings of the International Conference on Computational Methods in Science and Engineering 2007 (ICCMSE 2007): VOLUME 2, PARTS A and B | 2008

SMMH—A Parallel Heuristic for Combinatorial Optimization Problems

Guilherme de Melo Baptista Domingues; Yoshiyuki Morie; Feng Long Gu; Takeshi Nanri; Kazuaki Murakami

The process of finding one or more optimal solutions for answering combinatorial optimization problems bases itself on the use of algorithms instances. Those instances usually have to explore a very large search spaces. Heuristics search focusing on the use of High‐Order Hopfield neural networks is a largely deployed technique for very large search space. It can be established a very powerful analogy towards the dynamics evolution of a physics spin‐glass system while minimizing its own energy and the energy function of the network. This paper presents a new approach for solving combinatorial optimization problems through parallel simulations, based on a High‐Order Hopfield neural network using MPI specification.

Applied Categorical Structures | 2007

Optimization of MPI rank allocation considering communication timing for reducing contention

善之森江; Yoshiyuki Morie; 直樹末安; Naoki Sueyasu; 透松本; Toru Matsumoto; 豪志南里; Takeshi Nanri; 宏明石畑; Hiroaki Ishihata; 弘士井上; Koji Inoue; 和彰村上; Kazuaki Murakami

international conference on parallel and distributed computing and networks | 2011