Is this you? Create Your Porfile

Ryousei Takano

National Institute of Advanced Industrial Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ryousei Takano is active.

Explore More

Publication

Featured researches published by Ryousei Takano.

international conference on cluster computing | 2006

Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks

Motohiko Matsuda; Tomohiro Kudoh; Yuetsu Kodama; Ryousei Takano; Yutaka Ishikawa

Several MPI systems for grid environment, in which clusters are connected by wide-area networks, have been proposed. However, the algorithms of collective communication in such MPI systems assume relatively low bandwidth wide-area networks, and they are not designed for the fast wide-area networks that are becoming available. On the other hand, for cluster MPI systems, a beast algorithm by van de Geijn et al. and an allreduce algorithm by Rabenseifner have been proposed, which are efficient in a high bisection bandwidth environment. We modify those algorithms so as to effectively utilize fast wide-area inter-cluster networks and to control the number of nodes which can transfer data simultaneously through wide-area networks to avoid congestion. We confirmed the effectiveness of the modified algorithms by experiments using a 10 Gbps emulated WAN environment. The environment consists of two clusters, where each cluster consists of nodes with 1 Gbps Ethernet links and a switch with a 10 Gbps upper link. The two clusters are connected through a 10 Gbps WAN emulator which can insert latency. In a 10 millisecond latency environment, when the message size is 32 MB, the proposed beast and allreduce are 1.6 and 3.2 times faster, respectively, than the algorithms used in existing MPI systems for grid environment

international conference on cluster computing | 2004

GNET-1: gigabit Ethernet network testbed

Yuetsu Kodama; Tomohiro Kudoh; Ryousei Takano; H. Sato; Osamu Tatebe; Satoshi Sekiguchi

GNET-1 is a fully programmable network testbed. It provides functions such as wide area network emulation, network instrumentation, traffic shaping, and traffic generation at gigabit Ethernet wire speeds by programming the core FPGA. GNET-1 is a powerful tool for developing network-aware grid software. It is also a network monitoring and traffic-shaping tool that provides high-performance communication over wide area networks. This work describes several sample uses of GNET-1 and presents its architecture.

international conference on cloud computing | 2012

MiyakoDori: A Memory Reusing Mechanism for Dynamic VM Consolidation

Soramichi Akiyama; Takahiro Hirofuchi; Ryousei Takano; Shinichi Honiden

In Infrastructure-as-a-Service datacenters, the placement of Virtual Machines (VMs) on physical hosts are dynamically optimized in response to resource utilization of the hosts. However, existing live migration techniques, used to move VMs between hosts, need to involve large data transfer and prevents dynamic consolidation systems from optimizing VM placements efficiently. In this paper, we propose a technique called “memory reusing” that reduces the amount of transferred memory of live migration. When a VM migrates to another host, the memory image of the VM is kept in the source host. When the VM migrates back to the original host later, the kept memory image will be “reused”, i.e. memory pages which are identical to the kept pages will not be transferred. We implemented a system named MiyakoDori that uses memory reusing in live migrations. Evaluations show that MiyakoDori significantly reduced the amount of transferred memory of live migrations and reduced 87% of unnecessary energy consumption when integrated with our dynamic VM consolidation system.

international conference on cluster computing | 2005

TCP Adaptation for MPI on Long-and-Fat Networks

Motohiko Matsuda; Tomohiro Kudoh; Yuetsu Kodama; Ryousei Takano; Yutaka Ishikawa

Typical MPI applications work in phases of computation and communication, and messages are exchanged in relatively small chunks. This behavior is not optimal for TCP because TCP is designed only to handle a contiguous flow of messages efficiently. This behavior anomaly is well-known, but fixes are not integrated into todays TCP implementations, even though performance is seriously degraded, especially for MPI applications. This paper proposes three improvements in the Linux TCP stack: i.e., pacing at start-up, reducing Retransmit-Timeout time, and TCP parameter switching at the transition of computation phases in an MPI application. Evaluation of these improvements using the NAS parallel benchmarks shows that the BT, CG, IS, and SP benchmarks achieved 10 to 30 percent improvements. On the other hand, the FT and MG benchmarks showed no improvement because they have the steady communication that TCP assumes, and the LU benchmark became slightly worse because it has very little communication

ieee international conference on cloud computing technology and science | 2011

GridARS: A Grid Advanced Resource Management System Framework for Intercloud

Atsuko Takefusa; Hidemoto Nakada; Ryousei Takano; Tomohiro Kudoh; Yoshio Tanaka

Intercloud is a promising technology for data intensive applications. However, an important issue for Intercloud applications is orchestration of various virtualized and performance-assured resources, not only computers, but also network and storage, provided from multiple domains. We have been developing an advance reservation-based resource management framework, called Grid ARS, which can integrate heterogeneous resources and construct a performance-assured virtual infrastructure over Intercloud environment. Grid ARS provides four services that address resource management, resource allocation planning, provisioning and monitoring of the constructed virtual infrastructure. Grid ARS has been developed using common Web services technologies and standards. In this paper, we present overview of Grid ARS and its service components and describe Grid ARS demonstration challenges, demonstration at GLIF2010 and SC10 and OGF NSI interoperation in 2011.

ieee/acm international symposium cluster, cloud and grid computing | 2013

Fast Wide Area Live Migration with a Low Overhead through Page Cache Teleportation

Soramichi Akiyama; Takahiro Hirofuchi; Ryousei Takano; Shinichi Honiden

Live migration of virtual machines over a wide area network has many use cases such as cross-data center load balancing, low carbon virtual private clouds, and disaster recovery of IT systems. An efficient wide area live migration method is required because cross-data center connections have a narrow bandwidth. Page cache occupies a large portion of the memory of a Virtual Machine (VM) when it executes data-intensive workloads. We propose a new live migration technique, page cache teleportation, which reduces the total migration time of wide area live migration and has a low overhead. It detects the restorable page cache in the guest memory that has the same contents as the corresponding disk blocks. The restorable page cache is not transferred via the WAN but is restored from the disk image before the VM resumes. In this way, the IO performance degradation reduces after the migration. Evaluations show that page cache teleportation reduces the total migration time of wide area live migration and has a lower performance overhead than existing approaches.

international conference on e-science | 2012

Cooperative VM migration for a virtualized HPC cluster with VMM-bypass I/O devices

Ryousei Takano; Hidemoto Nakada; Takahiro Hirofuchi; Yoshio Tanaka; Tomohiro Kudoh

An HPC cloud, a flexible and robust cloud computing service specially dedicated to high performance computing, is a promising future e-Science platform. In cloud computing, virtualization is widely used to achieve flexibility and security. Virtualization makes migration or checkpoint/restart of computing elements (virtual machines) easy, and such features are useful for realizing fault tolerance and server consolidations. However, in widely used virtualization schemes, I/O devices are also virtualized, and thus I/O performance is severely degraded. To cope with this problem, VMM-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the I/O overhead can be significantly reduced, have been introduced. However, such VMM-bypass I/O technologies make it impossible to migrate or checkpoint/restart virtual machines, since virtual machines are directly attached to hardware devices. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a PCI hotplug and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark. Moreover, we also show that postcopy live migration enables us to reduce the down time of an application as the memory footprint increases.

ieee international conference on cloud computing technology and science | 2014

Exploring the Performance Impact of Virtualization on an HPC Cloud

Nuttapong Chakthranont; Phonlawat Khunphet; Ryousei Takano; Tsutomu Ikegami

The feasibility of the cloud computing paradigm is examined from the High Performance Computing (HPC) viewpoint. The impact of virtualization is evaluated on our latest private cloud, the AIST Super Green Cloud, which provides elastic virtual clusters interconnected by Infini Band. Performance is measured by using typical HPC benchmark programs, both on physical and virtual cluster computing clusters. The results of the micro benchmarks indicate that the virtual clusters suffer from the scalability issue on almost all MPI collective functions. The relative performance gradually becomes worse as the number of nodes increases. On the other hand, the benchmarks based on actual applications, including LINPACK, OpenMX, and Graph 500, show that the virtualization overhead is about 5% even when the number of nodes increase to 128. This observation leads to our optimistic conclusions on the feasibility of the HPC Cloud.

optical fiber communication conference | 2011

Joint storage-network resource management for super high-definition video delivery service

Kazuhisa Yamada; Yukio Tsukishima; Kazuhiro Matsuda; Masahiko Jinno; Yusuke Tanimura; Tomohiro Kudoh; Atsuko Takefusa; Ryousei Takano; Takashi Shimizu

This paper proposes a joint storage-network resource management for a super high-definition video delivery service. The method for allocating storage and optical path resources is discussed. The feasibility of the proposed system is shown.

cluster computing and the grid | 2008

High Performance Relay Mechanism for MPI Communication Libraries Run on Multiple Private IP Address Clusters

Ryousei Takano; Motohiko Matsuda; Tomohiro Kudoh; Yuetsu Kodama; Fumihiro Okazaki; Yutaka Ishikawa; Yasufumi Yoshizawa

We have been developing a Grid-enabled MPI communication library called GridMPI, which is designed to run on multiple clusters connected to a wide-area network. Some of these clusters may use private IP addresses. Therefore, some mechanism to enable communication between private IP address clusters is required. Such a mechanism should be widely adoptable, and should provide high communication performance. In this paper, we propose a message relay mechanism to support private IP address clusters in the manner of the Interoperable MPI (IMPI) standard. Therefore, any MPI implementations which follow the IMPI standard can communicate with the relay. Furthermore, we also propose a trunking method in which multiple pairs of relay nodes simultaneously communicate between clusters to improve the available communication bandwidth. While the relay mechanism introduces an one-way latency of about 25 musec, the extra overhead is negligible, since the communication latency through a wide area network is a few hundred times as large as this. By using trunking, the inter-cluster communication bandwidth can improve as the number of trunks increases. We confirmed the effectiveness of the proposed method by experiments using a 10 Gbps emulated WAN environment. When relay nodes with 1 Gbps NICs are used, the performance of most of the NAS Parallel Benchmarks improved proportional to the number of trunks. Especially, using 8 trunks, FT and IS are 4.4 and 3.4 times faster, respectively, compared with the single trunk case. The results showed that the proposed method is effective for running MPI programs over high bandwidth-delay product networks.

Explore More