Is this you? Create Your Porfile

Tomohiro Kudoh

National Institute of Advanced Industrial Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tomohiro Kudoh is active.

Explore More

Publication

Featured researches published by Tomohiro Kudoh.

Future Generation Computer Systems | 2006

G-lambda: coordination of a grid scheduler and lambda path service over GMPLS

Atsuko Takefusa; Michiaki Hayashi; Naohide Nagatsu; Hidemoto Nakada; Tomohiro Kudoh; Takahiro Miyamoto; Tomohiro Otani; Hideaki Tanaka; Masatoshi Suzuki; Yasunori Sameshima; Wataru Imajuku; Masahiko Jinno; Yoshihiro Takigawa; Shuichi Okamoto; Yoshio Tanaka; Satoshi Sekiguchi

A vertical coordination between computing resource scheduler and network resource scheduler for Grid-based applications is described. The network resource management system virtualizes and schedules network resources to inter-work with Grid resource scheduler through Web-services interface.

international conference on cluster computing | 2006

Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks

Motohiko Matsuda; Tomohiro Kudoh; Yuetsu Kodama; Ryousei Takano; Yutaka Ishikawa

Several MPI systems for grid environment, in which clusters are connected by wide-area networks, have been proposed. However, the algorithms of collective communication in such MPI systems assume relatively low bandwidth wide-area networks, and they are not designed for the fast wide-area networks that are becoming available. On the other hand, for cluster MPI systems, a beast algorithm by van de Geijn et al. and an allreduce algorithm by Rabenseifner have been proposed, which are efficient in a high bisection bandwidth environment. We modify those algorithms so as to effectively utilize fast wide-area inter-cluster networks and to control the number of nodes which can transfer data simultaneously through wide-area networks to avoid congestion. We confirmed the effectiveness of the modified algorithms by experiments using a 10 Gbps emulated WAN environment. The environment consists of two clusters, where each cluster consists of nodes with 1 Gbps Ethernet links and a switch with a 10 Gbps upper link. The two clusters are connected through a 10 Gbps WAN emulator which can insert latency. In a 10 millisecond latency environment, when the message size is 32 MB, the proposed beast and allreduce are 1.6 and 3.2 times faster, respectively, than the algorithms used in existing MPI systems for grid environment

job scheduling strategies for parallel processing | 2010

An advance reservation-based co-allocation algorithm for distributed computers and network bandwidth on QoS-guaranteed grids

Atsuko Takefusa; Hidemoto Nakada; Tomohiro Kudoh; Yoshio Tanaka

Co-allocation of performance-guaranteed computing and network resources provided by several administrative domains is one of the key issues for constructing a QoS-guaranteed Grid. We propose an advance reservation-based co-allocation algorithm for both computing and network resources on a QoS-guaranteed Grid, modeled as an integer programming (IP) problem. The goal of our algorithm is to create reservation plans satisfying user resource requirements as an on-line service. Also the algorithm takes co-allocation options for user and resource administrator issues into consideration. We evaluate the proposed algorithm with extensive simulation, in terms of both functionality and practicality. The results show: The algorithm enables efficient coallocation of both computing and network resources provided by multiple domains, and can reflect reservation options for resource administrators issues as a first step. The calculation times needed for selecting resources using an IP solver are acceptable for an on-line service.

international conference on cluster computing | 2004

GNET-1: gigabit Ethernet network testbed

Yuetsu Kodama; Tomohiro Kudoh; Ryousei Takano; H. Sato; Osamu Tatebe; Satoshi Sekiguchi

GNET-1 is a fully programmable network testbed. It provides functions such as wide area network emulation, network instrumentation, traffic shaping, and traffic generation at gigabit Ethernet wire speeds by programming the core FPGA. GNET-1 is a powerful tool for developing network-aware grid software. It is also a network monitoring and traffic-shaping tool that provides high-performance communication over wide area networks. This work describes several sample uses of GNET-1 and presents its architecture.

cluster computing and the grid | 2003

Evaluation of MPI implementations on grid-connected clusters using an emulated WAN environment

Motohiko Matsuda; Tomohiro Kudoh; Yutaka Ishikawa

The MPICH-SCore high performance communication library for cluster computing is integrated into the MPICHG-2 library in order to adapt PC clusters to a Grid environment. The integrated library is called MPICH-G2/SCore. In addition, for the purpose of comparison with other approaches, MPICH-SCore itself is extended to encapsulate its network packet into a UDP packet so that packets are delivered via L3 switches. This extension is called UDP-encapsulated MPICH-SCore. In this paper, three implementations of the MPI library, UDP-encapsulated MPICH-SCore, MPICH-G2/SCore, and MPICH-P4, are evaluated using an emulated WAN environment where two clusters, each consisting of sixteen hosts, are connected by a router PC. The router PC controls the latency of message delivery between clusters, and the added latency is varied from I millisecond to 4 milliseconds in round-trip time. Experiments are performed using the NAS Parallel Benchmarks, which show UDP-encapsulated MPICH-SCore most often performs better than other implementations. However, the differences are not critical for the benchmarks. The preliminary results show that the performance of the LU benchmark scales up linearly with under 4 millisecond round-trip latency. The CG and MG benchmarks show the scalability of 1.13 and 1.24 times with 4 millisecond round-trip latency, respectively.

job scheduling strategies for parallel processing | 2007

GridARS: an advance reservation-based grid co-allocation framework for distributed computing and network resources

Atsuko Takefusa; Hidemoto Nakada; Tomohiro Kudoh; Yoshio Tanaka; Satoshi Sekiguchi

For high performance parallel computing on actual Grids, one of the important issues is to co-allocate the distributed resources that are managed by various local schedulers with advance reservation. To address the issue, we proposed and developed the GridARS resource co-allocation framework, and a general advance reservation protocol that uses WSRF/GSI and a two-phased commit (2PC) protocol to enable a generic and secure advance reservation process based on distributed transactions, and provides the interface module for various existing resource schedulers. To confirm the effectiveness of GridARS, we describe the performance of a simultaneous reservation process and a case study of GridARS grid co-allocation over transpacific computing and network resources. Our experiments showed that: 1) the GridARS simultaneous 2PC reservation process is scalable and practical and 2) GridARS can coallocate distributed resources managed by various local schedulers stably.

international conference on cluster computing | 2000

MEMOnet: network interface plugged into a memory slot

Noboru Tanabe; Junji Yamamoto; Hiroaki Nishi; Tomohiro Kudoh; Yoshihiro Hamada; Hironori Nakajo; Hideharu Amano

The communication architecture of the DIMMnet-1 network interface, based on MEMOnet, is described. MEMOnet is an architecture consisting of a network interface plugged into a memory slot. The DIMMnet-1 prototype will have two banks of PC133 based SO-DIMM slots and an 8 Gbps full duplex optical link or two 448 MB/s full duplex LVDS channel links. The software overhead incurred to generate a message is only I CPU cycle and the estimated hardware delay is less than 100 ns using the atomic on-the-fly sending with header TLB. The estimated achievable communication bandwidth with block on-the-fly sending with protection stampable window memory is 440 MB/s which was observed in our experiments writing to the DIMM area with a write combining attribute. This is 3.3 times higher than the maximum bandwidth of PCI. This high performance distributed computing environment is available using economical personal computers with DIMM slots.

international conference on cluster computing | 2005

TCP Adaptation for MPI on Long-and-Fat Networks

Motohiko Matsuda; Tomohiro Kudoh; Yuetsu Kodama; Ryousei Takano; Yutaka Ishikawa

Typical MPI applications work in phases of computation and communication, and messages are exchanged in relatively small chunks. This behavior is not optimal for TCP because TCP is designed only to handle a contiguous flow of messages efficiently. This behavior anomaly is well-known, but fixes are not integrated into todays TCP implementations, even though performance is seriously degraded, especially for MPI applications. This paper proposes three improvements in the Linux TCP stack: i.e., pacing at start-up, reducing Retransmit-Timeout time, and TCP parameter switching at the transition of computation phases in an MPI application. Evaluation of these improvements using the NAS parallel benchmarks shows that the BT, CG, IS, and SP benchmarks achieved 10 to 30 percent improvements. On the other hand, the FT and MG benchmarks showed no improvement because they have the steady communication that TCP assumes, and the LU benchmark became slightly worse because it has very little communication

international symposium on parallel architectures algorithms and networks | 1994

Overview of the JUMP-1, an MPP prototype for general-purpose parallel computations

Kei Hiraki; Hideharu Amano; Morihiro Kuga; Toshinori Sueyoshi; Tomohiro Kudoh; Hiroshi Nakashima; Hironori Nakajo; Hideo Matsuda; Takashi Matsumoto; Shin ichiro Mori

We describe the basic architecture of JUMP-1, an MPP prototype developed by collaboration between 7 universities. The proposed architecture can exploit high performance of coarse-grained RISC processor performance in connection with flexible fine-grained operation such as distributed shared memory, versatile synchronization and message communications.<<ETX>>

international conference on parallel processing | 2006

Switch-tagged VLAN Routing Methodology for PC Clusters with Ethernet

Tomohiro Otsuka; Michihiro Koibuchi; Tomohiro Kudoh; Hideharu Amano

Ethernet has been used for connecting hosts in the area of high performance-per-cost PC clusters. Although L2 Ethernet topology is limited to a tree structure, various routing algorithms on topologies suitable for parallel processing can be employed by applying IEEE 802.1Q VLAN technology. However, communication library used in PC clusters does not always support VLANs, so the design of VLAN-based routing method cannot be applied for such PC clusters. In this paper, we propose a switch-tagged VLAN methodology to flexibly set the route of frames on such PC clusters. Since each host does not need to process VLAN tags, the proposed method has advantages in both simple host configuration and high portability. Evaluation results using NAS Parallel Benchmarks showed that performance of topologies supported by the proposed method was comparable with that of an ideal 1-switch (full crossbar) network in the case of a 16-host PC cluster

Explore More