Toshihiro Hanawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Toshihiro Hanawa is active.

Explore More

Publication

Featured researches published by Toshihiro Hanawa.

grid computing | 2010

D-Cloud: Design of a Software Testing Environment for Reliable Distributed Systems Using Cloud Computing Technology

Takayuki Banzai; Hitoshi Koizumi; Ryo Kanbayashi; Takayuki Imada; Toshihiro Hanawa; Mitsuhisa Sato

In this paper, we propose a software testing environment, called D-Cloud, using cloud computing technology and virtual machines with fault injection facility. Nevertheless, the importance of high dependability in a software system has recently increased, and exhaustive testing of software systems is becoming expensive and time-consuming, and, in many cases, sufficient software testing is not possible. In particular, it is often difficult to test parallel and distributed systems in the real world after deployment, although reliable systems, such as high-availability servers, are parallel and distributed systems. D-Cloud is a cloud system which manages virtual machines with fault injection facility. D-Cloud sets up a test environment on the cloud resources using a given system configuration file and executes several tests automatically according to a given scenario. In this scenario, D-Cloud enables fault tolerance testing by causing device faults by virtual machine. We have designed the D-Cloud system using Eucalyptus software and a description language for system configuration and the scenario of fault injection written in XML. We found that the D-Cloud system, which allows a user to easily set up and test a distributed system on the cloud and effectively reduces the cost and time of testing.

international conference on software testing, verification and validation workshops | 2010

Large-Scale Software Testing Environment Using Cloud Computing Technology for Dependable Parallel and Distributed Systems

Toshihiro Hanawa; Takayuki Banzai; Hitoshi Koizumi; Ryo Kanbayashi; Takayuki Imada; Mitsuhisa Sato

Various information systems are widely used in information society era, and the demand for highly dependable system is increasing year after year. However, software testing for such a system becomes more difficult due to the enlargement and the complexity of the system. In particular, it is too difficult to test parallel and distributed systems sufficiently although dependable systems such as high-availability servers usually form parallel and distributed systems. To solve these problems, we proposed a software testing environment for dependable parallel and distributed system using the cloud computing technology, named D-Cloud. D-Cloud includes Eucalyptus as the cloud management software, and FaultVM based on QEMU as the virtualization software, and D-Cloud frontend for interpreting test scenario. D-Cloud enables not only to automate the system configuration and the test procedure but also to perform a number of test cases simultaneously, and to emulate hardware faults flexibly. In this paper, we present the concept and design of D-Cloud, and describe how to specify the system configuration and the test scenario. Furthermore, the preliminary test example as the software testing using D-Cloud was presented. Its result shows that D-Cloud allows to set up the environment easily, and to test the software testing for the distributed system.

Proceedings of the First Workshop on Accelerator Programming using Directives | 2014

XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters

Masahiro Nakao; Hitoshi Murai; Takenori Shimosaka; Akihiro Tabuchi; Toshihiro Hanawa; Yuetsu Kodama; Taisuke Bokut; Mitsuhisa Sato

The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP and OpenACC directives in order to develop applications that can use accelerator clusters with ease. Moreover, in order to improve the performance of stencil applications, the Omni XACC compiler provides functions that can transfer a halo region on accelerator memory via Tightly Coupled Accelerators (TCA), which is a proprietary network for transferring data directly among accelerators. In the present paper, we evaluate the productivity and the performance of XACC through implementations of the HIMENO Benchmark. The results show that thanks to the productivity improvements, XACC requires less than half the source lines of code compare to a combination of Message Passing Interface (MPI) and OpenACC, which is commonly used together as a typical programming model. As a result of these performance improvements, XACC using TCA achieved up to 2.7 times faster performance than could be obtained via the combination of OpenACC and MPI programming model using GPUDirect RDMA over InfiniBand.

high performance interconnects | 2013

Interconnection Network for Tightly Coupled Accelerators Architecture

Toshihiro Hanawa; Yuetsu Kodama; Taisuke Boku; Mitsuhisa Sato

In recent years, heterogeneous clusters using accelerators have entered widespread use in high-performance computing systems. In such clusters, inter-node communication between accelerators normally requires several memory copies via CPU memory, which results in communication latency that causes severe performance degradation. To address this problem, we propose Tightly Coupled Accelerators (TCA) architecture, which is capable of reducing the communication latency between accelerators over different nodes. In the TCA architecture, PCI Express (PCIe)packets are used for direct inter-node communication between accelerators. In addition, we designed a communication chip that we have named PCI Express Adaptive Communication Hub Version 2 (PEACH2) to realize our proposed TCA architecture. In this paper, we introduce the design and implementation of the PEACH2 chip using a field programmable gate array (FPGA), and present a PEACH2 board designed for use as a PCIe extension board. The results of evaluations using ping-pong programs on an eight node TCA cluster demonstrate that the PEACH2 chip achieves 95% of the theoretical peak performance and a latency of 0.96 μsec.

dependable systems and networks | 2012

DS-Bench Toolset: Tools for dependability benchmarking with simulation and assurance

Hajime Fujita; Yutaka Matsuno; Toshihiro Hanawa; Mitsuhisa Sato; Shinpei Kato; Yutaka Ishikawa

Todays information systems have become large and complex because they must interact with each other via networks. This makes testing and assuring the dependability of systems much more difficult than ever before. DS-Bench Toolset has been developed to address this issue, and it includes D-Case Editor, DS-Bench, and D-Cloud. D-Case Editor is an assurance case editor. It makes a tool chain with DS-Bench and D-Cloud, and exploits the test results as evidences of the dependability of the system. DS-Bench manages dependability benchmarking tools and anomaly loads according to benchmarking scenarios. D-Cloud is a test environment for performing rapid system tests controlled by DS-Bench. It combines both a cluster of real machines for performance-accurate benchmarks and a cloud computing environment as a group of virtual machines for exhaustive function testing with a fault-injection facility. DS-Bench Toolset enables us to test systems satisfactorily and to explain the dependability of the systems to the stakeholders.

international workshop on openmp | 2009

Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP

Toshihiro Hanawa; Mitsuhisa Sato; Jinpil Lee; Takayuki Imada; Hideaki Kimura; Taisuke Boku

Recently, multicore technology has been introduced to embedded systems in order to improve performance and reduce power consumption. In the present study, three SMP multicore processors for embedded systems and a multicore processor for a desktop PC are evaluated by the parallel benchmark using OpenMP. The results indicate that, even if the memory performance is low, applications that are not memory-intensive exhibit large speedups by parallelization. The results also indicate a large performance improvement due to parallelization using OpenMP, despite its low cost.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators

Toshihiro Hanawa; Yuetsu Kodama; Taisuke Boku; Mitsuhisa Sato

In recent years, heterogeneous clusters using accelerators have been widely used in high performance computing systems. In such clusters, inter-node communication among accelerators requires several memory copies via CPU memory, and the communication latency causes severe performance degradation. In order to address this problem, we propose the Tightly Coupled Accelerators (TCA) architecture to reduce the communication latency between accelerators over different nodes. In addition, we promote the HA-PACS project at the Center for Computational Sciences, University of Tsukuba, in order to build up the HA-PACS base cluster system, as a commodity GPU cluster, and to develop an experimental system based on the TCA architecture as a proprietary interconnection network connecting accelerators beyond the nodes. In the present paper, we describe the TCA architecture and the design and implementation of PEACH2 for realizing the TCA architecture. We also evaluate the functionality and the basic performance of the PEACH2 chip, and the results demonstrate that the PEACH2 chip has sufficient maximum performance with 93% of the theoretical peak performance and a latency between adjacent nodes of approximately 0.8μsec.

pacific rim international symposium on dependable computing | 2010

Customizing Virtual Machine with Fault Injector by Integrating with SpecC Device Model for a Software Testing Environment D-Cloud

Toshihiro Hanawa; Hitoshi Koizumi; Takayuki Banzai; Mitsuhisa Sato; Shin’ichi Miura; Tadatoshi Ishii; Hidehisa Takamizawa

D-Cloud is a software testing environment for dependable parallel and distributed systems using cloud computing technology. We use Eucalyptus as cloud management software to manage virtual machines designed based on QEMU, called FaultVM, which have a fault injection mechanism. D-Cloud enables the test procedures to be automated using a large amount of computing resources in the cloud by interpreting the system configuration and the test scenario written in XML in D-Cloud front end and enables tests including hardware faults by emulating hardware faults by FaultVM flexibly. In the present paper, we describe the customization facility of FaultVM used to add new device models. We use SpecC, which is a system description language, to describe the behavior of devices, and a simulator generated from the description by SpecC is linked and integrated into FaultVM. This also makes the definition and injection of faults flexible without the modification of the original QEMU source codes. This facility allows D-Cloud to be used to test distributed systems with customized devices.

international parallel and distributed processing symposium | 2009

RI2N/DRV: Multi-link ethernet for high-bandwidth and fault-tolerant network on PC clusters

Shin'ichi Miura; Toshihiro Hanawa; Taiga Yonemoto; Taisuke Boku; Mitsuhisa Sato

Although recent high-end interconnection network devices and switches provide a high performance to cost ratio, most of the small to medium sized PC clusters are still built on the commodity network, Ethernet. To enhance performance on commonly used Gigabit Ethernet networks, link aggregation or binding technology is used. Currently, Linux kernels are equipped with software named Linux Channel Bonding (LCB), which is based IEEE802.3ad Link Aggregation technology. However, standard LCB has the disadvantage of mismatch with the TCP protocol; consequently, both large latency and bandwidth instability can occur. Fault-tolerance feature is supported by LCB, but the usability is not sufficient. We developed a new implementation similar to LCB named Redundant Interconnection with Inexpensive Network with Driver (RI2N/DRV) for use on Gigabit Ethernet. RI2N/DRV has a complete software stack that is very suitable for TCP, an upper layer protocol. Our algorithm suppresses unnecessary ACK packets and retransmission of packets, even in imbalanced network traffic and link failures on multiple links. It provides both high-bandwidth and fault-tolerant communication on multi-link Gigabit Ethernet. We confirmed that this system improves the performance and reliability of the network, and our system can be applied to ordinary UNIX services such as network file system (NFS), without any modification of other modules.

ACM Sigarch Computer Architecture News | 2014

PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators

Yuetsu Kodama; Toshihiro Hanawa; Taisuke Boku; Mitsuhisa Sato

In recent years, heterogeneous clusters using accelerators are often used for high performance computing systems. In such clusters, inter-node communication between accelerators requires several memory copies via CPU memory, and the communication latency incurred severely reduces performance. To solve this problem, we have been proposing a Tightly Coupled Accelerators (TCA) architecture intended to reduce the communication latency between accelerators over different nodes. In the TCA architecture, PCI Express packets are used for communication among GPUs over nodes. We developed a communication chip that we call the named PEACH2 chip, to help implement the TCA architecture. In this paper, we describe the details of the design and implementation of the PEACH2 chip, with respect to its routing mechanism and its DMA controller using FPGA. We evaluated the PEACH2 on a new platform that uses the latest Xeon CPU, IvyBridge, and achieved 2.3 GBytes/sec between GPUs over nodes, while the performance was only 880 MBytes/sec on the previous platform with SandyBridge.

Explore More