Keiichi Aoki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Keiichi Aoki is active.

Explore More

Publication

Featured researches published by Keiichi Aoki.

international symposium on parallel and distributed computing | 2005

Active Zero-copy: A performance study of non-deterministic messaging

Shinichi Yamagiwa; Keiichi Aoki; Koichi Wada

Zero-copy communication exchanges the messages among the buffers that are allocated and locked before the communication itself. This communication style fits into applications that the communication timings and the message sizes are known in its initialization phase. However, another application with non-deterministic messaging such as Web or parallel database can not fit into the style because the sizes and timings of its messages change at every communication. This paper proposes a new zero-copy communication style for these kinds of application, called active zero-copy, that receives messages without pre-allocated buffers. The performance evaluation with the active zero-copy compared with the conventional zero-copy, when the applications with non-deterministic messaging is applied, shows its efficiency

international performance, computing, and communications conference | 2004

On the performance of Maestro2 high performance network equipment, using new improvement techniques

Shinichi Yamagiwa; Kevin Ferreira; Luis M. Campos; Keiichi Aoki; Masaaki Ono; Koichi Wada; Munehiro Fukuda; Leonel Sousa

Cluster computers have become the vehicle of choice to build high performance computing environments. To fully exploit the computing power of these environments, one must utilize high performance network and protocol technologies, since the communication patterns of parallel applications running on clusters require low latency and high throughput, not achievable by using off-the-shell network technologies. We have developed a technology to build high performance network equipment, called MaestroS. This paper describes the novel techniques used by Maestro2 to extract maximum performance from the physical medium and studies the impact of software-level parameters. The results obtained clearly show that Maestro2 is a promising technology, presenting very good results both in terms of latency and throughput. The results also show the large impact of software overhead in the overall performance of the system and validate the need for optimized communication libraries for high performance computing.

international conference on communications | 2004

Maestro2: high speed network technology for high performance computing

Keiichi Aoki; Shinichi Yamagiwa; Kevin Ferreira; Luís Miguel Campos

Cluster computers have become the de facto mechanism to build high performance computing environments. To fully exploit the computing power of these environments, one must utilize high performance network and protocol technologies, since the communication patterns of parallel applications running on clusters require low latency and high throughput, not achievable by using off-the-shell network technologies. We have developed a technology to build high performance network equipment, called Maestro2. This paper describes the novel techniques used by Maestro2 to extract maximum performance from the physical medium and studies the impact of overhead in communication software. The results obtained clearly show that Maestro2 is a promising technology, presenting very good results both in terms of latency and throughput. The results also show the large impact of software overhead in the overall performance of the system and validate the need for optimized communication libraries for high performance computing.

Journal of Interconnection Networks | 2006

MAESTRO2: EXPERIMENTAL EVALUATION OF COMMUNICATION PERFORMANCE IMPROVEMENT TECHNIQUES IN THE LINK LAYER

Shinichi Yamagiwa; Leonel Sousa; Kevin Ferreira; Keiichi Aoki; Masaaki Ono; Koichi Wada

Cluster computers became the vehicle of choice to build high performance computing environments. To fully exploit the computing power of these environments, technologies for high performance networks and protocols have to be applied, since the communication patterns of parallel applications running on clusters demand low latency and high throughput. Our previous work identifies the main drawbacks of conventional network technologies, and proposed some techniques to solve them within a new network solution called Maestro. This paper describes not only the architecture and the evaluation platform of the Maestro2, but also the technologies introduced for enhancing the performance of the original Maestro framework. Novel techniques are proposed at the link layer and at the switching level, and its design and implementation in the Maestro2 is discussed: continuous network burst and out-of-order switching. Moreover, a specialized communication library has been developed to take full advantage of these new techniques for implementing high speed cluster networks based on the Maestro2 environment. Experimental results clearly show that the new proposed techniques provide significant improvements, both in terms of latency and throughput, which are essential to efficient cluster computing.

pacific rim conference on communications, computers and signal processing | 2007

An Architecture and Performance of Maestro3 Cluster Network

Hajime Kuribayashi; Koji Yasuda; Keiichi Aoki; Koichi Wada; Masaaki Ono

In cluster computing, communication performance is an important factor that determines overall performance of a cluster. The typical conventional approaches in improving communication performance are to use a dedicated cluster network and/or off loading protocols to network devices. However, in these approaches, the host processor still consumes much computing resources to process communication procedures. To improve efficiency in parallel processing, host processors has to be able to concentrate on primary computation by introducing more aggressive and flexible off loading mechanism. We are currently developing a new cluster network, called Maestro3, which has a capability of off loading user-defined software modules. Both a network interface and a switch of Masetro3 include a general purpose processor, which is tightly coupled with network hardware, and a high-capacity memory. This paper presents an architecture and preliminary evaluation of Maestro3 cluster network.

international symposium on parallel and distributed processing and applications | 2006

Architecture and performance of dynamic offloader for cluster network

Keiichi Aoki; Hiroki Maruoka; Koichi Wada; Masaaki Ono

This paper presents an architecture of the dynamic offloading mechanism, called Maestro Dynamic Offloading Mechanism(MDO), for the intelligent cluster network Maestro2. By using MDO, programmers can offload software modules to the network interface and the switch dynamically. MDO provides programmers functional APIs with which programmers can develop offload modules efficiently. MDO also includes wrapper library that enables the offload modules to be executed on the host processors as well as on the network devices. The results of performance evaluation showed that the performance of the collective communications can be improved by offloading communication procedures to the network devices using MDO. The overhead of the MDO and the traffic on PCI bus are also discussed.

international symposium on parallel and distributed computing | 2004

Distributed shared memory system based on the Maestro2 high performance cluster network

Kevin Ferreira; Shinichi Yamagiwa; Leonel Sousa; Keiichi Aoki; Koichi Wada; Luis M. Campos

Cluster computing has become a valid alternative for high performance computing. To fully exploit the computing power of these environments, one must utilize high performance network and protocol technologies, since parallel applications running on clusters require low latency and high throughput. To address this issue, Maestro2 high performance network technology has been developed. Parallel applications running on clusters usually use two different known alternatives to share information; message passing or distributed shared memory. Maestro2 already supports a high performance message passing communication system, MMP. This paper describes a first version of a distributed shared memory system based on Maestro2, and a first set of experimental results is presented.

parallel and distributed computing systems (isca) | 2005