Shinichi Yamagiwa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shinichi Yamagiwa is active.

Explore More

Publication

Featured researches published by Shinichi Yamagiwa.

IEEE Computer | 2007

Caravela: A Novel Stream-Based Distributed Computing Environment

Shinichi Yamagiwa; Leonel Sousa

Distributed computing implies sharing computation, data, and network resources around the world. The Caravela environment applies a proposed flow model for stream computing on graphics processing units that encapsulates a program to be executed in local or remote computers and directly collects the data through the memory or the network.

computing frontiers | 2007

Design and implementation of a stream-based distributedcomputing platform using graphics processing units

Shinichi Yamagiwa; Leonel Sousa

Anonymous use of computing resources spread over the world becomes one of the main goals in GRID environments. In GRID-based computing, the security of users or of contributors of computing resources is crucial to execute processes in a safe way. This paper proposes a new method for streambased processing in a distributed environment and also a novel method to solve the security matter under this kind of processing. It also presents the design of the distributed computing platform developed for stream-based processing, including the description of the local and remote execution methods, which are collectively designated by Caravela platform. The proposed flow-model is mapped on the distributed processing resources, connected through a network, by using the Caravela platform. This platform has been developed by the authors of this paper specifically for making use of the Graphics Processing Units available in recent personal computers. The paper also illustrates application of the Caravela platform to different types of processing, namely scientific computing and image/video processing. The presented experimental results show that significant improvements can be achieved with the use of GPUs against the use of general purpose processors.

computing frontiers | 2007

Data buffering optimization methods toward a uniform programming interface for gpu-based applications

Shinichi Yamagiwa; Leonel Sousa; Diogo Antão

The massive computational power available in off-the shelf Graphics Processing Units (GPUs) can pave the way for its usage in general purpose applications. Current interfaces to program GPU operation are still oriented towards graphics processing. This paper is focused in disparities on those programming interfaces and proposes an extension to of the recently developed Caravela library that supports streambased computation. This extension implements effective methods to counterbalance the disparities and differences in graphics runtime environments. Experimental results show that these methods improve performance of GPU-based applications by more than 50% and demonstrate that the proposed extended interface can be an effective solution for generalpurpose programming on GPUs.

international symposium on parallel and distributed computing | 2009

CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications

Shinichi Yamagiwa; Leonel Sousa

With the ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster computing, which uses GPUs as the processing units, is one of the most promising high performance parallel computing platforms, currently there is no programming environment, interface or library designed to use these multiple computing resources to compute tasks in parallel. This paper proposes the CaravelaMPI, a new message passing interface targeted for GPU cluster computing, providing a unified and transparent interface to manage both communication and GPU execution. Experimental results show that the transparent interface of CaravelaMPI allows to efficiently program GPU-based clusters, not only decreasing the required programming effort but also increasing the performance of GPU-based cluster computing platforms.

international conference on cluster computing | 2000

Design and performance of Maestro cluster network

Shinichi Yamagiwa; Munehiro Fukuda; Koichi Wada

Most clusters so far have used WAN or LAN-based network products for communication due to their market availability. However they do not always match communication patterns in clusters, thus incurring extra overhead. Based on our investigation for such overhead, we have optimized cluster communication at link layer. Partitioning each message in 16-byte packets, our optimization uses two techniques: (1) transferring in burst as many packets as the receiving buffer accepts at once, and (2) having each hardware component pass one packet to another in a pipelined manner. We have realized those two techniques in a link control hardware chip, referred to as MLC(Maestro Link Controller), and have constructed the Maestro cluster network using MLCs. This paper describes the feature of the Maestro cluster network and demonstrates the efficiency of our optimization techniques through performance experiments over this network.

international parallel and distributed processing symposium | 2009

Performance study of interference on GPU and CPU resources with multiple applications

Shinichi Yamagiwa; Koichi Wada

In the last years, the performance and capabilities of Graphics Processing Units (GPUs) improved drastically, mostly due to the demands of the entertainment market, with consumers and companies alike pushing for improvements in the level of visual fidelity, which is only achieved with high performing GPU solutions. Beside the entertainment market, there is an ongoing global research effort for using such immense computing power for applications beyond graphics, such as the domain of general purpose computing. Efficiently combining these GPUs resources with existing CPU resources is also an important and open research task. This paper is a contribution to that effort, focusing on analysis of performance factors of combining both resource types, while introducing also a novel job scheduler that manages these two resources. Through experimental performance evaluation, this paper reports what are the most important factors and design considerations that must be taken into account while designing such job scheduler.

international symposium on parallel and distributed computing | 2005

Active Zero-copy: A performance study of non-deterministic messaging

Shinichi Yamagiwa; Keiichi Aoki; Koichi Wada

Zero-copy communication exchanges the messages among the buffers that are allocated and locked before the communication itself. This communication style fits into applications that the communication timings and the message sizes are known in its initialization phase. However, another application with non-deterministic messaging such as Web or parallel database can not fit into the style because the sizes and timings of its messages change at every communication. This paper proposes a new zero-copy communication style for these kinds of application, called active zero-copy, that receives messages without pre-allocated buffers. The performance evaluation with the active zero-copy compared with the conventional zero-copy, when the applications with non-deterministic messaging is applied, shows its efficiency

international symposium on parallel and distributed computing | 2007

Meta-Pipeline: A New Execution Mechanism for Distributed Pipeline Processing

Shinichi Yamagiwa; Leonel Sousa; Tomás Brandão

The Caravela platform has been proposed by the authors of this paper to perform distributed stream-based computing on general purpose computation. This platform uses a secured execution unit called flow-model that prevents remote users to touch local information in a computer. The flow-model is assigned to local or remote processing units that execute its program. This paper is focused on a new execution mechanism that defines a pipeline composed by flow-models, called meta-pipeline, and is designed as a set of additional functions of the Caravela platform. The pipeline is executed automatically by the meta-pipeline runtime environment. This paper describes the execution mechanism and also presents an application example.

international performance, computing, and communications conference | 2004

On the performance of Maestro2 high performance network equipment, using new improvement techniques

Shinichi Yamagiwa; Kevin Ferreira; Luis M. Campos; Keiichi Aoki; Masaaki Ono; Koichi Wada; Munehiro Fukuda; Leonel Sousa

Cluster computers have become the vehicle of choice to build high performance computing environments. To fully exploit the computing power of these environments, one must utilize high performance network and protocol technologies, since the communication patterns of parallel applications running on clusters require low latency and high throughput, not achievable by using off-the-shell network technologies. We have developed a technology to build high performance network equipment, called MaestroS. This paper describes the novel techniques used by Maestro2 to extract maximum performance from the physical medium and studies the impact of software-level parameters. The results obtained clearly show that Maestro2 is a promising technology, presenting very good results both in terms of latency and throughput. The results also show the large impact of software overhead in the overall performance of the system and validate the need for optimized communication libraries for high performance computing.

international conference on communications | 2004

Maestro2: high speed network technology for high performance computing

Keiichi Aoki; Shinichi Yamagiwa; Kevin Ferreira; Luís Miguel Campos

Cluster computers have become the de facto mechanism to build high performance computing environments. To fully exploit the computing power of these environments, one must utilize high performance network and protocol technologies, since the communication patterns of parallel applications running on clusters require low latency and high throughput, not achievable by using off-the-shell network technologies. We have developed a technology to build high performance network equipment, called Maestro2. This paper describes the novel techniques used by Maestro2 to extract maximum performance from the physical medium and studies the impact of overhead in communication software. The results obtained clearly show that Maestro2 is a promising technology, presenting very good results both in terms of latency and throughput. The results also show the large impact of software overhead in the overall performance of the system and validate the need for optimized communication libraries for high performance computing.

Explore More