Shoichi Hirasawa
Tohoku University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shoichi Hirasawa.
asia and south pacific design automation conference | 2009
Reiji Suda; Takayuki Aoki; Shoichi Hirasawa; Akira Nukada; Hiroki Honda; Satoshi Matsuoka
We discuss hardware and software aspects of GPGPU, specifically focusing on NVIDIA cards and CUDA, from the viewpoints of parallel computing. The major weak points of GPU against newest supercomputers are identified to be and summarized as only four points: large SIMD vector length, small memory, absence of fast L2 cache, and high register spill penalty. As software concerns, we derive optimal scheduling algorithm for latency hiding of host-device data transfer, and discuss SPMD parallelism on GPUs.
ieee international conference on high performance computing, data, and analytics | 2014
Hiroyuki Takizawa; Shoichi Hirasawa; Yasuharu Hayashi; Ryusuke Egawa; Hiroaki Kobayashi
This paper proposes an extensible programming framework to separate platform-specific optimizations from application codes. The framework allows programmers to define their own code translation rules for special demands of individual systems, compilers, libraries, and applications. Code translation rules associated with user-defined compiler directives are defined in an external file, and the application code is just annotated by the directives. For code transformations based on the rules, the framework exposes the abstract syntax tree (AST) of an application code as an XML document to expert programmers. Hence, the XML document of an AST can be transformed using any XML-based technologies. Our case studies using real applications demonstrate that the framework is effective to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code.
international workshop on openmp | 2010
Satoshi Ohshima; Shoichi Hirasawa; Hiroki Honda
Arithmetic performance with GPGPU attracts attention. However, the difficulty of the programming poses a problem. We have proposed GPGPU programming which used the existing parallel programming technique. We are now developing OpenMP framework for GPU as a concrete of our proposal. The framework is based on Omni OpenMP Compiler and named “OMPCUDA”. In this paper we describe a design and an implementation of OMPCUDA. We evaluated using test programs, and validated that parallel improvement in the speed could be easily carried out in the same code as the existing OpenMP.
international symposium on computing and networking | 2015
Takeshi Yamada; Shoichi Hirasawa; Hiroyuki Takizawa; Hiroaki Kobayashi
This paper reports a case study of using the Xevolver code transformation framework for data layout optimizations of high-performance computing (HPC) applications. Due to the variety of data structures used in individual applications, a code transformation rule for data layout optimizations is generally specific to a particular application. Since the Xevolver framework enables users to define their own code transformations, a custom code transformation can be defined so that a specific data representation in an existing code can mechanically and consistently be translated to another one. Our evaluation results clearly demonstrate that such a code transformation is effective to improve memory access efficiency and hence the performance of an HPC application without overcomplicating the code.
international symposium on computing and networking | 2015
Reiji Suda; Hiroyuki Takizawa; Shoichi Hirasawa
HPC scientific codes are less readable and less manageable because of complex hand optimization which is often platform-dependent. We are developing a toolset that hopefully mitigates that maintainability problem by user-defined easy-to-use code transformation: The code is written in a simpler form, and coding technique for high performance is introduced by code transformations. In this paper, we present xevtgen, which is a code transformation generator of our toolset. Transformation rules are defined using dummy Fortran codes with some directives, and we expect that design makes it easier to learn for Fortran programmers. Some examples of code transformations are shown to discuss the practicality of the proposed approach.
international symposium on computing and networking | 2015
Kazuhiko Komatsu; Ryusuke Egawa; Shoichi Hirasawa; Hiroyuki Takizawa; Ken’ichi Itakura; Hiroaki Kobayashi
As the diversity of HPC systems increases, even legacy HPC applications often need to use accelerators for higher performance. To migrate large-scale legacy HPC applications to modern HPC systems including accelerators, OpenACC is a promising approach because its directive-based approach can prevent drastic code modifications. This paper shows a case study of the migration of a large-scale atmospheric simulation code to an OpenACC platform by keeping the maintainability of the original code. Although OpenACC enables an application to use accelerators by adding a small number of directives, it requires modifying the original code to achieve a high performance in most cases, and tends to degrade the maintainability and/or portability. To avoid such code modifications, this paper adopts a code transformation framework, Xevolver. Instead of directly modifying the code, custom code transformation rules and custom directives are defined using the Xevolver framework. This paper first shows that just inserting OpenACC directives does not lead to high performance and non-trivial code modifications are required in practice. Then, the direct code modification can be avoided by using externally defined transformation rules and directives to keep the original code unchanged as much as possible. Finally, the performance evaluation shows that the code modifications can improve the performance of the OpenACC code.
international parallel and distributed processing symposium | 2014
Chunyan Wang; Shoichi Hirasawa; Hiroyuki Takizawa; Hiroaki Kobayashi
A code smell is any part of an application code that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help programmers find which parts of their application codes should be refactored. However, code smells have not been defined in a formal manner. Moreover, existing detection tools are designed for object-oriented applications, but rarely provided for high performance computing (HPC) applications. HPC applications are usually optimized for a particular platform to achieve a high performance, and hence have special code smells called platform-specific code smells (PSCSs). The purpose of this work is to develop a code smell alert system to help programmers find PSCSs of HPC applications to improve the performance portability across different platforms. This paper presents a PSCS alert system that is based on an abstract syntax tree (AST) and XML. Code patterns of PSCSs are defined in a formal way using the AST information represented in XML. XML Path Language (XPath) is used to describe those patterns. The evaluation results obtained by using real applications show that the proposed system can alert potential PSCSs to programmers.
ieee region 10 conference | 2012
Alfian Amrizal; Shoichi Hirasawa; Kazuhiko Komatsu; Hiroyuki Takizawa; Hiroaki Kobayashi
As the number of nodes in a GPU computing system increases, checkpointing to a global file system becomes more time-consuming due to the I/O bottlenecks and network congestion. To solve this problem, in this paper, we propose a transparent and scalable checkpoint/restart mechanism for OpenCL applications, named Two-level CheCL. As its name implies, Two-level CheCL consists of two different checkpoint implementations, Local CheCL and Global CheCL. Local CheCL avoids checkpointing to the global file system by utilizing nodes local storage. Our experimental results show that Local CheCL can accelerate the checkpointing process by up to four times faster than a conventional checkpointing mechanism. We also implement Global CheCL, which utilizes a global file system, to make sure that we always have a global checkpoint file even in the case of a catastrophic failure. We discuss the performance of our proposed mechanism through an analysis with a two-level checkpoint model.
Archive | 2016
Hiroyuki Takizawa; Takeshi Yamada; Shoichi Hirasawa; Reiji Suda
Xevolver is a code transformation framework for users to define their own code transformation rules. In the framework, an abstract syntax tree (AST) of an application code is written in an XML format, and its transformation rules are expressed in the XSLT format, which is a standard XML format to describe XML data conversion; an AST and its transformation rules are both written in XML. Since it is too low-level for standard users to manually write XSLT rules, Xevtgen is now being developed as a tool to generate such rules from simple code description. In Xevtgen, users basically write just two code patterns, the original and transformed code patterns. Then, Xevtgen automatically generates a transformation rule that transforms the original code pattern to the transformed one. The generated rule is written in XSLT, and hence usable by other tools of the Xevolver framework. This article shows a use case of using Xevtgen for data layout optimization, and discusses the benefits of using the tool.
2009 Software Technologies for Future Dependable Distributed Systems | 2009
Shoichi Hirasawa; Hiroki Honda
Accelerators with little power consumption per computation performance are beginning to widely spread for High Performance Computing use, instead of general-purpose CPUs with much power consumption. They are GPUs, processors of Cell architecture, and FPGA accelerators. While these processors have much higher computation performance than general-purpose CPUs, they need specific programming environment respectively when using them as distributed memory accelerators. We discuss a portable programming environment which can be used in common with distributed memory accelerators in this paper.