Is this you? Create Your Porfile

Hideya Iwasaki

University of Electro-Communications

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hideya Iwasaki is active.

Explore More

Publication

Featured researches published by Hideya Iwasaki.

scalable information systems | 2006

A library of constructive skeletons for sequential style of parallel programming

Kiminori Matsuzaki; Hideya Iwasaki; Kento Emoto; Zhenjiang Hu

With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level parallel programming libraries that use abstraction to hide low-level concerns and reduce difficulties in parallel programming. Among them, libraries of parallel skeletons have emerged as a promising way towards this direction. Unfortunately, these libraries are not well accepted by sequential programmers, because of incomplete elimination of lower-level details, ad-hoc selection of library functions, unsatisfactory performance, or lack of convincing application examples. This paper addresses principle of designing skeleton libraries of parallel programming and reports implementation details and practical applications of a skeleton library SkeTo. The SkeTo library is unique in its feature that it has a solid theoretical foundation based on the theory of Constructive Algorithmics, and is practical to be used to describe various parallel computations in a sequential manner.

international conference on functional programming | 1997

Tupling calculation eliminates multiple data traversals

Zhenjiang Hu; Hideya Iwasaki; Masato Takeichi; Akihiko Takano

Tupling is a well-known transformation tactic to obtain new efficient recursive functions by grouping some recursive functions into a tuple. It may be applied to eliminate multiple traversals over the common data structure. The major difficulty in tupling transformation is to find what functions are to be tupled and how to transform the tupled function into an efficient one. Previous approaches to tupling transformation are essentially based on fold/unfold transformation. Though general, they suffer from the high cost of keeping track of function calls to avoid infinite unfolding, which prevents them from being used in a compiler.To remedy this situation, we propose a new method to expose recursive structures in recursive definitions and show how this structural information can be explored for calculating out efficient programs by means of tupling. Our new tupling calculation algorithm can eliminate most of multiple data traversals and is easy to be implemented.

international conference on functional programming | 1996

Deriving structural hylomorphisms from recursive definitions

Zhenjiang Hu; Hideya Iwasaki; Masato Takeichi

In functional programming, small programs are often glued together to construct a complex program. Program fusion is an optimizing process whereby these small programs are fused into a single one and intermediate data structures are removed. Recent work has made it clear that this process is especially successful if the recursive definitions are expressed in terms of hylomorphisms. In this paper, we propose an algorithm which can automatically turn all practical recursive definitions into structural hylomorphisms making program fusion be easily applied.

Proceedings of the IFIP TC 2 WG 2.1 international workshop on Algorithmic languages and calculi | 1997

A calculational fusion system HYLO

Y. Onoue; Zhenjiang Hu; Masato Takeichi; Hideya Iwasaki

Fusion, one of the most useful transformation tactics for deriving efficient programs, is the process whereby separate pieces of programs are fused into a single one, leading to an efficient program without intermediate data structures produced. In this paper, we report our on-going investigation on the design and implementation of an automatic transformation system HYLO which performs fusion transformation in a more systematic and more general way than any other systems. The distinguished point of our system is its calculational feature based on simple application of transformation laws rather than traditional search-based transformations.

ACM Transactions on Programming Languages and Systems | 1997

Formal derivation of efficient parallel programs by construction of list homomorphisms

Zhenjiang Hu; Hideya Iwasaki; Masato Takechi

It has been attracting much attention to make use of list homomorphisms in parallel programming because they ideally suit the divide-and-conquer parallel paradigm. However, they have been usually treated rather informally and ad hoc in the development of efficient parallel programs. What is worse is that some interesting functions, e.g., the maximum segment sum problem, are basically not list homomorphisms. In this article, we propose a systematic and formal way for the construction of a list homomorphism for a given problem so that an efficient parallel program is derived. We show, with several well-known but nontrivial problems, how a straightforward, and “obviously” correct, but quite inefficient solution to the problem can be successfully turned into a semantically equivalent “almost list homomorphism.” The derivation is based on two transformations, namely tupling and fusion, which are defined according to the specific recursive structures of list homomorphisms.

International Conference on the Practice and Theory of Automated Timetabling | 2002

Characterizing Feasible Pattern Sets with a Minimum Number of Breaks

Ryuhei Miyashiro; Hideya Iwasaki; Tomomi Matsui

In sports timetabling, creating an appropriate timetable for a round-robin tournament with home–away assignment is a significant problem. To solve this problem, we need to construct home–away assignment that can be completed into a timetable; such assignment is called a feasible pattern set. Although finding feasible pattern sets is at the heart of many timetabling algorithms, good characterization of feasible pattern sets is not known yet. In this paper, we consider the feasibility of pattern sets, and propose a new necessary condition for feasible pattern sets. In the case of a pattern set with a minimum number of breaks, we prove a theorem leading a polynomial-time algorithm to check whether a given pattern set satisfies the necessary condition. Computational experiment shows that, when the number of teams is less than or equal to 26, the proposed condition characterizes feasible pattern sets with a minimum number of breaks.

asian symposium on programming languages and systems | 2009

A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

Shigeyuki Sato; Hideya Iwasaki

Although todays graphics processing units (GPUs) have high performance and general-purpose computing on GPUs (GPGPU) is actively studied, developing GPGPU applications remains difficult for two reasons. First, both parallelization and optimization of GPGPU applications is necessary to achieve high performance. Second, the suitability of the target application for GPGPU must be determined, because whether an application performs well with GPGPU heavily depends on its inherent properties, which are not obvious from the source code. To overcome these difficulties, we developed a skeletal parallel programming framework for rapid GPGPU application developments. It enables programmers to easily write GPGPU applications and rapidly test them because it generates programs for both GPUs and CPUs from the same source code. It also provides an optimization mechanism based on fusion transformation. Its effectiveness was confirmed experimentally.

european symposium on programming | 2002

An Accumulative Parallel Skeleton for All

Zhenjiang Hu; Hideya Iwasaki; Masato Takeichi

Parallel skeletons intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process simpler. However, it is neither easy to develop efficient parallel programs using skeletons nor to use skeletons to manipulate irregular data, and moreover there lacks a systematic way to optimize skeletal parallel programs. To remedy this situation, we propose a novel parallel skeleton, called accumulate, which not only efficiently describes data dependency in computation but also exhibits nice algebraic properties for manipulation. We show that this skeleton significantly eases skeletal parallel programming in practice, efficiently manipulating both regular and irregular data, and systematically optimizing skeletal parallel programs.

european conference on parallel processing | 2004

A fusion-embedded skeleton library

Kiminori Matsuzaki; Kazuhiko Kakehi; Hideya Iwasaki; Zhenjiang Hu; Yoshiki Akashi

This paper addresses a new framework for designing and implementing skeleton libraries, in which each skeleton should not only be efficiently implemented as is usually done, but also be equipped with a structured interface to combine it efficiently with other skeletons. We illustrate our idea with a new skeleton library for parallel programming in C++. It is simple and efficient to use just like other C++ libraries. A distinctive feature of the library is its modularity: Our optimization framework treats newly defined skeletons equally to existing ones if the interface is given. Our current experiments are encouraging, indicating that this approach is promising both theoretically and in practice.

programming language design and implementation | 2011

Automatic parallelization via matrix multiplication

Shigeyuki Sato; Hideya Iwasaki

Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for parallelization by extracting max-operators automatically, and the other improves the performance of parallelized programs by eliminating redundancy. We have also implemented our framework and techniques as a parallelizer in a compiler. Experiments on examples that existing compilers cannot parallelize have demonstrated the scalability of programs parallelized by our implementation.

Explore More