Kiminori Matsuzaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kiminori Matsuzaki is active.

Explore More

Publication

Featured researches published by Kiminori Matsuzaki.

scalable information systems | 2006

A library of constructive skeletons for sequential style of parallel programming

Kiminori Matsuzaki; Hideya Iwasaki; Kento Emoto; Zhenjiang Hu

With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level parallel programming libraries that use abstraction to hide low-level concerns and reduce difficulties in parallel programming. Among them, libraries of parallel skeletons have emerged as a promising way towards this direction. Unfortunately, these libraries are not well accepted by sequential programmers, because of incomplete elimination of lower-level details, ad-hoc selection of library functions, unsatisfactory performance, or lack of convincing application examples. This paper addresses principle of designing skeleton libraries of parallel programming and reports implementation details and practical applications of a skeleton library SkeTo. The SkeTo library is unique in its feature that it has a solid theoretical foundation based on the theory of Constructive Algorithmics, and is practical to be used to describe various parallel computations in a sequential manner.

programming language design and implementation | 2007

Automatic inversion generates divide-and-conquer parallel programs

Kazutaka Morita; Akimasa Morihata; Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi

Divide-and-conquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divide-and-conquer paradigm. However, direct programming with list homomorphisms is a challenge for many programmers. In this paper, we propose and implement a novel systemthat can automatically derive cost-optimal list homomorphisms from a pair of sequential programs, based on the third homomorphism theorem. Our idea is to reduce extraction of list homomorphisms to derivation of weak right inverses. We show that a weak right inverse always exists and can be automatically generated from a wide class of sequential programs. We demonstrate our system with several nontrivial examples, including the maximum prefix sum problem, the prefix sum computation, the maximum segment sum problem, and the line-of-sight problem. The experimental results show practical efficiency of our automatic parallelization algorithm and good speedups of the generated parallel programs.

symposium on principles of programming languages | 2009

The third homomorphism theorem on trees: downward & upward lead to divide-and-conquer

Akimasa Morihata; Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi

Parallel programs on lists have been intensively studied. It is well known that associativity provides a good characterization for divide-and-conquer parallel programs. In particular, the third homomorphism theorem is not only useful for systematic development of parallel programs on lists, but it is also suitable for automatic parallelization. The theorem states that if two sequential programs iterate the same list leftward and rightward, respectively, and compute the same value, then there exists a divide-and-conquer parallel program that computes the same value as the sequential programs. While there have been many studies on lists, few have been done for characterizing and developing of parallel programs on trees. Naive divide-and-conquer programs, which divide a tree at the root and compute independent subtrees in parallel, take time that is proportional to the height of the input tree and have poor scalability with respect to the number of processors when the input tree is ill-balanced. In this paper, we develop a method for systematically constructing scalable divide-and-conquer parallel programs on trees, in which two sequential programs lead to a scalable divide-andconquer parallel program. We focus on paths instead of trees so as to utilize rich results on lists and demonstrate that associativity provides good characterization for scalable divide-and-conquer parallel programs on trees. Moreover, we generalize the third homomorphism theorem from lists to trees.We demonstrate the effectiveness of our method with various examples. Our results, being generalizations of known results for lists, are generic in the sense that they work well for all polynomial data structures.

european conference on parallel processing | 2004

A fusion-embedded skeleton library

Kiminori Matsuzaki; Kazuhiko Kakehi; Hideya Iwasaki; Zhenjiang Hu; Yoshiki Akashi

This paper addresses a new framework for designing and implementing skeleton libraries, in which each skeleton should not only be efficiently implemented as is usually done, but also be equipped with a structured interface to combine it efficiently with other skeletons. We illustrate our idea with a new skeleton library for parallel programming in C++. It is simple and efficient to use just like other C++ libraries. A distinctive feature of the library is its modularity: Our optimization framework treats newly defined skeletons equally to existing ones if the interface is given. Our current experiments are encouraging, indicating that this approach is promising both theoretically and in practice.

international conference on parallel processing | 2011

Towards systematic parallel programming over mapreduce

Yu Liu; Zhenjiang Hu; Kiminori Matsuzaki

MapReduce is a useful and popular programming model for data-intensive distributed parallel computing. But it is still a challenge to develop parallel programs with MapReduce systematically, since it is usually not easy to derive a proper divide-and-conquer algorithm that matches MapReduce. In this paper, we propose a homomorphism-based framework named Screwdriver for systematic parallel programming with MapReduce, making use of the program calculation theory of list homomorphisms. Screwdriver is implemented as a Java library on top of Hadoop. For any problem which can be resolved by two sequential functions that satisfy the requirements of the third homomorphism theorem, Screwdriver can automatically derive a parallel algorithm as a list homomorphism and transform the initial sequential programs to an efficient MapReduce program. Users need neither to care about parallelism nor to have deep knowledge of MapReduce. In addition to the simplicity of the programming model of our framework, such a calculational approach enables us to resolve many problems that it would be nontrivial to resolve directly with MapReduce.

acm symposium on parallel algorithms and architectures | 2006

Towards automatic parallelization of tree reductions in dynamic programming

Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi

Tree contraction algorithms, whose idea was first proposed by Miller and Reif, are important parallel algorithms to implement efficient parallel programs manipulating trees. Despite their efficiency, the tree contraction algorithms have not been widely used due to the difficulties in deriving the tree contracting operations. In particular, the derivation of the tree contracting operations is much difficult when multiple values are referred and updated in each step of the contractions. Such computations often appear in dynamic programming problems on trees. In this paper, we propose an algebraic approach to deriving tree contraction programs from recursive tree programs, by focusing on the properties of commutative semirings. We formalize a new condition for implementing tree reductions with the tree contraction algorithms, and give a systematic derivation of the tree contracting operations. Based on it, we implemented a code generator for tree reductions, which has an optimization mechanism that can remove unnecessary computations in the derived parallel programs. As far as we are aware, this is the first step towards an automatic parallelization system for the development of efficient tree programs.

parallel computing | 2006

Parallel skeletons for manipulating general trees

Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi

Trees are important datatypes that are often used in representing structured data such as XML. Though trees are widely used in sequential programming, it is hard to write efficient parallel programs manipulating trees, because of their irregular and ill-balanced structures. In this paper, we propose a solution based on the skeletal approach. We formalize a set of skeletons (abstracted computational patterns) for rose trees (general trees of arbitrary shapes) based on the theory of Constructive Algorithmics. Our skeletons for rose trees are extensions of those proposed for lists and binary trees. We show that we can implement the skeletons efficiently in parallel, by combining the parallel binary-tree skeletons for which efficient parallel implementations are already known. As far as we are aware, we are the first who have formalized and implemented a set of simple but expressive parallel skeletons for rose trees.

Parallel Processing Letters | 2005

SYSTEMATIC DERIVATION OF TREE CONTRACTION ALGORITHMS

Kiminori Matsuzaki; Zhenjiang Hu; Kazuhiko Kakehi; Masato Takeichi

While tree contraction algorithms play an important role in efficient tree computation in parallel, it is difficult to develop such algorithms due to the strict conditions imposed on contracting op...

european conference on parallel processing | 2003

Parallelization with tree skeletons

Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi

Trees are useful data structures, but to design efficient parallel programs over trees is known to be more difficult than to do over lists. Although several important tree skeletons have been proposed to simplify parallel programming on trees, few studies have been reported on how to systematically use them in solving practical problems; it is neither clear how to make a good combination of skeletons to solve a given problem, nor obvious even how to find suitable operators used in a single skeleton. In this paper, we report our first attempt to resolve these problems, proposing two important transformations, the tree diffusion transformation and the tree context preservation transformation. The tree diffusion transformation allows one to use familiar recursive definitions to develop his parallel programs, while the tree context preservation transformation shows how to derive associative operators that are required when using tree skeletons. We illustrate our approach by deriving an efficient parallel program for solving a nontrivial problem called the party planning problem, the tree version of the famous maximum-weight-sum problem.

parallel and distributed computing: applications and technologies | 2010

Systematic Development of Correct Bulk Synchronous Parallel Programs

Louis Gesbert; Zhenjiang Hu; Frédéric Loulergue; Kiminori Matsuzaki; Julien Tesson

With the current generalisation of parallel architectures arises the concern of applying formal methods to parallelism. The complexity of parallel, compared to sequential, programs makes them more error-prone and difficult to verify. Bulk Synchronous Parallelism (BSP) is a model of computation which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism. We propose a framework for refining a sequential specification toward a functional BSP program, the whole process being done with the help of the Coq proof assistant. To do so we define BH, a new homomorphic skeleton, which captures the essence of BSP computation in an algorithmic level, and also serves as a bridge in mapping from high level specification to low level BSP parallel programs.

Explore More