Kento Emoto
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kento Emoto.
scalable information systems | 2006
Kiminori Matsuzaki; Hideya Iwasaki; Kento Emoto; Zhenjiang Hu
With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level parallel programming libraries that use abstraction to hide low-level concerns and reduce difficulties in parallel programming. Among them, libraries of parallel skeletons have emerged as a promising way towards this direction. Unfortunately, these libraries are not well accepted by sequential programmers, because of incomplete elimination of lower-level details, ad-hoc selection of library functions, unsatisfactory performance, or lack of convincing application examples. This paper addresses principle of designing skeleton libraries of parallel programming and reports implementation details and practical applications of a skeleton library SkeTo. The SkeTo library is unique in its feature that it has a solid theoretical foundation based on the theory of Constructive Algorithmics, and is practical to be used to describe various parallel computations in a sequential manner.
implementation and application of functional languages | 2009
Kiminori Matsuzaki; Kento Emoto
Developing efficient parallel programs is more difficult and complicated than developing sequential ones. Skeletal parallelism is a promising methodology for easy parallel programming in which users develop parallel programs by composing ready-made components called parallel skeletons. We developed a parallel skeleton library SkeTo that provides parallel skeletons implemented in C++ andMPI for distributedmemory environments. In the new version of the library, the implementation of the parallel skeletons for lists is improved so that the skeletons equip themselves with fusion optimization. The optimization mechanism is implemented based on the programming technique called expression templates. In this paper, we illustrate the improved design and implementation of parallel skeletons for lists in the SkeTo library.
european symposium on programming | 2012
Kento Emoto; Sebastian Fischer; Zhenjiang Hu
MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensively studied in the functional programming and program calculation fields. We show that MapReduce can be equipped with a programming theory in calculational form. By integrating the generate-and-test programing paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way. We give nontrivial examples that demonstrate how to apply our framework.
international conference on computational science | 2007
Kazuhiko Kakehi; Kiminori Matsuzaki; Kento Emoto
A new approach for fast parallel reductions on trees over distributed memory environments is proposed. By employing serialized trees as the data representation, our algorithm has a communication-efficient BSP implementation regardless of the shapes of inputs. The prototype implementation supports its real efficacy.
european conference on parallel processing | 2007
Kento Emoto; Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi
Skeletal parallel programming enables us to develop parallel programs easily by composing ready-made components called skeletons. However, a simply-composed skeleton program often lacks efficiency due to overheads of intermediate data structures and communications. Many studies have focused on optimizations by fusing successive skeletons to eliminate the overheads. Existing fusion transformations, however, are too general to achieve adequate efficiency for some classes of problems. Thus, a specific fusion optimization is needed for a specific class. In this paper, we propose a strategy for domain-specific optimization of skeleton programs. In this strategy, one starts with a normal form that abstracts the programs of interest, then develops fusion rules that transform a skeleton program into the normal form, and finally makes efficient parallel implementation of the normal form. We illustrate the strategy with a case study: optimization of skeleton programs involving neighbor elements, which is often seen in scientific computations.
european conference on parallel processing | 2010
Kento Emoto; Zhenjiang Hu; Kazuhiko Kakehi; Kiminori Matsuzaki; Masato Takeichi
A large number of studies have been conducted on parallel skeletons and optimization theorems over skeleton programs to resolve difficulties with parallel programming. However, two nontrivial tasks still remain unresolved when we need nested data structures: The first is composing skeletons to generate and consume them; and the second is applying optimization theorems to obtain efficient parallel programs. In this paper, we propose a novel library called Generators of Generators (GoG) library. It provides a set of primitives, GoGs, to produce nested data structures. A program developed with these GoGs is automatically optimized by the optimization mechanism in the library, so that its asymptotic complexity can be improved. We demonstrate its implementation on the Fortress language and report some experimental results.
International Journal of Parallel Programming | 2014
Kento Emoto; Kiminori Matsuzaki
Skeletal parallel programming is a promising approach to easy parallel programming in which users can build parallel programs simply by combining parts of a given set of ready-made parallel computation patterns called skeletons. There is a trade-off for this easiness in the form of an efficiency problem caused by the compositional style of the programming. One solution to this problem is fusion transformation that optimizes naively composed skeleton programs by eliminating redundant intermediate data structures. Several parallel skeleton libraries have automatic fusion mechanisms. However, there have been no automatic fusion mechanisms proposed for variable-length list (VLL) skeletons, even though such skeletons are useful for practical problems. The main difficulty is that previous fusion mechanisms are not applicable to VLL skeletons, and so the fusion cannot be completed. In this paper, we propose a novel fusion mechanism for VLL skeletons that can achieve both an easy programming interface and complete fusion. The proposed mechanism has been implemented in our skeleton library, SkeTo, by using the expression templates technique, experimental results have shown that it is very effective.
international conference on conceptual structures | 2012
Kento Emoto; Hiroto Imachi
Abstract MapReduce, the de facto standard for large scale data-intensive applications, is a remarkable parallel programming model, allowing for easy parallelization of data intensive computations over many machines in a cloud. As huge tree data such as XML has achieved the status of the de facto standard for representing structured information, the situation calls for effcient MapReduce programs treating such a tree data structure in parallel. However, development of such MapReduce programs has remained a challenge. In this paper, restructuring our previous BSP algorithm for tree reduction computations, we propose a new MapReduce algorithm that can be used to implement various tree computations such as XPath queries. Our algorithm is designed to achieve linear speedup even for extreme inputs, and our experimental result shows that our prototype implementation actually achieves linear speedup even for monadic trees.
Formal Aspects of Computing | 2012
Kento Emoto; Sebastian Fischer; Zhenjiang Hu
We show that MapReduce, the de facto standard for large scale data-intensive parallel programming, can be equipped with a programming theory in calculational form. By integrating the generate-and-test programming paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way.
Proceedings of the fourth international workshop on High-level parallel programming and applications | 2010
Kiminori Matsuzaki; Kento Emoto
Recent computing environments achieving high performance with parallelism call for methodology for easy parallel programming, and skeletal parallel programming is such a methodology. There have been many studies on the development of parallel skeleton libraries and optimization for skeletal programs, but not so many studies have been done about applying the skeletal parallel programming to real applications. We implemented a BiCGStab method, which is widely used for solving systems of linear equations, with parallel skeletons provided in the parallel skeleton library SkeTo. First we implemented two skeletal programs, then applied optimization techniques, and finally developed efficient skeletal programs compared with the original sequential program. Through the implementation, optimization, and experiments of the skeletal programs, we obtained several lessons for realizing efficient and easy-to-use skeleton libraries. In this paper, we report the development of skeletal programs for the BiCGStab method and summarize the lessons obtained through the process.