Is this you? Create Your Porfile

Shigeyuki Sato

University of Electro-Communications

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shigeyuki Sato is active.

Explore More

Publication

Featured researches published by Shigeyuki Sato.

asian symposium on programming languages and systems | 2009

A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

Shigeyuki Sato; Hideya Iwasaki

Although todays graphics processing units (GPUs) have high performance and general-purpose computing on GPUs (GPGPU) is actively studied, developing GPGPU applications remains difficult for two reasons. First, both parallelization and optimization of GPGPU applications is necessary to achieve high performance. Second, the suitability of the target application for GPGPU must be determined, because whether an application performs well with GPGPU heavily depends on its inherent properties, which are not obvious from the source code. To overcome these difficulties, we developed a skeletal parallel programming framework for rapid GPGPU application developments. It enables programmers to easily write GPGPU applications and rapidly test them because it generates programs for both GPUs and CPUs from the same source code. It also provides an optimization mechanism based on fusion transformation. Its effectiveness was confirmed experimentally.

programming language design and implementation | 2011

Automatic parallelization via matrix multiplication

Shigeyuki Sato; Hideya Iwasaki

Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for parallelization by extracting max-operators automatically, and the other improves the performance of parallelized programs by eliminating redundancy. We have also implemented our framework and techniques as a parallelizer in a compiler. Experiments on examples that existing compilers cannot parallelize have demonstrated the scalability of programs parallelized by our implementation.

International Journal of Parallel Programming | 2016

A Generic Implementation of Tree Skeletons

Shigeyuki Sato; Kiminori Matsuzaki

In data-parallel skeleton libraries, the implementation of skeletons is usually tightly-coupled with that of data structures. However, loose coupling between them like C++ STL will improve modularity and flexibility of skeletons and data structures. This flexibility is particularly valuable for tree skeletons. To achieve such loose coupling, we present an iterator-based interface of trees for tree skeletons. We have implemented tree skeletons on the basis of our interface; we present their design and implementation. This paper also reports the results of preliminary experiments.

arXiv: Databases | 2018

Parallelization of XPath Queries Using Modern XQuery Processors

Shigeyuki Sato; Wei Hao; Kiminori Matsuzaki

A practical and promising approach to parallelizing XPath queries was proposed by Bordawekar et al. in 2009, which enables parallelization on top of existing XML database engines. Although they experimentally demonstrated the speedup by their approach, their practice has already been out of date because the software environment has largely changed with the capability of XQuery processing. In this work, we implement their approach in two ways on top of a state-of-the-art XML database engine and experimentally demonstrate that our implementations can bring significant speedup on a commodity server.

New Generation Computing | 2018

On Implementing the Push-Relabel Algorithm on Top of Pregel

Shigeyuki Sato

AbstractAlthough the vertex-centric model originating from Pregel has been well studied, its applications are limited and biased. While PageRank, which can be formalized as a network flow problem, is one of the most common (or overused) applications, the maximum flow problem, which is the most fundamental problem on flow networks, has never been dealt with. In this work, in order to analyze the applicability of the vertex-centric model, we have implemented the push-relabel algorithm for efficiently solving the maximum flow problem on top of Pregel. This paper presents our implementation involving important heuristics, analyzes the applicability of the vertex-centric model, and reports experimental results on our implementation.

Journal of Information Processing | 2017

A Generator of Hadoop MapReduce Programs that Manipulate One-dimensional Arrays

Reina Miyazaki; Kiminori Matsuzaki; Shigeyuki Sato

MapReduce is a framework for large-scale data processing proposed by Google, and its open-source implementation, Hadoop MapReduce, is now widely used. Several language systems have been proposed to make developing MapReduce programs easier, for instance, Sawzall, FlumeJava, Pig, Hive, and Crunch. These language systems mainly target applications that can be naturally solved by using a MapReduce-like programming model. In this study, we propose a new MapReduce-program generator that accepts programs manipulating one-dimensional arrays. By using the proposed generator, users only need to write sequential programs to generate Hadoop MapReduce programs automatically. We applied some program optimization techniques to the generation of Hadoop MapReduce programs. In this paper, we also report our experiment results that compare programs generated by the proposed generator with hand-written MapReduce programs.

asian symposium on programming languages and systems | 2016

A Debugger-Cooperative Higher-Order Contract System in Python

Ryoya Arai; Shigeyuki Sato; Hideya Iwasaki

Contract programming is one of the most promising ways of enhancing the reliability of Python, which becomes increasingly desired. Higher-order contract systems that support fully specifying the behaviors of iterators and functions are desirable for Python but have not been presented yet. Besides, even with them, debugging with contracts in Python would still be burdensome because of delayed contract checking. To resolve this problem, we present PyBlame, a higher-order contract system in Python, and ccdb, a source-level debugger equipped with features dedicated to debugging with delayed contract checking. PyBlame and ccdb are designed on the basis of the standard of Python and thus friendly to many Python programmers. We have experimentally confirmed the advantage and the efficacy of PyBlame and ccdb through the web framework Bottle.

international conference on parallel processing | 2015

Efficient Use of Hardware Transactional Memory for Parallel Mesh Generation

Tetsu Kobayashi; Shigeyuki Sato; Hideya Iwasaki

Efficient transactional executions are desirable for parallel implementations of algorithms with graph refinements. Hardware transactional memory (HTM) is promising for easy yet efficient transactional executions. Long HTM transactions, however, abort with high probability because of hardware limitations. Unfortunately, Delaunay mesh refinement (DMR), which is an algorithm with graph refinements for mesh generation, causes long transactions. Its parallel implementation naively based on HTM therefore leads to poor performance. To utilize HTM efficiently for parallel implementation of DMR, we present an approach to shortening transactions. Our HTM based implementations of DMR achieved significantly higher throughput and better scalability than a naive HTM-based one and lock-based ones. On a quad-core Has well processor, the absolute speedup of one of our implementations was up to 2.64 with 16 threads.

asian symposium on programming languages and systems | 2014

Syntax-Directed Divide-and-Conquer Data-Flow Analysis

Shigeyuki Sato; Akimasa Morihata

Link-time optimization, with which GCC and LLVM are equipped, generally deals with large-scale procedures because of aggressive procedure inlining. Data-flow analysis (DFA), which is an essential computation for compiler optimization, is therefore desired to deal with large-scale procedures. One promising approach to the DFA of large-scale procedures is divide-and-conquer parallelization. However, DFA on control-flow graphs is difficult to divide and conquer. If we perform DFA on abstract syntax trees (ASTs) in a syntax-directed manner, the divide and conquer of DFA becomes straightforward, owing to the recursive structure of ASTs, but then nonstructural control flow such as goto/label becomes a problem. In order to resolve it, we have developed a novel syntax-directed method of DFA on ASTs that can deal with goto/label and is ready to divide-and-conquer parallelization. We tested the feasibility of our method experimentally through prototype implementations and observed that our prototype achieved a significant speedup.

Sigplan Notices | 2015