Masahiro Yasugi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masahiro Yasugi is active.

Explore More

Publication

Featured researches published by Masahiro Yasugi.

international conference on supercomputing | 1992

ABCL/onEM-4 : a new software/hardware architecture for object-oriented concurrent computing on an extended dataflow supercomputer

Masahiro Yasugi; Satoshi Matsuoka; Akinori Yonezawa

The trend towards object-oriented software construction is becoming more and more prevalent, and parallel programming cannot be an exception. In the context of parallel computation, it is often natural to model the computation as message passing between autonomous, concurrently active objects. The problem was, as some previous studies had indicated, that the overhead from message reception to dynamic method dispatching consumes a significant amount of execution time (e.g., as much as 4000 machine cycles or 500 μseconds at 8 MHz block for some language/hardware combination). Our <bold>ABCL/onEM-4</bold), a software/hardware implementation architecture for a concurrent object-oriented language, overcomes this problem with technologies such as address-specifiable reactive packet-driven architecture, zero-overhead context switching, and packet-driven allocation of message boxes. Preliminary performance measurements on a real hardware EM-4 confirm our claim, achieving the performance of up to nearly 10 μseconds (130 clocks) total for a remote object-creation followed by a request message send to the created object and a reply reception from the object, for a 12.5 MHz clock speed. Our results indicate that the concurrent object-oriented computational model and languages are highly viable with proper implementational software/hardware architectures.

compiler construction | 2006

Lightweight lexical closures for legitimate execution stack access

Masahiro Yasugi; Tasuku Hiraishi; Taiichi Yuasa

We propose a new language concept called “L-closures” for a running program to legitimately inspect/modify the contents of its execution stack. L-closures are lightweight lexical closures created by evaluating nested function definitions. A lexical closure can access the lexically-scoped variables in the creation-time environment and indirect calls to it provide legitimate stack access. By using an intermediate language extended with L-closures in high-level compilers, high-level services such as garbage collection, check-pointing, multithreading and load balancing can be implemented elegantly and efficiently. Each variable accessed by an L-closure uses private and shared locations for giving the private location a chance to get a register. Operations to keep coherency with shared locations as well as operations to initialize L-closures are delayed until an L-closure is actually invoked. Because most high-level services create L-closures very frequently but call them infrequently (e.g., to scan roots in garbage collection), the total overhead can be reduced significantly. Since the GNU C compiler provides nested functions, we enhanced GCC at relatively low implementation costs. The results of performance measurements exhibit quite low costs of creating and maintaining L-closures.

ieee international conference on high performance computing data and analytics | 2003

Pursuing Laziness for Efficient Implementation of Modern Multithreaded Languages

Seiji Umatani; Masahiro Yasugi; Tsuneyasu Komiya; Taiichi Yuasa

Modern multithreaded languages are expected to support advanced features such as thread identification for Java-style locks and dynamically-scoped synchronization coupled with exception handling. However, supporting these features has been considered to degrade the effectiveness of existing efficient implementation techniques for fine-grained fork/join multithreaded languages, e.g., lazy task creation. This paper proposes efficient implementation techniques for an extended Java language OPA with the above advanced features. Our portable implementation in C achieves good performance by pursuing ‘laziness’ not only for task creation but also stealable continuation creation, thread ID allocation, and synchronizer creation.

international symposium on object component service oriented real time distributed computing | 2005

Real-time GC in JeRTy/spl trade/VM using the return-barrier method

Hideaki Saiki; Yoshiharu Konaka; Tsuneyasu Komiya; Masahiro Yasugi; Taiichi Yuasa

Runtime systems for applications written in certain languages such as Java/spl trade/ and Lisp usually have a garbage collection (GC) feature to make efficient use of memory. With this feature, the system, before running out of memory space, automatically reclaims unnecessary fragments of memory at one burst and recycles them for use by applications. In conventional implementations, however, the system has to suspend the execution of applications while carrying out GC, which renders GC generally unsuitable for real-time processing systems. As a solution to this problem, Yuasa, et al. of Kyoto University developed the return-barrier method based on snapshot GC. Return-barrier enhances real-time GC processing over the original snapshot GC. Some reports have been published on the evaluation of the return-barrier method in the Lisp environment such as Kyoto Common Lisp (KCL). In this paper, we report the implementation of the return-barrier method for JeRTy/spl trade/VM, which is a real-time environment for embedded Java applications developed by OMRON Corporation, and its performance improvement over the conventional GC in JeRTyVM.

Journal of Information Processing | 2012

Parallel Graph Traversals using Work-Stealing Frameworks for Many-core Platforms

Masahiro Yasugi; Tasuku Hiraishi; Seiji Umatani; Taiichi Yuasa

Parallel programming/execution frameworks for many/multi-core platforms should support as many applications as possible. In general, work-stealing frameworks provide efficient load balancing even for irregular parallel applications. Unfortunately, naive parallel programs which traverse graph-based data structures (e.g., for constructing spanning trees) cause stack overflow or unacceptable load imbalance. In this study, we develop parallel programs to perform probabilistically balanced divide-and-conquer graph traversals. We propose a programming technique for accumulating overflowed calls for the next iteration of repeated parallel stages. In an emerging backtracking-based work-stealing framework called “Tascell, ” which features on-demand concurrency, we propose a programming technique for long-term exclusive use of workspaces, leading to a similar technique also in the Cilk framework.

international conference on parallel architectures and compilation techniques | 1998

Eliminating bottlenecks on parallel systems using adaptive objects

Masahiro Yasugi; Shigeyuki Eguchi; Kazuo Taki

Dynamic method replacement is a new technique to eliminate bottlenecks (e.g., around a root of tree structure) using adaptive objects for concurrent accesses. The technique eliminates the frequency of mutual exclusion and remote message passing by dynamically increasing the number of read only methods and the immutable part of objects. The results of performance measurements on both shared memory and distributed memory parallel architectures indicate the effectiveness of our approach to bottleneck elimination.

international symposium on object component service oriented real time distributed computing | 2008

Replication-Based Incremental Compaction

Tomoharu Ugawa; Masahiro Yasugi; Taiichi Yuasa

We propose an incremental compaction algorithm. Our compactor selects a continuous area of the heap and evacuates it by incrementally copying all objects in the area to the rest of the heap. After all objects have been copied, our compactor incrementally updates pointers pointing into the evacuated area. During these processes, each original object and its copy are kept consistent. We implemented the compactor together with a snapshot garbage collector in the KVM. Our measurements show that (1) the largest free chunk is almost always more than 20% as large as the entire heap when the heap is twice as large as the maximum amount of live objects, (2) the runtime overhead is less than 20%, and (3) the maximum pause time caused by the compactor is comparable to that caused by the snapshot collector.

asian symposium on programming languages and systems | 2003

Lazy Stack Copying and Stack Copy Sharing for the Efficient Implementation of Continuations

Tomoharu Ugawa; Nobuhisa Minagawa; Tsuneyasu Komiya; Masahiro Yasugi; Taiichi Yuasa

In order to capture first-class continuations, most stack-based implementations copy contents of the stack to the heap. While various implementation strategies for copying have been proposed, many implementations employ the stack strategy. With this strategy, the entire stack contents is copied to the heap whenever a continuation is captured. This simple strategy is easy to implement and can be used for implementations with foreign language interface. However, this strategy requires a lot of time and memory for creation and invocation of continuations. We propose a lazy stack copying technique. The contents of the stack to copy are preserved on the stack until the function returns that has captured the continuation. So we delay stack copying for the continuation until the function returns. We can avoid stack copying if it is detected that the continuation has become garbage before the function returns. In addition, we propose stack copy sharing, which is realized by using lazy stack copying. We present three models for stack copy sharing. We applied these techniques to Scheme systems and found that the proposed techniques improve runtime and memory efficiency of programs that use first-class continuations.

workshop on program analysis for software tools and engineering | 2013

A proper performance evaluation system that summarizes code placement effects

Masahiro Yasugi; Yuki Matsuda; Tomoharu Ugawa

The growing complexity of underlying systems such as memory hierarchies and speculation mechanisms are making it difficult to perform proper performance evaluations. This is a serious problem especially when we want to know the overheads of adding new functionality to existing languages (or systems/applications), or to know small changes in performance caused by small changes to programs. A problem is that equivalent executable programs, which only differ in their instruction addresses (code placement), often exhibit significantly different performance. This difference can be explained by the fact that code placement affects the underlying branch predictors and instruction cache subsystems. By taking into account such code placement effects, this paper proposes a proper evaluation scheme that cancels accidental factors in code placement by statistically summarizing the performance of a sufficient number of artificial programs that differ from the evaluation target program (almost) only in their code placement. We developed a system, called Code Shaker, that supports performance evaluations based on the proposed scheme.

international parallel and distributed processing symposium | 2016

Reducing Redundant Search in Parallel Graph Mining Using Exceptions

Shingo Okuno; Tasuku Hiraishi; Hiroshi Nakashima; Masahiro Yasugi; Jun Sese

This paper proposes an implementation of a parallel graph mining algorithm using a task-parallel language having exception handling features. The performance of a straightforward task-parallel implementation is poor for many practical backtrack search algorithms due to pruning employed in them, a worker may prune a useless subtree, which is pruned before traversal in the sequential search algorithms for search space reduction, after another worker starts the traversal of it resulting in a large amount of redundant search. Such redundancy will be significantly reduced by letting a worker know that the subtree which it is traversing is pruned so that it aborts the traversal. This abortion can be implemented elegantly and efficiently using the task-parallel language that has a mechanism for exception handling by which all running parallel tasks in a try block with an exception are automatically aborted. We applied this abort mechanism to the graph mining algorithm called COPINE, which is practically used for drug discovery, using the task-parallel language Tascell. As a result, we reduced the search space by 31.9% and the execution time by 27.4% in a 28-worker execution.

Explore More