Masahiro Yasugi
Kyoto University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masahiro Yasugi.
international conference on supercomputing | 1992
Masahiro Yasugi; Satoshi Matsuoka; Akinori Yonezawa
The trend towards object-oriented software construction is becoming more and more prevalent, and parallel programming cannot be an exception. In the context of parallel computation, it is often natural to model the computation as message passing between autonomous, concurrently active objects. The problem was, as some previous studies had indicated, that the overhead from message reception to dynamic method dispatching consumes a significant amount of execution time (e.g., as much as 4000 machine cycles or 500 μseconds at 8 MHz block for some language/hardware combination). Our <bold>ABCL/onEM-4</bold), a software/hardware implementation architecture for a concurrent object-oriented language, overcomes this problem with technologies such as address-specifiable reactive packet-driven architecture, zero-overhead context switching, and packet-driven allocation of message boxes. Preliminary performance measurements on a real hardware EM-4 confirm our claim, achieving the performance of up to nearly 10 μseconds (130 clocks) total for a remote object-creation followed by a request message send to the created object and a reply reception from the object, for a 12.5 MHz clock speed. Our results indicate that the concurrent object-oriented computational model and languages are highly viable with proper implementational software/hardware architectures.
compiler construction | 2006
Masahiro Yasugi; Tasuku Hiraishi; Taiichi Yuasa
We propose a new language concept called “L-closures” for a running program to legitimately inspect/modify the contents of its execution stack. L-closures are lightweight lexical closures created by evaluating nested function definitions. A lexical closure can access the lexically-scoped variables in the creation-time environment and indirect calls to it provide legitimate stack access. By using an intermediate language extended with L-closures in high-level compilers, high-level services such as garbage collection, check-pointing, multithreading and load balancing can be implemented elegantly and efficiently. Each variable accessed by an L-closure uses private and shared locations for giving the private location a chance to get a register. Operations to keep coherency with shared locations as well as operations to initialize L-closures are delayed until an L-closure is actually invoked. Because most high-level services create L-closures very frequently but call them infrequently (e.g., to scan roots in garbage collection), the total overhead can be reduced significantly. Since the GNU C compiler provides nested functions, we enhanced GCC at relatively low implementation costs. The results of performance measurements exhibit quite low costs of creating and maintaining L-closures.
ieee international conference on high performance computing data and analytics | 2003
Seiji Umatani; Masahiro Yasugi; Tsuneyasu Komiya; Taiichi Yuasa
Modern multithreaded languages are expected to support advanced features such as thread identification for Java-style locks and dynamically-scoped synchronization coupled with exception handling. However, supporting these features has been considered to degrade the effectiveness of existing efficient implementation techniques for fine-grained fork/join multithreaded languages, e.g., lazy task creation. This paper proposes efficient implementation techniques for an extended Java language OPA with the above advanced features. Our portable implementation in C achieves good performance by pursuing ‘laziness’ not only for task creation but also stealable continuation creation, thread ID allocation, and synchronizer creation.
international symposium on object component service oriented real time distributed computing | 2005
Hideaki Saiki; Yoshiharu Konaka; Tsuneyasu Komiya; Masahiro Yasugi; Taiichi Yuasa
Runtime systems for applications written in certain languages such as Java/spl trade/ and Lisp usually have a garbage collection (GC) feature to make efficient use of memory. With this feature, the system, before running out of memory space, automatically reclaims unnecessary fragments of memory at one burst and recycles them for use by applications. In conventional implementations, however, the system has to suspend the execution of applications while carrying out GC, which renders GC generally unsuitable for real-time processing systems. As a solution to this problem, Yuasa, et al. of Kyoto University developed the return-barrier method based on snapshot GC. Return-barrier enhances real-time GC processing over the original snapshot GC. Some reports have been published on the evaluation of the return-barrier method in the Lisp environment such as Kyoto Common Lisp (KCL). In this paper, we report the implementation of the return-barrier method for JeRTy/spl trade/VM, which is a real-time environment for embedded Java applications developed by OMRON Corporation, and its performance improvement over the conventional GC in JeRTyVM.
Journal of Information Processing | 2012
Masahiro Yasugi; Tasuku Hiraishi; Seiji Umatani; Taiichi Yuasa
Parallel programming/execution frameworks for many/multi-core platforms should support as many applications as possible. In general, work-stealing frameworks provide efficient load balancing even for irregular parallel applications. Unfortunately, naive parallel programs which traverse graph-based data structures (e.g., for constructing spanning trees) cause stack overflow or unacceptable load imbalance. In this study, we develop parallel programs to perform probabilistically balanced divide-and-conquer graph traversals. We propose a programming technique for accumulating overflowed calls for the next iteration of repeated parallel stages. In an emerging backtracking-based work-stealing framework called “Tascell, ” which features on-demand concurrency, we propose a programming technique for long-term exclusive use of workspaces, leading to a similar technique also in the Cilk framework.
international conference on parallel architectures and compilation techniques | 1998
Masahiro Yasugi; Shigeyuki Eguchi; Kazuo Taki
Dynamic method replacement is a new technique to eliminate bottlenecks (e.g., around a root of tree structure) using adaptive objects for concurrent accesses. The technique eliminates the frequency of mutual exclusion and remote message passing by dynamically increasing the number of read only methods and the immutable part of objects. The results of performance measurements on both shared memory and distributed memory parallel architectures indicate the effectiveness of our approach to bottleneck elimination.
international symposium on object component service oriented real time distributed computing | 2008
Tomoharu Ugawa; Masahiro Yasugi; Taiichi Yuasa
We propose an incremental compaction algorithm. Our compactor selects a continuous area of the heap and evacuates it by incrementally copying all objects in the area to the rest of the heap. After all objects have been copied, our compactor incrementally updates pointers pointing into the evacuated area. During these processes, each original object and its copy are kept consistent. We implemented the compactor together with a snapshot garbage collector in the KVM. Our measurements show that (1) the largest free chunk is almost always more than 20% as large as the entire heap when the heap is twice as large as the maximum amount of live objects, (2) the runtime overhead is less than 20%, and (3) the maximum pause time caused by the compactor is comparable to that caused by the snapshot collector.
asian symposium on programming languages and systems | 2003
Tomoharu Ugawa; Nobuhisa Minagawa; Tsuneyasu Komiya; Masahiro Yasugi; Taiichi Yuasa
In order to capture first-class continuations, most stack-based implementations copy contents of the stack to the heap. While various implementation strategies for copying have been proposed, many implementations employ the stack strategy. With this strategy, the entire stack contents is copied to the heap whenever a continuation is captured. This simple strategy is easy to implement and can be used for implementations with foreign language interface. However, this strategy requires a lot of time and memory for creation and invocation of continuations. We propose a lazy stack copying technique. The contents of the stack to copy are preserved on the stack until the function returns that has captured the continuation. So we delay stack copying for the continuation until the function returns. We can avoid stack copying if it is detected that the continuation has become garbage before the function returns. In addition, we propose stack copy sharing, which is realized by using lazy stack copying. We present three models for stack copy sharing. We applied these techniques to Scheme systems and found that the proposed techniques improve runtime and memory efficiency of programs that use first-class continuations.
workshop on program analysis for software tools and engineering | 2013
Masahiro Yasugi; Yuki Matsuda; Tomoharu Ugawa
The growing complexity of underlying systems such as memory hierarchies and speculation mechanisms are making it difficult to perform proper performance evaluations. This is a serious problem especially when we want to know the overheads of adding new functionality to existing languages (or systems/applications), or to know small changes in performance caused by small changes to programs. A problem is that equivalent executable programs, which only differ in their instruction addresses (code placement), often exhibit significantly different performance. This difference can be explained by the fact that code placement affects the underlying branch predictors and instruction cache subsystems. By taking into account such code placement effects, this paper proposes a proper evaluation scheme that cancels accidental factors in code placement by statistically summarizing the performance of a sufficient number of artificial programs that differ from the evaluation target program (almost) only in their code placement. We developed a system, called Code Shaker, that supports performance evaluations based on the proposed scheme.
international parallel and distributed processing symposium | 2016
Shingo Okuno; Tasuku Hiraishi; Hiroshi Nakashima; Masahiro Yasugi; Jun Sese
This paper proposes an implementation of a parallel graph mining algorithm using a task-parallel language having exception handling features. The performance of a straightforward task-parallel implementation is poor for many practical backtrack search algorithms due to pruning employed in them, a worker may prune a useless subtree, which is pruned before traversal in the sequential search algorithms for search space reduction, after another worker starts the traversal of it resulting in a large amount of redundant search. Such redundancy will be significantly reduced by letting a worker know that the subtree which it is traversing is pruned so that it aborts the traversal. This abortion can be implemented elegantly and efficiently using the task-parallel language that has a mechanism for exception handling by which all running parallel tasks in a try block with an exception are automatically aborted. We applied this abort mechanism to the graph mining algorithm called COPINE, which is practically used for drug discovery, using the task-parallel language Tascell. As a result, we reduced the search space by 31.9% and the execution time by 27.4% in a 28-worker execution.
Collaboration
Dive into the Masahiro Yasugi's collaboration.
National Institute of Advanced Industrial Science and Technology
View shared research outputs