Soo-Mook Moon
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Soo-Mook Moon.
international conference on parallel architectures and compilation techniques | 1999
Byung-Sun Yang; Soo-Mook Moon; Seong-Bae Park; Junpyo Lee; Seungil Lee; Jinpyo Park; Yoo C. Chung; Suhyun Kim; Kemal Ebcioglu; Erik R. Altman
For network computing on desktop machines, fast execution of Java bytecode programs is essential because these machines are expected to run substantial application programs written in Java. Higher Java performance can be achieved by just-in-time (JIT) compilers which translate the stack-based bytecode into register-based machine code on demand. One crucial problem in Java JIT compilation is how to map and allocate stack entries and local variables into registers efficiently and quickly, so as to improve the Java performance. This paper introduces LaTTe, a Java JIT compiler that performs fast and efficient register mapping and allocation for RISC machines. LaTTe first translates the bytecode into pseudo RISC code with symbolic registers, which is then register allocated while coalescing those copies corresponding to pushes and pops between local variables and the stack. The LaTTe JVM also includes an enhanced object model, a lightweight monitor, a fast mark-and-sweep garbage collector, and an on-demand exception handling mechanism, all of which are closely coordinated with LaTTes JIT compilation.
ACM Sigarch Computer Architecture News | 1999
Byung-Sun Yang; Junpyo Lee; Jinpyo Park; Soo-Mook Moon; Kemal Ebcioglu; Erik R. Altman
| This paper introduces the lightweight monitor in Java VM that is fast on single-threaded programs as well as on multi-threaded programs with little lock contention. A 32-bit lock is embedded into each object for eecient access while the lock queue and the wait set is managed through a hash table. The lock manipulation code is highly optimized and inlined by our Java VM JIT compiler called LaTTe wherever the lock is accessed. In most cases, only 9 SPARC instructions are spent for lock acquisition and 5 instructions for lock release. Our experimental results indicate that the lightweight monitor is faster than the monitor in the latest SUN JDK 1.2 Release Candidate 1 by up to 21 times in the absence of lock contention and by up to 7 times in the presence of lock contention.
international conference on parallel processing | 1993
Soo-Mook Moon; Kemal Ebcioglu
Instruction-level parallelism in non-numerical code character as leading to small speeduo (as little) due to its irregularity. Recently, we have developed a new static scheduling algorithm called selective scheduling which can be used as a component of VLIW and superscalar compilers to exploit the irregular parallelism.
international conference on parallel processing | 1993
Soo-Mook Moon
Sequential execution of conditional branches in non-numerical code limits the exploitation of instruction-level parallelism (ILP). In order to cope with this limiation, exploitation of parallelism must be extended to concurrent execution of data and branches in a single cycle.
IEEE Transactions on Parallel and Distributed Systems | 2007
Byung-Sun Yang; Junpyo Lee; Seungil Lee; Seongbae Park; Yoo C. Chung; Suhyun Kim; Kemal Ebcioglu; Erik R. Altman; Soo-Mook Moon
Java just-in-time (JIT) compilers improve the performance of a Java virtual machine (JVM) by translating Java bytecode into native machine code on demand. One important problem in Java JIT compilation is how to map stack entries and local variables to registers efficiently and quickly, since register-based computations are much faster than memory-based ones, while JIT compilation overhead is part of the whole running time. This paper introduces LaTTe, an open-source Java JIT compiler that performs fast generation of efficiently register-mapped RISC code. LaTTe first maps all local variables and stack entries into pseudoregisters, followed by real register allocation which also coalesces copies corresponding to pushes and pops between local variables and stack entries aggressively. Our experimental results indicate that LaTTes sophisticated register mapping and allocation really pay off, achieving twice the performance of a naive JIT compiler that maps all local variables and stack entries to memory. It is also shown that LaTTe makes a reasonable trade-off between quality and speed of register mapping and allocation for the bytecode. We expect these results will also be beneficial to parallel and distributed Java computing: 1) by enhancing single-thread Java performance; and 2) by significantly reducing the number of memory accesses which the rest of the system must properly order to maintain coherence and keep threads synchronized
The Computer Journal | 1998
Soo-Mook Moon; Kemal Ebcioglu
Modern single-CPU microprocessors exploit instruction-level parallelism (ILP) by deriving their performance advantage mainly from parallel execution of ALU and memory instructions within a single clock cycle. This performance advantage obtained by exploiting data ILP is severely offset by sequential execution of conditional branches, especially in branch-intensive non-numerical code. Consequently, branch ILP must also be exploited by executing branches and data instructions in parallel. This requires compilation support for scheduling branches as well as architectural support for executing branches and data instructions in the same cycle. This paper performs a comprehensive empirical study aimed at evaluating the performance impact of exploiting branch ILP using a representation of ILP code called tree representation, which has been proposed by Nicolau [A. Nicolau (1985), Technical Report TR-85-678, Cornell University, Ithaca, NY] and Ebcioglu to exploit branch ILP in the most generalized form. Our results indicate that exploiting branch ILP can enhance performance substantially (i.e., as much as a geometric mean of speedup 4.5 in the 16-ALU machine, compared to the base speedup 3.0) and that the performance benefit comes not only from the intended parallel execution but from the decrease of useless speculative execution due to earlier scheduling of branches.
International Journal of High Speed Computing | 1997
Sam H. Noh; Soo-Mook Moon
This paper presents an algorithm for the Gaussian elimination problem that reduces the length of the critical path compared to the algorithm of Lord et al. This is done by redefining the notion of a task. For all practical purposes, the issues of communication overhead and pivoting cannot be overlooked. We consider these issues for the new algorithm as well. Timing results of this algorithm as executed on the CM-2 model of the Connection Machine are presented. Another contribution of this paper is the use of logical pivoting for stable computation of the Gaussian elimination algorithm. Pivoting is essential in producing stable results. When pivoting occurs, an interchange of two rows is required. A physical interchange of the values can be avoided by providing a permutation vector in a globally accessible location. We show experimental results that substantiate the use of logical pivoting.
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism | 1993
Soo-Mook Moon; Kemal Ebcioglu; Ashok K. Agrawala
international conference on human-computer interaction | 1998
Byung-Sun Yang; Junpyo Lee; Kemal Ebcioglu; Jinpyo Park; Soo-Mook Moon
international conference on parallel processing | 1992
Sam H. Noh; Soo-Mook Moon; Ashok K. Agrawala