Toshihiko Koju
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Toshihiko Koju.
acm international conference on systems and storage | 2012
Toshihiko Koju; Xin Tong; Ali I. Sheikh; Moriyoshi Ohara; Toshio Nakatani
A dynamic binary translator (DBT) is a runtime system that translates binary code on the fly, for example to emulate the execution of the binary code on a processor with a different instruction set. One of the major sources of the overhead is the resolution of the branch target addresses for indirect branch instructions. Previous work has addressed this problem for a single virtual address space, but none has addressed it for multiple virtual address spaces in the context of the system-level DBT. This is challenging for compiler optimizations because the compiler cannot compute the virtual addresses of the branch targets for indirect branches at compile-time since they are affected by the runtime states of the emulated TLB. In this paper, we propose a new compiler optimization technique to address the problem for a system-level DBT. Our key idea is to use an offset from the virtual address of each page that contains a branch instruction, since this offset is not affected by the emulated TLB. We found that the compiler can often compute the offset using compile-time constants and that this approach significantly simplifies the guard code necessary for an indirect branch. We implemented this technique in a compiler of a system-level DBT for the z/Architecture. Our experimental results showed our technique can reduce the execution times of the CBW2 benchmarks, part of the standard LSPR benchmark, by up to 5.9% and 2.5% on average. Our analysis indicated that our technique was able to optimize 3.8% of the total dynamic instructions in the original binary code, while completely removing the guard code for 98.9% of these indirect branches.
ACM Transactions on Architecture and Code Optimization | 2015
Xin Tong; Toshihiko Koju; Motohiro Kawahito; Andreas Moshovos
The emulation speed of a full system emulator (FSE) determines its usefulness. This work quantitatively measures where time is spent in QEMU [Bellard 2005], an industrial-strength FSE. The analysis finds that memory emulation is one of the most heavily exercised emulator components. For workloads studied, 38.1% of the emulation time is spent in memory emulation on average, even though QEMU implements a software translation lookaside buffer (STLB) to accelerate dynamic address translation. Despite the amount of time spent in memory emulation, there has been no study on how to further improve its speed. This work analyzes where time is spent in memory emulation and studies the performance impact of a number of STLB optimizations. Although there are several performance optimization techniques for hardware TLBs, this work finds that the trade-offs with an STLB are quite different compared to those with hardware TLBs. As a result, not all hardware TLB performance optimization techniques are applicable to STLBs and vice versa. The evaluated STLB optimizations target STLB lookups, as well as refills, and result in an average emulator performance improvement of 24.4% over the baseline.
symposium on code generation and optimization | 2016
Toshihiko Koju; Reid T. Copeland; Motohiro Kawahito; Moriyoshi Ohara
In this paper, we show a binary optimizer can achieve competitive performance relative to a state-of-the-art source code compiler by re-constructing high-level information (HLI) from binaries. Recent advances in compiler technologies have resulted in a large performance gap between binaries compiled with old compilers and those compiled with latest ones. This motivated us to develop a binary optimizer for old binaries using a compiler engine for a latest source code compiler. However, a traditional approach to naively convert machine instructions into an intermediate representation (IR) of the compiler engine, does not allow us to take full advantage of optimization techniques available in the compiler. This is because the HLI, such as information about variables and their data types, is not available in such an IR. To address this issue, we have devised a technique to re-construct the HLI from binaries by using contextual information. This contextual information is a set of knowledge about specific compilation technologies, such as the conventions of data structures, the patterns of instruction sequences, and the semantics of runtime routines. With this technique, our binary optimizer has improved the performance of binaries generated from an older compiler by 40.1% on average in the CPU time for a set of benchmarks, which is close to the one due to a source-code recompilation with the same compiler engine, 55.2% on average.
Archive | 2015
Toshihiko Koju; Takuya Nakaike
Archive | 2013
Toshihiko Koju; Ali I. Sheikh; Xin Tong
Archive | 2010
Toshihiko Koju; Takuya Nakaike
Archive | 2011
Toshihiko Koju; Takuya Nakaike; Ali I. Sheikh; Harold W. Cain; Maged M. Michael
Archive | 2016
Toshihiko Koju; Ali I. Sheikh
Archive | 2015
Steven Cooper; Reid T. Copeland; Toshihiko Koju; Roger H. E. Pett; Trong Truong
Archive | 2015
Reid T. Copeland; Toshihiko Koju