Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Toshihiko Koju is active.

Publication


Featured researches published by Toshihiko Koju.


acm international conference on systems and storage | 2012

Optimizing indirect branches in a system-level dynamic binary translator

Toshihiko Koju; Xin Tong; Ali I. Sheikh; Moriyoshi Ohara; Toshio Nakatani

A dynamic binary translator (DBT) is a runtime system that translates binary code on the fly, for example to emulate the execution of the binary code on a processor with a different instruction set. One of the major sources of the overhead is the resolution of the branch target addresses for indirect branch instructions. Previous work has addressed this problem for a single virtual address space, but none has addressed it for multiple virtual address spaces in the context of the system-level DBT. This is challenging for compiler optimizations because the compiler cannot compute the virtual addresses of the branch targets for indirect branches at compile-time since they are affected by the runtime states of the emulated TLB. In this paper, we propose a new compiler optimization technique to address the problem for a system-level DBT. Our key idea is to use an offset from the virtual address of each page that contains a branch instruction, since this offset is not affected by the emulated TLB. We found that the compiler can often compute the offset using compile-time constants and that this approach significantly simplifies the guard code necessary for an indirect branch. We implemented this technique in a compiler of a system-level DBT for the z/Architecture. Our experimental results showed our technique can reduce the execution times of the CBW2 benchmarks, part of the standard LSPR benchmark, by up to 5.9% and 2.5% on average. Our analysis indicated that our technique was able to optimize 3.8% of the total dynamic instructions in the original binary code, while completely removing the guard code for 98.9% of these indirect branches.


ACM Transactions on Architecture and Code Optimization | 2015

Optimizing Memory Translation Emulation in Full System Emulators

Xin Tong; Toshihiko Koju; Motohiro Kawahito; Andreas Moshovos

The emulation speed of a full system emulator (FSE) determines its usefulness. This work quantitatively measures where time is spent in QEMU [Bellard 2005], an industrial-strength FSE. The analysis finds that memory emulation is one of the most heavily exercised emulator components. For workloads studied, 38.1% of the emulation time is spent in memory emulation on average, even though QEMU implements a software translation lookaside buffer (STLB) to accelerate dynamic address translation. Despite the amount of time spent in memory emulation, there has been no study on how to further improve its speed. This work analyzes where time is spent in memory emulation and studies the performance impact of a number of STLB optimizations. Although there are several performance optimization techniques for hardware TLBs, this work finds that the trade-offs with an STLB are quite different compared to those with hardware TLBs. As a result, not all hardware TLB performance optimization techniques are applicable to STLBs and vice versa. The evaluated STLB optimizations target STLB lookups, as well as refills, and result in an average emulator performance improvement of 24.4% over the baseline.


symposium on code generation and optimization | 2016

Re-constructing high-level information for language-specific binary re-optimization

Toshihiko Koju; Reid T. Copeland; Motohiro Kawahito; Moriyoshi Ohara

In this paper, we show a binary optimizer can achieve competitive performance relative to a state-of-the-art source code compiler by re-constructing high-level information (HLI) from binaries. Recent advances in compiler technologies have resulted in a large performance gap between binaries compiled with old compilers and those compiled with latest ones. This motivated us to develop a binary optimizer for old binaries using a compiler engine for a latest source code compiler. However, a traditional approach to naively convert machine instructions into an intermediate representation (IR) of the compiler engine, does not allow us to take full advantage of optimization techniques available in the compiler. This is because the HLI, such as information about variables and their data types, is not available in such an IR. To address this issue, we have devised a technique to re-construct the HLI from binaries by using contextual information. This contextual information is a set of knowledge about specific compilation technologies, such as the conventions of data structures, the patterns of instruction sequences, and the semantics of runtime routines. With this technique, our binary optimizer has improved the performance of binaries generated from an older compiler by 40.1% on average in the CPU time for a set of benchmarks, which is close to the one due to a source-code recompilation with the same compiler engine, 55.2% on average.


Archive | 2015

Testing optimized binary modules

Toshihiko Koju; Takuya Nakaike


Archive | 2013

Compiling method, program, and information processing apparatus

Toshihiko Koju; Ali I. Sheikh; Xin Tong


Archive | 2010

Recovering from an Error in a Fault Tolerant Computer System

Toshihiko Koju; Takuya Nakaike


Archive | 2011

Code optimization by memory barrier removal and enclosure within transaction

Toshihiko Koju; Takuya Nakaike; Ali I. Sheikh; Harold W. Cain; Maged M. Michael


Archive | 2016

METHOD FOR OPTIMIZING BINARY CODE IN LANGUAGE HAVING ACCESS TO BINARY CODED DECIMAL VARIABLE, AND COMPUTER AND COMPUTER PROGRAM

Toshihiko Koju; Ali I. Sheikh


Archive | 2015

Performance neutral isolation of runtime discrepancies in binary code

Steven Cooper; Reid T. Copeland; Toshihiko Koju; Roger H. E. Pett; Trong Truong


Archive | 2015

CONTROL FLOW GRAPH ANALYSIS

Reid T. Copeland; Toshihiko Koju

Researchain Logo
Decentralizing Knowledge