Kazuaki Ishizaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazuaki Ishizaki is active.

Explore More

Publication

Featured researches published by Kazuaki Ishizaki.

conference on object-oriented programming systems, languages, and applications | 2000

A study of devirtualization techniques for a Java Just-In-Time compiler

Kazuaki Ishizaki; Motohiro Kawahito; Toshiaki Yasue; Hideaki Komatsu; Toshio Nakatani

Many devirtualization techniques have been proposed to reduce the runtime overhead of dynamic method calls for various object-oriented languages, however, most of them are less effective or cannot be applied for Java in a straightforward manner. This is partly because Java is a statically-typed language and thus transforming a dynamic call to a static one does not make a tangible performance gain (owing to the low overhead of accessing the method table) unless it is inlined, and partly because the dynamic class loading feature of Java prohibits the whole program analysis and optimizations from being applied.We propose a new technique called direct devirtualization with the code patching mechanism. For a given dynamic call site, our compiler first determines whether the call can be devirtualized, by analyzing the current class hierarchy. When the call is devirtualizable and the target method is suitably sized, the compiler generates the inlined code of the method, together with the backup code of making the dynamic call. Only the inlined code is actually executed until our assumption about the devirtualization becomes invalidated, at which time the compiler performs code patching to make the backup code executed subsequently. Since the new technique prevents some code motions across the merge point between the inlined code and the backup code, we have furthermore implemented recently-known analysis techniques, such as type analysis and preexistence analysis, which allow the backup code to be completely eliminated. We made various experiments using 16 real programs to understand the effectiveness and characteristics of the devirtualization techniques in our Java Just-In-Time (JIT) compiler. In summary, we reduced the number of dynamic calls by ranging from 8.9% to 97.3% (the average of 40.2%), and we improved the execution performance by ranging from -1% to 133% (with the geometric mean of 16%).

Ibm Journal of Research and Development | 2010

Workload and network-optimized computing systems

David P. LaPotin; Shahrokh Daijavad; Charles L. Johnson; Steven W. Hunter; Kazuaki Ishizaki; Hubertus Franke; Heather D. Achilles; Dan Peter Dumarot; Nancy Anne Greco; Bijan Davari

This paper describes a recent system-level trend toward the use of massive on-chip parallelism combined with efficient hardware accelerators and integrated networking to enable new classes of applications and computing-systems functionality. This system transition is driven by semiconductor physics and emerging network-application requirements. In contrast to general-purpose approaches, workload and network-optimized computing provides significant cost, performance, and power advantages relative to historical frequency-scaling approaches in a serial computational model. We highlight the advantages of on-chip network optimization that enables efficient computation and new services at the network edge of the data center. Software and application development challenges are presented, and a service-oriented architecture application example is shown that characterizes the power and performance advantages for these systems. We also discuss a roadmap for next-generation systems that proportionally scale with future networking bandwidth growth rates and employ 3-D chip integration methods for design flexibility and modularity.

Proceedings of the ACM 1999 conference on Java Grande | 1999

Design, implementation, and evaluation of optimizations in a just-in-time compiler

Kazuaki Ishizaki; Motohiro Kawahito; Toshiaki Yasue; Mikio Takeuchi; Takeshi Ogasawara; Toshio Suganuma; Tamiya Onodera; Hideaki Komatsu; Toshio Nakatani

The Java language incurs a runtime overhead for exception checks and object accesses without an interior pointer in order to ensure safety. It also requires type inclusion test, dynamic class loading, and dynamic method calls in order to ensure flexibility. A “JustIn-Time” (JIT) compiler generates native code from Java byte code at runtime. It must improve the runtime performance without compromising the safety and flexibility of the Java language. We designed and implemented effective optimizations for the JIT compiler, such as exception check elimination, common subexpression elimination, simple type inclusion test, method inlining, and resolution of dynamic method call. We evaluate the performance benefits of these optimizations based on various statistics collected using SPECjvm98 and two JavaSoft applications with byte code sizes ranging from 20000 to 280000 bytes. Each optimization contributes to an improvement in the performance of the programs.

Concurrency and Computation: Practice and Experience | 2000

Design, implementation, and evaluation of optimizations in a JavaTM Just-In-Time compiler

Kazuaki Ishizaki; Motohiro Kawahito; Toshiaki Yasue; Mikio Takeuchi; Takeshi Ogasawara; Toshio Suganuma; Tamiya Onodera; Hideaki Komatsu; Toshio Nakatani

The Java language incurs a runtime overhead for exception checks and object accesses, which are executed without an interior pointer in order to ensure safety. It also requires type inclusion test, dynamic class loading, and dynamic method calls in order to ensure flexibility. A ‘Just-In-Time’ (JIT) compiler generates native code from Java byte code at runtime. It must improve the runtime performance without compromising the safety and flexibility of the Java language. We designed and implemented effective optimizations for a JIT compiler, such as exception check elimination, common subexpression elimination, simple type inclusion test, method inlining, and devirtualization of dynamic method call. We evaluate the performance benefits of these optimizations based on various statistics collected using SPECjvm98, its candidates, and two JavaSoft applications with byte code sizes ranging from 23 000 to 280 000 bytes. Each optimization contributes to an improvement in the performance of the programs. Copyright

conference on object oriented programming systems languages and applications | 2003

Effectiveness of cross-platform optimizations for a java just-in-time compiler

Kazuaki Ishizaki; Mikio Takeuchi; Kiyokuni Kawachiya; Toshio Suganuma; Osamu Gohda; Tatsushi Inagaki; Akira Koseki; Kazunori Ogata; Motohiro Kawahito; Toshiaki Yasue; Takeshi Ogasawara; Tamiya Onodera; Hideaki Komatsu; Toshio Nakatani

This paper describes the system overview of our Java Just-In-Time (JIT) compiler, which is the basis for the latest production version of IBM Java JIT compiler that supports a diversity of processor architectures including both 32-bit and 64-bit modes, CISC, RISC, and VLIW architectures. In particular, we focus on the design and evaluation of the cross-platform optimizations that are common across different architectures. We studied the effectiveness of each optimization by selectively disabling it in our JIT compiler on three different platforms: IA-32, IA-64, and PowerPC. Our detailed measurements allowed us to rank the optimizations in terms of the greatest performance improvements with the smallest compilation times. The identified set includes method inlining only for tiny methods, exception check eliminations using forward dataflow analysis and partial redundancy elimination, scalar replacement for instance and class fields using dataflow analysis, optimizations for type inclusion checks, and the elimination of merge points in the control flow graphs. These optimizations can achieve 90% of the peak performance for two industry-standard benchmark programs on these platforms with only 34% of the compilation time compared to the case for using all of the optimizations.

conference on object-oriented programming systems, languages, and applications | 2012

On the benefits and pitfalls of extending a statically typed language JIT compiler for dynamic scripting languages

José G. Castaños; David Joel Edelsohn; Kazuaki Ishizaki; Priya Nagpurkar; Toshio Nakatani; Takeshi Ogasawara; Peng Wu

Whenever the need to compile a new dynamically typed language arises, an appealing option is to repurpose an existing statically typed language Just-In-Time (JIT) compiler (repurposed JIT compiler). Existing repurposed JIT compilers (RJIT compilers), however, have not yet delivered the hoped-for performance boosts. The performance of JVM languages, for instance, often lags behind standard interpreter implementations. Even more customized solutions that extend the internals of a JIT compiler for the target language compete poorly with those designed specifically for dynamically typed languages. Our own Fiorano JIT compiler is an example of this problem. As a state-of-the-art, RJIT compiler for Python, the Fiorano JIT compiler outperforms two other RJIT compilers (Unladen Swallow and Jython), but still shows a noticeable performance gap compared to PyPy, todays best performing Python JIT compiler. In this paper, we discuss techniques that have proved effective in the Fiorano JIT compiler as well as limitations of our current implementation. More importantly, this work offers the first in-depth look at benefits and limitations of the repurposed JIT compiler approach. We believe the most common pitfall of existing RJIT compilers is not focusing sufficiently on specialization, an abundant optimization opportunity unique to dynamically typed languages. Unfortunately, the lack of specialization cannot be overcome by applying traditional optimizations.

international conference on parallel architectures and compilation techniques | 2015

Compiling and Optimizing Java 8 Programs for GPU Execution

Kazuaki Ishizaki; Akihiro Hayashi; Gita Koblents; Vivek Sarkar

GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems. However, GPU execution currently requires explicit low-level operations such as 1) managing memory allocations and transfers between the host system and the GPU, 2) writing GPU kernels in a low-level programming model such as CUDA or OpenCL, and 3) optimizing the kernels by utilizing appropriate memory types on the GPU. Because of this complexity, in many cases, only expert programmers can exploit the computational capabilities of GPUs through the CUDA/OpenCL languages. This is unfortunate since a large number of programmers use high-level languages, such as Java, due to their advantages of productivity, safety, and platform portability, but would still like to exploit the performance benefits of GPUs. Thus, one challenging problem is how to utilize GPUs while allowing programmers to continue to benefit from the productivity advantages of languages like Java. This paper presents a just-in-time (JIT) compiler that can generate and optimize GPU code from a pure Java program written using lambda expressions with the new parallel streams APIs in Java 8. These APIs allow Java programmers to express data parallelism at a higher level than threads and tasks. Our approach translates lambda expressions with parallel streams APIs in Java 8 into GPU code and automatically generates runtime calls that handle the low-level operations mentioned above. Additionally, our optimization techniques 1) allocate and align the starting address of the Java array body in the GPUs with the memory transaction boundary to increase memory bandwidth, 2) utilize read-only cache for array accesses to increase memory efficiency in GPUs, and 3) eliminate redundant data transfer between the host and the GPU. The compiler also performs loop versioning for eliminating redundant exception checks and for supporting virtual method invocations within GPU kernels. These features and optimizations are supported and automatically performed by a JIT compiler that is built on top of a production version of the IBM Java 8 runtime environment. Our experimental results on an NVIDIA Tesla GPU show significant performance improvements over sequential execution (127.9 × geometric mean) and parallel execution (3.3 × geometric mean) for eight Java 8 benchmark programs running on a 160-thread POWER8 machine. This paper also includes an in-depth analysis of GPU execution to show the impact of our optimization techniques by selectively disabling each optimization. Our experimental results show a geometric-mean speed-up of 1.15 × in the GPU kernel over state-of-the-art approaches. Overall, our JIT compiler can improve the performance of Java 8 programs by automatically leveraging the computational capability of GPUs.

international symposium on performance analysis of systems and software | 2009

The data-centricity of Web 2.0 workloads and its impact on server performance

Moriyoshi Ohara; Priya Nagpurkar; Yohei Ueda; Kazuaki Ishizaki

Advances in network performance and browser technologies, coupled with the ubiquity of internet access and proliferation of users, have lead to the emergence of a new class of web applications, called Web 2.0. Web 2.0 technologies enable easy collaboration and sharing by allowing users to contribute, modify, and aggregate content using applications like Wikis, Blogs, Social Networking communities, and Mashups. Web 2.0 applications also make heavy use of Ajax, which allows asynchronous communication between client and server, to provide a richer user experience. In this paper, we analyze the effect of these new features on the infrastructure that hosts these workloads. In particular, we focus on the data-centricity, inherent in many Web 2.0 applications, and study its impact on the persistence layer in an application server context. Our experimental results reveal some important performance characteristics; we show that frequent Ajax requests, and other requests arising from the participatory nature of Web 2.0, often retrieve and update persistent data. This can lead to frequent database accesses, lock contention, and reduced performance. We also show that problems in the persistence layer, arising from the data-intensive nature of Web 2.0 applications, can lead to poor scalability that can inhibit us from exploiting current and future multicore architectures.

virtual execution environments | 2012

Adding dynamically-typed language support to a statically-typed language compiler: performance evaluation, analysis, and tradeoffs

Kazuaki Ishizaki; Takeshi Ogasawara; José G. Castaños; Priya Nagpurkar; David Joel Edelsohn; Toshio Nakatani

Applications written in dynamically typed scripting languages are increasingly popular for Web software development. Even on the server side, programmers are using dynamically typed scripting languages such as Ruby and Python to build complex applications quickly. As the number and complexity of dynamically typed scripting language applications grows, optimizing their performance is becoming important. Some of the best performing compilers and optimizers for dynamically typed scripting languages are developed entirely from scratch and target a specific language. This approach is not scalable, given the variety of dynamically typed scripting languages, and the effort involved in developing and maintaining separate infrastructures for each. In this paper, we evaluate the feasibility of adapting and extending an existing production-quality method-based Just-In-Time (JIT) compiler for a language with dynamic types. Our goal is to identify the challenges and shortcomings with the current infrastructure, and to propose and evaluate runtime techniques and optimizations that can be incorporated into a common optimization infrastructure for static and dynamic languages. We discuss three extensions to the compiler to support dynamically typed languages: (1) simplification of control flow graphs, (2) mapping of memory locations to stack-allocated variables, and (3) reduction of runtime overhead using language semantics. We also propose four new optimizations for Python in (2) and (3). These extensions are effective in reduction of compiler working memory and improvement of runtime performance. We present a detailed performance evaluation of our approach for Python, finding an overall improvement of 1.69x on average (up to 2.74x) over our JIT compiler without any optimization for dynamically typed languages and Python.

Ibm Journal of Research and Development | 2004

Evolution of a java just-in-time compiler for IA-32 platforms

Toshio Suganuma; Takeshi Ogasawara; Kiyokuni Kawachiya; Mikio Takeuchi; Kazuaki Ishizaki; Akira Koseki; Tatsushi Inagaki; Toshiaki Yasue; Motohiro Kawahito; Tamiya Onodera; Hideaki Komatsu; Toshio Nakatani

JavaTM has gained widespread popularity in the industry, and an efficient Java virtual machine (JVMTM) and just-in-time (JIT) compiler are crucial in providing high performance for Java applications. This paper describes the design and implementation of our JIT compiler for IA-32 platforms by focusing on the recent advances achieved in the past several years. We first present the dynamic optimization framework, which focuses the expensive optimization efforts only on performance-critical methods, thus helping to manage the total compilation overhead. We then describe the platform-independent features, which include the conversion from the stack-semantic Java bytecode into our register-based intermediate representation (IR) and a variety of aggressive optimizations applied to the IR. We also present some techniques specific to the IA-32 used to improve code quality, especially for the efficient use of the small number of registers on that platform. Using several industry-standard benchmark programs, the experimental results show that our approach offers high performance with low compilation overhead. Most of the techniques presented here are included in the IBM JIT compiler product, integrated into the IBM Development Kit for Microsoft Windows®, Java Technology Edition Version 1.4.0.

Explore More