Vijay Sundaresan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vijay Sundaresan is active.

Explore More

Publication

Featured researches published by Vijay Sundaresan.

symposium on code generation and optimization | 2006

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

Vijay Sundaresan; Daryl James Maier; Pramod Ramarao; Mark G. Stoodley

In this paper, we describe the techniques that have been implemented in the IBM TestaRossa (TR) just-in-time (JIT) compiler to safely perform aggressive code patching and collect accurate profiles in the context of a Java application employing multiple threads and dynamic class loading and unloading. Previous work in these areas either did not account for the synchronization cost of safety or dynamic class loading/unloading effects in a heavily multithreaded program or did not consider how different patching techniques may be required for different platforms where instruction cache coherence guarantees vary. We evaluate the space and time overhead to make our profiling framework correct, showing that privatizing the profiling variables to achieve correctness impacts execution time only minimally but it can grow the stack frames for profiled methods by less than 15% on average for the SPECjvm98 and SPECjbb2000 benchmarks. Since methods are profiled for only a brief time and the stack frames themselves are not large, we do not consider this growth to be prohibitive. The techniques reported in this paper are implemented in the 1.5.0 release of the IBM Developer Kit for Java targeting 12 different processor-operating system platforms.

symposium on code generation and optimization | 2005

Automatically Reducing Repetitive Synchronization with a Just-in-Time Compiler for Java

Mark G. Stoodley; Vijay Sundaresan

We describe an automatic technique to remove repetitive synchronization in Java/spl trade/ programs by removing selected MONITORENTER/EXIT operations. Once these operations are removed, parts of a method that were not originally locked become protected by a lock. If it is unsafe to synchronize the code between the original locked regions, however, the code is not transformed. Scalability is also protected by not allowing a lock to be held for a significantly longer time than it would be held in the original program code. Our base algorithm improved the throughput of the industry-standard SPECjbb2000 benchmark by 2% to 5% for three different platforms. We also describe an extension to our algorithm to better handle virtual calls, which are prevalent in Java code, and this extension provides up to a further 5% improvement. Our computationally efficient algorithm was implemented and evaluated in a production Just-In-Time (JIT) compiler.

symposium on code generation and optimization | 2008

Removing redundancy via exception check motion

Vijay Sundaresan; Mark G. Stoodley; Pramod Ramarao

Partial redundancy elimination aims to reduce the number of times an expression is computed more than once. The traditional Lazy Code Motion (LCM) algorithm formulated by Knoop, Ruthing and Steffen, through its reliance on unordered bit vectors, is severely limited in its ability to remove redundancy when precise exception semantics are required because bit vectors cannot express the order of exception checks. This paper describes our new PRE algorithm Exception Check Motion that uses the LCM algorithm to treat and optimize exception checks in a similar way to any other expression. Unlike earlier techniques that can remove only the compare instruction of a partially redundant exception check, our solution can eliminate both the compare and trap instructions without any run time code patching or expensive recovery operations. Since it is the trap instructions that restrict subsequent code motions, our technique gives downstream optimizations more flexibility to improve the performance of the resulting code once the partially redundant checks are eliminated. Our analysis has been implemented in the IBM® Testarossa (TR) just-in-time (JIT) compiler in the IBM Developer Kit for Java Release 5.0 as part of the J9 Virtual Machine. We measure performance improvements up to 7.6% and averaging 2.5% across 22 SPEC and DaCapo benchmarks on 4-way IBM pSeries (PowerPC) hardware.

symposium on code generation and optimization | 2013

Experiences in designing a robust and scalable interpreter profiling framework

Ian Gartley; Marius Pirvu; Vijay Sundaresan; Nikola Grcevski

Profile directed feedback (PDF) is a well known technique used to drive many compiler optimizations like basic block ordering and guarded devirtualization. These optimizations are particularly crucial in order to achieve good throughput performance in JEE applications that have a large code footprint. To effectively apply optimizations that rely on profiling information, a just-in-time (JIT) compiler must have access to profiling information that is accurate. One common source of profiling information in a Java virtual machine (JVM) is the interpreter. Typically methods are interpreted as a program ramps up, during which profiling information can be collected. However, obtaining useful and accurate information for large enterprise-class applications can be a challenge because of the memory and performance overhead associated with collecting and processing the large volumes of profiling data that is generated. This paper describes the challenges in maintaining the balance between throughput performance and profiling overhead in a production JIT compiler that is used by the IBM JDK. The scope of the performance overhead in terms of throughput, memory footprint and startup speed for large JEE class applications is introduced and various engineering solutions that were tried are detailed and compared in terms of experimental results. We found that the throughput improvement due to interpreter profiling (IP) can be as high as 58%, whereas the overhead measured in terms of application startup time could cost up to 57%. Our solutions to reducing profiling overhead managed to reduce the startup cost to only a few percent while maintaining the full throughput benefit. By discussing these approaches, this paper offers a balanced and practical overview on how to make PDF work well for enterprise-class applications in a production JIT compiler.

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3 | 2004