Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Prasad A. Kulkarni is active.

Publication


Featured researches published by Prasad A. Kulkarni.


programming language design and implementation | 2004

Fast searches for effective optimization phase sequences

Prasad A. Kulkarni; Stephen Hines; Jason D. Hiser; David B. Whalley; Jack W. Davidson; Douglas L. Jones

It has long been known that a fixed ordering of optimization phases will not produce the best code for every application. One approach for addressing this phase ordering problem is to use an evolutionary algorithm to search for a specific sequence of phases for each module or function. While such searches have been shown to produce more efficient code, the approach can be extremely slow because the application is compiled and executed to evaluate each sequences effectiveness. Consequently, evolutionary or iterative compilation schemes have been promoted for compilation systems targeting embedded applications where longer compilation times may be tolerated in the final stage of development. In this paper we describe two complementary general approaches for achieving faster searches for effective optimization sequences when using a genetic algorithm. The first approach reduces the search time by avoiding unnecessary executions of the application when possible. Results indicate search time reductions of 65% on average, often reducing searches from hours to minutes. The second approach modifies the search so fewer generations are required to achieve the same results. Measurements show that the average number of required generations decreased by 68%. These improvements have the potential for making evolutionary compilation a viable choice for tuning embedded applications.


languages compilers and tools for embedded systems | 2003

Finding effective optimization phase sequences

Prasad A. Kulkarni; Wankang Zhao; Hwashin Moon; Kyunghwan Cho; David B. Whalley; Jack W. Davidson; Mark W. Bailey; Yunheung Paek; Kyle A. Gallivan

It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumption. Given that many embedded application developers are willing to spend time tuning an application, we believe a viable approach is to allow the developer to steer the process of optimizing a function. In this paper, we describe support in VISTA, an interactive compilation system, for finding effective sequences of optimization phases. VISTA provides the user with dynamic and static performance information that can be used during an interactive compilation session to gauge the progress of improving the code. In addition, VISTA provides support for automatically using performance information to select the best optimization sequence among several attempted. One such feature is the use of a genetic algorithm to search for the most efficient sequence based on specified fitness criteria. We have included a number of experimental results that evaluate the effectiveness of using a genetic algorithm in VISTA to find effective optimization phase sequences.


ACM Transactions on Architecture and Code Optimization | 2009

Practical exhaustive optimization phase order exploration and evaluation

Prasad A. Kulkarni; David B. Whalley; Gary S. Tyson; Jack W. Davidson

Choosing the most appropriate optimization phase ordering has been a long-standing problem in compiler optimizations. Exhaustive evaluation of all possible orderings of optimization phases for each function is generally dismissed as infeasible for production-quality compilers targeting accepted benchmarks. In this article, we show that it is possible to exhaustively evaluate the optimization phase order space for each function in a reasonable amount of time for most of the functions in our benchmark suite. To achieve this goal, we used various techniques to significantly prune the optimization phase order search space so that it can be inexpensively enumerated in most cases and reduce the number of program simulations required to evaluate program performance for each distinct phase ordering. The techniques described are applicable to other compilers in which it is desirable to find the best phase ordering for most functions in a reasonable amount of time. We also describe some interesting properties of the optimization phase order space, which will prove useful for further studies of related problems in compilers.


symposium on code generation and optimization | 2006

Exhaustive Optimization Phase Order Space Exploration

Prasad A. Kulkarni; David B. Whalley; Gary S. Tyson; Jack W. Davidson

The phase-ordering problem is a long standing issue for compiler writers. Most optimizing compilers typically have numerous different code-improving phases, many of which can be applied in any order. These phases interact by enabling or disabling opportunities for other optimization phases to be applied. As a result, varying the order of applying optimization phases to a program can produce different code, with potentially significant performance variation amongst them. Complicating this problem further is the fact that there is no universal optimization phase order that will produce the best code, since the best phase order depends on the function being compiled, the compiler, and the target architecture characteristics. Moreover, finding the optimal optimization sequence for even a single function is hard as the space of attempted optimization phase sequences is huge and the interactions between different optimizations are poorly understood. Most previous studies performed to search for the most effective optimization phase sequence assume the optimization phase order search space to be extremely large, and hence consider exhaustive exploration of this space infeasible. In this paper we show that even though the attempted search space is extremely large, with careful and aggressive pruning it is possible to limit the actual search space with no loss of information so that it can be completely evaluated in a matter of minutes or a few hours for most functions. We were able to exhaustively enumerate all the possible function instances that can be produced by different phase orderings performed by our compiler for more than 98% of the functions in our benchmark suite. In this paper we describe the algorithm we used to make exhaustive search of the optimization phase order space possible. We then analyze this space to automatically calculate relationships between different phases. Finally, we show that the results of this analysis can be used to reduce the compilation time for a conventional batch compiler.


symposium on code generation and optimization | 2007

Evaluating Heuristic Optimization Phase Order Search Algorithms

Prasad A. Kulkarni; David B. Whalley; Gary S. Tyson; Jack W. Davidson

Program-specific or function-specific optimization phase sequences are universally accepted to achieve better overall performance than any fixed optimization phase ordering. A number of heuristic phase order space search algorithms have been devised to find customized phase orderings achieving high performance for each function. However, to make this approach of iterative compilation more widely accepted and deployed in mainstream compilers, it is essential to modify existing algorithms, or develop new ones that find near-optimal solutions quickly. As a step in this direction, in this paper we attempt to identify and understand the important properties of some commonly employed heuristic search methods by using information collected during an exhaustive exploration of the phase order search space. We compare the performance obtained by each algorithm with all others, as well as with the optimal phase ordering performance. Finally, we show how we can use the features of the phase order space to improve existing algorithms as well as devise new and better performing search algorithms


real time technology and applications symposium | 2004

Timing the WCET of embedded applications

Wankang Zhao; Prasad A. Kulkarni; David B. Whalley; Christopher A. Healy; Frank Mueller; Gang-Ryung Uh

It is advantageous to not only calculate the WCET of an application, but to also perform transformations to reduce the WCET since an application with a lower WCET is less likely to violate its timing constraints. In this paper we describe an environment consisting of an interactive compilation system and a timing analyzer, where a user can interactively tune the WCET of an application. After each optimization phase is applied, the timing analyzer is automatically invoked to calculate the WCET of the function being tuned. Thus, a user can easily gauge the progress of reducing the WCET. In addition, the user can apply a genetic algorithm to search for an effective optimization sequence that best reduces the WCET. Using the genetic algorithm, we show that the WCET for a number of applications can be reduced by 7% on average as compared to the default batch optimization sequence.


ACM Transactions on Architecture and Code Optimization | 2005

Fast and efficient searches for effective optimization-phase sequences

Prasad A. Kulkarni; Stephen Hines; David B. Whalley; Jason D. Hiser; Jack W. Davidson; Douglas L. Jones

It has long been known that a fixed ordering of optimization phases will not produce the best code for every application. One approach for addressing this phase-ordering problem is to use an evolutionary algorithm to search for a specific sequence of phases for each module or function. While such searches have been shown to produce more efficient code, the approach can be extremely slow because the application is compiled and possibly executed to evaluate each sequences effectiveness. Consequently, evolutionary or iterative compilation schemes have been promoted for compilation systems targeting embedded applications where meeting strict constraints on execution time, code size, and power consumption is paramount and longer compilation times may be tolerated in the final stage of development, when an application is compiled one last time and embedded in a product. Unfortunately, even for small embedded applications, the search process can take many hours or even days making the approach less attractive to developers. In this paper, we describe two complementary general approaches for achieving faster searches for effective optimization sequences when using a genetic algorithm. The first approach reduces the search time by avoiding unnecessary executions of the application when possible. Results indicate search time reductions of 62%, on average, often reducing searches from hours to minutes. The second approach modifies the search so fewer generations are required to achieve the same results. Measurements show this approach decreases the average number of required generations by 59%. These improvements have the potential for making evolutionary compilation a viable choice for tuning embedded applications.


virtual execution environments | 2007

Dynamic compilation: the benefits of early investing

Prasad A. Kulkarni; Matthew Arnold; Michael Hind

Dynamic compilation is typically performed in a separate thread, asynchronously with the remaining application threads. This compilation thread is often scheduled for execution in a simple round-robin fashion either by the operating system or by the virtual machine itself. Despite the popularity of this approach in production virtual machines, it has a number of shortcomings that can lead to suboptimal performance. This paper explores a number of issues surrounding asynchronous dynamic compilation in a virtual machine. We begin by describing the shortcomings of current approaches and demonstrate their potential to perform poorly under certain conditions. We describe the importance of enforcing a minimum level of utilization for the compilation thread, and evaluate the performance implications of varying the utilization that is enforced. We observed surprisingly large speedups by increasing the priority of the compilation thread, averaging 18.2% improvement over a large benchmark suite. Finally, we discuss options for implementing these techniques in a VM and address relevant issues when moving from a single-processor to a multiprocessor machine.


virtual execution environments | 2010

Novel online profiling for virtual machines

Manjiri A. Namjoshi; Prasad A. Kulkarni

Application profiling is a popular technique to improve program performance based on its behavior. Offline profiling, although beneficial for several applications, fails in cases where prior program runs may not be feasible, or if changes in input cause the profile to not match the behavior of the actual program run. Managed languages, like Java and C\#, provide a unique opportunity to overcome the drawbacks of offline profiling by generating the profile information online during the current program run. Indeed, online profiling is extensively used in current VMs, especially during selective compilation to improve program startup performance, as well as during other feedback-directed optimizations. In this paper we illustrate the drawbacks of the current reactive mechanism of online profiling during selective compilation. Current VM profiling mechanisms are slow -- thereby delaying associated transformations, and estimate future behavior based on the programs immediate past -- leading to potential misspeculation that limit the benefits of compilation. We show that these drawbacks produce an average performance loss of over 14.5% on our set of benchmark programs, over an ideal offline approach that accurately compiles the hot methods early. We then propose and evaluate the potential of a novel strategy to achieve similar performance benefits with an online profiling approach. Our new online profiling strategy uses early determination of loop iteration bounds to predict future method hotness. We explore and present promising results on the potential, feasibility, and other issues involved for the successful implementation of this approach.


languages compilers and tools for embedded systems | 2006

In search of near-optimal optimization phase orderings

Prasad A. Kulkarni; David B. Whalley; Gary S. Tyson; Jack W. Davidson

Phase ordering is a long standing challenge for traditional optimizing compilers. Varying the order of applying optimization phases to a program can produce different code, with potentially significant performance variation amongst them. A key insight to addressing the phase ordering problem is that many different optimization sequences produce the same code. In an earlier study, we used this observation to restate the phase ordering problem to concentrate on finding all distinct function instances that can be produced due to different phase orderings, instead of attempting to generate code for all possible optimization sequences. Using a novel search algorithm we were able to show that it is possible to exhaustively enumerate the set of all possible function instances that can be produced by different phase orderings in our compiler for most of the functions in our benchmark suite [1]. Finding the optimal function instance within this set for almost any dynamic measure of performance still appears impractical since that would involve execution/simulation of all generated function instances. To find the dynamically optimal function instance we exploit the observation that the enumeration space for a function typically contains a very small number of distinct control flow paths. We simulate only one function instance from each group of function instances having the identical control flow, and use that information to estimate the dynamic performance of the remaining functions in that group. We further show that the estimated dynamic frequency counts obtained by using our method correlate extremely well to simulated processor cycle counts. Thus, by using our measure of dynamic frequencies to identify a small number of the best performing function instances we can often find the optimal phase ordering for a function within a reasonable amount of time. Finally, we perform a case study to evaluate how adept our genetic algorithm is for finding optimal phase orderings within our compiler, and demonstrate how the algorithm can be improved.

Collaboration


Dive into the Prasad A. Kulkarni's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stephen Hines

Florida State University

View shared research outputs
Top Co-Authors

Avatar

Gary S. Tyson

Florida State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wankang Zhao

Florida State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge