Dries Buytaert
Ghent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dries Buytaert.
conference on object oriented programming systems languages and applications | 2007
Andy Georges; Dries Buytaert; Lieven Eeckhout
Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various. There exist a wide variety of Java performance evaluation methodologies usedby researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution; yet others consider multiple VM invocations and iterate the benchmark multiple times. This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.
conference on object-oriented programming systems, languages, and applications | 2008
Andy Georges; Lieven Eeckhout; Dries Buytaert
A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process. Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine. This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.
conference on object oriented programming systems languages and applications | 2007
Dries Buytaert; Andy Georges; Michael Hind; Matthew Arnold; Lieven Eeckhout; Koen De Bosschere
All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design space of several profilers for initiating dynamic compilation and shows that existing online profiling schemes suffer from several limitations. They provide an insufficient number of samples, are untimely, and have limited accuracy at determining the frequently executed methods. We describe and comprehensively evaluate HPM-sampling, a simple but effective profiling scheme for finding optimization candidates using hardware performance monitors (HPMs) that addresses the aforementioned limitations. We show that HPM-sampling is more accurate; has low overhead; and improves performance by 5.7% on average and up to 18.3% when compared to the default system in Jikes RVM, without changing the compiler.
conference on object-oriented programming systems, languages, and applications | 2006
Jonas Maebe; Dries Buytaert; Lieven Eeckhout; Koen De Bosschere
Understanding the behavior of applications running on high-level language virtual machines, as is the case in Java, is non-trivial because of the tight entanglement at the lowest execution level between the application and the virtual machine. This paper proposes Javana, a system for building Java program analysis tools. Javana provides an easy-to-use instrumentation infrastructure that allows for building customized profiling tools very quickly.Javana runs a dynamic binary instrumentation tool underneath the virtual machine. The virtual machine communicates with the instrumentation layer through an event handling mechanism for building a vertical map that links low-level native instruction pointers and memory addresses to high-level language concepts such as objects, methods, threads, lines of code, etc. The dynamic binary instrumentation tool then intercepts all memory accesses and instructions executed and provides the Javana end user with high-level language information for all memory accesses and natively executed instructions.We demonstrate the power of Javana through a number of applications: memory address tracing, vertical cache simulation and object lifetime computation. For each of these applications, the instrumentation specification requires only a small number of lines of code. Developing similarly powerful profiling tools within a virtual machine (as done in current practice) is both time-consuming and error-prone; in addition, the accuracy of the obtained profiling results might be questionable as we show in this paper.
field-programmable logic and applications | 2005
Philippe Faes; Mark Christiaens; Dries Buytaert; D. Strooband
During codesign of a system, one still runs into the impedance mismatch between the software and hardware worlds. This paper identifies the different levels of abstraction of hardware and software as a major culprit of this mismatch. For example, when programming in high-level object-oriented languages like Java, one has disposal of objects, methods, memory management, that facilitates development but these have to be largely abandoned when moving the same functionality into hardware. As a solution, this paper presents a virtual machine, based on the Jikes Research Virtual Machine, that is able to bridge the gap by providing the same capabilities to hardware components as to software components. This seamless integration is achieved by introducing an architecture and protocol that allow reconfigurable hardware and software to communicate with each other in a transparent manner i.e. no component of the design needs to be aware whether other components are implemented in hardware or in software. Further, in this paper we present a novel technique that allows reconfigurable hardware to manage dynamically allocated memory. This is achieved by allowing the hardware to hold references to objects and by modifying the garbage collector of the virtual machine to be aware of these references in hardware. We present benchmark results that show, for four different, well-known garbage collectors and for a wide range of applications, that a hardware-aware garbage collector results in a marginal overhead and is therefore a worthwhile addition to the developers toolbox.
high performance embedded architectures and compilers | 2005
Dries Buytaert; Kris Venstermans; Lieven Eeckhout; Koen De Bosschere
This paper shows that Appel-style garbage collectors often make suboptimal decisions both in terms of when and how to collect. We argue that garbage collection should be done when the amount of live bytes is low (in order to minimize the collection cost) and when the amount of dead objects is high (in order to maximize the available heap size after collection). In addition, we observe that Appel-style collectors sometimes trigger a nursery collection in cases where a full-heap collection would have been better. Based on these observations, we propose garbage collection hints (GCH) which is a profile-directed method for guiding garbage collection. Offline profiling is used to identify favorable collection points in the program code. In those favorable collection points, the VM dynamically chooses between nursery and full-heap collections based on an analytical garbage collector cost-benefit model. By doing so, GCH guides the collector in terms of when and how to collect. Experimental results using the SPECjvm98 benchmarks and two generational garbage collectors show that substantial reductions can be obtained in garbage collection time (up to 30X) and that the overall execution time can be reduced by more than 10%.
high performance embedded architectures and compilers | 2007
Dries Buytaert; Kris Venstermans; Lieven Eeckhout; Koen De Bosschere
This paper shows that Appel-style garbage collectors often make suboptimal decisions both in terms of whenand howto collect. We argue that garbage collection should be done when the amount of live bytes is low (in order to minimize the collection cost) and when the amount of dead objects is high (in order to maximize the available heap size after collection). In addition, we observe that Appel-style collectors sometimes trigger a nursery collection in cases where a full-heap collection would have been better. Based on these observations, we propose garbage collection hints (GCH)which is a profile-directed method for guiding garbage collection. Off-line profiling is used to identify favorable collection points in the program code. In those favorable collection points, the garbage collector dynamically chooses between nursery and full-heap collections based on an analytical garbage collector cost-benefit model. By doing so, GCH guides the collector in terms of whenand howto collect. Experimental results using the SPECjvm98 benchmarks and two generational garbage collectors show that substantial reductions can be obtained in garbage collection time (up to 29X) and that the overall execution time can be reduced by more than 10%. In addition, we also show that GCH reduces the maximum pause times and outperforms user-inserted forced garbage collections.
conference on object-oriented programming systems, languages, and applications | 2006
Dries Buytaert; Jonas Maebe; Lieven Eeckhout; Koen De Bosschere
Javana is a tool for creating customized Java program analysis tools. It comes with an easy-to-use instrumentation framework that enables programmers to develop profiling tools that crosscut the Java application, the Java Virtual Machine (JVM) and the native execution layers. The goal of this poster is to demonstrate the power of Javana, using object lifetime computation as an example.Object lifetime has proven to be useful for analyzing and optimizing the behavior of Java applications. Computing object lifetime is conceptually simple, however, in practice it is often challenging. The JVM needs to be adjusted in numerous ways in order to track all possible accesses to all objects, including accesses that occur through the Java Native Interface (JNI), the standard class libraries, and the JVM implementation itself. Capturing all object accesses through manual instrumentation requires an in-depth understanding of the JVM and its libraries. We show that using Javana is both easier and more accurate than manual instrumentation.
conference on object oriented programming systems languages and applications | 2007
Andy Georges; Dries Buytaert; Lieven Eeckhout
Java performance is far from trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism due to Just-in-Time compilation/optimization, thread scheduling, etc., causes the execution time of a Java program to differ from run to run. This poster advocates statistically rigorous data analysis when reporting Java performance. We advise to model non-determinism by computing confidence intervals. In addition, we show that prevalent data analysis approaches may lead to misleading or even incorrect conclusions. Although we focus on Java performance, the techniques can be readily applied to any managed runtime system.
conference on object-oriented programming systems, languages, and applications | 2004
Dries Buytaert; Andy Georges; Lieven Eeckhout; Koen De Bosschere
This poster presents <sc>MonitorMethod</sc> which helps Java programmers gain insight in the behavior of their applications. <sc>MonitorMethod</sc> instruments the Java application and relates hardware performance monitors (HPMs) to the methods in the Java applications source code. We present a detailed case study showing that linking microprocessor-level performance characteristics to the source code is helpful for identifying performance bottlenecks and their causes. In addition, we relate our work to a previously proposed time-based HPM profiling framework.