Urs Hölzle
University of California, Santa Barbara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Urs Hölzle.
european conference on object oriented programming | 1991
Urs Hölzle; Craig Chambers; David M. Ungar
Polymorphic inline caches (PICs) provide a new way to reduce the overhead of polymorphic message sends by extending inline caches to include more than one cached lookup result per call site. For a set of typical object-oriented SELF programs, PICs achieve a median speedup of 11%.
programming language design and implementation | 1992
Urs Hölzle; Craig Chambers; David M. Ungar
SELFs debugging system provides complete source-level debugging (expected behavior) with globally optimized code. It shields the debugger from optimizations performed by the compiler by dynamically deoptimizing code on demand. Deoptimization only affects the procedure activations that are actively being debugged; all other code runs at full speed. Deoptimization requires the compiler to supply debugging information at discrete interrupt points; the compiler can still perform extensive optimizations between interrupt points without affecting debuggability. At the same time, the inability to interrupt between interrupt points is invisible to the user. Our debugging system also handles programming changes during debugging. Again, the system provides expected behavior: it is possible to change a running program and immediately observe the effects of the change. Dynamic deoptimization transforms old compiled code (which may contain inlined copies of the old version of the changed procedure) into new versions reflecting the current source-level state. To the best of our knowledge, SELF is the first practical system providing full expected behavior with globally optimized code.
programming language design and implementation | 1994
Urs Hölzle; David M. Ungar
Abstrach Object-oriented programs are difficult to optimize because they execute many dynamically-dispatched calls. These calls cannot easily be eliminated because the compiler does not know which callee will be invoked at runtime. We have developed a simple technique that feeds back type information from the runtime system to the compiler. With this type feedback, the compiler can inline any dynamically-dispatched call. Our compiler drastically reduces the calI frequency of a suite of large SELF applications (by a factor of 3.6) and improves performance by a factor of 1.7. We believe that type feedback could significantly reduce call frequencies and improve performance for most other objectoriented languages (statically-typed or not) as well as for languages with type-dependent operations such as generic arithmetic.
conference on object-oriented programming systems, languages, and applications | 1999
Jeff Bogda; Urs Hölzle
Java programs perform many synchronization operations on data structures. Some of these synchronization are unnecessary; in particular, if an object is reachable only by a single thread, concurrent access is impossible and no synchronization is needed. We describe an interprocedural, flow- and context-insensitive dataflow analysis that finds such situations. A global optimizing transformation then eliminates synchronizations on these objects. For every program in our suite of ten Java benchmarks consisting of SPECjvm98 and others, our system optimizes over 90% of the alias sets containing at least one synchronized object. As a result, the dynamic frequency of synchronizations is reduced by up to 99%. For two benchmarks that perform synchronizations very frequently, this optimization leads to speedups of 36% and 20%.
european conference on object oriented programming | 1997
Ralph Keller; Urs Hölzle
Binary component adaptation (BCA) allows components to be adapted and evolved in binary form and on-the-fly (during program loading). BCA rewrites component binaries before (or while) they are loaded, requires no source code access and guarantees release-to-release compatibility. That is, an adaptation is guaranteed to be compatible with a new binary release of the component as long as the new release itself is compatible with clients compiled using the earlier release. We describe our implementation of BCA for Java and demonstrate its usefulness by showing how it can solve a number of important integration and evolution problems. Even though our current implementation was designed for easy integration with Suns JDK 1.1 VM rather than for ultimate speed, the load-time overhead introduced by BCA is small, in the range of one or two seconds. With its flexibility, relatively simple implementation, and low overhead, binary component adaptation could significantly improve the reusability of Java components.
european conference on object oriented programming | 1999
Sylvia Dieckmann; Urs Hölzle
We present an analysis of the memory usage for six of the Java programs in the SPECjvm98 benchmark suite. Most of the programs are real-world applications with high demands on the memory system. For each program, we measured as much low level data as possible, including age and size distribution, type distribution, and the overhead of object alignment. Among other things, we found that non-pointer data usually represents more than 50% of the allocated space for instance objects, that Java objects tend to live longer than objects in Smalltalk or ML, and that they are fairly small.
conference on object-oriented programming systems, languages, and applications | 1996
Karel Driesen; Urs Hölzle
We study the direct cost of virtual function calls in C++ programs, assuming the standard implementation using virtual function tables. We measure this overhead experimentally for a number of large benchmark programs, using a combination of executable inspection and processor simulation. Our results show that the C++ programs measured spend a median of 5.2% of their time and 3.7% of their instructions in dispatch code. For all virtuals versions of the programs, the median overhead rises to 13.7% (13% of the instructions). The thunk variant of the virtual function table implementation reduces the overhead by a median of 21% relative to the standard implementation. On future processors, these overheads are likely to increase moderately.
european conference on object-oriented programming | 1996
Gerald Aigner; Urs Hölzle
We have designed and implemented an optimizing source-to-source C++ compiler that reduces the frequency of virtual function calls. This technical report describes our preliminary experience with this system. The prototype implementation demonstrates the value of OO-specific optimization of C++. Despite some limitations of our system, and despite the low frequency of virtual function calls in some of the programs, optimization improves the performance of a suite of two small and six large C++ applications totalling over 90,000 lines of code by a median of 20% over the original programs and reduces the number of virtual function calls by a median factor of 5. For more call-intensive versions of the same programs, performance improved by a median of 40% and the number of virtual calls dropped by a factor of 21. Our measurements indicate that inlining does not necessarily lead to large increases in code size, and that for most programs, the instruction cache miss ratio does not increase significantly.
Lecture Notes in Computer Science | 1999
Murat Karaorman; Urs Hölzle; John Bruno
jContractor is a purely library based approach to support Design By Contract specifications such as preconditions, postconditions, class invariants, and recovery and exception handling in Java. jContractor uses an intuitive naming convention, and standard Java syntax to instrument Java classes and enforce Design By Contract constructs. The designer of a class specifies a contract by providing contract methods following jContractor naming conventions. jContractor uses Java Reflection to synthesize an instrumented version of a Java class by incorporating code that enforces the present jContractor contract specifications. Programmers enable the run-time enforcement of contracts by either engaging the jContractor class loader or by explicitly instantiating objects using the jContractor object factory. Programmers can use exactly the same syntax for invoking methods and passing object references regardless of whether contracts are present or not. Since jContractor is purely library-based, it requires no special tools such as modified compilers, modified JVMs, or pre-processors.
international symposium on computer architecture | 1998
Karel Driesen; Urs Hölzle
Indirect branch prediction is likely to become increasingly important in the future because indirect branches occur more frequently in object-oriented programs. With misprediction rates of around 25% on current processors, indirect branches can incur a significant fraction of branch misprediction overhead even though they remain less frequent than the more predictable conditional branches. We investigate a wide range of two-level predictors dedicated exclusively to indirect branches. Starting with predictors that use full-precision addresses and unlimited tables, we progressively introduce hardware constraints and minimize the loss of predictor performance at each step. For programs from the SPECint95 suite as well as a suite of large C++ applications, a two-level predictor achieves a misprediction rate of 9.8% with a 1K-entry table and 7.3% with an 8K-entry table, representing more than a threefold improvement over an ideal BTB. A hybrid predictor further reduces the misprediction rates to 8.98% (1K) and 5.95% (8K).