Erik Ruf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik Ruf is active.

Explore More

Publication

Featured researches published by Erik Ruf.

programming language design and implementation | 2000

Effective synchronization removal for Java

Erik Ruf

We present a new technique for removing unnecessary synchronization operations from statically compiled Java programs. Our approach improves upon current efforts based on escape analysis, as it can eliminate synchronization operations even on objects that escape their allocating threads. It makes use of a compact, equivalence-class-based representation that eliminates the need for fixed point operations during the analysis. We describe and evaluate the performance of an implementation in theMarmot native Java compiler. For the benchmark programs examined, the optimization removes 100% of the dynamic synchronization operations in single-threaded programs, and 0-99% in multi-threaded programs, at a low cost in additional compilation time and code growth.

programming language design and implementation | 1995

Context-insensitive alias analysis reconsidered

Erik Ruf

Recent work on alias analysis in the presence of pointers has concentrated on context-sensitive interprocedural analyses, which treat multiple calls to a single procedure independently rather than constructing a single approximation to a procedures effect on all of its callers. While context-sensitive modeling offers the potential for greater precision by considering only realizable call-return paths, its empirical benefits have yet to be measured. This paper compares the precision of a simple, efficient, context-insensitive points-to analysis for the C programming language with that of a maximally context-sensitive version of the same analysis. We demonstrate that, for a number of pointer-intensive benchmark programs, context-insensitivity exerts little to no precision penalty. We also describe techniques for using the output of context-insensitive analysis to improve the efficiency of context-sensitive analysis without affecting precision.

Software - Practice and Experience | 2000

Marmot: an optimizing compiler for Java

Robert P. Fitzgerald; Todd B. Knoblock; Erik Ruf; Bjarne Steensgaard; David Tarditi

The Marmot system is a research platform for studying the implementation of high level programming languages. It currently comprises an optimizing native‐code compiler, runtime system, and libraries for a large subset of Java. Marmot integrates well‐known representation, optimization, code generation, and runtime techniques with a few Java‐specific features to achieve competitive performance. This paper contains a description of the Marmot system design, along with highlights of our experience applying and adapting traditional implementation techniques to Java. A detailed performance evaluation assesses both Marmots overall performance relative to other Java and C++ implementations, and the relative costs of various Java language features in Marmot‐compiled code. Our experience with Marmot has demonstrated that well‐known compilation techniques can produce very good performance for static Java applications – comparable or superior to other Java systems, and approaching that of C++ in some cases. Copyright

international conference on computer graphics and interactive techniques | 1995

Specializing shaders

Brian K. Guenter; Todd B. Knoblock; Erik Ruf

We have developed a system for interactive manipulation of shading parameters for three dimensional rendering. The system takes as input user-defined shaders, written in a subset of C, which are then specialized for interactive use. Since users typically experiment with different values of a single shader parameter while leaving the others constant, we can benefit by automatically generating a specialized shader that performs only those computations depending on the parameter being varied; all other values needed by the shader can be precomputed and cached. The specialized shaders are as much as 95 times faster than the original user defined shader. This dramatic improvement in speed makes it possible to interactively view parameter changes for relatively complex shading models, such as procedural solid texturing.

international conference on functional programming | 1991

Automatic online partial evaluation

Daniel Weise; Roland Conybeare; Erik Ruf; Scott Seligman

We have solved the problem of constructing a fully automatic online program specializer for an untyped functional language (specifically, the functional subset of Scheme). We designed our specializer, called Fuse, as an interpreter that returns a trace of suspended computations. The trace is represented as a graph, rather than as program text, and each suspended computation indicates the type of its result. A separate process translates the graph into a particular programming language. Producing graphs rather than program text solves problems with code duplication and premature reduce/residualize decisions. Fuses termination strategy, which employs online generalization, specializes conditional recursive function calls, and unfolds all other calls. This strategy is shown to be both powerful and safe.

programming language design and implementation | 1996

Data specialization

Todd B. Knoblock; Erik Ruf

Given a repeated computation, part of whose input context remains invariant across all repetitions, program staging improves performance by separating the computation into two phases. An early phase executes only once, performing computations depending only on invariant inputs, while a late phase repeatedly performs the remainder of the work given the varying inputs and the results of the early computations.Common staging techniques based on dynamic compilation statically construct an early phase that dynamically generates object code customized for a particular input context. In effect, the results of the invariant computations are encoded as the compiled code for the late phase.This paper describes an alternative approach in which the results of early computations are encoded as a data structure, allowing both the early and late phases to be generated statically. By avoiding dynamic code manipulation, we give up some optimization opportunities in exchange for significantly lower dynamic space/time overhead and reduced implementation complexity.

Cluster Computing | 2014

Direct GPU/FPGA communication Via PCI express

Ray A. Bittner; Erik Ruf; Alessandro Forin

We describe a mechanism for connecting GPU and FPGA devices directly via the PCI Express bus, enabling the transfer of data between these heterogeneous computing units without the intermediate use of system memory. We evaluate the performance benefits of this approach over a range of transfer sizes, and demonstrate its utility in a computer vision application. We find that bypassing system memory yields improvements as high as 2.2× in data transfer speed, and 1.9× in application performance.

international conference on parallel processing | 2012

Direct GPU/FPGA Communication via PCI Express

Ray A. Bittner; Erik Ruf

Parallel processing has hit mainstream computing in the form of CPUs, GPUs and FPGAs. While explorations proceed with all three platforms individually and with the CPU-GPU pair, little exploration has been performed with the synergy of GPU-FPGA. This is due in part to the cumbersome nature of communication between the two. This paper presents a mechanism for direct GPU-FPGA communication and characterizes its performance in a full hardware implementation.

Sigplan Notices | 1995

Optimizing sparse representations for dataflow analysis

Erik Ruf

Sparse program representations allow inter-statement dependences to be represented explicitly, enabling dataflow analyzers to restrict the propagation of information to paths where it could potentially affect the dataflow solution. This paper describes the use of a single sparse program representation, the value dependence graph, in both general and analysis-specific contexts, and demonstrates its utility in reducing the cost of dataflow analysis. We find that several semantics-preserving transformations are beneficial in both contexts.

high performance graphics | 2009

Embedded function composition

Turner Whitted; James T. Kajiya; Erik Ruf; Ray A. Bittner

A low-level graphics processor is assembled from a collection of hardwired functions of screen coordinates embedded directly in the display. Configuration of these functions is controlled by a buffer containing parameters delivered to the processor on-the-fly during display scan. The processor is modular and scalable in keeping with the demands of large, high resolution displays.

Explore More