Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert Muth is active.

Publication


Featured researches published by Robert Muth.


programming language design and implementation | 2005

Pin: building customized program analysis tools with dynamic instrumentation

Chi-Keung Luk; Robert Cohn; Robert Muth; Harish Patil; Artur Klauser; P. Geoffrey Lowney; Steven Wallace; Vijay Janapa Reddi; Kim M. Hazelwood

Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pins rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the applications original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pins versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium®, and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.


ACM Transactions on Programming Languages and Systems | 2000

Compiler techniques for code compaction

Saumya K. Debray; William S. Evans; Robert Muth; Bjorn De Sutter

In recent years there has been an increasing trend toward the incorpor ation of computers into a variety of devices where the amount of memory available is limited. This makes it desirable to try to reduce the size of applications where possible. This article explores the use of compiler techniques to accomplish code compaction to yield smaller executables. The main contribution of this article is to show that careful, aggressive, interprocedural optimization, together with procedural abstraction of repeated code fragments, can yield significantly better reductions in code size than previous approaches, which have generally focused on abstraction of repeated instruction sequences. We also show how “equivalent” code fragments can be detected and factored out using conventional compiler techniques, and without having to resort to purely linear treatments of code sequences as in suffix-tree-based approaches, thereby setting up a framework for code compaction that can be more flexible in its treatment of what code fragments are considered equivalent. Our ideas have been implemented in the form of a binary-rewriting tool that reduces the size of executables by about 30% on the average.


symposium on principles of programming languages | 1998

Alias analysis of executable code

Saumya K. Debray; Robert Muth; Matthew Weippert

Recent years have seen increasing interest in systems that reason about and manipulate executable code. Such systems can generally benefit from information about aliasing. Unfortunately, most existing alias analyses are formulated in terms of high-level language features, and are unable to cope with features, such as pointer arithmetic, that pervade executable programs. This paper describes a simple algorithm that can be used to obtain aliasing information for executabie code. In order to be practical, the algorithm is carefut to keep its memory requirements low, sacrificing precision where necessary to achieve this goal. Experimental results indicate that it is nevertheless able to provide a reasonable amount of information about memory references across a variety of benchmark programs.


combinatorial pattern matching | 1996

Approximate Multiple Strings Search

Robert Muth; Udi Manber

This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algorithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interesting techniques, overall the algorithm is not deep theoretically. The emphasis of this paper is on the experimental side of algorithm design. We show the importance of careful design, experimentation, and utilization of current architectures. In particular, we discuss the issues of locality and cache performance, fast hash functions, and incremental hashing techniques. We introduce the notion of two-level hashing, which utilizes cache behavior to speed up hashing, especially in cases where unsuccessful searches are not uncommon. Two-level hashing may be useful for many other applications. The end result is also interesting by itself. We show that multiple search with one error is fast enough for most text applications.


static analysis symposium | 2000

Code Specialization Based on Value Profiles

Robert Muth; Scott A. Watterson; Saumya K. Debray

It is often the case at runtime that variables and registers in programs are “quasi-invariant,” i.e., the distribution of the values they take on is very skewed, with a small number of values occurring most of the time. Knowledge of such frequently occurring values can be exploited by a compiler to generate code that optimizes for the common cases without sacrificing the ability to handle the general case. The idea can be generalized to the notion of expression profiles, which profile the runtime values of arbitrary expressions and can permit optimizations that may not be possible using simple value profiles. Since this involves the introduction of runtime tests, a careful cost-benefit analysis is necessary to make sure that the benefits from executing the code specialized for the common values outweigh the cost of testing for these values. This paper describes a static cost-benefit analysis that allows us to discover when such specialization is profitable. Experimental results, using such an analysis and an implementation of low-level code specialization based on value and expression profiles within a link-time code optimizer, are given to validate our approach.


symposium on code generation and optimization | 2004

Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture

Chi-Keung Luk; Robert Muth; Harish Patil; Robert Cohn; P. Geoffrey Lowney

Ispike is a post-link optimizer developed for the Intel/spl reg/ Itanium Processor Family (IPF) processors. The IPF architecture poses both opportunities and challenges to post-link optimizations. IPF offers a rich set of performance counters to collect detailed profile information at a low cost, which is essential to post-link optimization being practical. At the same time, the predication and bundling features on IPF make post-link code transformation more challenging than on other architectures. In Ispike, we have implemented optimizations like code layout, instruction prefetching, data layout, and data prefetching that exploit the IPF advantages, and strategies that cope with the IPF-specific challenges. Using SPEC CINT2000 as benchmarks, we show that Ispike improves performance by as much as 40% on the ltanium/spl reg/2 processor, with average improvement of 8.5% and 9.9% over executables generated by the Intel/spl reg/ Electron compiler and by the Gcc compiler, respectively. We also demonstrate that statistical profiles collected via IPF performance counters and complete profiles collected via instrumentation produce equal performance benefit, but the profiling overhead is significantly lower for performance counters.


international conference on supercomputing | 2002

Profile-guided post-link stride prefetching

Chi-Keung Luk; Robert Muth; Harish Patil; Richard Weiss; P. Geoffrey Lowney; Robert Cohn

Data prefetching is an effective approach to addressing the memory latency problem. While a few processors have implemented hardware-based data prefetching, the majority of modern processors support data-prefetch instructions and rely on compilers to automatically insert prefetches. However, most prefetching schemes in commercial compilers suffer from two limitations: (1) the source code must be available before prefetching can be applied, and (2) these prefetching schemes target only loops with statically-known strided accesses. In this study, we broaden the scope of software-controlled prefetching by addressing the above two limitations. We use profiling to discover strided accesses that frequently occur during program execution but are not determinable by the compiler. We then use the strides discovered to insert prefetches into the executable directly, without the need for re-compilation. Performance evaluation was done on an Alpha 21264-based system with a 64KB data cache and an 8MB secondary cache. We find that even with such large caches, our technique offers speedups ranging from 3% to 56% in 11 out of the 26 SPEC2000 benchmarks. Our technique has been incorporated into Pixie and Spike, two products in Compaqs Tru64 Unix.


symposium on principles of programming languages | 2000

On the complexity of flow-sensitive dataflow analyses

Robert Muth; Saumya K. Debray

This paper attempts to address the question of why certain dataflow analysis problems can be solved efficiently, but not others. We focus on flow-sensitive analyses, and give a simple and general result that shows that analyses that require the use of relational attributes for precision must be PSPACE-hard in general. We then show that if the language constructs are slightly strengthened to allow a computation to maintain a very limited summary of what happens along an execution path, inter-procedural analyses become EXPTIME-hard. We discuss applications of our results to a variety of analyses discussed in the literature. Our work elucidates the reasons behind the complexity results given by a number of authors, improves on a number of such complexity results, and exposes conceptual commonalities underlying such results that are not readily apparent otherwise.


colloquium on trees in algebra and programming | 1997

On the Complexity of Function Pointer May-Alias Analysis

Robert Muth; Saumya K. Debray

This paper considers the complexity of interprocedural function pointer may-alias analysis, i.e., determining the set of functions that a function pointer (in a language such as C) can point to at a point in a program. This information is necessary, for example, in order to construct the control flow graphs of programs that use function pointers, which in turn is fundamental for most dataflow analyses and optimizations. We show that the general problem is complete for deterministic exponential time. We then consider two natural simplifications to the basic (precise) analysis and examine their complexity. The approach described can be used to readily obtain similar complexity results for related analyses such as reachability and recursiveness.


compiler construction | 1999

Link-Time Improvement of Scheme Programs

Saumya K. Debray; Robert Muth; Scott A. Watterson

Optimizing compilers typically limit the scope of their analyses and optimizations to individual modules. This has two drawbacks: first, library code cannot be optimized together with their callers, which implies that reusing code through libraries incurs a penalty; and second, the results of analysis and optimization cannot be propagated from an application module written in one language to a module written in another. A possible solution is to carry out (additional) program optimization at link time. This paper describes our experiences with such optimization using two different optimizing Scheme compilers, and several benchmark programs, via alto, a link-time optimizer we have developed for the DEC Alpha architecture. Experiments indicate that significant performance improvements are possible via link-time optimization even when the input programs have already been subjected to high levels of compile-time optimization.

Collaboration


Dive into the Robert Muth's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge