Robert J. Fowler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert J. Fowler is active.

Explore More

Publication

Featured researches published by Robert J. Fowler.

Information Processing Letters | 1981

Optimal packing and covering in the plane are NP-complete☆

Robert J. Fowler; Michael S. Paterson; Steven L. Tanimoto

This paper was motivated by a practical problem related to databases for image processing: given a set of points in the plane, find an efficient covering of that set using identical fured-size rectangles with sides parallel to the coordinate system [ 111. Also, the problem of packing as many square modules as possible into an irregularly-shaped region on a silicon chip was an additional motivation. In one dimension many packing and covering problems for sets of arbitrary objects are NP-complete [ 11, but when restricted to using identical objects they become trivial [S]. We prove that even severely restricted instances of packing and covering problems remain NP-hard in two or more dimensions. We shall recast these as combinatorial problems through the device of the intersection graph. In one dimension if the objects are intervals then their intersection graphs are intervalgraphs ([2,7,10]). Since any graph is the intersection graph of convex objects [ 131 in three or more dimensions the computational results for arbitrary graphs apply to intersection graph problems in those dimensions. There are comparatively few computational results for intersection graphs in two dimensions (see [3,6] ) although they have been studied [S]. Our results help to fii the gap by showing that some very constrained

modeling, analysis, and simulation on computer and telecommunication systems | 1994

MINT: a front end for efficient simulation of shared-memory multiprocessors

Jack E. Veenstra; Robert J. Fowler

MINT is a software package designed to ease the process of constructing event-driven memory hierarchy simulators for multiprocessors. It provides a set of simulated processors that run standard Unix executable files compiled for a MIPS R3000 based multiprocessor. These generate multiple streams of memory reference events that drive a user-provided memory system simulator. MINT uses a novel hybrid technique that exploits the best aspects of native execution and software interpretation to minimize the overhead of processor simulation. Combined with related techniques to improve performance, this approach makes simulation on uniprocessor hosts extremely efficient.<<ETX>>

international symposium on computer architecture | 1993

Adaptive cache coherency for detecting migratory shared data

Alan L. Cox; Robert J. Fowler

Parallel programs exhibit a small number of distinct data-sharing patterns. A common data-sharing pattern, migratory access, is characterized by exclusive read and write access by one processor at a time to a shared datum. We describe a family of adaptive cache coherency protocols that dynamically identify migratory shared data in order to reduce the cost of moving them. The protocols use a standard memory model and processor-cache interface. They do not require any compile-time or run-time software support. We describe implementations for bus-based multiprocessors and for shared-memory multiprocessors that use directory-based caches. These implementations are simple and would not significantly increase hardware cost. We use trace- and execution-driven simulation to compare the performance of the adaptive protocols to standard write-invalidate protocols. These simulations indicate that, compared to conventional protocols, the use of the adaptive protocol can almost halve the number of inter-node messages on some applications. Since cache coherency traffic represents a larger part of the total communication as cache size increases, the relative benefit of using the adaptive protocol also increases.

symposium on operating systems principles | 1989

The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

Alan L. Cox; Robert J. Fowler

PLATINUM is an operating system kernel with a novel memory management system for Non-Uniform Memory Access (NUMA) multiprocessor architectures. This memory management system implements a coherent memory abstraction. Coherent memory is uniformly accessible from all processors in the system. When used by applications coded with appropriate programming styles it appears to be nearly as fast as local physical memory and it reduces memory contention. Coherent memory makes programming NUMA multiprocessors easier for the user while attaining a level of performance comparable with hand-tuned programs. This paper describes the design and implementation of the PLATINUM memory management system, emphasizing the coherent memory. We measure the cost of basic operations implementing the coherent memory. We also measure the performance of a set of application programs running on PLATINUM. Finally, we comment on the interaction between architecture and the coherent memory system. PLATINUM currently runs on the BBN Butterfly Plus Multiprocessor.

Journal of Parallel and Distributed Computing | 1990

Analyzing parallel program executions using multiple views

Thomas J. LeBlanc; John M. Mellor-Crummey; Robert J. Fowler

Abstract To understand a parallel programs execution we must be able to analyze lots of information describing complex relationships among many processes. Various techniques have been used, from program replay to program animation, but each has limited applicability and the lack of a common foundation precludes an integrated solution. Our approach to parallel program analysis is based on a multiplicity of views of an execution. We use a synchronization trace captured during execution to construct a graph representation of the programs behavior. A user manipulates this representation to create and fine-tune visualizations using an integrated, programmable toolkit. Additional execution details can be recovered as needed using program replay to reconstruct an execution from an existing synchronization trace. We present a framework for describing views of a parallel programs execution, and an analysis methodology that relates a sequence of views to the program development cycle. We then describe our toolkit implementation and explain how users construct visualizations using the toolkit. Finally, we present an extended example to illustrate both our methodology and the power of our programmable toolkit.

symposium on operating systems principles | 1981

The architecture of the Eden system

Edward D. Lazowska; Henry M. Levy; Guy T. Almes; Michael J. Fischer; Robert J. Fowler; Stephen C. Vestal

The University of Washingtons Eden project is a five-year research effort to design, build and use an “integrated distributed” computing environment. The underlying philosophy of Eden involves a fresh approach to the tension between these two adjectives. In briefest form, Eden attempts to support both good personal computing and good multi-user integration by combining a node machine / local network hardware base with a software environment that encourages a high degree of sharing and cooperation among its users. The hardware architecture of Eden involves an Ethernet local area network interconnecting a number of node machines with bit-map displays, based upon the Intel iAPX 432 processor. The software architecture is object-based, allowing each user access to the information and resources of the entire system through a simple interface. This paper states the philosophy and goals of Eden, describes the programming methodology that we have chosen to support, and discusses the hardware and kernel architecture of the system.

The Journal of Supercomputing | 2002

HPCVIEW: A Tool for Top-down Analysis of Node Performance

John M. Mellor-Crummey; Robert J. Fowler; Gabriel Marin; Nathan R. Tallent

It is increasingly difficult for complex scientific programs to attain a significant fraction of peak performance on systems that are based on microprocessors with substantial instruction-level parallelism and deep memory hierarchies. Despite this trend, performance analysis and tuning tools are still not used regularly by algorithm and application designers. To a large extent, existing performance tools fail to meet many user needs and are cumbersome to use. To address these issues, we developed HPCVIEW—a toolkit for combining multiple sets of program profile data, correlating the data with source code, and generating a database that can be analyzed anywhere with a commodity Web browser. We argue that HPCVIEW addresses many of the issues that have limited the usability and the utility of most existing tools. We originally built HPCVIEW to facilitate our own work on data layout and optimizing compilers. Now, in addition to daily use within our group, HPCVIEW is being used by several code development teams in DoD and DoE laboratories as well as at NCSA.

Information & Computation | 1982

An efficient algorithm for byzantine agreement without authentication

Danny Dolev; Michael J. Fischer; Robert J. Fowler; Nancy A. Lynch; H. Raymond Strong

Byzantine Agreement involves a system of n processes, of which some t may be faulty. The problem is for the correct processes to agree on a binary value sent by a transmitter that may itself be one of the n processes. If the transmitter sends the same value to each process, then all correct processes must agree on that value, but in any case, they must agree on some value. An explicit solution not using authentication for n = 3t + 1 processes is given, using 2t + 3 rounds and O(t3 log t) message bits. This solution is easily extended to the general case of n ⩾ 3t + 1 to give a solution using 2t + 3 rounds and O(nt + t3 log t) message bits.

international conference on supercomputing | 2005

Low-overhead call path profiling of unmodified, optimized code

Nathan Froyd; John M. Mellor-Crummey; Robert J. Fowler

Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for call graph edges without instrumenting every procedures code to count them. The data structures and algorithms used are efficient enough to construct the complete calling context tree exposed during sampling. The profiler leverages information recorded by compilers for debugging or exception handling to record call path profiles even for highly-optimized code. We describe an implementation for the Tru64/Alpha platform. Experiments profiling the SPEC CPU2000 benchmark suite demonstrate the low (2%-7%) overhead of this profiler. A comparison with instrumentation-based profilers, such as gprof, shows that for call-intensive programs, our sampling-based strategy for call path profiling has over an order of magnitude lower overhead.

Journal of Parallel and Distributed Computing | 2001

Telescoping Languages

Ken Kennedy; Bradley Broom; Keith D. Cooper; Jack J. Dongarra; Robert J. Fowler; Dennis Gannon; S. Lennart Johnsson; John M. Mellor-Crummey; Linda Torczon

As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap?the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called TeleGen, that will make it practical to construct efficient domain-specific high-level languages from annotated component libraries. We call these languages telescoping languages, because they can be nested within one another. For programs written in telescoping languages, high performance and reasonable compilation times can be achieved by exhaustively analyzing the component libraries in advance to produce a language processor that recognizes and optimizes library operations as primitives in the language. The key to making this strategy practical is to keep compile times low by generating a custom compiler with extensive built-in knowledge of the underlying libraries. The goal is to achieve compile times that are linearly proportional to the size of the program presented by the user, rather than to the aggregate size of that program plus the base libraries.

Explore More