Thomas R. Gross | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas R. Gross is active.

Explore More

Publication

Featured researches published by Thomas R. Gross.

IEEE Transactions on Computers | 1987

The Warp Computer: Architecture, Implementation, and Performance

Marco Annaratone; E. Arnould; Thomas R. Gross; H. T. Kung; Monica S. Lam; Onat Menzilcioglu; Jon A. Webb

The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes ten cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a Unix host system. Programs for Warp are written in a high-level language supported by an optimizing compiler. The first ten-cell prototype was completed in February 1986; delivery of production machines started in April 1987. Extensive experimentation with both the prototype and production machines has demonstrated that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research. For these applications, Warp is typically several hundred times faster than a VAX 11/780 class computer. This paper describes the architecture, implementation, and performance of the Warp machine. Each major architectural decision is discussed and evaluated with system, software, and application considerations. The programming model and tools developed for the machine are also described. The paper concludes with performance data for a large number of applications.

conference on object-oriented programming systems, languages, and applications | 1998

Role model based framework design and integration

Dirk Riehle; Thomas R. Gross

Today, any large object-oriented software system is built using frameworks. Yet, designing frameworks and defining their interaction with clients remains a difficult task. A primary reason is that todays dominant modeling concept, the class, is not well suited to describe the complexity of object collaborations as it emerges in framework design and integration. We use role modeling to overcome the problems and limitations of class-based modeling. Using role models, the design of a framework and its use by clients can be described succinctly and with much better separation of concerns than with classes. Using role objects, frameworks can be integrated into use-contexts that have not been foreseen by their original designers.

international symposium on computer architecture | 1990

Supporting systolic and memory communication in iWarp

Shekhar Borkar; Robert Cohn; George W. Cox; Thomas R. Gross; H. T. Kung; Monica S. Lam; Margie Levine; Brian E. Moore; Wire Moore; Craig B. Peterson; Jim Susman; Jim Sutton; John Urbanski; Jon A. Webb

iWarp is a parallel architecture developed jointly by Carnegie Mellon University and Intel Corporation. The iWarp communication system supports two widely used interprocessor communication styles: memory communication and systolic communication. This paper describes the rationale, architecture, and implementation for the iWarp communication system.nThe sending or receiving processor of a message can perform either memory or systolic communication. In memory communication, the entire message is buffered in the local memory of the processor before it is transmitted or after it is received. Therefore communication begins or terminates at the local memory. For conventional message passing methods, both sending and receiving processors use memory communication. In systolic communication, individual data items are transferred as they are produced, or are used as they are received, by the program running at the processor. Memory communication is flexible and well suited for general computing; whereas systolic communication is efficient and well suited for speed critical applications.nA major achievement of the iWarp effort is the derivation of a common design to satisfy the requirements of both systolic and memory communication styles. This is made possible by two important innovations in communication: (1) program access to communication and (2) logical channels. The former allows programs to access data as they are transmitted and to redirect portions of messages to different destinations efficiently. The latter increases the connectivity between the processors and guarantees communication bandwidth for classes of messages. These innovations have provided a focus for the iWarp architecture. The result is a communication system that provides a total bandwidth of 320 MBytes/sec and that is integrated on a single VLSI component with a 20 MFLOPS plus 20 MIPS long instruction word computation engine.

IEEE Transactions on Software Engineering | 1998

A framework based approach to the development of network aware applications

Jürg Bolliger; Thomas R. Gross

Modern networks provide a QoS (quality of service) model to go beyond best-effort services, but current QoS models are oriented towards low-level network parameters (e.g., bandwidth, latency, jitter). Application developers, on the other hand, are interested in quality models that are meaningful to the end-user and, therefore, struggle to bridge the gap between network and application QoS models. Examples of application quality models are response time, predictability or a budget (for transmission costs). Applications that can deal with changes in the network environment are called network-aware. A network-aware application attempts to adjust its resource demands in response to network performance variations. This paper presents a framework-based approach to the construction of network-aware programs. At the core of the framework is a feedback loop that controls the adjustment of the application to network properties. The framework provides the skeleton to address two fundamental challenges for the construction of network-aware applications: how to find out about dynamic changes in network service quality; and how to map application-centric quality measures (e.g., predictability) to network-centric quality measures (e.g., QoS models that focus on bandwidth or latency). Our preliminary experience with a prototype network-aware image retrieval system demonstrates the feasibility of our approach. The prototype illustrates that there is more to network-awareness than just taking network resources and protocols into account and raises questions that need to be addressed (from a software engineering point of view) to make a general approach to network-aware applications useful.

acm sigplan symposium on principles and practice of parallel programming | 1993

Exploiting task and data parallelism on a multicomputer

Jaspal Subhlok; James M. Stichnoth; David R. O'Hallaron; Thomas R. Gross

For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach.

high performance distributed computing | 1998

A resource query interface for network-aware applications

Bruce Lowekamp; Nancy Miller; Dean Sutherland; Thomas R. Gross; Peter Steenkiste; Jaspal Subhlok

Networked systems provide a cost-effective platform for parallel computing, but the applications have to deal with the changing availability of computation and communication resources. Network-awareness is a recent attempt to bridge the gap between the realities of networks and the demands of applications. Network-aware applications obtain information about their execution environment and dynamically adapt to enhance their performance. Adaptation is especially important for synchronous parallel applications because a single busy communication link can become the bottleneck and degrade overall performance dramatically. This paper presents Remos, a uniform API that allows applications to obtain relevant network information, and reports on the development of parallel applications in this environment. The challenges in defining a uniform interface include network heterogeneity, diversity and variability in network traffic, and resource sharing in the network and even inside an application. The first implementation of the Remos interface uses SNMP to monitor IP-based networks. This paper reports on our methodology for developing adaptive parallel applications for high-speed networks with Remos and presents experimental results using applications generated by the Fx parallelizing compiler. The results highlight the importance and effectiveness of adaptive parallel computing.

Journal of Parallel and Distributed Computing | 1994

Generating communication for array statements: design, implementation, and evaluation

James M. Stichnoth; David R. O'Hallaron; Thomas R. Gross

Abstract Array statements as included in Fortran 90 or High Performance Fortran (HPF) are a well-accepted way to specify data parallelism in programs. When generating code for such a data parallel program for a private memory parallel system, the compiler must determine when array elements must be moved from one processor to another. This paper describes a practical method to compute the set of array elements that are to be moved; it covers all the distributions that are included in HPF: block, cyclic, and block-cyclic. This method is the foundation for an efficient protocol for modern private memory parallel systems: for each block of data to be sent, the sender processor computes the local address in the receiver′s address space, and the address is then transmitted together with the data. This strategy increases the communication load but reduces the overhead on the receiving processor. We implemented this optimization in an experimental Fortran compiler, and this paper reports an empirical evaluation on a 64-node private memory iWarp system, using a number of different distributions.

IEEE Parallel & Distributed Technology: Systems & Applications | 1994

Task Parallelism in a High Performance Fortran Framework

Thomas R. Gross; David R. O'Hallaron; Jaspal Subhlok

Exploiting both data and task parallelism in a single framework is the key to achieving good performance for a variety of applications.

international symposium on computer architecture | 1986

Warp architecture and implementation

Marco Annaratone; E. Arnould; Thomas R. Gross; H. T. Kung; Monica S. Lam; Onat Menzilcioglu; Ken Sarocky; Jon A. Webb

This paper describes the scan line array processor (SLAP), a new architecture designed for high-performance yet low-cost image computation. A SLAP is a SIMD linear array of processors, and hence is easy to build and scales well with VLSI technology; yet appropriate special features and programming techniques make it efficient for a surprisingly wide variety of low and medium level computer vision tasks. We describe the basic SLAP concept and some of its variants, discuss a particular planned implementation, and indicate its performance on computer vision and other applications.

Software - Practice and Experience | 1990

Structured dataflow analysis for arrays and its use in an optimizing complier

Thomas R. Gross; Peter Steenkiste

We extend the well‐known interval analysis method so that it can be used to gather global flow information for individual array elements. Data dependences between all array accesses in different basic blocks, different iterations of the same loop, and across different loops are computed and represented as labelled arcs in a program flow graph. This approach results in a uniform treatment of scalars and arrays in the compiler and builds a systematic basis from which the compiler can perform numerous global optimizations.

Explore More