Marco Annaratone
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marco Annaratone.
IEEE Transactions on Computers | 1987
Marco Annaratone; E. Arnould; Thomas R. Gross; H. T. Kung; Monica S. Lam; Onat Menzilcioglu; Jon A. Webb
The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes ten cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a Unix host system. Programs for Warp are written in a high-level language supported by an optimizing compiler. The first ten-cell prototype was completed in February 1986; delivery of production machines started in April 1987. Extensive experimentation with both the prototype and production machines has demonstrated that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research. For these applications, Warp is typically several hundred times faster than a VAX 11/780 class computer. This paper describes the architecture, implementation, and performance of the Warp machine. Each major architectural decision is discussed and evaluated with system, software, and application considerations. The programming model and tools developed for the machine are also described. The paper concludes with performance data for a large number of applications.
international symposium on computer architecture | 1986
Marco Annaratone; E. Arnould; Thomas R. Gross; H. T. Kung; Monica S. Lam; Onat Menzilcioglu; Ken Sarocky; Jon A. Webb
This paper describes the scan line array processor (SLAP), a new architecture designed for high-performance yet low-cost image computation. A SLAP is a SIMD linear array of processors, and hence is easy to build and scales well with VLSI technology; yet appropriate special features and programming techniques make it efficient for a surprisingly wide variety of low and medium level computer vision tasks. We describe the basic SLAP concept and some of its variants, discuss a particular planned implementation, and indicate its performance on computer vision and other applications.
afips | 1899
Marco Annaratone; E. Arnould; Robert Cohn; Thomas R. Gross; H. T. Kung; Monica S. Lam; Onat Menzilcioglu; K. Sarocky; J. Senko; Jon A. Webb
The Warp machine* is a high-performance systolic array computer with a linear array of 10 or more cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A 10-cell machine has a peak performance of 100 MFLOPS. Warp is integrated into a UNIXTM host system, and program development is supported by a compiler. Two copies of a 10-cell prototype of the Warp machine became operational in 1986 and are in use at Carnegie Mellon for a wide range of applications, including low-level vision processing for robot vehicle navigation and signal processing. The success of the prototypes led to the development of a production version of the Warp machine that is implemented with printed circuit boards. At least eight copies of this machine are being built by General Electric in 1987. The first copy was delivered to Carnegie Mellon in April 1987. This paper describes the architecture of the production Warp machine and explains the changes that turned the prototype system into a mature high-performance computing engine. * Warp is a service mark of Carnegie Mellon University.
afips | 1899
Marco Annaratone; Francois J. Bitz; Jeff Deutch; H. T. Kung; Leonard Harney; P.C. Maulik; P. S. Tseng; Jon A. Webb
The prototype Warp* machine at Carnegie Mellon is being used to develop new applications in magnetic resonance image processing, as a research tool in image texture analysis, and for scientific computing. In these areas, orders of magnitude speedup over conventional computers are being observed. These new applications build on our use of Warp for low level vision, which is the area for which the machine was originally designed. Experience with the prototype Warp machine has led to rules that programmers should follow to achieve best performance in their application. These rules concern all levels of the Warp system, from input and output ordering to programming each individual Warp cell to memory use in Warps host. The new printed circuit board version of Warp incorporates several architectural improvements, which lead to better support of a wider class of applications. An ambitious design for implementation of Warp in custom VLSI is underway, which promises an increase of at least ten in cost-performance over the current version of Warp, together with the opportunity to build much more powerful systolic arrays delivering GigaFLOPS performance. * warp is a service mark of Carnegie Mellon University
international conference on acoustics, speech, and signal processing | 1986
Marco Annaratone; E. Arnould; H. T. Kung; Onat Menzilcioglu
Warp is a programmable systolic array machine designed by CMU and built together with its industrial partners-GE and Honeywell. The first large scale version of the machine with an array of 10 linearly connected cells will become operational in January 1986. Each cell in the array is capable of performing 10 million 32-bit floating-point operations per second (10 MFLOPS). The 10-cell array can achieve a performance of 50 to 100 MFLOPS for a large variety of signal processing operations such as digital filtering, image compression, and spectral decomposition. The machine, augmented by a Boundary Processor, is particularly effective for computationally expensive matrix algorithms such as solution of linear systems, QR-decomposition and singular value decomposition, that are crucial to many real-time signal processing tasks. This paper outlines the Warp implementation of the 2- dimensional Discrete Cosine Transform and singular value decomposition.
international symposium on computer architecture | 1989
Marco Annaratone; Roland Rühl
The multiprocessor Sequent Symmetry was first delivered to customers with write-through caches. Later on each machine was upgraded with copy-back caches. Because all the other architectural parameters were unchanged (main memory, bus, cache organization and size, and so on), this made it possible to measure the performance of a multiprocessor with no caches, write-through caches, and copy-back caches. We also study the impact that the language (FORTRAN and C) has on the performance of the machine.
Real-Time Signal Processing VIII | 1986
Marco Annaratone; E. Arnould; P. K. Hsiung; H. T. Kung
A high-performance systolic array computer called Warp has been designed by CMU and is currently under construction. The full scale machine has a systolic array of 10 or more linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). By the end of 1985 the first full scale machine will be operational. Low-level vision processing for robots and autonomous vehicles are among the first applications of the machine. This paper describes a new boundary processor to be attached to an end of the linear systolic array in Warp. Extending Warp with this boundary processor will substantially enhance the performance and applicability of the machine. The extended machine will be efficient for new application areas such as solution of linear systems of equations and adaptive signal processing.
Archive | 1986
Marco Annaratone
The literature on MOS transistor characteristics is extensive. The purpose of this chapter is to review the fundamentals of MOS technology through the use of simplified models. A more accurate model to compute the voltage transfer function of an inverter will be introduced in Section 2.6. Most of the equations presented in this chapter will not be justified. The reader interested in a more comprehensive treatment of MOS physics should refer to references at the end of the chapter [34,22,35,25,21].
Archive | 1986
Marco Annaratone
The aim of this chapter is not to present a complete, in-depth analysis of every possible building-block that can be found in today’s VLSI circuits. Rather, a limited number of circuits is dealt with. This chapter presents circuits of increasing complexity: it starts with flip-flops and latches and ends with multipliers. More complex circuits — such as floating-point or memory management units — have not been considered, because an in-depth treatment — including architectural issues, logic design methodologies, and actual implementation — would have eventually occupied a large portion of the book.
Archive | 1986
Marco Annaratone
CMOS I/O buffers and bus drivers — or any circuit which is to drive a significant load both on-chip and off-chip — have not received much attention in the literature, and a comprehensive treatment of them — delay minimization, power dissipation, second order effects, and layout techniques to minimize noise and maximize speed — is lacking. This chapter aims at filling this gap: both input and output buffers will be dealt with, from the stand-point of speed, power dissipation, noise robustness, degree of protection, etc. The design of on-chip drivers — such as bus drivers — can differ from the design of output buffers, because both transmitting and receiving stages are under the designer’s control, and, therefore, a global optimization can be effectively carried out. Nonetheless, simple and reliable approaches are still implemented — for instance, scaled-up inverter chain. In this respect the design of on-chip drivers can be treated like the design of output buffers. Finally, on-chip driver design which optimize both transmitter and receiver is presented in Section 7.8.