Is this you? Create Your Porfile

Alvin M. Despain

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alvin M. Despain is active.

Explore More

Publication

Featured researches published by Alvin M. Despain.

international symposium on computer architecture | 1978

X-Tree: A tree structured multi-processor computer architecture

Alvin M. Despain; David A. Patterson

The problem of organizing multiple, monolithic microprocessors into an effective general purpose computer structure is examined. A tree structure with extra interconnections was found to be especially attractive. It provides a structured hierarchy for control, addressing and message routing. More important, it appears to provide a mechanism to automatically migrate data abstractions and processes over the network of processors. The network can be expanded to any desired size and no global control or routine mechanisms are needed.n The potential advantages and disadvantages of the X-Tree structure are discussed and the results of some static simulations are presented.

international symposium on computer architecture | 1986

Multiprocessor cache synchronization: issues, innovations, evolution

Philip Bitar; Alvin M. Despain

Many options are possible in a cache synchronization (or consistency) scheme for a broadcast system. We clarify basic concepts, analyze the handling of shared data, and then describe a protocol that we are currently exploring. Finally, we analyze the evolution of options that have been proposed under write-in (or write-back) policy. We show how our protocol extends this evolution with new methods for efficient busy-wait locking, waiting, and unlocking. The lock scheme allows locking and unlocking to occur in zero time, eliminating the need for test-and-set. The scheme also integrates processor atomic read-modify-write instructions and programmer/compiler busy-wait-synchronized operations under the same mechanism. The wait scheme eliminates all unsuccessful retries from the bus, and allows a process to work while waiting.

international symposium on computer architecture | 1990

Fast Prolog with an extended general purpose architecture

Bruce K. Holmer; Barton Sano; Michael J. Carlton; Peter Van Roy; Ralph Clarke Haygood; William R. Bush; Alvin M. Despain; Joan M. Pendleton; Tep P. Dobry

Most Prolog machines have been based on specialized architectures. Our goal is to start with a general purpose architecture and determine a minimal set of extensions for high performance Prolog execution. We have developed both the architecture and optimizing compiler simultaneously, drawing on results of previous implementations. We find that most Prolog specific operations can be done satisfactorily in software; however, there is a crucial set of features that the architecture must support to achieve the best Prolog performance. The emphasis of this paper is on our architecture and instruction set. The costs and benefits of the special architectural features and instructions are analyzed. Simulated performance results are presented and indicate a peak compiled Prolog performance of 3.68 million logical inferences per second.

international symposium on microarchitecture | 1984

Design decisions influencing the microarchitecture for a Prolog machine

Tep P. Dobry; Yale N. Patt; Alvin M. Despain

The PLM-1 is the first step in the hardware implementation of a heterogeneous MIMD processor for logic programming. This paper describes its ISP architecture, and discusses in detail some of the design decisions relative to its microarchitecture.

design automation conference | 1986

Delay Reduction Using Simulated Annealing

Jonathan D. Pincus; Alvin M. Despain

The MOST program chooses appropriate sizes for transistors in a VLSI schematic to meet specified delay criteria. A simulated annealing algorithm is used in conjunction with a timing analyzer, both written in Prolog. A screening function takes advantage of the symbolic equations provided by the timing analyzer to reject clearly inappropriate choices, so full timing analysis is performed less frequently. Despite running in an interpreted Prolog, performance gains of over 50% versus an unsized circuit can be attained in less than 10 cpu minutes.

international symposium on computer architecture | 1987

Performance studies of a parallel Prolog architecture

Barry S. Fagin; Alvin M. Despain

This paper presents a new multiprocessor architecture for the parallel execution of logic programs, developed as part of the Aquarius Project. This architecture is designed to support AND-parallelism, OR-parallelism, and intelligent backtracking. We present the most comprehensive experimental results available to date on combined AND-parallelism, OR-parallelism, and intelligent backtracking in Prolog programs. Simulation results indicate that most Prolog programs in use today cannot effectively make use of multiprocessing.

international symposium on microarchitecture | 1985

Compiling Prolog into microcode: a case study using the NCR/32-000

Barry S. Fagin; Yale N. Patt; Vason P. Srini; Alvin M. Despain

A proven method of obtaining high performance for Prolog programs is to first translate them into the instruction set of Warrens Abstract Machine, or W-code [1]. From that point, there are several models of execution available. This paper describes one of them:- the compilation of W-code directly into the vertical microcode of a general purpose host processor, the NCR/32-000. The result is the fastest functioning Prolog system known to the authors. We describe the implementation, provide benchmark measurements, and analyze our results.

international conference on supercomputing | 1988

A two-tier memory architecture for high-performance multiprocessor systems

Tam M. Nguyen; Vason P. Srini; Alvin M. Despain

Performance of high-speed multiprocessor systems is limited by the available bandwidth to memory and the need to synchronize write sharable data. This paper presents a new memory system that separates synchronization related data from others. The memory system has two tiers: synchronization memory and high bandwidth (HB) memory. The synchronization memory consists of snooping caches connected to a bus and is used to store synchronization variables such as locks and semaphores. The HB memory is used to store the bulk of the application program code and data. It contains caches and a high bandwidth interconnection network to memory, such as the crossbar, but does not have full snooping among caches.nThe two tier memory system has been evaluated by analyzing the memory behavior of the simulated parallel execution of Prolog programs. Initial results indicate that the two tier memory system potentially reduces memory interference and speeds up synchronization. Three different schemes have been studied for the caches on the HB memory and the results are presented. The two-tier memory system has potential applications in areas where synchronization is light to medium and local data is often accessed.

Journal of Logic Programming | 1996

Design and analysis of hardware for high-performance prolog☆

Bruce K. Holmer; Barton Sano; Michael J. Carlton; Peter Van Roy; Alvin M. Despain

Abstract Most Prolog machines have been based on specialized architectures. Our goal is to start with a general-purpose architecture and determine a minimal set of extensions for high-performance Prolog execution. We have developed both the architecture and optimizing compiler simultaneously, drawing on results of previous implementations. We find that most Prolog-specific operations can be done satisfactorily in software; however, there is a crucial set of features that the architecture must support to achieve the best Prolog performance. In this paper, the costs and benefits of special architectural features and instructions are analyzed. In addition, we study the relationship between the strength of compiler optimization and the benefit of specialized hardware. We demonstrate that our base architecture can be extended to include explicit support for Prolog with modest increase in chip area (13%), and yet attain a significant performance benefit (60–70%). Experiments using optimized code that approximates the output of future optimizing compilers indicate that special hardware support can still provide a performance benefit of 30–35%. The microprocessor described here, the VLSI-BAM, has been fabricated and incorporated into a working test system.

Journal of Parallel and Distributed Computing | 1985

Fast fourier transform processors using Gaussian residue arithmetic

Alvin M. Despain; Allen M. Peterson; Oscar S. Rothaus; Erling H. Wold

Residue arithmetic using moduli which are Gaussian primes is suggested as a method for computing the fast Fourier transform (FFT). For complex operations, a substantial savings in hardware and time over residue arithmetic using real moduli can be obtained using complex moduli. Gaussian residue arithmetic is discussed and methods for conversion into and out of the complex residue representation are developed. This representation lends itself to table-driven computation, allowing very low latency designs to be developed. A 64-point Cooley-Tukey style processor and a 60-point Rader-Winograd style processor using this technique are described and compared. Hardware savings are realized by approximating the rotations necessary to perform the FFT by small complex integers and by scaling the results of the computation at an intermediate point. It is shown that further hardware reductions can be made by developing custom integrated circuits at the expense of latency.

Explore More