Is this you? Create Your Porfile

John Wawrzynek

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Wawrzynek is active.

Explore More

Publication

Featured researches published by John Wawrzynek.

field programmable custom computing machines | 1997

Garp: a MIPS processor with a reconfigurable coprocessor

J. Hauser; John Wawrzynek

Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications.

parallel computing | 2009

A view of the parallel computing landscape

Krste Asanovic; Rastislav Bodik; James Demmel; Tony M. Keaveny; Kurt Keutzer; John Kubiatowicz; Nelson Morgan; David A. Patterson; Koushik Sen; John Wawrzynek; David Wessel; Katherine A. Yelick

Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers.

IEEE Computer | 2000

The Garp architecture and C compiler

Timothy J. Callahan; J. Hauser; John Wawrzynek

Various projects and products have been built using off-the-shelf field-programmable gate arrays (FPGAs) as computation accelerators for specific tasks. Such systems typically connect one or more FPGAs to the host computer via an I/O bus. Some have shown remarkable speedups, albeit limited to specific application domains. Many factors limit the general usefulness of such systems. Long reconfiguration times prevent the acceleration of applications that spread their time over many different tasks. Low-bandwidth paths for data transfer limit the usefulness of such systems to tasks that have a high computation-to-memory-bandwidth ratio. In addition, standard FPGA tools require hardware design expertise which is beyond the knowledge of most programmers. To help investigate the viability of connected FPGA systems, the authors designed their own architecture called Garp and experimented with running applications on it. They are also investigating whether Garps design enables automatic, fast, effective compilation across a broad range of applications. They present their results in this article.

IEEE Design & Test of Computers | 2005

BEE2: a high-end reconfigurable computing system

Chen Chang; John Wawrzynek; Robert W. Brodersen

The Berkeley Emulation Engine 2 (BEE2) project is developing a reusable, modular, and scalable framework for designing high-end reconfigurable computers, including a processing-module building block and several programming models. Using these elements, BEE2 can provide over 10 times more computing throughput than a DSP-based system with similar power consumption and cost and over 100 times that of a microprocessor-based system.

IEEE Transactions on Neural Networks | 1993

Silicon auditory processors as computer peripherals

John Lazzaro; John Wawrzynek; Misha Mahowald; Massimo A. Sivilotti; Dave Gillespie

Several research groups are implementing analog integrated circuit models of biological auditory processing. The outputs of these circuit models have taken several forms, including video format for monitor display, simple scanned output for oscilloscope display, and parallel analog outputs suitable for data-acquisition systems. Here, an alternative output method for silicon auditory models, suitable for direct interface to digital computers, is described. As a prototype of this method, an integrated circuit model of temporal adaptation in the auditory nerve that functions as a peripheral to a workstation running Unix is described. Data from a working hybrid system that includes the auditory model, a digital interface, and asynchronous software are given. This system produces a real-time X-window display of the response of the auditory nerve model.

field programmable gate arrays | 1999

HSRA: high-speed, hierarchical synchronous reconfigurable array

William Tsu; Kip Macy; Atul Joshi; Randy Huang; Norman Walker; Tony Tung; Omid Rowhani; Varghese George; John Wawrzynek; André DeHon

There is no inherent characteristic forcing Field Programmable Gate Array (FPGA) or Reconfigurable Computing (RC) Array cycle times to be greater than processors in the same process. Modern FPGAs seldom achieve application clock rates close to their processor cousins because (1) resources in the FPGAs are not balanced appropriately for high-speed operation, (2) FPGA CAD does not automatically provide the requisite transforms to support this operation, and (3) interconnect delays can be large and vary almost continuously, complicating high frequency mapping. We introduce a novel reconfigurable computing array, the High-Speed, Hierarchical Synchronous Reconfigurable Array (HSRA), and its supporting tools. This packagedemonstrates that computing arrays can achieve efficient, high-speedoperation. We have designedand implemented a prototype component in a 0.4 m logic design on a DRAM process which will support 250MHz operation for CAD mapped designs.

international symposium on microarchitecture | 2007

RAMP: Research Accelerator for Multiple Processors

John Wawrzynek; David A. Patterson; Mark Oskin; Shih-Lien Lu; Christoforos E. Kozyrakis; James C. Hoe; Derek Chiou; Krste Asanovic

The RAMP projects goal is to enable the intensive, multidisciplinary innovation that the computing industry will need to tackle the problems of parallel processing. RAMP itself is an open-source, community-developed, FPGA-based emulator of parallel architectures. its design framework lets a large, collaborative community develop and contribute reusable, composable design modules. three complete designs - for transactional memory, distributed systems, and distributed-shared memory - demonstrate the platforms potential.

design automation conference | 2012

Chisel: constructing hardware in a Scala embedded language

Jonathan Bachrach; Huy Vo; Brian C. Richards; Yunsup Lee; Andrew Waterman; Rimas Avizienis; John Wawrzynek; Krste Asanovic

In this paper we introduce Chisel, a new hardware construction language that supports advanced hardware design using highly parameterized generators and layered domain-specific hardware languages. By embedding Chisel in the Scala programming language, we raise the level of hardware design abstraction by providing concepts including object orientation, functional programming, parameterized types, and type inference. Chisel can generate a high-speed C++-based cycle-accurate software simulator, or low-level Verilog designed to map to either FPGAs or to a standard ASIC flow for synthesis. This paper presents Chisel, its embedding in Scala, hardware examples, and results for C++ simulation, Verilog emulation and ASIC synthesis.

compilers, architecture, and synthesis for embedded systems | 2000

Adapting software pipelining for reconfigurable computing

Timothy J. Callahan; John Wawrzynek

The Garp compiler and architecture have been developed in parallel, in part to help investigate whether features of the architecture help facilitate rapid, automatic compilation utilizing the Garp’s rapidly reconfigurable coprocessor. Previously reported work for compiling to Garp has drawn heavily on techniques from software compilation rather than high-level synthesis. That trend continues in this paper, which describes the extension of those techniques to support pipelined execution of loops on the coprocessor. Even though it targets hardware, our approach resembles VLIW software pipelining much more than it resembles hardware synthesis retiming algorithms. This paper presents a simple, uniform schema for pipelining the hardware execution of a broad class of loops. The loops can have multiple control paths, multiple exits (including exits resulting from hyperblock path exclusion), datadependent exits, and arbitrary memory accesses. The Garp compiler is fully implemented, and results are presented. A sample benchmark, wavelet image encoding, saw its overall speedup on accelerated loops grow from a speedup of about 2 without pipelined execution to a speedup of about 4 with pipelined execution.

field programmable gate arrays | 1998

Fast module mapping and placement for datapaths in FPGAs

Timothy J. Callahan; Philip Chong; André DeHon; John Wawrzynek

By tailoring a compiler tree-parsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath structure; this allows exploitation of specialized datapath features in FPGAs, retains regularity, and also results in a smaller problem size. To further achive high mapping speed, we formulate the problem as tree covering and solve it efficiently with a linear-time dynamic programming algorithm. In a novel extension to the tree-covering algorithm, we perform module placement simultaneously with the mapping, still in linear time. Integrating placement has the potential to increase the quality of the result since we can optimize total delay including routing delays. To our knowledge this is the first effort to leverage a grammar-based tree covering tool for datapath module mapping. Further, it is the first work to integrate simultaneous placement with module mapping in a way that preserves linear time complexity.

Explore More