Vaughn Betz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vaughn Betz is active.

Explore More

Publication

Featured researches published by Vaughn Betz.

Archive | 1999

Architecture and CAD for Deep-Submicron FPGAs

Vaughn Betz; Jonathan Rose; Alexander Marquardt

From the Publisher: Architecture and CAD for Deep-Submicron FPGAs addresses several key issues in the design of high-performance FPGA architectures and CAD tools, with particular emphasis on issues that are important for FPGAs implemented in deep-submicron processes. Three factors combine to determine the performance of an FPGA: the quality of the CAD tools used to map circuits into the FPGA, the quality of the FPGA architecture, and the electrical (i.e. transistor-level) design of the FPGA. Architecture and CAD for Deep-Submicron FPGAs examines all three of these issues in concert.

field programmable logic and applications | 1997

VPR: A new packing, placement and routing tool for FPGA research

Vaughn Betz; Jonathan Rose

We describe the capabilities of and algorithms used in a ne w FPGA CAD tool, Versatile Place and Route (VPR). In terms of minimizing routing area, VPR outperforms all published FPGA place and route tools to which we can compare. Although the algorithms used are based on pre viously known approaches, we present several enhancements that improve run-time and quality. We present placement and routing results on a new set of lar ge circuits to allo w future benchmark comparisons of FPGA place and route tools on circuit sizes more typical of today’s industrial designs. VPR is capable of targeting a broad range of FPGA architectures, and the source code is publicly available. It and the associated netlist translation / clustering tool VPACK ha ve already been used in a number of research projects w orldwide, and should be useful in many areas of FPGA architecture research.

field programmable gate arrays | 1999

Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density

Alexander Marquardt; Vaughn Betz; Jonathan Rose

In 1999, most commercial FPGAs, like the Altera Flex and Xilinx Virtex FPGAs already had cluster-based logic blocks. However, the modeling and evaluation of these sorts of architectures was still in its infancy. In the previous year, Betz had shown that cluster-based logic blocks led to improved density. The real advantage of clustered-based logic blocks, though, was speed, as this paper demonstrates. In doing so, this paper opened up an entirely new research area, setting the framework for numerous packing algorithms that have become a fundamental part of any FPGA CAD flow.

field programmable gate arrays | 2000

Timing-driven placement for FPGAs

Alexander Marquardt; Vaughn Betz; Jonathan Rose

In this paper we introduce a new Simulated Annealing-based timing-driven placement algorithm for FPGAs. This paper has three main contributions. First, our algorithm employs a novel method of determining source-sink connection delays during placement. Second, we introduce a new cost function that trades off between wire-use and critical path delay, resulting in significant reductions in critical path delay without significant increases in wire-use. Finally, we combine connection-based and path-based timing-analysis to obtain an algorithm that has the low time-complexity of connection-based timing-driven placement, while obtaining the quality of path-based timing-driven placement. A comparison of our new algorithm to a well known non-timing-driven placement algorithm demonstrates that our algorithm is able to increase the post-place-and-route speed (using a full path-based timing-driven router and a realistic routing architecture) of 20 MCNC benchmark circuits by an average of 42%, while only increasing the minimum wiring requirements by an average of 5%.

field programmable gate arrays | 2005

The Stratix II logic and routing architecture

David Lewis; Elias Ahmed; Gregg William Baeckler; Vaughn Betz; Mark Bourgeault; David Cashman; David Galloway; Michael D. Hutton; Christopher F. Lane; Andy L. Lee; Paul Leventis; Sandy Marquardt; Cameron McClintock; Ketan Padalia; Bruce B. Pedersen; Giles Powell; Boris Ratchev; Srinivas T. Reddy; Jay Schleicher; Kevin Stevens; Richard Yuan; Richard G. Cliff; Jonathan Rose

This paper describes the Altera Stratix II™ logic and routing architecture. This architecture features a novel adaptive logic module (ALM) that is based on a 6-LUT, but can be partitioned into two smaller LUTs to efficiently implement circuits containing a range of LUT sizes that arises in conventional synthesis flows. This provides a performance increase of 15% in the Stratix II architecture while reducing area by 2%. The ALM also includes a more powerful arithmetic structure that can perform two bits of arithmetic per ALM, and perform a sum of up to three inputs. The routing fabric adds a new set of fast inputs to the routing multiplexers for another 3% improvement in performance, while other improvements in routing efficiency cause another 6% reduction in area. These changes in combination with other circuit and architecture changes in Stratix II contribute 27% of an overall 51% performance improvement (including architecture and process improvement). The architecture changes reduce area by 10% in the same process, and by 50% after including process migration.

field programmable gate arrays | 1999

FPGA routing architecture: segmentation and buffering to optimize speed and density

Vaughn Betz; Jonathan Rose

In this work we investigate the routing architecture of FPGAs, focusing primarily on determining the best distribution of routing segment lengths and the best mix of pass transistor and tri-state buffer routing switches. While most commercial FPGAs contain many length 1 wires (wires that span only one logic block) we find that wires this short lead to FPGAs that are inferior in terms of both delay and routing area. Our results show instead that it is best for FPGA routing segments to have lengths of 4 to 8 logic blocks. We also show that 50% to 80% of the routing switches in an FPGA should be pass transistors, with the remainder being tri-state buffers. Architectures that employ the best segmentation distributions and the best mixes of pass transistor and tri-state buffer switches found in this paper are not only 11% to 18% faster than a routing architecture very similar to that of the Xilinx XC4000X but also considerably simpler. These results are obtained using an architecture investigation infrastructure that contains a fully timing-driven router and detailed area and delay models.

field programmable gate arrays | 2003

The stratixπ routing and logic architecture

David Lewis; Vaughn Betz; David Jefferson; Andy L. Lee; Christopher F. Lane; Paul Leventis; Sandy Marquardt; Cameron McClintock; Bruce B. Pedersen; Giles Powell; Srinivas T. Reddy; Chris Wysocki; Richard G. Cliff; Jonathan Rose

This paper describes the Altera Stratix logic and routing architecture. The primary goals of the architecture were to achieve high performance and logic density. We give an overview of the entire device, and then focus on the logic and routing architecture. The Stratix logic architecture is based on a cluster of ten 4-input LUTs and its routing consists of staggered routing lines. We describe the development of the routing architecture, including its directional bias, its direct-drive routing which reduces both area and delay. The logic array block and logic cell design is also described, and new routing structures with in the logic array block, and logic element features are described.

ACM Transactions on Reconfigurable Technology and Systems | 2014

VTR 7.0: Next Generation Architecture and CAD System for FPGAs

Jason Luu; Jeffrey B. Goeders; Michael Wainberg; Andrew Somerville; Thien Yu; Konstantin Nasartschuk; Miad Nasr; Sen Wang; Tim X. Liu; Nooruddin Ahmed; Kenneth B. Kent; Jason Helge Anderson; Jonathan Rose; Vaughn Betz

Exploring architectures for large, modern FPGAs requires sophisticated software that can model and target hypothetical devices. Furthermore, research into new CAD algorithms often requires a complete and open source baseline CAD flow. This article describes recent advances in the open source Verilog-to-Routing (VTR) CAD flow that enable further research in these areas. VTR now supports designs with multiple clocks in both timing analysis and optimization. Hard adder/carry logic can be included in an architecture in various ways and significantly improves the performance of arithmetic circuits. The flow now models energy consumption, an increasingly important concern. The speed and quality of the packing algorithms have been significantly improved. VTR can now generate a netlist of the final post-routed circuit which enables detailed simulation of a design for a variety of purposes. We also release new FPGA architecture files and models that are much closer to modern commercial architectures, enabling more realistic experiments. Finally, we show that while this version of VTR supports new and complex features, it has a 1.5× compile time speed-up for simple architectures and a 6× speed-up for complex architectures compared to the previous release, with no degradation to timing or wire-length quality.

field programmable gate arrays | 1998

A fast routability-driven router for FPGAs

Jordan S. Swartz; Vaughn Betz; Jonathan Rose

Three factors are driving the demand for rapid FPGA compilation. First, as FPGAs have grown in logic capacity, the compile computation has grown more quickly than the compute power of the available computers. Second, there exists a subset of users who are willing to pay for very high speed compile with a decrease in quality of result, and accordingly being required to use a larger FPGA or use more real-estate on a given FPGA than is otherwise necessary. Third, very high speed compile has been a long-standing desire of those using FPGA-based custom computing machines, as they want compile times at least closer to those of regular computers. This paper focuses on the routing phase of the compile process, and in particular on routability-driven routing (as opposed to timing-driven routing). We present a routing algorithm and routing tool that has three unique capabilities relating to very high-speed compile:For a “low stress” routing problem (which we define as the case where the track supply is at least 10% greater than the minimun number of tracks per channel actually needed to route a circuit) the routing time is very fast. For example, the routing phase (after the netlist is parsed and the routing graph is constructed) for a 20,000 LUT/FF pair circuit with 30% extra tracks is only 23 seconds on a 300 MHz Sparcstation. For low-stress routing problems the routing time is near-linear in the size of the circuit, and the linearity constant is very small: 1.1 ms per LUT/FF pair, or roughly 55,000 LUT/FF pairs per minute. For more difficult routing problems (where the track supply is close to the minimum needed) we provide a method that quickly identifies and subdivides this class into two sub-classes: (i) those circuits which are difficult (but possible) to route and will take significantly more time than low-stress problems, and (ii) those circuits which are impossible to route. In the first case the user can choose to continue or reduce the amount of logic; in the second case the user is forced to reduce the amount of logic or obtain a larger FPGA.

custom integrated circuits conference | 1997

Cluster-based logic blocks for FPGAs: area-efficiency vs. input sharing and size

Vaughn Betz; Jonathan Rose

While modern FPGAs often contain clusters of 4-input lookup tables and flip flops, little is known about good choices for two key architectural parameters: the number of these basic logic elements (BLEs) in each cluster, and the total number of distinct inputs that the programmable routing can provide to each cluster. In this paper we explore the effect of these parameters on FPGA area-efficiency. We show that a cluster containing N BLEs needs only 2N+2 distinct inputs (vs. the 4N maximum) to achieve complete logic utilization. Secondly, we find that a cluster size of 4 is most area-efficient, and leads to an FPGA that is 5-10% more area-efficient than an FPGA based on a single BLE logic block.

Explore More