Brent E. Nelson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brent E. Nelson is active.

Explore More

Publication

Featured researches published by Brent E. Nelson.

field programmable custom computing machines | 1999

A CAD suite for high-performance FPGA design

Brad L. Hutchings; Peter Bellows; Joseph Hawkins; K. Scott Hemmert; Brent E. Nelson; Mike Rytting

This paper describes the current status of a suite of CAD tools designed specifically for use by designers who are developing high-performance configurable-computing applications. The basis of this tool suite is JHDL, a design tool originally conceived as a way to experiment with Run-Time Reconfigured (RTR) designs. However, what began as a limited experiment to model RTR designs with Java has evolved into a comprehensive suite of design tools and verification aids, with these tools being used successfully to implement high-performance applications in Automated Target Recognition (ATR), sonar beamforming, and general image processing on configurable-computing systems.

field-programmable logic and applications | 2011

RapidSmith: Do-It-Yourself CAD Tools for Xilinx FPGAs

Christopher Lavin; Marc Padilla; Jaren Lamprecht; Philip Lundrigan; Brent E. Nelson; Brad L. Hutchings

Creating CAD tools for commercial FPGAs is a difficult task. Closed proprietary device databases and unsupported interfaces are largely to blame for the lack of CAD research found on commercial architectures versus hypothetical architectures. This paper formally introduces RapidSmith, a new set of tools and APIs that enable CAD tool creation for Xilinx FPGAs. Based on the Xilinx Design Language (XDL), RapidSmith provides a compact, yet, fast device database with hundreds of APIs that enable the creation of placers, routers and several other tools for Xilinx devices. RapidSmith alleviates several of the difficulties of using XDL and this work demonstrates the kinds of research facilitated by removing such challenges.

field-programmable custom computing machines | 2001

Instrumenting Bitstreams for Debugging FPGA Circuits

Paul S. Graham; Brent E. Nelson; Brad L. Hutchings

Since FPGAs are frequently used to improve the time to market for products, shortening the time for validating and debugging FPGA designs is, thus, important. Our paper discusses how directly instrumenting FPGA programming data, or bitstreams, with debugging hardware can improve the debugging productivity for designers and, thus, reduce a design’s time to market. We also provide some background relating to the current state of the art in debugging FPGA designs and describe how bitstream instrumentation can be automated using JHDL, JBits and JRoute. When instrumenting designs with embedded logic analyzers at the bitstream level, we have witnessed design modification speed-ups ranging from about 6 to 19 times over more conventional techniques. We will also briefly mention other applications of bitstream modification in debugging FPGA designs.

field-programmable custom computing machines | 2003

Reconfigurable computing application frameworks

Anthony Lynn Slade; Brent E. Nelson; Brad L. Hutchings

FPGA-based (field programmable gate array) configurable computing machines (CCMs) offer powerful and flexible general-purpose computing platforms. However, development for FPGA-based designs using modern CAD (computer aided design) tools is geared mainly toward an ASIC-like process. This is inadequate for the needs of CCM application development. This paper discusses an application framework for developing CCM-based applications beyond just the hardware configuration. This framework leverages the advantages of CCMs (availability, programmability, visibility, and controllability) to help create CCM-based applications throughout the entire development process (i.e. design, debug, and deploy). The framework itself is deployed with the final application, thus permitting dynamic circuit configurations that include data folding optimizations based on user input. The resulting system aids in creating applications that are potentially more intuitive, easier to develop, and better performing. An example application demonstrates the use of the application framework and the potential benefits.

field-programmable custom computing machines | 2011

HMFlow: Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping

Christopher Lavin; Marc Padilla; Jaren Lamprecht; Philip Lundrigan; Brent E. Nelson; Brad L. Hutchings

The FPGA compilation process (synthesis, map, place, and route) is a time consuming task that severely limits designer productivity. Compilation time can be reduced by saving implementation data in the form of hard macros. Hard macros consist of previously synthesized, placed and routed circuits that enable rapid design assembly because of the native FPGA circuitry (primitives and nets)which they encapsulate. This work presents results from creating a new FPGA design flow based on hard macros called HMF low. HMF low has shown speedups of 10-50X over the fastest configuration of the Xilinx tools. Designed for rapid prototyping, HMF low achieves these speedups by only utilizing up to 50 percent of the resources on an FPGA and produces implementations that run 2-4X slower than those produced by Xilinx. These speedups are obtained on a wide range of benchmark designs with some exceeding 18,000 slices on a Virtex 4 LX200.

field programmable logic and applications | 2001

Using Design-Level Scan to Improve FPGA Design Observability and Controllability for Functional Verification

Timothy Wheeler; Paul S. Graham; Brent E. Nelson; Brad L. Hutchings

This paper describes a structured technique for providing full observability and controllability for functionally debugging FPGA designs in hardware, capabilities which are currently not available otherwise. Similar in concept to flip-flop scan chains for VLSI, our design-level scan technique includes all FPGA flip-flops and RAMs in a serial scan chain using FPGA logic rather than transistor logic. This paper describes the general procedure for modifying designs with design-level scan chains and provides the results of adding scan to several designs, both large and small. We observed an average FPGA resource overhead of 84% for full scan and only 60% when we augmented existing FPGA capabilities with scan to provide complete observability and controllability in hardware.

field-programmable logic and applications | 2002

Novel Optimizations for Hardware Floating-Point Units in a Modern FPGA Architecture

Eric Roesler; Brent E. Nelson

As FPGA densities have increased, the feasibility of using floatingpoint computations on FPGAs has improved. Moreover, recent innovations in FPGA architecture have changed the design tradeoff space by providing new fixed circuit functions which may be employed in floating-point computations. These include high density multiplier blocks and shift registers. This paper evaluates the use of such blocks for the design of a family of floating-point units including add/sub, multiplier, and divider. Portions of the units that would receive the greatest benefit from the use of multipliers and shift registers are identified. It is shown that the use of these results in significant area savings compared to similar floating-point units based solely on conventional LUT/FF logic. Finally, a complete floating-point application circuit that solves a classic heat transfer problem is presented.

field-programmable custom computing machines | 2003

Tradeoffs of designing floating-point division and square root on Virtex FPGAs

Xiaojun Wang; Brent E. Nelson

Low latency, high throughput and small area are three major design considerations of an FPGA (field programmable gate array) design. In this paper, we present a high radix SRT division algorithm and a binary restoring square root algorithm. We describe three implementations of floating-point division operations with a variable width and precision based on Virtex-2 FPGAs. One is a low cost iterative implementation; another is a low latency array implementation; and the third is a high throughput pipelined implementation. The implementations of floating-point square root operations are presented as well. In addition to the design of modules, we also analyze the tradeoffs among the cost, latency and throughput with strategies on how to reduce the cost or improve the performance.

Journal of Multimedia | 2007

FPGA-based Real-time Optical Flow Algorithm Design and Implementation

Zhaoyi Wei; Dah-Jye Lee; Brent E. Nelson

Optical flow algorithms are difficult to apply to robotic vision applications in practice because of their extremely high computational and frame rate requirements. In most cases, traditional general purpose processors and sequentially executed software cannot compute optical flow in real time. In this paper, a tensor-based optical flow algorithm is developed and implemented using field programmable gate array (FPGA) technology. The resulting algorithm is significantly more accurate than previously published FPGA results and was specifically developed to be implemented using a pipelined hardware structure. The design can process 640 × 480 images at 64 fps, which is fast enough for most real-time robot navigation applications. This design has low resource requirements, making it easier to fit into small embedded systems. Error analysis on a synthetic image sequence is given to show its effectiveness. The algorithm is also tested on a real image sequence to show its robustness and limitations. The resulting limitations are analyzed and an improved scheme is then proposed. It is then shown that the performance of the design could be substantially improved with sufficient hardware resources.

IEEE Transactions on Knowledge and Data Engineering | 1993

Multiple prefetch adaptive disk caching

Knuth Stener Grimsrud; James K. Archibald; Brent E. Nelson

A new disk caching algorithm is presented that uses an adaptive prefetching scheme to reduce the average service time for disk references. Unlike schemes which simply prefetch the next sector or group of sectors, this method maintains information about the order of past disk accesses which is used to accurately predict future access sequences. The range of parameters of this scheme is explored, and its performance is evaluated through trace-driven simulation, using traces obtained from three different UNIX minicomputers. Unlike disk trace data previously described in the literature, the traces used include time stamps for each reference. With this timing information-essential for evaluating any prefetching scheme-it is shown that a cache with the adaptive prefetching mechanism can reduce the average time to service a disk request by a factor of up to three, relative to an identical disk cache without prefetching. >

Explore More