Jeffrey B. Goeders | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeffrey B. Goeders is active.

Explore More

Publication

Featured researches published by Jeffrey B. Goeders.

ACM Transactions on Reconfigurable Technology and Systems | 2014

VTR 7.0: Next Generation Architecture and CAD System for FPGAs

Jason Luu; Jeffrey B. Goeders; Michael Wainberg; Andrew Somerville; Thien Yu; Konstantin Nasartschuk; Miad Nasr; Sen Wang; Tim X. Liu; Nooruddin Ahmed; Kenneth B. Kent; Jason Helge Anderson; Jonathan Rose; Vaughn Betz

Exploring architectures for large, modern FPGAs requires sophisticated software that can model and target hypothetical devices. Furthermore, research into new CAD algorithms often requires a complete and open source baseline CAD flow. This article describes recent advances in the open source Verilog-to-Routing (VTR) CAD flow that enable further research in these areas. VTR now supports designs with multiple clocks in both timing analysis and optimization. Hard adder/carry logic can be included in an architecture in various ways and significantly improves the performance of arithmetic circuits. The flow now models energy consumption, an increasingly important concern. The speed and quality of the packing algorithms have been significantly improved. VTR can now generate a netlist of the final post-routed circuit which enables detailed simulation of a design for a variety of purposes. We also release new FPGA architecture files and models that are much closer to modern commercial architectures, enabling more realistic experiments. Finally, we show that while this version of VTR supports new and complex features, it has a 1.5× compile time speed-up for simple architectures and a 6× speed-up for complex architectures compared to the previous release, with no degradation to timing or wire-length quality.

field-programmable technology | 2012

VersaPower: Power estimation for diverse FPGA architectures

Jeffrey B. Goeders; Steven J. E. Wilton

This paper presents VersaPower, a tool capable of modelling the power usage of many different field programmable gate array (FPGA) architectures.The latest release of the academic FPGA CAD tool, Versatile Place and Route 6.0 (VPR), supports new architecture features such as fracturable look-up tables and complex logic blocks. Past FPGA power models do not support these new features. VersaPower is designed to work closely with VPR to provide power estimation for any architecture supported by this new CAD flow. This allows researchers to investigate the effects on power usage of both new FPGA architectures, as well as new CAD algorithms. VersaPower is designed to operate with modern CMOS technologies, and is validated against SPICE using 22 nm, 45 nm and 130 nm technologies. Results show that for common architectures, roughly 60% HDL of power consumption is due to the routing fabric, 30% from logic blocks and 10% from the clock network. Architectures ODN supporting fracturable LUTs require 5-10% more power, as each CLB has additional I/O pins, increasing the sizes of local interconnect crossbars and connection boxes.

field-programmable custom computing machines | 2015

Using Dynamic Signal-Tracing to Debug Compiler-Optimized HLS Circuits on FPGAs

Jeffrey B. Goeders; Steven J. E. Wilton

High-level synthesis (HLS) for FPGA designs has received considerable attention in recent years. To make this design methodology mainstream, improved debugging technologies are essential. Ideally, a user should be able to debug their design using the original source code, without detailed knowledge of the underlying hardware, while the circuit executes in-situ. Although recent work has made progress toward this goal, existing solutions are unable to provide visibility into circuits that have been heavily optimized by the compiler. HLS compilers typically perform many optimizations, including moving variable values out of memories and into registers distributed throughout the design. Debugging such circuits typically requires either understanding the hardware and probing the appropriate RTL level registers, or ignoring these variables while debugging the design, neither of which is desirable. In this work we present a new signal-tracing technique, specifically designed for circuits that have been optimized by an HLS tool. Information is extracted from the HLS process to determine which signals are relevant to record each cycle. We automatically embed circuitry which dynamically selects the relevant signals, cycle-by-cycle, and records them into on-chip memories. In addition, we explore techniques to balance tracing between cycles to further improve memory efficiency. For each 100Kb of memory allocated to trace buffers, our technique can, on average, record and replay 4322 lines of source code, versus 141 lines using traditional tracing methods.

field programmable logic and applications | 2014

Effective FPGA debug for high-level synthesis generated circuits

Jeffrey B. Goeders; Steven J. E. Wilton

High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of a debugging infrastructure. To debug, designers can run their source code on a processor; however, this does not capture interactions with other system components. The alternative is to debug using the RTL, which is beyond the expertise of software designers, and impractical for hardware designers as the RTL may not resemble the original source code.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2017

Signal-Tracing Techniques for In-System FPGA Debugging of High-Level Synthesis Circuits

Jeffrey B. Goeders; Steven J. E. Wilton

High-level synthesis (HLS) promises to increase designer productivity in the face of increasing field-programmable gate array sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of an in-system debugging infrastructure. Although designers can run their software code on a workstation, or simulate the register-transfer level, neither can reliably capture the behaviors, and therefore bugs, that may be present in the final system. Debugging hardware circuits in-system requires using signal-tracing to record circuit behavior for later offline analysis. In this paper, we present a debugging architecture, which automatically records key hardware signals, and relates them back to the original software source code. This architecture allows designers to debug HLS circuits in-system, in the context of the original source code. We present several signal-tracing techniques, tailored to HLS circuits, which allow a much longer execution trace to be captured. These techniques include signal compression, dynamically changing which signals are recorded cycle-by-cycle, and offline signal restoration. Compared to using an embedded logic analyzer to perform signal-tracing, our architecture increases the length of execution trace that can be recorded by 127X. For each 100 Kb of trace buffer memory, our architecture can record 15 369 executed lines of C code.

field-programmable technology | 2015

Using Round-Robin Tracepoints to debug multithreaded HLS circuits on FPGAs

Jeffrey B. Goeders; Steven J. E. Wilton

High-level synthesis (HLS) for FPGA designs has gained significant traction in recent years. A key component in its adoption is allowing users to debug their hardware systems in the context of the original source code. This is becoming even more challenging as modern HLS tools enable the user to provide multithreaded source code for synthesis to hardware. Although recent work has begun to tackle source-level debugging of HLS circuits, none have addressed doing this in multithreaded circuits. In such systems it may be necessary to observe the behaviour of multiple threads for long run times in order to locate obscure or non-deterministic bugs and performance issues. In this paper we present a trace-based debugging architecture which records values from user-selected tracepoints into on-chip memories during circuit execution. The recorded values can be provided to the user as a cycle-accurate timeline of events to aid them in debugging multithreaded HLS circuits. We present a novel technique to allow multiple hardware threads to share trace buffers, effectively increasing the execution trace that can be recorded. This is accomplished by analyzing the control and data flow graph to determine the maximum rates at which each thread can encounter tracepoints, using this information to select which threads can share trace buffers, and automatically generating round-robin circuitry to arbitrate access to the buffers. Using this technique we are able to obtain an average of 4X improvement in trace length for an 8 thread system. This provides users with a longer timeline of execution and greater visibility into the execution of multithreaded HLS circuits.

applied reconfigurable computing | 2014

Faster FPGA Debug: Efficiently Coupling Trace Instruments with User Circuitry

Eddie Hung; Jeffrey B. Goeders; Steven J. E. Wilton

Prior to fabricating an integrated circuit, designers will often construct FPGA-based prototypes that can test and verify their circuits far more thoroughly than is possible within software simulations, such as by booting an operating system. A key limitation of physical prototypes, however, is the lack of on-chip observability necessary during debug. This paper describes a trace-buffer based platform that can be used to enhance FPGA observability. We use this platform to investigate how best to couple debug instruments with user circuitry, and how the subsequent debug loop — the process of changing the signals or trigger observed when converging on the root-cause of a bug — can be shortened. We demonstrate a working implementation of this platform on Xilinx technology, finding that runtime speedups for each debug loop of 1.2–3.0X (and potentially 5.7–11.2X) can be achieved on industrial benchmarks, when compared to re-instrumenting with vendor tools.

field programmable logic and applications | 2016

Quantifying observability for in-system debug of high-level synthesis circuits

Jeffrey B. Goeders; Steven J. E. Wilton

In recent years high-level synthesis (HLS) has seen considerable attention as it promises to increase designer productivity and make custom hardware implementation accessible to software developers. A challenge facing those developing HLS technologies is how to allow users to understand, debug and optimize their final hardware systems. Recently, several techniques have been developed to provide in-system debugging capabilities for HLS circuits. These techniques instrument the users design with some debugging circuitry to provide observability into the circuit during execution. Due to resource constraints, it is usually infeasible to view all variable values for the entire circuit execution. Rather, instrumentation usually captures only some variable values and for only a portion of the circuit execution. In this paper we present a metric for measuring the observability into an executing HLS circuit. This metric reflects the portion of variable accesses that are available to the user, the duration of execution for which these values are available, as well as accommodating variations in importance between source code variables. This metric can be used to understand how different circuit observation networks can provide the user with different levels of observability into the HLS circuit execution. As a demonstration of the applicability of the metric, we first study differences between recent debugging approaches for HLS circuits, and quantify the level of observability provided by such architectures. We then explore different schemes to select which variables are accessible in the observation network, and measure impact on variable availability and length of captured execution trace.

FPGAs for Software Programmers | 2016

LegUp High-Level Synthesis

Andrew Canis; Jongsok Choi; Blair Fort; Bain Syrowik; Ruolong Lian; Yu Ting Chen; Hsuan Hsiao; Jeffrey B. Goeders; Stephen Dean Brown; Jason Helge Anderson

LegUp is a High-level Synthesis tool under active development at the University of Toronto since 2011. The tool is on its fourth public release, is open source and freely downloadable. LegUp has been the subject of over 15 publications and has been downloaded by over 1500 groups from around the world. In this section, we overview LegUp, its programming model, unique aspects of the tool versus other HLS offerings, and conclude with a case study.

field programmable custom computing machines | 2017

Enabling Long Debug Traces of HLS Circuits Using Bandwidth-Limited Off-Chip Storage Devices

Jeffrey B. Goeders

High-level synthesis (HLS) has gained considerable traction in recent years. Despite considerable strides in the development of quality HLS compilers, one area that is often cited as a barrier to HLS adoption is the difficulty in debugging HLS produced circuits. Recent academic work has presented techniques that use on-chip memories to efficiently record execution of HLS circuits, and map the captured data back to the original source code to provide the user with a software-like debug experience. However, limited on-chip memory results in very short debug traces, which may force a designer to spend multiple debug iterations to resolve complicated bugs. In this work we present techniques to enable off-chip capture of HLS debug information. While off-chip storage does not suffer from the capacity limitations of on-chip memory, its usage introduces a new challenge: limited bandwidth. In this work we show how information from within the HLS flow can be leveraged to generated a streamed debug trace within given bandwidth constraints. For a bandwidth limited interface, we show that our techniques allow the user to observe 19x more source code variables than using a basic approach.

Explore More