David Zaretsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Zaretsky is active.

Explore More

Publication

Featured researches published by David Zaretsky.

field programmable custom computing machines | 2000

A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems

Prithviraj Banerjee; Nagaraj Shenoy; Alok N. Choudhary; Scott Hauck; C. Bachmann; Malay Haldar; Pramod G. Joisha; A. Kanhare; Anshuman Nayak; S. Periyacheri; M. Walkden; David Zaretsky

Recently, high-level languages such as MATLAB have become popular in prototyping algorithms in domains such as signal and image processing. Many of these applications whose subtasks have diverse execution requirements, often employ distributed, heterogeneous, reconfigurable systems. These systems consist of an interconnected set of heterogeneous processing resources that provide a variety of architectural capabilities. The objective of the MATCH (MATLAB Compiler for Heterogeneous Computing Systems) compiler project at Northwestern University is to make it easier for the users to develop efficient code for distributed heterogeneous, reconfigurable computing systems. Towards this end we are implementing and evaluating an experimental prototype of a software system that will take MATLAB descriptions of various applications, and automatically map them on to a distributed computing environment consisting of embedded processors, digital signal processors and field-programmable gale arrays built from commercial off-the-shelf components. We provide an overview of the MATCH compiler and discuss the testbed which is being used to demonstrate our ideas. We present preliminary experimental results on some benchmark MATLAB programs with the use of the MATCH compiler.

IEEE Transactions on Very Large Scale Integration Systems | 2004

Overview of a compiler for synthesizing MATLAB programs onto FPGAs

Prithviraj Banerjee; Malay Haldar; Anshuman Nayak; Victor Kim; Vikram Saxena; Steven Parkes; Debabrata Bagchi; Satrajit Pal; Nikhil Tripathi; David Zaretsky; Robert Anderson; Juan Ramon Uribe

This paper describes a behavioral synthesis tool called AccelFPGA which reads in high-level descriptions of digital signal processing (DSP) applications written in MATLAB, and automatically generates synthesizable register transfer level (RTL) models and simulation testbenches in VHDL or Verilog. The RTL models can be synthesized using commercial logic synthesis tools and place and route tools onto field-programmable gate arrays (FPGAs). This paper describes how powerful directives are used to provide high-level architectural tradeoffs for the DSP designer. Experimental results are reported on a set of eight MATLAB benchmarks that are mapped onto the Xilinx Virtex II and Altera Stratix FPGAs.

design automation conference | 2004

Automatic translation of software binaries onto FPGAs

Gaurav Mittal; David Zaretsky; Xiaoyong Tang; Prithviraj Banerjee

The introduction of advanced FPGA architectures, with built-in DSP support, has given DSP designers a new hardware alternative. By exploiting its inherent parallelism, it is expected that FPGAs can outperform DSP processors. This paper describes the process and considerations for automatically translating binaries targeted for general DSP processors into Register Transfer Level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. The Texas Instruments C6000 DSP processor architecture is chosen as the DSP processor platform, and the Xilinx Virtex II as a target FPGA. Various optimizations are discussed, including data dependency analysis, procedure extraction, induction variable analysis, memory optimizations, and scheduling. Experimental results on resource usage and performance are shown for ten software binary benchmarks. Results show performance gains of 3-20X in the FPGA designs over that of the DSP processors in terms of reductions of execution cycles.

field-programmable custom computing machines | 2004

Overview of the FREEDOM compiler for mapping DSP software to FPGAs

David Zaretsky; M. Mittal; Xiaoyong Tang; Prithviraj Banerjee

Applications that require digital signal processing (DSP) functions are typically mapped onto general purpose DSP processors. With the introduction of advanced FPGA architectures with built-in DSP support, a new hardware alternative is available for DSP designers. By exploiting its inherent parallelism, it is expected that FPGAs can outperform DSP processors. However, the migration of assembly code to hardware is typically a very arduous process. This paper describes the process and considerations for automatically translating software assembly and binary codes targeted for general DSP processors into register transfer level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. The Texas instruments C6000 DSP processor architecture has been used as the DSP processor platform, and the Xilinx Virtex II as the target FPGA. Various optimizations are discussed, including loop unrolling, induction variable analysis, memory and register optimizations, scheduling and resource binding. Experimental results on resource usage and performance are shown for ten software binary benchmarks in the signal processing and image processing domains. Results show performance gains of 3-20x in terms of reductions in execution cycles and 1.3-5x in terms of reductions in execution times for the FPGA designs over that of the DSP processors in terms of reductions in execution cycles.

international conference on vlsi design | 2006

Dynamic template generation for resource sharing in control and data flow graphs

David Zaretsky; Gaurav Mittal; Robert P. Dick; Prith Banerjee

High-level synthesis compilers often produce reoccurring patterns in intermediate CDFGs during translation. By identifying large reoccurring patterns, one may reduce area and communication overhead by efficiently reusing hardware for multiple operations. This paper presents an algorithm for dynamically generating templates of reoccurring patterns for resource sharing in CDFGs. Results show 40-80% resource reduction using small, incremental template growth, and variations within a 5% margin among varying look-ahead depths.

languages and compilers for parallel computing | 2005

Generation of control and data flow graphs from scheduled and pipelined assembly code

David Zaretsky; Gaurav Mittal; Robert P. Dick; Prith Banerjee

High-level synthesis tools generally convert abstract designs described in a high-level language into a control and data flow graph (CDFG), which is then optimized and mapped to hardware. However, there has been little work on generating CDFGs from highly pipelined software binaries, which complicate the problem of determining data flow propagation and dependencies. This paper presents a methodology for generating CDFGs from highly pipelined and scheduled assembly code that correctly represents the data dependencies and propagation of data through the program control flow. This process consists of three stages: generating a control flow graph, linearizing the assembly code, and generating the data flow graph. The proposed methodology was implemented in the FREEDOM compiler and tested on 8 highly pipelined software binaries. Results indicate that data dependencies were correctly identified in the designs, allowing the compiler to perform complex optimizations to reduce clock cycles.

asia and south pacific design automation conference | 2005

Automatic extraction of function bodies from software binaries

Gaurav Mittal; David Zaretsky; Gokhan Memik; Prith Banerjee

This paper describes a method for automatically extracting function bodies from linked software binaries. It utilizes procedure-calling conventions along with limited control and data now information. It has been tested with the TI C6000 DSP processor platform. Results are reported on eight benchmarks for which our algorithm successfully identifies all functions. It identifies 198% more functions than by the use procedure calling conventions alone.

great lakes symposium on vlsi | 2004

Evaluation of scheduling and allocation algorithms while mapping assembly code onto FPGAs

David Zaretsky; Gaurav Mittal; Xiaoyong Tang; Prithviraj Banerjee

Migration of software from older general purpose embedded processors onto newer mixed hardware/software Systems-On-Chip (SOC) platforms is becoming an increasingly important topic. Automatic translation of general purpose software binaries and assembly code onto hardware implementations using FPGAs require sophisticated scheduling and allocation algorithms to maximize the resource utilization of such hardware devices. This paper describes the effects of scheduling and chaining of node operations in a CDFG onto an FPGA. The effects of register allocation on scheduled nodes are also discussed. The Texas Instruments C6000 DSP processor architecture was chosen as the DSP processor platform and assembly code, and the Xilinx Virtex II XC2V250 was chosen as the target FPGA. Results are reported on ten benchmarks, which show that scheduling with chaining operations produces the best results on FPGAs, while the addition of register allocation in fact generates poorer designs in terms of area and frequency.

international symposium on quality electronic design | 2009

A software pipelining algorithm in high-level synthesis for FPGA architectures

Lei Gao; David Zaretsky; Gaurav Mittal; Dan Schonfeld; Prith Banerjee

In this paper, we present a variation of the Modulo Scheduling algorithm to exploit software pipelining in the high-level synthesis for FPGA architectures. We demonstrate the difficulties of implementing software pipelining for FPGA architectures, and propose a modified version of Modulo Scheduling that utilizes memory lifetime holes and addresses circular dependencies. Experimental results demonstrate a 35% improvement on average over the non-pipelined implementation, and 15% improvement on average over the traditional Modulo Scheduling algorithm.

adaptive hardware and systems | 2011

Resource optimization and deadlock prevention while generating streaming architectures from ordinary programs

Lei Gao; Gaurav Mittal; David Zaretsky; Prith Banerjee

This paper presents a methodology for generating streaming architectures from ordinary programs. It automatically identifies streaming relationships and translates them into parallel computational kernels connected with customized stream buffers. New optimizations are introduced that reduce resource utilization by automatically generating lower bounds on stream buffer sizes. The approach also statically analyzes the design for deadlock and determines appropriate strategies to guarantee prevention. The experimental results show 19–325% improvement in performance and 15–62% reduction in area over non-streaming designs of several software-defined radio applications. This framework allows system-level designers to develop optimized reconfigurable streaming architectures for FPGAs at compile-time.

Explore More