Prith Banerjee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Prith Banerjee is active.

Explore More

Publication

Featured researches published by Prith Banerjee.

asia and south pacific design automation conference | 2003

An overview of a compiler for mapping MATLAB programs onto FPGAs

Prith Banerjee

This paper describes a behavioral synthesis tool called the MATCH compiler developed as part of the DARPA Adaptive Computing Systems program. The MATCH compiler reads in high-level descriptions of DSP applications written in MATLAB, and automatically generates synthesizable RTL models in VHDL. The RTL models can be synthesized using commercial logic synthesis tools and place and route tools onto FPGAs. By linking the two design domains of DSP and FPGA hardware design, the MATCH compiler provides DSP design teams a significant reduction in design labor and time, elimination of misinterpretations and costly design rework, automatic verification of the hardware implementation, and the ability of systems engineers and algorithm developers to perform architectural exploration in the early phases of their development cycle. The paper describes how powerful directives are used to provide high-level architectural tradeoffs for the DSP designer. The MATCH compiler has been transferred to a startup company called AccelChip which has developed a commercial version of the compiler called AccelFPGA. Experimental results are reported using AccelFPGA on a set of nine MATLAB benchmarks that are mapped onto the recent Xilinx Virtex II and Altera Stratix FPGAs. The benchmark programs range in complexity from 20 lines to 170 lines of MATLAB code and produce VHDL code ranging from 1500 to 4500 lines of code. The compilation times range from 3 seconds to 40 seconds.

field-programmable custom computing machines | 2001

Parallelization of MATLAB Applications for a Multi-FPGA System

Anshuman Nayak; Malay Haldar; Alok N. Choudhary; Prith Banerjee

We present a compiler that takes high level signal and image processing algorithms described in MATLAB and generates an optimized hardware for the WildChild™ board having nine FPGAs and external memory. We propose a Single Program Multiple Data (SPMD) style parallelization framework to automatically generate hardware for all the nine FPGAs. We propose a data alignment and data distribution scheme for minimizing communication across the different FPGAs and present a communication framework based on the WildChild interconnection network for sending and receiving data. Our results show that we get a speedup of around 6 to 7 on eight FPGAs. Further, we propose a prediction mechanism to extract parallelism within a single FPGA. We show that this results in much improved speedups of around 28 on eight FPGAs for the Image Thresholding benchmark. We show that such a framework generates hardwares which are three times slower than the most optimized manual designs, but which can be generated in seconds as compared to days taken by a manual designer.

asia and south pacific design automation conference | 2005

Automatic extraction of function bodies from software binaries

Gaurav Mittal; David Zaretsky; Gokhan Memik; Prith Banerjee

This paper describes a method for automatically extracting function bodies from linked software binaries. It utilizes procedure-calling conventions along with limited control and data now information. It has been tested with the TI C6000 DSP processor platform. Results are reported on eight benchmarks for which our algorithm successfully identifies all functions. It identifies 198% more functions than by the use procedure calling conventions alone.

international symposium on quality electronic design | 2009

A software pipelining algorithm in high-level synthesis for FPGA architectures

Lei Gao; David Zaretsky; Gaurav Mittal; Dan Schonfeld; Prith Banerjee

In this paper, we present a variation of the Modulo Scheduling algorithm to exploit software pipelining in the high-level synthesis for FPGA architectures. We demonstrate the difficulties of implementing software pipelining for FPGA architectures, and propose a modified version of Modulo Scheduling that utilizes memory lifetime holes and addresses circular dependencies. Experimental results demonstrate a 35% improvement on average over the non-pipelined implementation, and 15% improvement on average over the traditional Modulo Scheduling algorithm.

adaptive hardware and systems | 2011

Resource optimization and deadlock prevention while generating streaming architectures from ordinary programs

Lei Gao; Gaurav Mittal; David Zaretsky; Prith Banerjee

This paper presents a methodology for generating streaming architectures from ordinary programs. It automatically identifies streaming relationships and translates them into parallel computational kernels connected with customized stream buffers. New optimizations are introduced that reduce resource utilization by automatically generating lower bounds on stream buffer sizes. The approach also statically analyzes the design for deadlock and determines appropriate strategies to guarantee prevention. The experimental results show 19–325% improvement in performance and 15–62% reduction in area over non-streaming designs of several software-defined radio applications. This framework allows system-level designers to develop optimized reconfigurable streaming architectures for FPGAs at compile-time.

international conference on parallel architectures and compilation techniques | 1999

On reducing false sharing while improving locality on shared memory multiprocessors

Mahmut T. Kandemir; Alok N. Choudhary; Prith Banerjee; J. Ramanujam

The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coherence unit and the spatial locality exhibited by the applications, in addition to the amount of parallelism in the applications. Large coherence units are helpful in exploiting spatial locality, but worsen the effects of false sharing. We present a mathematical framework that allows a clean description of the relationship between spatial locality and false sharing. We first show how to identify a severe form of multiple-writer false sharing and then demonstrate the importance of the interaction between optimization techniques aimed at enhancing locality and the techniques oriented toward reducing false sharing. Given the conflicting requirements, a compiler based approach to this problem holds promise. We investigate the use of data transformations in addressing spatial locality and false sharing, and derive an approach that balances the impact of the two. Experimental results demonstrate that such a balanced approach outperforms those approaches that consider only one of these two issues. On an eight-processor SGI Origin 2000 system, our approach brings an additional 9% improvement over a powerful locality optimization technique that uses both loop and data transformations. Also, our approach obtains an additional 19% improvement over an optimization technique that is oriented specifically toward reducing false sharing.

international conference on parallel processing | 2010

Automatic Generation of Stream Descriptors for Streaming Architectures

Lei Gao; David Zaretsky; Gaurav Mittal; Dan Schonfeld; Prith Banerjee

We describe a novel approach for automatically generating streaming architectures from software programs. While existing systems require user-defined stream models, our method automatically identifies producer-consumer streaming relationships and translates them into streaming architectures. Data streams between producer-consumer kernels are represented using a combination of stream descriptors and CFGs, which are categorized into four stream types. A bridge module is generated based on the stream type in the streaming architecture to facilitate data streaming between each producer-consumer pair. Several optimizations are also developed to improve throughput and parallelism. We demonstrate our results on a FPGA based platform. The automatically generated streaming architectures show 1.5-3x speedups over the non-streaming designs by employing spatial and temporal data independence to increase parallelism.

asia and south pacific design automation conference | 2003

Adaptive computing: what can it do, where can it go?

Robert Reuss; Jose L. Muñoz; Toshiaki Miyazaki; Nader Bagherzadeh; Prith Banerjee; Brad L. Hutchings; Brian Schott

The Adaptive Computing Systems (ACS) program was initiated by Defense Advanced Research Projects Agency (DARPA) of the United States in 1996. With the advent of FPGAs, has emerged a new class of computing systems that contain configurable hardware. This session begins by a presentation by the first ACS program manager of its motivation, original goals, and objectives. It is then followed by presentations of four specific projects under the ACS program. Future activities surrounding the ACS community will be discussed at the end.

Vlsi Design | 2001

Power Optimization of Delay Constrained Circuits

Anshuman Nayak; Malay Haldar; Prith Banerjee; Chunhong Chen; Majid Sarrafzadeh

We present a framework for combining Voltage Scaling (VS) and Gate Sizing (GS) techniques for power optimizations. We introduce a fast heuristic for choosing gates for sizing and voltage scaling such that the total power is minimized under delay constraints. We also use a more accurate estimate for determining the power dissipation of the circuit by taking into account the short circuit power along with the dynamic power. A better model of the short circuit power is used which takes into account the load capacitance of the gates. Our results show that the combination of VS and GS perform better than the techniques applied in isolation. An average power reduction of 73% is obtained when decisions are taken assuming dynamic power only. In contrast, average power reduction is 77% when decisons include the short circuit power dissipation.

Archive | 2004