Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Malay Haldar is active.

Publication


Featured researches published by Malay Haldar.


field programmable custom computing machines | 2000

A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems

Prithviraj Banerjee; Nagaraj Shenoy; Alok N. Choudhary; Scott Hauck; C. Bachmann; Malay Haldar; Pramod G. Joisha; A. Kanhare; Anshuman Nayak; S. Periyacheri; M. Walkden; David Zaretsky

Recently, high-level languages such as MATLAB have become popular in prototyping algorithms in domains such as signal and image processing. Many of these applications whose subtasks have diverse execution requirements, often employ distributed, heterogeneous, reconfigurable systems. These systems consist of an interconnected set of heterogeneous processing resources that provide a variety of architectural capabilities. The objective of the MATCH (MATLAB Compiler for Heterogeneous Computing Systems) compiler project at Northwestern University is to make it easier for the users to develop efficient code for distributed heterogeneous, reconfigurable computing systems. Towards this end we are implementing and evaluating an experimental prototype of a software system that will take MATLAB descriptions of various applications, and automatically map them on to a distributed computing environment consisting of embedded processors, digital signal processors and field-programmable gale arrays built from commercial off-the-shelf components. We provide an overview of the MATCH compiler and discuss the testbed which is being used to demonstrate our ideas. We present preliminary experimental results on some benchmark MATLAB programs with the use of the MATCH compiler.


design, automation, and test in europe | 2001

Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs

Anshuman Nayak; Malay Haldar; Alok N. Choudhary; Prithviraj Banerjee

We present a compiler that takes high level signal and image processing algorithms described in MATLAB and generates an optimized hardware for an FPGA with external memory. We propose a precision analysis algorithm to determine the minimum number of bits required by an integer variable and a combined precision and error analysis algorithm to infer the minimum number of bits required by a floating point variable. Our results show that on average, our algorithms generate hardware requiring a factor of 5 less FPGA resources in terms of the configurable logic blocks (CLBs) consumed as compared to the hardware generated without these optimizations. We show that our analysis results in the reduction in the size of lookup tables for functions like sin, cos, sqrt, exp etc. Our precision analysis also enables us to pack various array elements into a single memory location to reduce the number external memory accesses. We show that such a technique improves the performance of the generated hardware by an average of 35%.


design, automation, and test in europe | 2002

Accurate Area and Delay Estimators for FPGAs

Anshuman Nayak; Malay Haldar; Alok N. Choudhary; Prithviraj Banerjee

We present an area and delay estimator in the context of a compiler that takes in high level signal and image processing applications described in MATLAB and performs automatic design space exploration to synthesize hardware for a field programmable gate array (FPGA) which meets the user area and frequency specifications. We present an area estimator which is used to estimate the maximum number of configurable logic blocks (CLBs) consumed by the hardware synthesized for the Xilinx XC4010 from the input MATLAB algorithm. We also present a delay estimator which finds out the delay in the logic elements in the critical path and the delay in the interconnects. The total number of CLBs predicted by us is within 16% of the actual CLB consumption and the synthesized frequency estimated by us is within an error of 13% of the actual frequency after synthesis through Synplify logic synthesis tools and after placement and routing through the XACT tools from Xilinx. Since the estimators proposed by us are fast and accurate enough, they can be used in a high level synthesis framework like ours to perform rapid design space exploration.


international conference on computer aided design | 2001

A system for synthesizing optimized FPGA hardware from Matlab(R)

Malay Haldar; Anshuman Nayak; Alok N. Choudhary; Prithviraj Banerjee

Efficient high level design tools that can map behavioral descriptions to FPGA architectures are one of the key requirements to fully leverage FPGA for high throughput computations and meet time-to-market pressures. We present a compiler that takes as input algorithms described in MATLAB and generates RTL VHDL. The RTL VHDL then can be mapped to FPGAs using existing commercial tools. The input application is mapped to multiple FPGAs by parallelizing the application and embedding communication and synchronization primitives automatically. Our compiler infers the minimum number of bits required to represent the variable through a precision analysis framework. The compiler can leverage optimized IP cores to enhance the hardware generated. The compiler also exploits parallelism in the input algorithm by pipelining in the presence of resource constraints. We demonstrate the utility of the compiler by synthesizing hardware for a couple of signal/image processing algorithms and comparing them with manually designed hardware.


IEEE Transactions on Very Large Scale Integration Systems | 2004

Overview of a compiler for synthesizing MATLAB programs onto FPGAs

Prithviraj Banerjee; Malay Haldar; Anshuman Nayak; Victor Kim; Vikram Saxena; Steven Parkes; Debabrata Bagchi; Satrajit Pal; Nikhil Tripathi; David Zaretsky; Robert Anderson; Juan Ramon Uribe

This paper describes a behavioral synthesis tool called AccelFPGA which reads in high-level descriptions of digital signal processing (DSP) applications written in MATLAB, and automatically generates synthesizable register transfer level (RTL) models and simulation testbenches in VHDL or Verilog. The RTL models can be synthesized using commercial logic synthesis tools and place and route tools onto field-programmable gate arrays (FPGAs). This paper describes how powerful directives are used to provide high-level architectural tradeoffs for the DSP designer. Experimental results are reported on a set of eight MATLAB benchmarks that are mapped onto the Xilinx Virtex II and Altera Stratix FPGAs.


field-programmable custom computing machines | 2003

Automatic conversion of floating point MATLAB programs into fixed point FPGA based hardware design

Prithviraj Banerjee; Debabrata Bagchi; Malay Haldar; Anshuman Nayak; Victor Kim; R. Uribe

This paper describes how the floating point computations in MATLAB can be automatically converted to a fixed point MATLAB version of specific precision for hardware design. The techniques have been incorporated in the AcelFPGA behavioral synthesis tool (Banerjee et al., 2003) that reads in high-level descriptions of DSP applications written in MATLAB, and automatically generate synthesizable RTL models in VHDL or Verilog. Experimental results are reported with the AccelFPGA version 1.5 compiler on a set of five MATLAB benchmarks that are mapped onto the Xilinx Virtex II FPGAs (field programmable gate arrays).


international conference on vlsi design | 2001

FPGA hardware synthesis from MATLAB

Malay Haldar; Anshuman Nayak; Nagaraj Shenoy; Alok N. Choudhary; Prithviraj Banerjee

Field Programmable Gate Arrays (FPGAs) have been recently used as an effective platform for implementing many image/signal processing applications. MATLAB is one of the most popular languages to model image/signal processing applications. We present the MATCH compiler that takes MATLAB as input and produces a hardware in RTL VHDL, which can be mapped to an FPGA using commercial CAD tools. This dramatically reduces the time to implement an application on an FPGA. We present results on some image and signal processing algorithms for which hardware was synthesized using our compiler for the Xilinx XC4028 FPGA with an external memory. We also present comparisons with manually designed hardware for the applications. Our results indicate that FPGA hardware can be generated automatically reducing the design time from days to minutes, with the tradeoff that the automatically generated hardware is 5 times slower than the manually designed hardware.


great lakes symposium on vlsi | 2000

Parallel algorithms for FPGA placement

Malay Haldar; Anshuman Nayak; Alok N. Choudhary; Prithviraj Banerjee

Fast FPGA CAD tools that produce high quality results has been one of the most important research issues in the FPGA domain. Simulated annealing has been the method of choice for placement. However, simulated annealing is a very compute-intensive method. In our present work we investigate a range of parallelization strategies to speedup simulated annealing with application to placement for FPGA. We present experimental results obtained by applying the different parallelization strategies to the Versatile Place and Route (VPR) Tool, implemented on an SGI Origin shared memory multi-processor and an IBM-SP2 distributed memory multi-processor. The results show the tradeoff between execution time and quality of result for the different parallelization strategies.


asia and south pacific design automation conference | 2001

Automated synthesis of pipelined designs on FPGAs for signal and image processing applications described in MATLAB(R)

Malay Haldar; Anshuman Nayak; Alok N. Choudhary; Prithviraj Banerjee

We present a compiler that takes high level algorithms described in MATLAB and generates an optimized hardware for an FPGA with external memory. A framework is described to detect and exploit opportunities to pipeline loops in an optimal way. Effectiveness of the framework is demonstrated by synthesizing some image and signal processing applications. Starting from the MATLAB description of the applications, hardware is synthesized that runs on a Xilinx XC4028. The synthesized designs are equivalent to manually optimized designs in performance.


field-programmable custom computing machines | 2001

Parallelization of MATLAB Applications for a Multi-FPGA System

Anshuman Nayak; Malay Haldar; Alok N. Choudhary; Prith Banerjee

We present a compiler that takes high level signal and image processing algorithms described in MATLAB and generates an optimized hardware for the WildChild™ board having nine FPGAs and external memory. We propose a Single Program Multiple Data (SPMD) style parallelization framework to automatically generate hardware for all the nine FPGAs. We propose a data alignment and data distribution scheme for minimizing communication across the different FPGAs and present a communication framework based on the WildChild interconnection network for sending and receiving data. Our results show that we get a speedup of around 6 to 7 on eight FPGAs. Further, we propose a prediction mechanism to extract parallelism within a single FPGA. We show that this results in much improved speedups of around 28 on eight FPGAs for the Image Thresholding benchmark. We show that such a framework generates hardwares which are three times slower than the most optimized manual designs, but which can be generated in seconds as compared to days taken by a manual designer.

Collaboration


Dive into the Malay Haldar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Victor Kim

Northwestern University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Satrajit Pal

Northwestern University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge