Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Deshanand P. Singh is active.

Publication


Featured researches published by Deshanand P. Singh.


field programmable logic and applications | 2012

From opencl to high-performance hardware on FPGAS

Tomasz S. Czajkowski; Utku Aydonat; Dmitry Denisenko; John Freeman; Michael Kinsner; David Neto; Jason Wong; Peter Yiannacouras; Deshanand P. Singh

We present an OpenCL compilation framework to generate high-performance hardware for FPGAs. For an OpenCL application comprising a host program and a set of kernels, it compiles the host program, generates Verilog HDL for each kernel, compiles the circuit using Altera Complete Design Suite 12.0, and downloads the compiled design onto an FPGA.We can then run the application by executing the host program on a Windows(tm)-based machine, which communicates with kernels on an FPGA using a PCIe interface. We implement four applications on an Altera Stratix IV and present the throughput and area results for each application. We show that we can achieve a clock frequency in excess of 160MHz on our benchmarks, and that OpenCL computing paradigm is a viable design entry method for high-performance computing applications on FPGAs.


design automation conference | 2005

FPGA technology mapping: a study of optimality

Andrew C. Ling; Deshanand P. Singh; Stephen Dean Brown

This paper attempts to quantify the optimality of FPGA technology mapping algorithms. The authors developed an algorithm, based on Boolean satisfiability (SAT), that is able to map a small subcircuit into the smallest possible number of lookup tables (LUTs) needed to realize its functionality. This technique was applied iteratively to small portions of circuits that have already been technology mapped by the best available mapping algorithms for FPGAs. In many cases, the optimal mapping of the subcircuit uses fewer LUTs than is obtained by the technology mapping algorithm. It is shown that for some circuits the total area improvement could be up to 67%.


field programmable gate arrays | 2001

The case for registered routing switches in field programmable gate arrays

Deshanand P. Singh; Stephen Dean Brown

FPGAs are characterized by a programmable interconnect that contains highly resistive and capacitive elements. While the configurable structure of the interconnect allows for the implementation of arbitrary circuits, it has also become a significant bottleneck for high-speed circuits. Even if there are only a few signal paths that run along long stretches of interconnect, it is these paths that may determine the maximum operating frequency of the circuit. In this paper we investigate architectural features that could allow us to automatically pipeline the delay associated with long routes without an excessive area penalty. The goal is to reschedule circuit operations in such a way that a signal may use multiple clock cycles to traverse a long route, rather than requiring a single long clock period. This rescheduling would not effect the timing of the visible outputs~(~no latency is added to the overall system~). Specifically, we analyze the effects of adding a small number of registered routing switches to an FPGA architecture with segmented routing resources. A parameterized FPGA architecture is studied where the percentage of registered routing switches is varied and the speed improvement and area penalty is evaluated. Novel algorithms are presented that allow a circuit to best utilize an architecture with a given percentage of registered switches. We believe that this is the first study that attempts to evauate the tradeoffs associated with switches required in FPGA architectures. Our experiments indicate that the architectural features introduced can produce significant speedup for high speed circuits without excessive area costs. We believe that these techniques will become increasingly important in the future as deep sub-micron process technologies shrink, and wire delays become even more significant.


field programmable logic and applications | 2012

Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering

Doris Chen; Deshanand P. Singh

The FPGA can be a tremendously efficient computational fabric for many applications. In particular, the performance to power ratios of FPGA make them attractive solutions to solve the problem of data centers that are constrained largely by power and cooling costs. However, the complexity of the FPGA design flow requires the programmer to understand cycle-accurate details of how data is moved and transformed through the fabric. In this paper, we explore techniques that allow programmers to efficiently use FPGAs at a level of abstraction that is closer to traditional software-centric approaches by using the emerging parallel language, OpenCL. Although the field of high level synthesis has evolved greatly in the last few decades, several fundamental parts were missing from the complete software abstraction of the FPGA. These include standard and portable methods of describing HW/SW codesign, memory hierarchy, data movement and control of parallelism. We believe that OpenCL addresses all of these issues and allows for highly efficient description of FPGA designs with a higher level of abstraction. We demonstrate this premise by examining the performance of a document filtering algorithm, implemented in OpenCL and automatically compiled to a Stratix IV 530 FPGA. We show that our implementation achieves 5.5× and 5.25× better performance per watt ratios than GPU and CPU implementations, respectively.


international conference on computer aided design | 2002

Incremental placement for layout-driven optimizations on FPGAs

Deshanand P. Singh; Stephen Dean Brown

This paper presents an algorithm to update the placement of logic elements when given an incremental netlist change. Specifically, these algorithms are targeted to incrementally place logic elements created by layout-driven circuit restructuring techniques. The incremental placement engine assumes that the restructuring algorithms provide a list of new logic elements along with preferred locations for each of these new elements. It then tries to shift non-critical logic elements in the original placement out of the way to satisfy the preferred location requests. Our algorithm considers modern FPGA architectures with clustered logic blocksthat have numerous architectural constraints. Experiments indicate that our technique produces results of extremely highquality.


asia and south pacific design automation conference | 2013

Fractal video compression in OpenCL: An evaluation of CPUs, GPUs, and FPGAs as acceleration platforms

Doris Chen; Deshanand P. Singh

Fractal compression is an efficient technique for image and video encoding that uses the concept of self-referential codes. Although offering compression quality that matches or exceeds traditional techniques with a simpler and faster decoding process, fractal techniques have not gained widespread acceptance due to the computationally intensive nature of its encoding algorithm. In this paper, we present a real-time implementation of a fractal compression algorithm in OpenCL [1]. We show how the algorithm can be efficiently implemented in OpenCL and optimized for multi-CPUs, GPUs, and FPGAs. We demonstrate that the core computation implemented on the FPGA through OpenCL is 3× faster than a high-end GPU and 114× faster than a multi-core CPU, with significant power advantages. We also compare to a hand coded FPGA implementation to showcase the effectiveness of an OpenCL-to-FPGA compilation tool.


field programmable gate arrays | 2002

Constrained clock shifting for field programmable gate arrays

Deshanand P. Singh; Stephen Dean Brown

Circuits implemented in FPGAs have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and resistive elements. The delay encountered by any connection depends strongly on the number of interconnect elements used to route the connection. These delays are only completely known after the place and route phase of the CAD flow. We propose the use of Clock Shifting optimization techniques to improve the clock frequency as a post place and route step.Clock Shifting Optimization is a technique first formalized in [4]. It is a cycle-stealing algorithm that allows one to reduce the critical path delay of a synchronous circuit by shifting the clock signals at each register. This technique allows late arriving signals to be sampled at a later point in time by intentionally introducing a skew on the clock input of the sampling register. Typical FPGAs contain a number of special purpose global clock networks that distribute clock signals to every register in the chip. Unused global clock lines in FPGAs can be used to distribute a finite set of clock skews to the entire circuit. We propose an efficient integer programming method to find the optimal circuit improvement for a finite set of clock skews. This technique is modified to consider inherent uncertainties present in the timing models. The uncertainty controls the aggressiveness of the optimizations as we must take great care in ensuring functionality for any range of possible timing characteristics.Our results confirm intuition that more aggressive speed optimizations can be performed as timing models become more accurate. We also show that providing 4 skewed versions of the nominal clock signal results in the best delay--area tradeoff. This result is evocative as it may suggest future FPGA architectures that contain greater numbers of global clock lines, as we tradeoff gains in speed for greater power requirements from increased clock network flexibility.


design automation conference | 2005

Incremental retiming for FPGA physical synthesis

Deshanand P. Singh; Valavan Manohararajah; Stephen Dean Brown

In this paper, the authors presented a new linear-time retiming algorithm that produces near-optimal results. The implementation is specifically targeted at Alteras Stratix FPGA-based designs, although the techniques described are general enough for any implementation medium. The algorithm is able to handle the architectural constraints of the target device, multiple timing constraints assigned by the user and implicit legality constraints. It ensures that register moves do not create asynchronous problems such as creating a glitch on a clock/reset signal.


field programmable gate arrays | 2010

A comprehensive approach to modeling, characterizing and optimizing for metastability in FPGAs

Doris Chen; Deshanand P. Singh; Jeffrey Christopher Chromczak; David Lewis; Ryan Fung; David Neto; Vaughn Betz

Metastability is a phenomenon that can cause system failures in digital circuits. It may occur whenever signals are being transmitted across asynchronous or unrelated clock domains. The impact of metastability is increasing as process geometries shrink and supply voltages drop faster than transistor Vts. FPGA technologies are significantly affected since leading edge FPGAs are amongst the first devices to adopt the most recent process nodes. In this paper, we present a comprehensive suite of techniques for modeling, characterizing and optimizing metastability effects in FPGAs. We first discuss a theoretical model of metastability, and verify the predictions using both circuit level simulations and board measurements. Next we show how designers have traditionally dealt with metastability problems and contrast that with the automatic CAD algorithms described in this paper that both analyze and optimize metastability-related issues. Through our detailed experimental results, we show that we can improve the metastability characteristics of a large suite of industrial benchmarks by an average of 268,000 times with our optimization techniques.


theory and applications of satisfiability testing | 2005

FPGA logic synthesis using quantified boolean satisfiability

Andrew C. Ling; Deshanand P. Singh; Stephen Dean Brown

This paper describes a novel Field Programmable Gate Array (FPGA) logic synthesis technique which determines if a logic function can be implemented in a given programmable circuit and describes how this problem can be formalized and solved using Quantified Boolean Satisfiability. This technique is general enough to be applied to any type of logic function and programmable circuit; thus, it has many applications to FPGAs. The applications demonstrated in this paper include FPGA technology mapping and resynthesis where their results show significant FPGA performance improvements.

Collaboration


Dive into the Deshanand P. Singh's collaboration.

Researchain Logo
Decentralizing Knowledge