Chetana Nagendra
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chetana Nagendra.
IEEE Transactions on Very Large Scale Integration Systems | 1994
Chetana Nagendra; Robert Michael Owens; Mary Jane Irwin
An approach to designing CMOS adders for both high speed and low power is presented by analyzing the performance of three types of adders - linear time adders, logN time adders and constant time adders. The representative adders used are a ripple carry adder, a blocked carry lookahead adder and several signed-digit adders, respectively. Some of the tradeoffs that are possible during the logic design of an adder to improve its power-delay product are identified. An effective way of improving the speed of a circuit is by transistor sizing which unfortunately increases power dissipation to a large extent. It is shown that by sizing transistors judiciously it is possible to gain significant speed improvements at the cost of only a slight increase in power and hence a better power-delay product. Perflex, an in-house performance driven layout generator, is used to systematically generate sized layouts. >
1993 Computer Architectures for Machine Perception | 1993
Robert Michael Owens; Mary Jane Irwin; Chetana Nagendra; Raminder Singh Bajwa
The authors show that the MGAP can be used to solve a variety of problems in computer vision. It provides performance similar to the MPP, the CLIP, the DAP and the GAPP at a fraction of their cost. At the same time, the MGAP is general purpose enough to be used in a variety of other fields also. As might be expected, the MGAP is very well suited for low-level vision tasks, but is not ideal for tasks requiring global information, such as histogramming. A shift-register network is proposed as an addition to the array architecture to improve global communications. This results in a factor of 20 performance improvement for histogram computation. The MGAP prototype is currently being tested. The custom micro-grain PGAs and the board have been fabricated. The authors have a simulator for the processor array which operates at the level of the assembly code. They have developed both high and low level programming tools. A new language called /sup */C++ is used to program the MGAP. It extends C++ to handle parallel data and specify data movement in a concise and natural manner. The compiler generates code for the processor array, the controller as well as the scalar processor.
international conference on acoustics, speech, and signal processing | 1993
Chetana Nagendra; Manjit Borah; Mohan Vishwanath; Robert Michael Owens; Mary Jane Irwin
The authors demonstrate an optimal time algorithm and architecture for edge detection in real time using fine grained parallelism. Given an image in the form of a two-dimensional array of pixels, this algorithm computes the Sobel and Laplacian operators for skimming lines in the image and then generates the Hough array using thresholding Hough transforms for M different angles of projection are obtained in a fully systolic manner without using any multiplication or division. An implementation of the algorithm on the MGAP-a fine-grained processor array architecture developed at the Pennsylvanian State University-is shown. It computes at the rate of approximately 75000 Hough transforms per second on a 256*256 image using a 25-MHz clock. It is also shown that the algorithm can be easily extended to the general case of Radon transforms.<<ETX>>
ieee workshop on vlsi signal processing | 1994
Chetana Nagendra; Robert Michael Owens; Mary Jane Irwin
In this paper, we present extensive simulation results for different types of parallel adders, which are the most frequently used primitives in digital signal processing. The adders studied include the linear time ripple carry and manchester carry chain adders, the logarithmic carry lookahead adders and its variations, the carry skip adder, and constant time signed-digit adders.
international conference on application specific array processors | 1993
Chetana Nagendra; Robert Michael Owens; Mary Jane Irwin
In this paper, the authors present a novel scheme for performing arithmetic efficiently on fine-grain programmable architectures and FPGA-based systems. They achieve an O(n) speedup over the bit-serial methods of existing fine-grain systems such as the DAP, the MPP and the CM2, within the constraints of regular, near neighbor communication and only a small amount of on-chip memory. This is possible by means of digit systolic algorithms which avoid broadcast and operate in a fully systolic manner at the digit level. They use digit online techniques coupled with a base 4, signed-digit number system to limit carry propagation. Although the algorithms are bit-serial, the authors are able to match the performance of the bit-parallel methods, while retaining low communication complexity. Efficient O(n) time algorithms for multiplication and division of fixed-point, variable precision numbers are given. By using the organization of logic blocks suggested in this paper, problems of placement and routing that exist in systems built using FPGAs can be avoided. Since the algorithms are amenable to pipelining, very high throughput can be obtained.<<ETX>>
international conference on acoustics, speech, and signal processing | 1994
Chetana Nagendra; Mary Jane Irwin; Robert Michael Owens
The paper describes a digit pipelined architecture for the 1D discrete wavelet transform, assuming a digit-serial model of computation. The use of simple operations and data movement makes it suitable for VLSI implementation and it can be easily mapped onto fine-grain custom VLSI and FPGA-based architectures. It achieves a factor of two speedup over a previous implementation of the same algorithm by virtue of digit pipelining made possible by the use of signed-digit arithmetic. In addition, the system can be clocked faster since it uses only nearest neighbor connections on a mesh, thus avoiding the signal propagation delays associated with long routing paths. An N-point DWT takes O(Nk) time and requires O(LJk) area, where L is the filter size, J is the number of octaves and k is the precision.<<ETX>>
signal processing systems | 1995
Chetana Nagendra; Robert Michael Owens; Mary Jane Irwin
In this paper, we present a novel scheme for performing fixed-point arithmetic efficiently on fine-grain, massively parallel, programmable architectures including both custom and FPGA-based systems. We achieve anO(n) speedup, wheren is the operand precision, over the bit-serial methods of existing fine-grain systems such as the DAP, the MPP and the CM2, within the constraints of regular, near neighbor communication and only a small amount of on-chip memory. This is possible by means of digit pipelined algorithms which avoid broadcast and which operate in a fully systolic manner by pipelining at the digit level. A base 4, signed-digit, fully redundant number system and on-line techniques are used to limit carry propagation and minimize communication costs. p ]Although our algorithms are digit-serial, we are able to match the performance of the bit-parallel methods, while retaining low communication complexity. Reconfigurable hardware systems built using field programmable gate arrays (FPGAs) can share in the speed benefits of these algorithms. By using the organization of logic blocks suggested in this paper, problems of placement and routing that exist in such systems can be avoided. Since the algorithms are amenable to pipelining, very high throughput can be obtained.
international conference on vlsi design | 1996
Chetana Nagendra; Robert Michael Owens; Mary Jane Irwin
In this paper, we study the effects of modified Booth recoding, pipeline granularity and clocking on the speed, power dissipation and transistor count of different types of multipliers and FIR filters. Detailed simulations show that recoding may not always result in an improvement in delay. We propose a way of reducing the activity factor of a Booth multiplier by guarded evaluation. As systems become faster and faster, we can see the trend shifting from pipelining at the level of blocks of bits to bit-level, half-bit-level and even gate-level-pipelining. We run detailed experiments to answer the question of how fine-grain can the depth of pipelining in high-throughput multipliers and filters be made before the increase in power consumption overpowers the speed gain. It is our observation that gate-level pipelining increases power dissipation without improving the speed significantly when compared to half-bit level pipelining.
international symposium on low power electronics and design | 1995
Chetana Nagendra; Robert Michael Owens; Mary Jane Irwin
Historically, signed-digit adders have been thought of as being logically very complicated to implement, while carry-save adders have been considered to be fast, low power and easy to implement. While the latter is true, the former is a misconception. We show that for every integer k 1, it is possible to build a network of radix-2 signed-digit adders having the same logical complexity and hence the same area, delay and power consumption as a network of k-bit carry-sum adders (where, a 1-bit carry-sum adder is a carry-save adder). We also study the power and delay tradeo s involved in using a network of carry-save and signed-digit adders for adding multiple operands when compared to a fast twoscomplement adder and show that it always consumes less power. However, when the number of operands is large (> 26), a tree of fast carry-lookahead adders was found to be faster.
hawaii international conference on system sciences | 1994
Manjit Borah; Chetana Nagendra; Mary Jane Irwin; Robert Michael Owens
The MGAP is a user programmable, multifunctional architecture. Its potential as a general purpose, high performance problem solver has been demonstrated in earlier papers. We describe its salient features and show that it solves DSP problems efficiently. With the current growing trend towards mobile communications and multimedia systems, low cost, small size and high performance are primary concerns. At the same time there is need for addressing a diverse set of applications. The MGAP caters to all these requirements. It is a complete computing environment that is small enough to be incorporated into a personal desktop machine and affordable by everyday users while powerful and versatile enough to meet the performance goals in real time situations.<<ETX>>