Sohan Purohit
University of Massachusetts Lowell
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sohan Purohit.
adaptive hardware and systems | 2009
Sai Rahul Chalamalasetti; Sohan Purohit; Martin Margala; Wim Vanderbauwhede
This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented.
IEEE Transactions on Very Large Scale Integration Systems | 2012
Sohan Purohit; Martin Margala
This paper presents the design and characterization of 12 full-adder circuits in the IBM 90-nm process. These include three new full-adder circuits using the recently proposed split-path data driven dynamic logic. Based on the logic function realized, the adders were characterized for performance and power consumption when operated under various supply voltages and fan-out loads. The adders were then further deployed in a 32 bit ripple carry adder and 8×4 multiplier to evaluate the impact of sum and carry propagation delays on the performance, power of these systems. Performance characterization of the adder circuits in the presence of process and voltage variations was also performed through Monte Carlo simulations. Besides analyzing and comparing circuit performance, the possible impact of the choice of logic function has also been underlined in this study.
great lakes symposium on vlsi | 2009
Sohan Purohit; Martin Margala; Marco Lanuzza; Pasquale Corsonello
Arithmetic circuits have always played one of the most important roles in the designs of processors, FPGAs, and the rapidly evolving domain of media processing architectures. The full adder cell forms the basic building block of majority of these arithmetic circuits. In this paper we describe a hybrid pseudo static full adder cell designed using Data Driven Dynamic Logic. Simulation results show the adder to out perform its competitors, both static as well as dynamic topologies in terms of performance, while maintaining relatively similar area and power characteristics. This paper presents a complete characterization of the popular adder cells in terms of delay, area, power, noise margin and reliability analysis for both super threshold and sub threshold operating regimes.
IEEE Transactions on Nanotechnology | 2011
I. Iniguez-de-la-Torre; Sohan Purohit; Vikas Kaushal; Martin Margala; Mufei Gong; Roman Sobolewski; David Wolpert; Paul Ampadu; T. González; J. Mateos
We present exploratory studies of digital circuit design using the recently proposed ballistic deflection transistor (BDT) devices. We demonstrate a variety of possible logic functions through simple reconfiguration of two drain-connected BDTs. We further propose the creation of a three-BDT logic cell to yield differential versions of each logic function, improving overall flexibility of BDT circuit design. Each of the proposed gate configurations has been verified through extensive numerical calculations using an in-house Monte Carlo simulator. Simulation results show that the proposed gate arrangements are capable of achieving 400-GHz operating frequencies at room temperature. A compact fit-based analytical model to aid circuit design using BDTs is also introduced.
application specific systems architectures and processors | 2010
Wim Vanderbauwhede; Martin Margala; Sai Rahul Chalamalasetti; Sohan Purohit
MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance.
reconfigurable computing and fpgas | 2008
Sohan Purohit; Sai Rahul Chalamalasetti; Martin Margala; Pasquale Corsonello
This paper presents new power efficient high throughput data paths for portable multimedia devices. The various data paths provide support for dense arithmetic operations. This work provides the performance evaluation for a library of reconfigurable data path elements (Processing Elements) previously proposed and presents two new processing element architectures to be part of power efficient portable, multimedia processing systems. The performance results show that the proposed designs will provide a higher efficiency in power and area consumption compared to the previously suggested and commercial solutions, and could prove highly beneficial for the target domain of multimedia operations on portable systems.
IEEE Transactions on Very Large Scale Integration Systems | 2013
Sohan Purohit; Sai Rahul Chalamalasetti; Martin Margala; Wim Vanderbauwhede
In this paper, we present the design and evaluation of two new processing elements for reconfigurable computing. We also present a circuit-level implementation of the data paths in static and dynamic design styles to explore the various performance-power tradeoffs involved. When implemented in IBM 90-nm CMOS process, the 8-b data paths achieve operating frequencies ranging over 1 GHz both for static and dynamic implementations, with each data path supporting single-cycle computational capability. A novel single-precision floating point processing element (FPPE) using a 24-b variant of the proposed data paths is also presented. The full dynamic implementation of the FPPE shows that it operates at a frequency of 1 GHz with 6.5-mW average power consumption. Comparison with competing architectures shows that the FPPE provides two orders of magnitude higher throughput. Furthermore, to evaluate its feasibility as a soft-processing solution, we also map the floating point unit onto the Virtex 4 and 5 devices, and observe that the unit requires less than 1% of the total logic slices, while utilizing only around 4% of the DSP blocks available. When compared against popular field-programmable-gate-array-based floating point units, our design on Virtex 5 showed significantly lower resource utilization, while achieving comparable peak operating frequency.
field-programmable logic and applications | 2009
Sai Rahul Chalamalasetti; Wim Vanderbauwhede; Sohan Purohit; Martin Margala
This paper presents an FPGA implementation of a low cost 8bit reconfigurable processor core for media processing applications. The core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. This paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. The core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. Throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented.
Journal of Low Power Electronics | 2010
Sohan Purohit; Marco Lanuzza; Martin Margala
This paper presents the design, the analysis and the complete characterization of a novel split-path Data Driven Dynamic (sp-D3L) full adder cell in IBM’s 65 nm CMOS process. The split path D3L design style derived from standard D3L allows the design of high speed dynamic circuits without the power overhead of the clock tree while providing significantly higher performance than the D3L due to reduced capacitance at the pre-charge node. To demonstrate the performance benefits of the new split-path dynamic approach, we present comparison of the proposed adder with conventional static and dynamic adder cells. All the adder circuits were characterized for speed, power, area, noise margins, supply voltage scaling as well as fan-out capabilities. To evaluate the combined impact of load driven by the adder and load presented by the adder to the driving circuit, a combined fan-infan-out analysis with varying loads was also performed. Monte Carlo simulations were performed to evaluate the reliability of the adder design against random process, voltage and temperature variations. To compare with state of the art, we also performed a comparison of our proposed adder with several low power as well as high performance adders proposed recently in literature. Furthermore, to simulate the behavior of the adder in data path elements, we built ripple carry adders of varying lengths using the proposed adder. The new design was found to achieve from 16% to 27% performance advantages over its static and dynamic counterparts at nominal supply voltage. With supply voltage scaled from 1 V to 0.8 V, the adder shows 12%, 34% and 39% PDP advantage over domino, static and conventional D3L designs respectively. Fan-out analysis showed the adder to perform with 11% to 41% better PDP than the others at worst case FO32 loading.
Journal of Low Power Electronics | 2009
Sohan Purohit; Marco Lanuzza; Stefania Perri; Pasquale Corsonello; Martin Margala
This paper presents the architecture and complete VLSI implementation of a high data throughput, energy and area efficient data path targeted for DSP and multimedia applications. The architecture presented here is extremely flexible and can be easily extended to process 8-16-32 bit operands. For the initial analysis and design, three different implementations of the reconfigurable data path using static, dynamic domino and D3L logic styles were implemented to evaluate the performance of the proposed architecture in static as well as dynamic design domains. This paper presents complete evaluations of the architecture, along with an analysis of every design decision from the lowermost level. The proposed architecture can be easily extended to generalized N-bit operations. This is demonstrated through the custom implementation of 8-16 and 32 bit versions.