Bharat Sukhwani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bharat Sukhwani is active.

Explore More

Publication

Featured researches published by Bharat Sukhwani.

IEEE Computer | 2007

Achieving High Performance with FPGA-Based Computing

Martin C. Herbordt; Tom VanCourt; Yongfeng Gu; Bharat Sukhwani; Al Conti; Josh Model; Doug Disabello

Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric

field-programmable custom computing machines | 2006

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Martin C. Herbordt; Josh Model; Yongfeng Gu; Bharat Sukhwani; Tom VanCourt

Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic programming- (DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at streaming rate (110 Maa/sec on a VP70 for query sizes up to 600 and 170 Maa/sec on a Virtex4 for query sizes up to 1024), and with no preprocessing other than loading the query string. Further, they use very high sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations

international conference on parallel architectures and compilation techniques | 2012

Database analytics acceleration using FPGAs

Bharat Sukhwani; Hong Min; Mathew S. Thoennes; Parijat Dube; Balakrishna R. Iyer; Bernard Brezzo; Donna N. Dillenberger; Sameh W. Asaad

Business growth and technology advancements have resulted in growing amounts of enterprise data. To gain valuable business insight and competitive advantage, businesses demand the capability of performing real-time analytics on such data. This, however, involves expensive query operations that are very time consuming on traditional CPUs. Additionally, in traditional database management systems (DBMS), the CPU resources are dedicated to mission-critical transactional workloads. Offloading expensive analytics query operations to a co-processor can allow efficient execution of analytics workloads in parallel with transactional workloads. In this paper, we present a Field Programmable Gate Array (FPGA) based acceleration engine for database operations in analytics queries. The proposed solution provides a mechanism for a DBMS to seamlessly harness the FPGA compute power without requiring any changes in the application or the existing data layout. Using a software-programmed query control block, the accelerator can be tailored to execute different queries without reconfiguration. Our prototype is implemented in a PCIe-attached FPGA system and is integrated into a commercial DBMS platform. The results demonstrate up to 94% CPU savings on real customer data compared to the baseline software cost with up to an order of magnitude speedup in the offloaded computations and up to 6.2× improvement in end-to-end performance.

parallel computing | 2007

Single pass streaming BLAST on FPGAs

Martin C. Herbordt; Josh Model; Bharat Sukhwani; Yongfeng Gu; Tom VanCourt

Approximate string matching is fundamental to bioinformatics and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic-programming- (DP) based methods. Our primary contribution is a new algorithm for emulating the seeding and extension phases of BLAST. This operates in a single pass through a database at streaming rate, and with no preprocessing other than loading the query string. Moreover, it emulates parameters turned to maximum possible sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations showing order of magnitude acceleration over serial reference code. A simple extension assures compatibility with NCBI BLAST.

field-programmable custom computing machines | 2013

Accelerating Join Operation for Relational Databases with FPGAs

Robert J. Halstead; Bharat Sukhwani; Hong Min; Mathew S. Thoennes; Parijat Dube; Sameh W. Asaad; Balakrishna R. Iyer

In this paper, we investigate the use of field programmable gate arrays (FPGAs) to accelerate relational joins. Relational join is one of the most CPU-intensive, yet commonly used, database operations. Hashing can be used to reduce the time complexity from quadratic (naïve) to linear time. However, doing so can introduce false positives to the results which must be resolved. We present a hash-join engine on FPGA that performs hashing, conflict resolution, and joining on a PCIe-attached system, achieving greater than 11x speedup over software.

Computing in Science and Engineering | 2008

Computing Models for FPGA-Based Accelerators

Martin C. Herbordt; Yongfeng Gu; Tom VanCourt; Josh Model; Bharat Sukhwani; Matt Chiu

Field-programmable gate arrays are widely considered accelerators for compute-intensive applications. A critical phase of FPGA application development is finding and mapping to the appropriate computing model. These models differ from models generally used in programming. For example, whereas parallel computing models are often based on thread execution and interaction, FPGA computing can exploit more degrees of freedom than are available in software. This enables models with highly flexible fine-grained parallelism and associative operations such as broadcast and collective response. Several case studies demonstrate the effectiveness of using FPGA-based accelerators in molecular modeling.

International Journal of Parallel Programming | 2015

A Hardware/Software Approach for Database Query Acceleration with FPGAs

Bharat Sukhwani; Mathew S. Thoennes; Hong Min; Parijat Dube; Bernard Brezzo; Sameh W. Asaad; Donna N. Dillenberger

Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional online transaction processing (OLTP) systems for real-time analytics can affect the performance of mission-critical OLTP queries. On the other hand, support for real-time analytics is considered vital for important business insights and improved market responsiveness. In this paper, we try to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection. While projection helps reduce the amount of data being processed by subsequent query operations, sort is central to most database queries, even those not involving an explicit sort operation. Our system involves FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload. The FPGA-accelerated database system contains accelerator kernels for various database operations and automatic transformation of query operations into calls to these hardware kernels for seamless integration of the accelerator into the database system. Based on the query semantics, each accelerator kernel can be tailored by software to execute specific database operations and different kernels can be fused together to compose a query accelerator. Our query transformation algorithm creates a query-specific control block to customize the accelerator without requiring FPGA-reconfiguration.

field-programmable custom computing machines | 2011

High-Throughput, Lossless Data Compresion on FPGAs

Bharat Sukhwani; Bulent Abali; Bernard Brezzo; Sameh W. Asaad

Loss less compression is often used before writing data to a storage medium or transmitting across a transmission medium. Compression aids by saving storage space or transmission bandwidth, a decompression operation is performed when the data is subsequently read. Though this scheme has clear benefits, the execution time of compression and decompression is critical to its application in real-time systems. Software compression utilities are often slow, leading to degraded system performance. Hardware-based solutions, on the other hand, often drive large resource requirements and are not amenable to supporting future algorithmic changes. In the current article, we present a high-throughput, streaming, loss less compression algorithm and its efficient implementation on FPGAs. The proposed solution provides a peak throughput of 1GB/sec per engine, with a sustained overall measured throughput of 2.66GB/sec on a PCIe-based FPGA board with two compression and two decompression engines. This result represents an overall speedup of 13.6x over reference software implementation. The proposed design is very lean, and, with multiple engines running in parallel, provides a path to potential speedups of up to two orders of magnitude. In the current implementation, the achievable overall throughput is limited only by the available PCIe bus bandwidth.

field-programmable logic and applications | 2008

Acceleration of a production rigid molecule docking code

Bharat Sukhwani; Martin C. Herbordt

Modeling the interactions of biological molecules, or docking is critical to both understanding basic life processes and to designing new drugs. Here we describe the FPGA-based acceleration of a recently developed, complex, production docking code. We find that it is necessary to extend our previous 3D correlation structure in several ways, most significantly to support simultaneous computation of several correlation functions. The result is a hundred-fold speed-up of a section of the code that represents over 92% of the original run-time. An additional 4% is accelerated through a previously described method, yielding a total acceleration of almost 25times for typical protein-ligand combinations.

IEEE Transactions on Circuits and Systems | 2007

Simulation and Design of Nanocircuits With Resonant Tunneling Devices

Janet Meiling Wang; Bharat Sukhwani; Uday Padmanabhan; Dongsheng Ma; Kartik Sinha

New nanotechnology-based devices are being researched to replace CMOS devices in order to overcome CMOS technologys scaling limitations. However, many such devices exhibit nonmonotonic I-V characteristics and uncertain properties which lead to the negative differential resistance (NDR) problem and the chaotic performance. This paper proposes two new circuit simulation approaches that can effectively simulate nanotechnology devices with uncertain input sources and negative differential resistance problem. A new tool called NanoSim-RTD is developed based on the proposed new simulation techniques. The experimental results show a speedup of 1.48-37.1 times when compared with existing simulators. Further, this paper demonstrates a new way to design delay-insensitive nanocircuits, and the designs can be verified by using NanoSim-RTD.

Explore More