Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mathew S. Thoennes.
international conference on parallel architectures and compilation techniques | 2012
Bharat Sukhwani; Hong Min; Mathew S. Thoennes; Parijat Dube; Balakrishna R. Iyer; Bernard Brezzo; Donna N. Dillenberger; Sameh W. Asaad
Business growth and technology advancements have resulted in growing amounts of enterprise data. To gain valuable business insight and competitive advantage, businesses demand the capability of performing real-time analytics on such data. This, however, involves expensive query operations that are very time consuming on traditional CPUs. Additionally, in traditional database management systems (DBMS), the CPU resources are dedicated to mission-critical transactional workloads. Offloading expensive analytics query operations to a co-processor can allow efficient execution of analytics workloads in parallel with transactional workloads. In this paper, we present a Field Programmable Gate Array (FPGA) based acceleration engine for database operations in analytics queries. The proposed solution provides a mechanism for a DBMS to seamlessly harness the FPGA compute power without requiring any changes in the application or the existing data layout. Using a software-programmed query control block, the accelerator can be tailored to execute different queries without reconfiguration. Our prototype is implemented in a PCIe-attached FPGA system and is integrated into a commercial DBMS platform. The results demonstrate up to 94% CPU savings on real customer data compared to the baseline software cost with up to an order of magnitude speedup in the offloaded computations and up to 6.2× improvement in end-to-end performance.
field-programmable custom computing machines | 2013
Robert J. Halstead; Bharat Sukhwani; Hong Min; Mathew S. Thoennes; Parijat Dube; Sameh W. Asaad; Balakrishna R. Iyer
In this paper, we investigate the use of field programmable gate arrays (FPGAs) to accelerate relational joins. Relational join is one of the most CPU-intensive, yet commonly used, database operations. Hashing can be used to reduce the time complexity from quadratic (naïve) to linear time. However, doing so can introduce false positives to the results which must be resolved. We present a hash-join engine on FPGA that performs hashing, conflict resolution, and joining on a PCIe-attached system, achieving greater than 11x speedup over software.
International Journal of Parallel Programming | 2015
Bharat Sukhwani; Mathew S. Thoennes; Hong Min; Parijat Dube; Bernard Brezzo; Sameh W. Asaad; Donna N. Dillenberger
Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional online transaction processing (OLTP) systems for real-time analytics can affect the performance of mission-critical OLTP queries. On the other hand, support for real-time analytics is considered vital for important business insights and improved market responsiveness. In this paper, we try to address the needs of real-time analytics by enabling hardware acceleration of complex database query operations such as predicate evaluation, sort and projection. While projection helps reduce the amount of data being processed by subsequent query operations, sort is central to most database queries, even those not involving an explicit sort operation. Our system involves FPGA-based composable accelerator for offloading the analytics queries from the host CPU running the OLTP workload. The FPGA-accelerated database system contains accelerator kernels for various database operations and automatic transformation of query operations into calls to these hardware kernels for seamless integration of the accelerator into the database system. Based on the query semantics, each accelerator kernel can be tailored by software to execute specific database operations and different kernels can be fused together to compose a query accelerator. Our query transformation algorithm creates a query-specific control block to customize the accelerator without requiring FPGA-reconfiguration.
The Journal of Supercomputing | 2003
Mathew S. Thoennes; Charles C. Weems
The performance of software on modern architectures has grown more and more difficult to predict and analyze, as modern microprocessors have grown more complex. The execution of a program now entails the complex interaction of code, compiler and processor architecture. The current generation of microprocessors is optimized to an existing set of commercial and scientific benchmarks but new applications such as data mining are becoming a significant part of the workload. In this paper we explore the use of performance monitoring hardware to analyze the execution of C4.5, a data mining application, on the IBM Power2 architecture. We see how the data gathered by the hardware can be used to identify potential changes that can be made to the program and the processor micro-architecture to improve performance. We then go on to evaluate changes to C4.5 and to the micro-architecture. Based on our experience, we identify issues that limit the use of performance monitoring hardware in user level tuning and in extending its use to high performance computing environments.
symposium on computer architecture and high performance computing | 2013
Bharat Sukhwani; Mathew S. Thoennes; Hong Min; Parijat Dube; Bernard Brezzo; Sameh W. Asaad; Donna N. Dillenberger
In recent years, real-time analytics has seen widespread adoption in the business world. While it provides useful business insights and improved market responsiveness, it also adds a computational burden to traditional online transaction processing (OLTP) systems. Analytics queries involve complex database operations such as sort, aggregation, and join that consume significant computational resources, and, when executed on the same system, may affect the performance of OLTP queries. In this paper, we try to address this issue by accelerating two such database operations, namely, projection and sort, using a field programmable gate array (FPGA). Our prototype is implemented on an Alter a Stratix V FPGA and achieves an order of magnitude speedup in the sort operation compared to baseline software. Furthermore, our prototype implements projection in parallel with other query operations on FPGA, thus completely eliminating the cost of projection without consuming any extra cycles on the FPGA. FPGA accelerated sort and projection have been integrated with our previous work on accelerating other query operations [1], making our analytics acceleration prototype on FPGA applicable to a wider variety of queries.
Archive | 2006
Jeffrey D. Aman; Yuk L. Chan; Yuksel Gunal; Hiren R. Shah; Mathew S. Thoennes; Peter B. Yocom
IEEE Micro | 2014
Bharat Sukhwani; Hong Min; Mathew S. Thoennes; Parijat Dube; Bernard Brezzo; Sameh W. Asaad; Donna Eng Dillenberger
Archive | 2006
Mathew S. Thoennes; Peter B. Yocom
Archive | 2008
John A. Bivens; David M. Chess; Donna N. Dillenberger; Steven E. Froehlich; James E. Hanson; Mark Francis Hulber; Jeffrey O. Kephart; Giovanni Pacifici; Michael J. Spreitzer; Asser N. Tantawi; Mathew S. Thoennes; Ian Whalley; Peter B. Yocom
Archive | 2016
Sameh W. Asaad; Parijat Dube; Balakrishna R. Iyer; Hong Min; Bharat Sukhwani; Mathew S. Thoennes