Bishwaranjan Bhattacharjee

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bishwaranjan Bhattacharjee is active.

Explore More

Publication

Featured researches published by Bishwaranjan Bhattacharjee.

very large data bases | 2010

SSD bufferpool extensions for database systems

Mustafa Canim; George A. Mihaila; Bishwaranjan Bhattacharjee; Kenneth A. Ross; Christian A. Lang

High-end solid state disks (SSDs) provide much faster access to data compared to conventional hard disk drives. We present a technique for using solid-state storage as a caching layer between RAM and hard disks in database management systems. By caching data that is accessed frequently, disk I/O is reduced. For random I/O, the potential performance gains are particularly significant. Our system continuously monitors the disk access patterns to identify hot regions of the disk. Temperature statistics are maintained at the granularity of an extent, i.e., 32 pages, and are kept current through an aging mechanism. Unlike prior caching methods, once the SSD is populated with pages from warm regions cold pages are not admitted into the cache, leading to low levels of cache pollution. Simulations based on DB2 I/O traces, and a prototype implementation within DB2 both show substantial performance improvements.

international conference on management of data | 2013

Building an efficient RDF store over a relational database

Mihaela A. Bornea; Julian Dolby; Anastasios Kementsietsidis; Kavitha Srinivas; Patrick Dantressangle; Octavian Udrea; Bishwaranjan Bhattacharjee

Efficient storage and querying of RDF data is of increasing importance, due to the increased popularity and widespread acceptance of RDF on the web and in the enterprise. In this paper, we describe a novel storage and query mechanism for RDF which works on top of existing relational representations. Reliance on relational representations of RDF means that one can take advantage of 35+ years of research on efficient storage and querying, industrial-strength transaction support, locking, security, etc. However, there are significant challenges in storing RDF in relational, which include data sparsity and schema variability. We describe novel mechanisms to shred RDF into relational, and novel query translation techniques to maximize the advantages of this shredded representation. We show that these mechanisms result in consistently good performance across multiple RDF benchmarks, even when compared with current state-of-the-art stores. This work provides the basis for RDF support in DB2 v.10.1.

very large data bases | 2009

Efficient index compression in DB2 LUW

Bishwaranjan Bhattacharjee; Lipyeow Lim; Timothy R. Malkemus; George A. Mihaila; Kenneth A. Ross; Sherman Lau; Cathy Mcarthur; Zoltan Toth; Reza Sherkat

In database systems, the cost of data storage and retrieval are important components of the total cost and response time of the system. A popular mechanism to reduce the storage footprint is by compressing the data residing in tables and indexes. Compressing indexes efficiently, while maintaining response time requirements, is known to be challenging. This is especially true when designing for a workload spectrum covering both data warehousing and transaction processing environments. DB2 Linux, UNIX, Windows (LUW) recently introduced index compression for use in both environments. This uses techniques that are able to compress index data efficiently while incurring virtually no performance penalty for query processing. On the contrary, for certain operations, the performance is actually better. In this paper, we detail the design of index compression in DB2 LUW and discuss the challenges that were encountered in meeting the design goals. We also demonstrate its effectiveness by showing performance results on typical customer scenarios.

international conference on management of data | 2003

Multi-dimensional clustering: a new data layout scheme in DB2

Sriram Padmanabhan; Bishwaranjan Bhattacharjee; Timothy R. Malkemus; Leslie A. Cranston; Matthew A. Huras

We describe the design and implementation of a new data layout scheme, called multi-dimensional clustering, in DB2 Universal Database Version 8. Many applications, e.g., OLAP and data warehousing, process a table or tables in a database using a multi-dimensional access paradigm. Currently, most database systems can only support organization of a table using a primary clustering index. Secondary indexes are created to access the tables when the primary key index is not applicable. Unfortunately, secondary indexes perform many random I/O accesses against the table for a simple operation such as a range query. Our work in multi-dimensional clustering addresses this important deficiency in database systems. Multi-Dimensional Clustering is based on the definition of one or more orthogonal clustering attributes (or expressions) of a table. The table is organized physically by associating records with similar values for the dimension attributes in a cluster. We describe novel techniques for maintaining this physical layout efficiently and methods of processing database operations that provide significant performance improvements. We show results from experiments using a star-schema database to validate our claims of performance with minimal overhead.

international conference on data engineering | 2007

Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling

Christian A. Lang; Bishwaranjan Bhattacharjee; Timothy R. Malkemus; Sriram Padmanabhan; Kwai Wong

Decision support (DSS) workloads generally contain multiple large concurrent scan operations. These are often executed as relational table scans which can take up a lot of I/O bandwidth. This is especially true for ad-hoc queries where the workload is not known in advance. Common database management systems have only limited ability to reuse memory buffer content across multiple running queries due to their treatment of queries in isolation. Previous attempts to coordinate scans for better buffer reuse were less than satisfactory due to drifting between scans and the required radical DBMS architecture changes. In this paper, we describe a new mechanism to keep similar table scans closer together during scanning. This is achieved via dynamic grouping and regrouping of scans based on their runtime behavior and via adaptive throttling of scan speeds based on scan group characteristics. The required memory footprint is very small and the effort required to extend existing database management systems is minimal, as shown in our DB2 UDB prototype. Our experiments show significant gains in end-to-end response times as well as average response times for TPC-H workloads.

very large data bases | 2003

Efficient query processing for multi-dimensionally clustered tables in DB2

Bishwaranjan Bhattacharjee; Sriram Padmanabhan; Timothy R. Malkemus; Tony Wen Hsun Lai; Leslie A. Cranston; Matthew A. Huras

We have introduced a Multi-Dimensional Clustering (MDC) physical layout scheme in DB2 version 8.0 for relational tables. Multi-Dimensional Clustering is based on the definition of one or more orthogonal clustering attributes (or expressions) of a table. The table is organized physically by associating records with similar values for the dimension attributes in a cluster. Each clustering key is allocated one or more blocks of physical storage with the aim of storing the multiple records belonging to the cluster in almost contiguous fashion. Block oriented indexes are created to access these blocks. In this paper, we describe novel techniques for query processing operations that provide significant performance improvements for MDC tables. Current database systems employ a repertoire of access methods including table scans, index scans, index ANDing, and index ORing. We have extended these access methods for efficiently processing the block based MDC tables. One important concept at the core of processing MDC tables is the block oriented access technique. In addition, since MDC tables can include regular record oriented indexes, we employ novel techniques to combine block and record indexes. Block oriented processing is extended to nested loop joins and star joins as well. We show results from experiments using a star-schema database to validate our claims of performance with minimal overhead.

data management on new hardware | 2006

Using secure coprocessors for privacy preserving collaborative data mining and analysis

Bishwaranjan Bhattacharjee; Naoki Abe; Kenneth Alan Goldman; Bianca Zadrozny; Vamsavardhana R. Chillakuru; Marysabel del Carpio; Chidanand Apte

Secure coprocessors have traditionally been used as a keystone of a security subsystem, eliminating the need to protect the rest of the subsystem with physical security measures. With technological advances and hardware miniaturization they have become increasingly powerful. This opens up the possibility of using them for non traditional use. This paper describes a solution for privacy preserving data sharing and mining using cryptographically secure but resource limited coprocessors. It uses memory light data mining methodologies along with a light weight database engine with federation capability, running on a coprocessor. The data to be shared resides with the enterprises that want to collaborate. This system will allow multiple enterprises, which are generally not allowed to share data, to do so solely for the purpose of detecting particular types of anomalies and for generating alerts. We also present results from experiments which demonstrate the value of such collaborations.

very large data bases | 2013

Making updates disk-I/O friendly using SSDs

Mohammad Sadoghi; Kenneth A. Ross; Mustafa Canim; Bishwaranjan Bhattacharjee

Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques for reducing index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead. The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates, and localizes maintenance to indexes on updated attributes. Further, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used Generalized Search Tree (GiST) open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost, by a factor of 3. For insertions of new records, our novel batching technique can save up to 90% of the insertion time.

data management on new hardware | 2011

Enhancing recovery using an SSD buffer pool extension

Bishwaranjan Bhattacharjee; Kenneth A. Ross; Christian A. Lang; George A. Mihaila; Mohammad Banikazemi

Recent advances in solid state technology have led to the introduction of solid state drives (SSDs). Todays SSDs store data persistently using NAND flash memory and support good random IO performance. Current work in exploiting flash in database systems has primarily focused on using its random IO capability for second level bufferpools below main memory. There has not been much emphasis on exploiting its persistence. In this paper, we describe a mechanism extending our previous work on a SSD Bufferpool on a DB2 LUW prototype, to exploit the SSD persistence for recovery and normal restart. We demonstrate significantly shorter recovery times, and improved performance immediately after recovery completes. We quantify the overhead of supporting recovery and show that the overhead is minimal.

very large data bases | 2014

Reducing database locking contention through multi-version concurrency

Mohammad Sadoghi; Mustafa Canim; Bishwaranjan Bhattacharjee; Fabian Nagel; Kenneth A. Ross

In multi-version databases, updates and deletions of records by transactions require appending a new record to tables rather than performing in-place updates. This mechanism incurs non-negligible performance overhead in the presence of multiple indexes on a table, where changes need to be propagated to all indexes. Additionally, an uncommitted record update will block other active transactions from using the index to fetch the most recently committed values for the updated record. In general, in order to support snapshot isolation and/or multi-version concurrency, either each active transaction is forced to search a database temporary area (e.g., roll-back segments) to fetch old values of desired records, or each transaction is forced to scan the entire table to find the older versions of the record in a multi-version database (in the absence of specialized temporal indexes). In this work, we describe a novel kV-Indirection structure to enable efficient (parallelizable) optimistic and pessimistic multi-version concurrency control by utilizing the old versions of records (at most two versions of each record) to provide direct access to the recent changes of records without the need of temporal indexes. As a result, our technique results in higher degree of concurrency by reducing the clashes between readers and writers of data and avoiding extended lock delays. We have a working prototype of our concurrency model and kV-Indirection structure in a commercial database and conducted an extensive evaluation to demonstrate the benefits of our multi-version concurrency control, and we obtained orders of magnitude speed up over the single-version concurrency control.

Explore More