Is this you? Create Your Porfile

Deepavali Bhagwat

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Deepavali Bhagwat is active.

Explore More

Publication

Featured researches published by Deepavali Bhagwat.

very large data bases | 2004

An annotation management system for relational databases

Deepavali Bhagwat; Laura Chiticariu; Wang Chiew Tan; Gaurav Vijayvargiya

We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system could be used for understanding the provenance (aka lineage) of data, who has seen or edited a piece of data or the quality of data, which are useful functionalities for applications that deal with integration of scientific and biological data.We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.

modeling, analysis, and simulation on computer and telecommunication systems | 2009

Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

Deepavali Bhagwat; Kave Eshghi; Darrell D. E. Long; Mark David Lillibridge

Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.

modeling, analysis, and simulation on computer and telecommunication systems | 2006

Providing High Reliability in a Minimum Redundancy Archival Storage System

Deepavali Bhagwat; Kristal T. Pollack; Darrell D. E. Long; Thomas J. E. Schwarz; Ethan L. Miller; Jehan-Francois Paris

Inter-file compression techniques store files as sets of references to data objects or chunks that can be shared among many files. While these techniques can achieve much better compression ratios than conventional intra-file compression methods such as Lempel-Ziv compression, they also reduce the reliability of the storage system because the loss of a few critical chunks can lead to the loss of many files. We show how to eliminate this problem by choosing for each chunk a replication level that is a function of the amount of data that would be lost if that chunk were lost. Experiments using actual archival data show that our technique can achieve significantly higher robustness than a conventional approach combining data mirroring and intra-file compression while requiring about half the storage space.

acm conference on hypertext | 2005

Searching a file system using inferred semantic links

Deepavali Bhagwat; Neoklis Polyzotis

We describe Eureka, a file system search engine that takes into account the inherent relationships among files in order to improve the rankings of search results. The key idea is to automatically infer semantic links within the file system, and use the structure of the links to determine the importance of different files and essentially bias the result rankings. We discuss the inference of semantic links and describe the design of the Eureka search engine.

knowledge discovery and data mining | 2007

Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus

Deepavali Bhagwat; Kave Eshghi; Pankaj Mehra

We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search is performed by finding documents that have features in common with the query document. While it is possible to store all the features of all the documents in one index, this suffers from obvious scalability problems. Our approach is to partition the feature index into multiple smaller partitions that can be hosted on separate servers, enabling scalable and parallel search execution. When a document is ingested into the repository, a small number of partitions are chosen to store the features of the document. To perform similarity-based search, also, only a small number of partitions are queried. Our approach is stateless and incremental. The decision as to which partitions the features of the document should be routed to (for storing at ingestion time, and for similarity based search at query time) is solely based on the features of the document. Our approach scales very well. We show that executing similarity-based searches over such a partitioned search space has minimal impact on the precision and recall of search results, even though every search consults less than 3% of the total number of partitions.

international performance computing and communications conference | 2012

Improved deduplication through parallel Binning

Zhike Zhang; Deepavali Bhagwat; Witold Litwin; Darrell D. E. Long; Thomas J. E. Schwarz

Many modern storage systems use deduplication in order to compress data by avoiding storing the same data twice. Deduplication needs to use data stored in the past, but accessing information about all data stored can cause a severe bottleneck. Similarity based deduplication only accesses information on past data that is likely to be similar and thus more likely to yield good deduplication. We present an adaptive deduplication strategy that extends Extreme Binning and investigate theoretically and experimentally the effects of the additional bin accesses.

acm international conference on systems and storage | 2015

Efficient replication for distributed fault tolerant memory

Deepavali Bhagwat; Chethan Kumar; Satyam Vaghani

In-memory computing aims to bring forward application data from the storage to the servers main memory. By locating the working set closer to its application, in memory, great performance gains can be achieved. Most approaches in this arena have built solutions that either need traditional applications to be re-written or require new software acquisition. We built FVP, a solution that uses RAM to seamlessly accelerate any application running in VMs across servers in a virtualized datacenter. Therefore, applications need not be re-written and still reap the acceleration benefits of inmemory computing. VM I/O is absorbed by the servers memory, so the VM experiences memory latencies. However, due to the volatility of RAM, fault tolerance is essential without which our solution would be inconsequential. We discuss how we solved the practical challenges we faced when implementing a Distributed Fault Tolerant Memory (DFTM) acceleration layer while preserving VM performance. We also discuss other new challenges that inmemory computing brings to our field.

file and storage technologies | 2009