Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Raghav Kaushik is active.

Publication


Featured researches published by Raghav Kaushik.


international conference on management of data | 2002

Covering indexes for branching path queries

Raghav Kaushik; Philip Bohannon; Jeffrey F. Naughton; Henry F. Korth

In this paper, we ask if the traditional relational query acceleration techniques of summary tables and covering indexes have analogs for branching path expression queries over tree- or graph-structured XML data. Our answer is yes --- the forward-and-backward index already proposed in the literature can be viewed as a structure analogous to a summary table or covering index. We also show that it is the smallest such index that covers all branching path expression queries. While this index is very general, our experiments show that it can be so large in practice as to offer little performance improvement over evaluating queries directly on the data. Likening the forward-and-backward index to a covering index on all the attributes of several tables, we devise an index definition scheme to restrict the class of branching path expressions being indexed. The resulting index structures are dramatically smaller and perform better than the full forward-and-backward index for these classes of branching path expressions. This is roughly analogous to the situation in multidimensional or OLAP workloads, in which more highly aggregated summary tables can service a smaller subset of queries but can do so at increased performance. We evaluate the performance of our indexes on both relational decompositions of XML and a native storage technique. As expected, the performance benefit of an index is maximized when the query matches the index definition.


international conference on data engineering | 2002

Exploiting local similarity for indexing paths in graph-structured data

Raghav Kaushik; Pradeep Shenoy; Philip Bohannon; Ehud Gudes

XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-structured query languages. However, to answer all path queries accurately, summaries must encode information about long, seldom-queried paths, leading to increased size and complexity with little added value. We introduce the A(k)-indices, a family of approximate structural summaries. They are based on the concept of k-bisimilarity, in which nodes are grouped based on local structure, i.e., the incoming paths of length up to k. The parameter k thus smoothly varies the level of detail (and accuracy) of the A(k)-index. For small values of k, the size of the index is substantially reduced. While smaller, the A(k) index is approximate, and we describe techniques for efficiently extracting exact answers to regular path queries. Our experiments show that, for moderate values of k, path evaluation using the A(k)-index ranges from being very efficient for simple queries to competitive for most complex queries, while using significantly less space than comparable structures.


international conference on data engineering | 2006

Robust Cardinality and Cost Estimation for Skyline Operator

Surajit Chaudhuri; Nilesh N. Dalvi; Raghav Kaushik

Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show that our technique is substantially more effective than traditional approaches. Finally, we show through an implementation in Microsoft SQL Server that skyline queries can substantially benefit from our techniques.


international xml database symposium | 2003

XML-to-SQL Query Translation Literature: The State of the Art and Open Problems

Rajasekar Krishnamurthy; Raghav Kaushik; Jeffrey F. Naughton

Recently, the database research literature has seen an explosion of publications with the goal of using an RDBMS to store and/or query XML data. The problems addressed and solved in this area are diverse. This diversity renders it difficult to know how the various results presented fit together, and even makes it hard to know what open problems remain. As a first step to rectifying this situation, we present a classification of the problem space and discuss how almost 40 papers fit into this classification. As a result of this study, we find that some basic questions are still open. In particular, for the XML publishing of relational data and for “schema-based” shredding of XML documents into relations, there is no published algorithm for translating even simple path expression queries (with the axis) into SQL when the XML schema is recursive.


international conference on management of data | 2009

Extending autocompletion to tolerate errors

Surajit Chaudhuri; Raghav Kaushik

Autocompletion is a useful feature when a user is doing a look up from a table of records. With every letter being typed, autocompletion displays strings that are present in the table containing as their prefix the search string typed so far. Just as there is a need for making the lookup operation tolerant to typing errors, we argue that autocompletion also needs to be error-tolerant. In this paper, we take a first step towards addressing this problem. We capture input typing errors via edit distance. We show that a naive approach of invoking an offline edit distance matching algorithm at each step performs poorly and present more efficient algorithms. Our empirical evaluation demonstrates the effectiveness of our algorithms.


international conference on management of data | 2010

On active learning of record matching packages

Arvind Arasu; Michaela Götz; Raghav Kaushik

We consider the problem of learning a record matching package (classifier) in an active learning setting. In active learning, the learning algorithm picks the set of examples to be labeled, unlike more traditional passive learning setting where a user selects the labeled examples. Active learning is important for record matching since manually identifying a suitable set of labeled examples is difficult. Previous algorithms that use active learning for record matching have serious limitations: The packages that they learn lack quality guarantees and the algorithms do not scale to large input sizes. We present new algorithms for this problem that overcome these limitations. Our algorithms are fundamentally different from traditional active learning approaches, and are designed ground up to exploit problem characteristics specific to record matching. We include a detailed experimental evaluation on realworld data demonstrating the effectiveness of our algorithms.


international conference on management of data | 2007

Leveraging aggregate constraints for deduplication

Surajit Chaudhuri; Anish Das Sarma; Venkatesh Ganti; Raghav Kaushik

We show that aggregate constraints (as opposed to pairwise constraints) that often arise when integrating multiple sources of data, can be leveraged to enhance the quality of deduplication. However, despite its appeal, we show that the problem is challenging, both semantically and computationally. We define a restricted search space for deduplication that is intuitive in our context and we solve the problem optimally for the restricted space. Our experiments on real data show that incorporating aggregate constraints significantly enhances the accuracy of deduplication.


international conference on management of data | 2011

Data generation using declarative constraints

Arvind Arasu; Raghav Kaushik; Jian Li

We study the problem of generating synthetic databases having declaratively specified characteristics. This problem is motivated by database system and application testing, data masking, and benchmarking. While the data generation problem has been studied before, prior approaches are either non-declarative or have fundamental limitations relating to data characteristics that they can capture and efficiently support. We argue that a natural, expressive, and declarative mechanism for specifying data characteristics is through cardinality constraints; a cardinality constraint specifies that the output of a query over the generated database have a certain cardinality. While the data generation problem is intractable in general, we present efficient algorithms that can handle a large and useful class of constraints. We include a thorough empirical evaluation illustrating that our algorithms handle complex constraints, scale well as the number of constraints increase, and outperform applicable prior techniques.


very large data bases | 2002

Updates for structure indexes

Raghav Kaushik; Philip Bohannon; Jeffrey F. Naughton; Pradeep Shenoy

The problem of indexing path queries in semistructured/XML databases has received considerable attention recently, and several proposals have advocated the use of structure indexes as supporting data structures for this problem. In this paper, we investigate efficient update algorithms for structure indexes. We study two kinds of updates -- the addition of a subgraph, intended to represent the addition of a new file to the database, and the addition of an edge, to represent a small incremental change. We focus on three instances of structure indexes that are based on the notion of graph bisimilarity. We propose algorithms to update the bisimulation partition for both kinds of updates and show how they extend to these indexes. Our experiments on two real world data sets show that our update algorithms are an order of magnitude faster than dropping and rebuilding the index. To the best of our knowledge, no previous work has addressed updates for structure indexes based on graph bisimilarity.


very large data bases | 2004

Efficient XML-to-SQL query translation: where to add the intelligence?

Rajasekar Krishnamurthy; Raghav Kaushik; Jeffrey F. Naughton

We consider the efficiency of queries generated by XML to SQL translation. We first show that published XML-to-SQL query translation algorithms are suboptimal in that they often translate simple path expressions into complex SQL queries even when much simpler equivalent SQL queries exist. There are two logical ways to deal with this problem. One could generate suboptimal SQL queries using a fairly naive translation algorithm, and then attempt to optimize the resulting SQL; or one could use a more intelligent translation algorithm with the hopes of generating efficient SQL directly. We show that optimizing the SQL after it is generated is problematic, becoming intractable even in simple scenarios; by contrast, designing a translation algorithm that exploits information readily available at translation time is a promising alternative. To support this claim, we present a translation algorithm that exploits translation time information to generate efficient SQL for path expression queries over tree schemas.

Collaboration


Dive into the Raghav Kaushik's collaboration.

Researchain Logo
Decentralizing Knowledge