Shantanu Joshi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shantanu Joshi is active.

Explore More

Publication

Featured researches published by Shantanu Joshi.

ACM Transactions on Database Systems | 2006

The Sort-Merge-Shrink join

Christopher Jermaine; Alin Dobra; Subramanian Arumugam; Shantanu Joshi; Abhijit Pol

One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over large, disk-based input tables. The key innovation of the SMS join is that if the input data are clustered in a statistically random fashion on disk, then at all times, the join provides an online, statistical estimator for the eventual answer to the query as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimates accuracy or run the algorithm to completion with a total time requirement that is not much longer than that of other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into main memory.

international conference on management of data | 2005

A disk-based join with probabilistic guarantees

Christopher Jermaine; Alin Dobra; Subramanian Arumugam; Shantanu Joshi; Abhijit Pol

One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm for computing the answer to such a query over large, disk-based input tables. The key innovation of our algorithm is that at all times, it provides an online, statistical estimator for the eventual answer to the query, as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimates accuracy, or run the algorithm to completion with a total time requirement that is not much longer than other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into core memory.

very large data bases | 2009

Sampling-based estimators for subset-based queries

Shantanu Joshi; Christopher Jermaine

We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments.

international conference on data engineering | 2008

Robust Stratified Sampling Plans for Low Selectivity Queries

Shantanu Joshi; Chris Jermaine

We consider the problem of estimating the result of an aggregate query with a very low selectivity. Traditional sampling techniques can be ineffective for such a problem since a small random sample is likely to miss most or even all of the records satisfying the restrictive selection predicate. Stratfied sampling is useful in this situation, but a key problem in applying stratified sampling effectively is identifying which strata are important and developing a sampling plan that favors those strata in a robust fashion. We develop a solution to this problem that combines any prior knowledge or expectation about the stratification with information obtained from pilot sampling in a principled Bayesian framework.

IEEE Data(base) Engineering Bulletin | 2008

Oracle's SQL Performance Analyzer.

Khaled Yagoub; Peter Belknap; Benoit Dageville; Karl Dias; Shantanu Joshi; Hailing Yu

very large data bases | 2005

Online estimation for subset-based SQL queries

Christopher Jermaine; Alin Dobra; Abhijit Pol; Shantanu Joshi

international conference on data engineering | 2006

Materialized Sample Views for Database Approximation

Shantanu Joshi; Christopher Jermaine

international conference on management of data | 2008

Oracle real application testing

Peter Belknap; Supiti Buranawatanachoke; Romain Colle; Benoit Dageville; Karl Dias; Leonidas Galanis; Shantanu Joshi; Jonathan D. Klein; Stratos Papadomanolakis; Uri Shaft; Leng Seow Tan; Venkateshwaran Venkataramani; Yujun Wang; Graham Wood; Khaled Yagoub; Hailing Yu

Archive | 2007