Shantanu Joshi
University of Florida
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shantanu Joshi.
ACM Transactions on Database Systems | 2006
Christopher Jermaine; Alin Dobra; Subramanian Arumugam; Shantanu Joshi; Abhijit Pol
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over large, disk-based input tables. The key innovation of the SMS join is that if the input data are clustered in a statistically random fashion on disk, then at all times, the join provides an online, statistical estimator for the eventual answer to the query as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimates accuracy or run the algorithm to completion with a total time requirement that is not much longer than that of other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into main memory.
international conference on management of data | 2005
Christopher Jermaine; Alin Dobra; Subramanian Arumugam; Shantanu Joshi; Abhijit Pol
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm for computing the answer to such a query over large, disk-based input tables. The key innovation of our algorithm is that at all times, it provides an online, statistical estimator for the eventual answer to the query, as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimates accuracy, or run the algorithm to completion with a total time requirement that is not much longer than other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into core memory.
very large data bases | 2009
Shantanu Joshi; Christopher Jermaine
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments.
international conference on data engineering | 2008
Shantanu Joshi; Chris Jermaine
We consider the problem of estimating the result of an aggregate query with a very low selectivity. Traditional sampling techniques can be ineffective for such a problem since a small random sample is likely to miss most or even all of the records satisfying the restrictive selection predicate. Stratfied sampling is useful in this situation, but a key problem in applying stratified sampling effectively is identifying which strata are important and developing a sampling plan that favors those strata in a robust fashion. We develop a solution to this problem that combines any prior knowledge or expectation about the stratification with information obtained from pilot sampling in a principled Bayesian framework.
IEEE Data(base) Engineering Bulletin | 2008
Khaled Yagoub; Peter Belknap; Benoit Dageville; Karl Dias; Shantanu Joshi; Hailing Yu
very large data bases | 2005
Christopher Jermaine; Alin Dobra; Abhijit Pol; Shantanu Joshi
international conference on data engineering | 2006
Shantanu Joshi; Christopher Jermaine
international conference on management of data | 2008
Peter Belknap; Supiti Buranawatanachoke; Romain Colle; Benoit Dageville; Karl Dias; Leonidas Galanis; Shantanu Joshi; Jonathan D. Klein; Stratos Papadomanolakis; Uri Shaft; Leng Seow Tan; Venkateshwaran Venkataramani; Yujun Wang; Graham Wood; Khaled Yagoub; Hailing Yu
Archive | 2007
Chris Jermaine; Shantanu Joshi
Archive | 2005
Shantanu Joshi; Subramanian Arumugam