Prasad M. Deshpande | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Prasad M. Deshpande is active.

Explore More

Publication

Featured researches published by Prasad M. Deshpande.

international conference on management of data | 1997

An array-based algorithm for simultaneous multidimensional aggregates

Yihong Zhao; Prasad M. Deshpande; Jeffrey F. Naughton

Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications. Recently, Gray et al. [GBLP95] proposed the “Cube” operator, which computes group-by aggregations over all possible subsets of the specified dimensions. The rapid acceptance of the importance of this operator has led to a variant of the Cube being proposed for the SQL standard. Several efficient algorithms for Relational OLAP (ROLAP) have been developed to compute the Cube. However, to our knowledge there is nothing in the literature on how to compute the Cube for Multidimensional OLAP (MOLAP) systems, which store their data in sparse arrays rather than in tables. In this paper, we present a MOLAP algorithm to compute the Cube, and compare it to a leading ROLAP algorithm. The comparison between the two is interesting, since although they are computing the same function, one is value-based (the ROLAP algorithm) whereas the other is position-based (the MOLAP algorithm). Our tests show that, given appropriate compression techniques, the MOLAP algorithm is significantly faster than the ROLAP algorithm. In fact, the difference is so pronounced that this MOLAP algorithm may be useful for ROLAP systems as well as MOLAP systems, since in many cases, instead of cubing a table directly, it is faster to first convert the table to an array, cube the array, then convert the result back to a table.

international conference on management of data | 1998

Caching multidimensional queries using chunks

Prasad M. Deshpande; Karthikeyan Ramasamy; Amit Shukla; Jeffrey F. Naughton

Caching has been proposed (and implemented) by OLAP systems in order to reduce response times for multidimensional queries. Previous work on such caching has considered table level caching and query level caching. Table level caching is more suitable for static schemes. On the other hand, query level caching can be used in dynamic schemes, but is too coarse for “large” query results. Query level caching has the further drawback for small query results in that it is only effective when a new query is subsumed by a previously cached query. In this paper, we propose caching small regions of the multidimensional space called “chunks”. Chunk-based caching allows fine granularity caching, and allows queries to partially reuse the results of previous queries with which they overlap. To facilitate the computation of chunks required by a query but missing from the cache, we propose a new organization for relational tables, which we call a “chunked file.” Our experiments show that for workloads that exhibit query locality, chunked caching combined with the chunked file organization performs better than query level caching. An unexpected benefit of the chunked file organization is that, due to its multidimensional clustering properties, it can significantly improve the performance of queries that “miss” the cache entirely as compared to traditional file organizations.

very large data bases | 2007

OLAP over uncertain and imprecise data

Douglas Burdick; Prasad M. Deshpande; T. S. Jayram; Raghu Ramakrishnan; Shivakumar Vaithyanathan

We extend the OLAP data model to represent data ambiguity, specifically imprecision and uncertainty, and introduce an allocation-based approach to the semantics of aggregation queries over such data. We identify three natural query properties and use them to shed light on alternative query semantics. While there is much work on representing and querying ambiguous data, to our knowledge this is the first paper to handle both imprecision and uncertainty in an OLAP setting.

international conference on management of data | 1998

Simultaneous optimization and evaluation of multiple dimensional queries

Yihong Zhao; Prasad M. Deshpande; Jeffrey F. Naughton; Amit Shukla

Database researchers have made significant progress on several research issues related to multidimensional data analysis, including the development of fast cubing algorithms, efficient schemes for creating and maintaining precomputed group-bys, and the design of efficient storage structures for multidimensional data. However, to date there has been little or no work on multidimensional query optimization. Recently, Microsoft has proposed “OLE DB for OLAP” as a standard multidimensional interface for databases. OLE DB for OLAP defines Multi-Dimensional Expressions (MDX), which have the interesting and challenging feature of allowing clients to ask several related dimensional queries in a single MDX expression. In this paper, we present three algorithms to optimize multiple related dimensional queries. Two of the algorithms focus on how to generate a global plan from several related local plans. The third algorithm focuses on generating a good global plan without first generating local plans. We also present three new query evaluation primitives that allow related query plans to share portions of their evaluation. Our initial performance results suggest that the exploitation of common subtask evaluation and global optimization can yield substantial performance improvements when relational database systems are used as data sources for multidimensional analysis.

knowledge discovery and data mining | 1999

Using a knowledge cache for interactive discovery of association rules

Biswadeep Nag; Prasad M. Deshpande; David J. DeWitt

Association rule mining is a valuable decision support technique that can be used to analyze customer preferences, buying patterns, and product correlations. Current systems are however handicapped by the long processing times required by mining algorithms that make them unsuitable for interactive use. In this paper, we propose the use of a knowledge cache that can reduce the response time by several orders of magnitude. Most of the performance gain comes from the idea of guaranteed support that allows us to completely eliminate database accesses in a large number of cases. Using this cache, the time taken to answer a query is proportional to just the size of the result, rather than to the size of the database. Cache replacement is best done by a benefit-metric based strategy that can easily adapt to changing query patterns. We show that our caching scheme is quite robust, providing good performance on a wide variety of data distributions even for small cache sizes. We also compare algorithms that use precomputation to those that use caching and show that the best performance is obtained by combining both these techniques. Finally, we illustrate how the idea of caching can be readily extended to a broader class of problems such as the mining of generalized association rules.

conference on information and knowledge management | 2004

Grammar-based task analysis of web logs

Savitha Srinivasan; Arnon Amir; Prasad M. Deshpande; Vladimir Zbarsky

The daily use of Internet-based services is involved with hundreds of different tasks being performed by multiple users. A single task is typically involved with a sequence of Web URLs invocation. We study the problem of pattern detection in Web logs to identify tasks performed by users, and analyze task trends over time using a grammar-based framework. Our results are demonstrated on a corporate Intranet portal application with 7000 users over a 6 week period and demonstrate compelling business value from this high-level task analysis.

extending database technology | 2000

Materialized View Selection for Multi-Cube Data Models

Amit Shukla; Prasad M. Deshpande; Jeffrey F. Naughton

OLAP applications use precomputation of aggregate data to improve query response time. While this problem has been well-studied in the recent database literature, to our knowledge all previous work has focussed on the special case in which all aggregates are computed from a single cube (in a star schema, this corresponds to there being a single fact table). This is unfortunate, because many real world applications require aggregates over multiple fact tables. In this paper, we attempt to fill this lack of discussion about the issues arising in multi-cube data models by analyzing these issues. Then we examine performance issues by studying the precomputation problem for multi-cube systems. We show that this problem is significantly more complex than the single cube precomputation problem, and that algorithms and cost models developed for single cube precomputation must be extended to deal well with the multi-cube case. Our results from a prototype implementation show that for multicube workloads substantial performance improvements can be realized by using the multi-cube algorithms.

analytics for noisy unstructured text data | 2008

Rule based synonyms for entity extraction from noisy text

Rema Ananthanarayanan; Vijil Chenthamarakshan; Prasad M. Deshpande; Raghuram Krishnapuram

Identification of named entities such as person, organization and product names from text is an important task in information extraction. In many domains, the same entity could be referred to in multiple ways due to variations introduced by different user groups, variations of spellings across regions or cultures, usage of abbreviations, typographical errors and other reasons associated with conventional usage. Identifying a piece of text as a mention of an entity in such noisy data is difficult, even if we have a dictionary of possible entities. Previous approaches treat the synonym problem as part entity disambiguation and use learning-based methods that use the context of the words to identify synonyms. In this paper, we show that existing domain knowledge, encoded as rules, can be used effectively to address the synonym problem to a considerable extent. This makes the disambiguation task simpler, without the need for much training data. We look at a subset of application scenarios in named entity extraction, categorize the possible variations in entity names, and define rules for each category. Using these rules, we generate synonyms for the canonical list and match these synonyms to the actual occurrence in the data sets. In particular, we describe the rule categories that we developed for several named entities and report the results of applying our technique of extracting named entities by generating synonyms for two different domains.

extending database technology | 2011

Efficient reverse skyline retrieval with arbitrary non-metric similarity measures

Prasad M. Deshpande; Deepak P

A Reverse Skyline query returns all objects whose skyline contains the query object. In this paper, we consider Reverse Skyline query processing where the distance between attribute values are not necessarily metric. We outline real world cases that motivate Reverse Skyline processing in such scenarios. We consider various optimizations to develop efficient algorithms for Reverse Skyline processing. Firstly, we consider block-based processing of objects to optimize on IO costs. We then explore pre-processing to re-arrange objects on disk to speed-up computational and IO costs. We then present our main contribution, which is a method of using group-level reasoning and early pruning to micro-optimize processing by reducing attribute level comparisons. An extensive empirical evaluation with real-world datasets and synthetic data of varying characteristics shows that our optimization techniques are indeed very effective in dramatically speeding Reverse Skyline processing, both in terms of computational costs and IO costs.

extending database technology | 2000

Aggregate Aware Caching for Multi-Dimensional Queries

Prasad M. Deshpande; Jeffrey F. Naughton

To date, work on caching for OLAP workloads has focussed on using cached results from a previous query as the answer to another query. This strategy is effective when the query stream exhibits a high degree of locality. It unfortunately misses the dramatic performance improvements obtainable when the answer to a query, while not immediately available in the cache, can be computed from data in the cache. In this paper, we consider the common subcase of answering queries by aggregating data in the cache. In order to use aggregation in the cache, one must solve two subproblems: (1) determining when it is possible to answer a query by aggregating data in the cache, and (2) determining the fastest path for this aggregation, since there can be many.We present two strategies - a naive one and a Virtual Count based strategy. The virtual count based method finds if a query is computable from the cache almost instantaneously, with a small overhead of maintaining the summary state of the cache. The algorithm also maintains cost-based information that can be used to figure out the best possible option for computing a query result from the cache. Experiments with our implementation show that aggregation in the cache leads to substantial performance improvement. The virtual count based methods further improve the performance compared to the naive approaches, in terms of cache lookup and aggregation times.

Explore More