Is this you? Create Your Porfile

S. Sudarshan

Indian Institute of Technology Bombay

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where S. Sudarshan is active.

Explore More

Publication

Featured researches published by S. Sudarshan.

international conference on data engineering | 2002

Keyword searching and browsing in databases using BANKS

Gaurav Bhalotia; Arvind Hulgeri; Charuta Nakhe; Soumen Chakrabarti; S. Sudarshan

With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BANKS, a system which enables keyword-based search on relational databases, together with data and schema browsing. BANKS enables users to extract information in a simple manner without any knowledge of the schema or any need for writing complex queries. A user can get information by typing a few keywords, following hyperlinks, and interacting with controls on the displayed results. BANKS models tuples as nodes in a graph, connected by links induced by foreign key and other relationships. Answers to a query are modeled as rooted trees connecting tuples that match individual keywords in the query. Answers are ranked using a notion of proximity coupled with a notion of prestige of nodes based on inlinks, similar to techniques developed for Web search. We present an efficient heuristic algorithm for finding and ranking query results.

international conference on management of data | 2000

Efficient and extensible algorithms for multi query optimization

Prasan Roy; S. Seshadri; S. Sudarshan; Siddhesh Bhobe

Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multiquery optimization aims at exploiting common sub-expressions to reduce evaluation cost. Multi-query optimization has hither-to been viewed as impractical, since earlier algorithms were exhaustive, and explore a doubly exponential search space. In this paper we demonstrate that multi-query optimization using heuristics is practical, and provides significant benefits. We propose three cost-based heuristic algorithms: Volcano-SH and Volcano-RU, which are based on simple modifications to the Volcano search strategy, and a greedy heuristic. Our greedy heuristic incorporates novel optimizations that improve efficiency greatly. Our algorithms are designed to be easily added to existing optimizers. We present a performance study comparing the algorithms, using workloads consisting of queries from the TPC-D benchmark. The study shows that our algorithms provide significant benefits over traditional optimization, at a very acceptable overhead in optimization time.

international conference on management of data | 2000

Turbo-charging vertical mining of large databases

Pradeep Shenoy; Jayant R. Haritsa; S. Sudarshan; Gaurav Bhalotia; Mayank Bawa; Devavrat Shah

In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.

international conference on management of data | 1996

Materialized view maintenance and integrity constraint checking: trading space for time

Kenneth A. Ross; Divesh Srivastava; S. Sudarshan

We investigate the problem of incremental maintenance of an SQL view in the face of database updates, and show that it is possible to reduce the total time cost of view maintenance by materializing (and maintaining) additional views. We formulate the problem of determining the optimal set of additional views to materialize as an optimization problem over the space of possible view sets (which includes the empty set). The optimization problem is harder than query optimization since it has to deal with multiple view sets, updates of multiple relations, and multiple ways of maintaining each view set for each updated relation.We develop a memoing solution for the problem; the solution can be implemented using the expression DAG representation used in rule-based optimizers such as Volcano. We demonstrate that global optimization cannot, in general, be achieved by locally optimizing each materialized subview, because common subexpressions between different materialized subviews can allow nonoptimal local plans to be combined into an optimal global plan. We identify conditions on materialized subviews in the expression DAG when local optimization is possible. Finally, we suggest heuristics that can be used to efficiently determine a useful set of additional views to materialize.Our results are particularly important for the efficient checking of assertions (complex integrity constraints) in the SQL-92 standard, since the incremental checking of such integrity constraints is known to be essentially equivalent to the view maintenance problem.

international conference on management of data | 2001

Materialized view selection and maintenance using multi-query optimization

Hoshi Mistry; Prasan Roy; S. Sudarshan; Krithi Ramamritham

Materialized views have been found to be very effective at speeding up queries, and are increasingly being supported by commercial databases and data warehouse systems. However, whereas the amount of data entering a warehouse and the number of materialized views are rapidly increasing, the time window available for maintaining materialized views is shrinking. These trends necessitate efficient techniques for the maintenance of materialized views. In this paper, we show how to find an efficient plan for the maintenance of a set of materialized views, by exploiting common subexpressions between different view maintenance expressions. In particular, we show how to efficiently select (a) expressions and indices that can be effectively shared, by transient materialization; (b) additional expressions and indices for permanent materialization; and (c) the best maintenance plan — incremental or recomputation — for each view. These three decisions are highly interdependent, and the choice of one affects the choice of the others. We develop a framework that cleanly integrates the various choices in a systematic and efficient manner. Our evaluations show that many-fold improvement in view maintenance time can be achieved using our techniques. Our algorithms can also be used to efficiently select materialized views to speed up workloads containing queries and updates.

very large data bases | 2002

BANKS: browsing and keyword searching in relational databases

B. Aditya; Gaurav Bhalotia; Soumen Chakrabarti; Arvind Hulgeri; Charuta Nakhe; Parag Parag; S. Sudarshan

Publisher Summary Browsing ANd Keyword Searching (BANKS) enables almost effortless Web publishing of relational and eXtensible Markup Language (XML) data that would otherwise remain (at least partially) invisible to the Web. Relational databases store large amounts of data that are queried using structured query languages. A user needs to know the underlying schema and the query language in order to make meaningful ad hoc queries on the data. This is a substantial barrier for casual users, such as users of Web-based information systems. HTML forms can be provided for predefined queries. A university Website may provide a form interface to search for faculty and students. Searching for departments would require yet another form, as would search for courses offered. However, creating an interface for each such task is laborious, and is also confusing to users since they must first expend effort finding which form to use. Furthermore, they are not suitable for ad hoc querying or exploratory browsing. Search engines on the Web have popularized an alternative unstructured querying and browsing paradigm that is simple and user-friendly. Users type in keywords and then follow hyperlinks to navigate from one document to the other. No knowledge of schema is needed. Keyword search can provide a very simple and easy-to-use mechanism for casual users to get information from databases.

international conference on management of data | 1996

Cost-based optimization for magic: algebra and implementation

Praveen Seshadri; Joseph M. Hellerstein; Hamid Pirahesh; T. Y. Cliff Leung; Raghu Ramakrishnan; Divesh Srivastava; Peter J. Stuckey; S. Sudarshan

Magic sets rewriting is a well-known optimization heuristic for complex decision-support queries. There can be many variants of this rewriting even for a single query, which differ greatly in execution performance. We propose cost-based techniques for selecting an efficient variant from the many choices.Our first contribution is a practical scheme that models magic sets rewriting as a special join method that can be added to any cost-based query optimizer. We derive cost formulas that allow an optimizer to choose the best variant of the rewriting and to decide whether it is beneficial. The order of complexity of the optimization process is preserved by limiting the search space in a reasonable manner. We have implemented this technique in IBMs DB2 C/S V2 database system. Our performance measurements demonstrate that the cost-based magic optimization technique performs well, and that without it, several poor decisions could be made.Our second contribution is a formal algebraic model of magic sets rewriting, based on an extension of the multiset relational algebra, which cleanly defines the search space and can be used in a rule-based optimizer. We introduce the multiset θ-semijoin operator, and derive equivalence rules involving this operator. We demonstrate that magic sets rewriting for non-recursive SQL queries can be modeled as a sequential composition of these equivalence rules.

very large data bases | 1994

The CORAL deductive system

Raghu Ramakrishnan; Divesh Srivastava; S. Sudarshan; Praveen Seshadri

CORAL is a deductive system that supports a rich declarative language, and an interface to C++, which allows for a combination of declarative and imperative programming. A CORAL declarative program can be organized as a collection of interacting modules. CORAL supports a wide range of evaluation strategies, and automatically chooses an efficient strategy for each module in the program. Users can guide query optimization by selecting from a wide range of control choices. The CORAL system provides imperative constructs to update, insert, and delete facts. Users can program in a combination of declarative CORAL and C++ extended with CORAL primitives. A high degree of extensibility is provided by allowing C++ programmers to use the class structure of C++ to enhance the CORAL implementation. CORAL provides support for main-memory data and, using the EXODUS storage manager, disk-resident data. We present a comprehensive view of the system from broad design goals, the language, and the architecture, to language interfaces and implementation details.

symposium on principles of database systems | 2001

Pipelining in multi-query optimization

Nilesh N. Dalvi; Sumit Sanghai; Prasan Roy; S. Sudarshan

Database systems frequently have to execute a set of related queries, which share several common subexpressions. Multi-query optimization exploits this, by finding evaluation plans that share common results. Current approaches to multi-query optimization assume that common subexpressions are materialized. Significant performance benefits can be had if common subexpressions are pipelined to their uses, without being materialized. However, plans with pipelining may not always be realizable with limited buffer space, as we show. We present a general model for schedules with pipelining, and present a necessary and sufficient condition for determining validity of a schedule under our model. We show that finding a valid schedule with minimum cost is NP-hard. We present a greedy heuristic for finding good schedules. Finally, we present a performance study that shows the benefit of our algorithms on batches of queries from the TPCD benchmark.

very large data bases | 1998

Garbage Collection in Object Oriented Databases Using Transactional Cyclic Reference Counting

Prasan Roy; S. Seshadri; Abraham Silberschatz; S. Sudarshan; Srinivas Ashwin

Abstract. Garbage collection is important in object-oriented databases to free the programmer from explicitly deallocating memory. In this paper, we present a garbage collection algorithm, called Transactional Cyclic Reference Counting (TCRC), for object-oriented databases. The algorithm is based on a variant of a reference-counting algorithm proposed for functional programming languages The algorithm keeps track of auxiliary reference count information to detect and collect cyclic garbage. The algorithm works correctly in the presence of concurrently running transactions, and system failures. It does not obtain any long-term locks, thereby minimizing interference with transaction processing. It uses recovery subsystem logs to detect pointer updates; thus, existing code need not be rewritten. Finally, it exploits schema information, if available, to reduce costs. We have implemented the TCRC algorithm and present results of a performance study of the implementation.

Explore More