Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eugene J. Shekita is active.

Publication


Featured researches published by Eugene J. Shekita.


international conference on management of data | 1996

Improved histograms for selectivity estimation of range predicates

Viswanath Poosala; Peter J. Haas; Yannis E. Ioannidis; Eugene J. Shekita

Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.


very large data bases | 2001

Efficiently publishing relational data as XML documents

Jayavel Shanmugasundaram; Eugene J. Shekita; Rimon Barr; Michael J. Carey; Bruce G. Lindsay; Hamid Pirahesh; Berthold Reinwald

Abstract. XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the major challenges is finding a way to efficiently structure and tag data from one or more tables as a hierarchical XML document. Different alternatives are possible depending on when this processing takes place and how much of it is done inside the relational engine. In this paper, we characterize and study the performance of these alternatives. Among other things, we explore the use of new scalar and aggregate functions in SQL for constructing complex XML documents directly in the relational engine. We also explore different execution plans for generating the content of an XML document. The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit. Our results also show the superiority of having the relational engine use what we call an “outer union plan” to generate the content of an XML document.


international conference on management of data | 2010

A comparison of join algorithms for log processing in MaPreduce

Spyros Blanas; Jignesh M. Patel; Vuk Ercegovac; Jun Rao; Eugene J. Shekita; Yuanyuan Tian

The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As part of this analysis, the log often needs to be joined with reference data such as information about users. Although there have been many studies examining join algorithms in parallel and distributed DBMSs, the MapReduce framework is cumbersome for joins. MapReduce programmers often use simple but inefficient algorithms to perform joins. In this paper, we describe crucial implementation details of a number of well-known join strategies in MapReduce, and present a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster. Our results provide insights that are unique to the MapReduce platform and offer guidance on when to use a particular join algorithm on this platform.


IEEE Transactions on Knowledge and Data Engineering | 1990

Starburst mid-flight: as the dust clears (database project)

Laura M. Haas; Walter Chang; Guy M. Lohman; John McPherson; Paul F. Wilms; George Lapis; Bruce G. Lindsay; Hamid Pirahesh; Michael J. Carey; Eugene J. Shekita

The purpose of the Starburst project is to improve the design of relational database management systems and enhance their performance, while building an extensible system to better support nontraditional applications and to serve as a testbed for future improvements in database technology. The design and implementation of the Starburst system to date are considered. Some key design decisions and how they affect the goal of improved structure and performance are examined. How well the goal of extensibility has been met is examined: what aspects of the system are extensible, how extensions can be done, and how easy it is to add extensions. Some actual extensions to the system, including the experiences of the first real customizers, are discussed. >


OODS '86 Proceedings on the 1986 international workshop on Object-oriented database systems | 1986

The architecture of the EXODUS extensible DBMS

Michael J. Carey; David J. DeWitt; Daniel Frank; M. Muralikrishna; Goetz Graefe; Joel E. Richardson; Eugene J. Shekita

With non-traditional application areas such as engineering design, image/voice data management, scientific/statistical applications, and artificial intelligence systems all clamoring for ways to store and efficiently process larger and larger volumes of data, it is clear that traditional database technology has been pushed to its limits. It also seems clear that no single database system will be capable of simultaneously meeting the functionality and performance requirements of such a diverse set of applications. In this paper we describe the preliminary design of EXODUS, an extensible database system that will facilitate the fast development of high-performance, application-specific database systems. EXODUS provides certain kernel facilities, including a versatile storage manager and a type manager. In addition, it provides an architectural framework for building application-specific database systems, tools to partially automate the generation of such systems, and libraries of software components (e.g., access methods) that are likely to be useful for many application domains.


international conference on management of data | 1991

Data caching tradeoffs in client-server DBMS architectures

Michael J. Carey; Michael J. Franklin; Miron Livny; Eugene J. Shekita

In this paper, we examine the performance tradeoffs that are raised by caching data in the client workstations of a client-server DBMS. We begin by presenting a range of lock-based cache consistency algorithms that arise by viewing cache consistency as a v~iant of the well-understood problem of replicated data management. We then use a detailed simulation model to study the performance of these algorithm over a wide range of workloads end system resource configurations. The results illustrate the key performance tradeoffs related to clientserver cache consistency, and should be of use to designers of next-generation DBMS prototypes and products.


international conference on management of data | 2001

A general technique for querying XML documents using a relational database system

Jayavel Shanmugasundaram; Eugene J. Shekita; Jerry Kiernan; Rajasekar Krishnamurthy; Efstratios Viglas; Jeffrey F. Naughton; Igor Tatarinov

There has been recent interest in using relational database systems to store and query XML documents. Each of the techniques proposed in this context works by (a) creating tables for the purpose of storing XML documents (also called relational schema generation), (b) storing XML documents by shredding them into rows in the created tables, and (c) converting queries over XML documents into SQL queries over the created tables. Since relational schema generation is a physical database design issue -- dependent on factors such as the nature of the data, the query workload and availability of schemas -- there have been many techniques proposed for this purpose. Currently, each relational schema generation technique requires its own query processor to efficiently convert queries over XML documents into SQL queries over the created tables. In this paper, we present an efficient technique whereby the same query-processor can be used for all such relational schema generation techniques. This greatly simplifies the task of relational schema generation by eliminating the need to write a special-purpose query processor for each new solution to the problem. In addition, our proposed technique enables users to query seamlessly across relational data and XML documents. This provides users with unified access to both relational and XML data without them having to deal with separate databases.


web search and data mining | 2008

Beyond basic faceted search

Ori Ben-Yitzhak; Nadav Golbandi; Nadav Har'El; Ronny Lempel; Andreas Neumann; Shila Ofek-Koifman; Dafna Sheinwald; Eugene J. Shekita; Benjamin Sznajder; Sivan Yogev

This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data that is far richer than just knowing the quantities of documents belonging to each facet. We see this capability as a step toward bringing OLAP capabilities, traditionally supported by databases over relational data, to the domain of free-text queries over metadata-rich content. Our second extension shows how one can efficiently extend a faceted search engine to support correlated facets - a more complex information model in which the values associated with a document across multiple facets are not independent. We show that by reducing the problem to a recently solved tree-indexing scenario, data with correlated facets can be efficiently indexed and retrieved


very large data bases | 2011

Using Paxos to build a scalable, consistent, and highly available datastore

Jun Rao; Eugene J. Shekita; Sandeep Tata

Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads. This paper describes Spinnakers Paxos-based replication protocol. The use of Paxos ensures that a data partition in Spinnaker will be available for reads and writes as long a majority of its replicas are alive. Unlike traditional master-slave replication, this is true regardless of the failure sequence that occurs. We show that Paxos replication can be competitive with alternatives that provide weaker consistency guarantees. Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.


international conference on management of data | 1996

Fundamental techniques for order optimization

David E. Simmen; Eugene J. Shekita; Timothy R. Malkemus

Decision support applications are growing in popularity as more business data is kept on-line. Such applications typically include complex SQL queries that can test a query optimizers ability to produce an efficient access plan. Many access plan strategies exploit the physical ordering of data provided by indexes or sorting. Sorting is an expensive operation, however. Therefore, it is imperative that sorting is optimized in some way or avoided all together. Toward that goal, this paper describes novel optimization techniques for pushing down sorts in joins, minimizing the number of sorting columns, and detecting when sorting can be avoided because of predicates, keys, or indexes. A set of fundamental operations is described that provide the foundation for implementing such techniques. The operations exploit data properties that arise from predicate application, uniqueness, and functional dependencies. These operations and techniques have been implemented in IBMs DB2/CS.

Researchain Logo
Decentralizing Knowledge