Kevin S. Beyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevin S. Beyer is active.

Explore More

Publication

Featured researches published by Kevin S. Beyer.

international conference on management of data | 1999

Bottom-up computation of sparse and Iceberg CUBE

Kevin S. Beyer; Raghu Ramakrishnan

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs. We present a thorough performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve Iceberg-CUBE computation.

very large data bases | 2004

A framework for using materialized XPath views in XML query processing

Andrey Balmin; Fatma Ozcan; Kevin S. Beyer; Roberta Jo Cochrane; Hamid Pirahesh

XML languages, such as XQuery, XSLT and SQL/XML, employ XPath as the search and extraction language. XPath expressions often define complicated navigation, resulting in expensive query processing, especially when executed over large collections of documents. In this paper, we propose a framework for exploiting materialized XPath views to expedite processing of XML queries. We explore a class of materialized XPath views, which may contain XML fragments, typed data values, full paths, node references or any combination thereof. We develop an XPath matching algorithm to determine when such views can be used to answer a user query containing XPath expressions. We use the match information to identify the portion of an XPath expression in the user query which is not covered by the XPath view. Finally, we construct, possibly multiple, compensation expressions which need to be applied to the view to produce the query result. Experimental evaluation, using our prototype implementation, shows that the matching algorithm is very efficient and usually accounts for a small fraction of the total query compilation time.

international conference on management of data | 2010

Ricardo: integrating R and Hadoop

Sudipto Das; Yannis Sismanis; Kevin S. Beyer; Rainer Gemulla; Peter J. Haas; John McPherson

Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential for marketplace competitiveness. This need to perform deep analysis over huge data repositories poses a significant challenge to existing statistical software and data management systems. On the one hand, statistical software provides rich functionality for data analysis and modeling, but can handle only limited amounts of data; e.g., popular packages like R and SPSS operate entirely in main memory. On the other hand, data management systems - such as MapReduce-based systems - can scale to petabytes of data, but provide insufficient analytical functionality. We report our experiences in building Ricardo, a scalable platform for deep analytics. Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research Center, and rests on a decomposition of data-analysis algorithms into parts executed by the R statistical analysis system and parts handled by the Hadoop data management system. This decomposition attempts to minimize the transfer of data across system boundaries. Ricardo contrasts with previous approaches, which try to get along with only one type of system, and allows analysts to work on huge datasets from within a popular, well supported, and powerful analysis environment. Because our approach avoids the need to re-implement either statistical or data-management functionality, it can be used to solve complex problems right now.

international conference on management of data | 2005

System RX: one part relational, one part XML

Kevin S. Beyer; Roberta Jo Cochrane; Vanja Josifovski; Jim Kleewein; George Lapis; Guy M. Lohman; Bob Lyle; Fatma Ozcan; Hamid Pirahesh; Normen Seemann; Tuong Chanh Truong; Bert Van der Linden; Brian S. Vickery; Chun Zhang

This paper describes the overall architecture and design aspects of a hybrid relational and XML database system called System RX. We believe that such a system is fundamental in the evolution of enterprise data management solutions: XML and relational data will co-exist and complement each other in enterprise solutions. Furthermore, a successful XML repository requires much of the same infrastructure that already exists in a relational database management system. Finally, XML query languages have considerable conceptual and functional overlap with relational dataflow engines. System RX is the first truly hybrid system that comingles XML and relational data, giving them equal footing. The new support for XML includes native support for storage and indexing as well as query compilation and evaluation support for the latest industry-standard query languages, SQL/XML and XQuery. By building a hybrid system, we leverage more than 20 years of data management research to advance XML technology to the same standards expected from mature relational systems.

international conference on management of data | 2000

How to roll a join: asynchronous incremental view maintenance

Kenneth Salem; Kevin S. Beyer; Bruce G. Lindsay; Roberta Cochrane

Incremental refresh of a materialized join view is often less expensive than a full, non-incremental refresh. However, it is still a potentially costly atomic operation. This paper presents an algorithm that performs incremental view maintenance as a series of small, asynchronous steps. The size of each step can be controlled to limit contention between the refresh process and concurrent operations that access the materialized view or the underlying relations. The algorithm supports point-in-time refresh, which allows a materialized view to be refreshed to any time between the last refresh and the present.

statistical and scientific database management | 1998

SRQL: Sorted Relational Query Language

R. Ramakrsihnan; Donko Donjerkovic; A. Ranganathan; Kevin S. Beyer; M. Krishnaprasad

A relation is an unordered collection of records. Often, however there is an underlying order (e.g., a sequence of stock prices), and users want to pose queries that reflect this order (e.g., find a weekly moving average). SQL provides no support for posing such queries. We show how a rich class of queries reflecting sort order can be naturally expressed and efficiently executed with simple extensions to SQL.

international conference on management of data | 2005

Extending XQuery for analytics

Kevin S. Beyer; Donald D. Chamberlin; Latha S. Colby; Fatma Ozcan; Hamid Pirahesh; Yu Xu

XQuery is a query language under development by the W3C XML Query Working Group. The language contains constructs for navigating, searching, and restructuring XML data. With XML gaining importance as the standard for representing business data, XQuery must support the types of queries that are common in business analytics. One such class of queries is OLAP-style aggregation queries. Although these queries are expressible in XQuery Version 1, the lack of explicit grouping constructs makes the construction of these queries non-intuitive and places a burden on the XQuery engine to recognize and optimize the implicit grouping constructs. Furthermore, although the flexibility of the XML data model provides an opportunity for advanced forms of grouping that are not easily represented in relational systems, these queries are difficult to express using the current XQuery syntax. In this paper, we provide a proposal for extending the XQuery FLWOR expression with explicit syntax for grouping and for numbering of results. We show that these new XQuery constructs not only simplify the construction and evaluation of queries requiring grouping and ranking but also enable complex analytic queries such as moving-window aggregation and rollups along dynamic hierarchies to be expressed without additional language extensions.

conference on information and knowledge management | 2004

Virtual cursors for XML joins

Beverly Yang; Marcus Fontoura; Eugene J. Shekita; Sridhar Rajagopalan; Kevin S. Beyer

Structural joins are a fundamental operation in XML query processing and a large body of work has focused on index-based algorithms for executing them. In this paper, we describe how two well-known index features -- path indices and ancestor information -- can be combined in a novel way to replace one or more of the physical index cursors in a structural join with <i>virtual cursors</i>. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Implementation results are provided to show that, by eliminating index I/O, virtual cursors can improve the performance of structural joins by an order of magnitude or more.

international conference on management of data | 2005

DB2/XML: designing for evolution

Kevin S. Beyer; Fatma Ozcan; Sundar Saiprasad; Bert Van der Linden

DB2 provides native XML storage, indexing, navigation and query processing through both SQL/XML and XQuery using the XML data type introduced by SQL/XML. In this tutorial we focus on DB2s XML support for schema evolution, especially DB2s schema repository and document-level validation.

international conference on management of data | 2011

Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse

Fatma Ozcan; David Hoa; Kevin S. Beyer; Andrey Balmin; Chuan Jie Liu; Yu Li

Enterprises are dealing with ever increasing volumes of data, reaching into the petabyte scale. With many of our customer engagements, we are observing an emerging trend: They are using Hadoop-based solutions in conjunction with their data warehouses. They are using Hadoop to deal with the data volume, as well as the lack of strict structure in their data to conduct various analyses, including but not limited to Web log analysis, sophisticated data mining, machine learning and model building. This first stage of the analysis is off-line and suitable for Hadoop. But, once their data is summarized or cleansed enough, and their models are built, they are loading the results into a warehouse for interactive querying and report generation. At this later stage, they leverage the wealth of business intelligence tools, which they are accustomed to, that exist for warehouses. In this paper, we outline this use case and discuss the bidirectional connectors we developed between IBM DB2 and IBM InfoSphere BigInsights.

Explore More