Todd Eavis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Todd Eavis is active.

Explore More

Publication

Featured researches published by Todd Eavis.

Distributed and Parallel Databases | 2006

The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Frank K. H. A. Dehne; Todd Eavis; Andrew Rau-Chaplin

On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems.In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.

conference on information and knowledge management | 2007

Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation

Todd Eavis; Alex Lopez

Database query engines typically rely upon query size estimators in order to evaluate the potential cost of alternate query plans. In multi-dimensional database systems, such as those typically found in large data warehousing environments, these selectivity estimators often take the form of multi-dimensional histograms. But while single dimensional histograms have proven to be quite accurate, even in the presence of data skew, the multi-dimensional variations have generally been far less reliable. In this paper, we present a new histogram model that is based upon an r-tree space partitioning. The localization of the r-tree boxes is in turn controlled by a Hilbert space filling curve, while a series of efficient area equalization heuristics restructures the initial boxes to provide improved bucket representation. Experimental results demonstrate significantly improved estimation accuracy relative to state of the art alternatives, as well as superior consistency across a variety of record distributions.

Distributed and Parallel Databases | 2008

PnP: sequential, external memory, and parallel iceberg cube computation

Ying Chen; Frank K. H. A. Dehne; Todd Eavis; Andrew Rau-Chaplin

AbstractnWe present “Pipe ’n Prune” (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.nnWe performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.n

conference on information and knowledge management | 2007

Mapgraph: efficient methods for complex olap hierarchies

Todd Eavis; Ahmad Taleb

Online Analytical Processing is a database paradigm that provides for the rich analysis of multi-dimensional data. OLAP is often supported by a logical structure known as the Cube. However, supporting efficient OLAP query resolution in enterprise scale environments is an issue of considerable complexity. In practice, the difficulty of the problem is exacerbated by the existence of dimension hierarchies that sub-divide core dimensions into aggregation layers of varying granularity. Common hierarchy-sensitive query operations such as Rollup and Drilldown can be very costly on large cubes. Moreover, facilities for the representation of more complex hierarchical relationships are not well supported by conventional techniques. This paper presents a robust hierarchy infrastructure called mapGraph that supports the efficient and transparent manipulation of attribute hierarchies within OLAP environments. Experimental results verify that, when compared to the alternatives, very little additional overhead is introduced, even when advanced functionality is exploited.

database systems for advanced applications | 2009

Multi-level Frequent Pattern Mining

Todd Eavis; Xi Zheng

Frequent pattern mining (FPM) has become one of the most popular data mining approaches for the analysis of purchasing patterns. Methods such as Apriori and FP-growth have been shown to work efficiently in this setting. However, these techniques are typically restricted to a single concept level. Since typical business databases support hierarchies that represent the relationships amongst many different concept levels, it is important that we extend our focus to discover frequent patterns in multi-level environments. Unfortunately, little attention has been paid to this research area. In this paper, we present two novel algorithms that efficiently discover multi-level frequent patterns. Adopting either a top-down or bottom-up approach, our algorithms exploit existing fp-tree structures, rather than excessively scanning the raw data set multiple times, as might be done with a naive implementation. In addition, we also introduce an algorithm to mine cross-level frequent patterns. Experimental results have shown that our new algorithms maintain their performance advantage across a broad spectrum of test environments.

data warehousing and knowledge discovery | 2007

A hilbert space compression architecture for data warehouse environments

Todd Eavis; David Cueva

Multi-dimensional data sets are very common in areas such as data warehousing and statistical databases. In these environments, core tables often grow to enormous sizes. In order to reduce storage requirements, and therefore to permit the retention of even larger data sets, compression methods are an attractive option. In this paper we discuss an efficient compression framework that is specifically designed for very large relational database implementations. The primary methods exploit a Hilbert space filling curve to dramatically reduce the storage footprint for the underlying tables. Tuples are individually compressed into page sized units so that only blocks relevant to the users multidimensional query need be accessed. Compression is available not only for the relational tables themselves, but also for the associated r-tree indexes. Experimental results demonstrate compression rates of more than 90% for multi-dimensional data, and up to 98% for the indexes.

international conference on data engineering | 2006

cgmOLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes

Ying Chen; Andrew Rau-Chaplin; Frank K. H. A. Dehne; Todd Eavis; D. Green; E. Sithirasenan

We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel cube indexing schemes. The cgmOLAP system consists of an application interface, a parallel query engine, a parallel cube materialization engine, meta data and cost model repositories, and shared server components that provide uniform management of I/O, memory, communications, and disk resources.

data warehousing and olap | 2005

Parallel querying of ROLAP cubes in the presence of hierarchies

Frank K. H. A. Dehne; Todd Eavis; Andrew Rau-Chaplin

Online Analytical Processing is a powerful framework for the analysis of organizational data. OLAP is often supported by a logical structure known as a data cube, a multidimensional data model that offers an intuitive array-based perspective of the underlying data. Supporting efficient indexing facilities for multi-dimensional cube queries is an issue of some complexity. In practice, the difficulty of the indexing problem is exacerbated by the existence of attribute hierarchies that sub-divide attributes into aggregation layers of varying granularity. In this paper, we present a hierarchy and caching framework that supports the efficient and transparent manipulation of attribute hierarchies within a parallel ROLAP environment. Experimental results verify that, when compared to the non-hierarchical case, very little overhead is required to handle streams of arbitrary hierarchical queries.

Future Generation Computer Systems | 2010

Parallel OLAP with the Sidera server

Todd Eavis; George Dimitrov; Ivan Dimitrov; David Cueva; Alex Lopez; Ahmad Taleb

Online Analytical Processing (OLAP) has become a primary component of todays pervasive Decision Support systems. As the underlying databases grow into the multi-terabyte range, however, single CPU OLAP servers are being stretched beyond their limits. In this paper, we present a comprehensive model for a fully parallelized OLAP server. Our multi-node platform actually consists of a series of largely independent sibling servers that are glued together with a lightweight MPI-based Parallel Service Interface (PSI). Physically, we target the commodity-oriented, shared nothing Linux cluster, an architecture that provides an extremely cost effective alternative to the shared everything commercial platforms often used in high-end database environments. Experimental results demonstrate both the viability and robustness of the design.

International Journal of Data Warehousing and Mining | 2006

Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel

Ying Chen; Frank K. H. A. Dehne; Todd Eavis; Andrew Rau-Chaplin

This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. Experiments show that our improved parallel method provides optimal, linear, speedup for at least 32 processors. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes.

Explore More