Dengfeng Gao
University of Arizona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dengfeng Gao.
very large data bases | 2003
Dengfeng Gao; Richard T. Snodgrass
As with relational data, XML data changes over time with the creation, modification, and deletion of XML documents. Expressing queries on time-varying (relational or XML) data is more difficult than writing queries on nontemporal data. In this paper, we present a temporal XML query language, τXQuery, in which we add valid time support to XQuery by minimally extending the syntax and semantics of XQuery. We adopt a stratum approach which maps a τXQuery query to a conventional XQuery. The paper focuses on how to perform this mapping, in particular, on mapping sequenced queries, which are by far the most challenging. The critical issue of supporting sequenced queries (in any query language) is time-slicing the input data while retaining period timestamping. Timestamps are distributed throughout an XML document, rather than uniformly in tuples, complicating the temporal slicing while also providing opportunities for optimization. We propose four optimizations of our initial maximally-fragmented time-slicing approach: selected node slicing, copy-based per-expression slicing, in-place per-expression slicing, and idiomatic slicing, each of which reduces the number of constant periods over which the query is evaluated. While performance tradeoffs clearly depend on the underlying XQuery engine, we argue that there are queries that favor each of the five approaches.
very large data bases | 2005
Dengfeng Gao; Søren Kejser Jensen; T. Snodgrass; D. Soo
Abstract.Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally varying data dramatically increases the size of a database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins.We address this need for efficient join evaluation in temporal databases. Our purpose is twofold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index-based join algorithms. Such algorithms do not rely on auxiliary access paths but may exploit sort orderings to achieve efficiency.
international conference on management of data | 2002
Wei Li; Dengfeng Gao; Richard T. Snodgrass
Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.
international conference on data engineering | 2007
Haifeng Jiang; Dengfeng Gao; Wen-Syan Li
Many large enterprises require access to distributed data warehouses for business intelligence (BI) applications. Typically distributed data warehouses are integrated into a centralized data warehouse for the benefit of easy maintenance. However, this approach needs to overcome the complexity of data loading and job scheduling as well as scalability issues. On the other hand, the approach of a fully federated system may not be feasible for data intensive BI applications. The hybrid approach via intelligent data placement is more flexible and applicable than the centralized or full-federation configuration. The current implementation of the hybrid approach to integrating distributed data warehouses is to aggregate selected data from various remote sources as materialized views and cache them at the federation server to improve the performance of complex BI query workloads. In this paper, we propose an improvement that recommends materialized query tables (MQTs) for backend servers for the benefits of load distribution and easy maintenance of aggregated data in conjunction with the current hybrid approach of data placement. Our approach considers the correlation between backend servers and recommends MQTs that are well coordinated among the backend servers and optimized for a given workload. We also exploit the parallelism property among the backend servers to make our approach run almost linearly (in contrast to exponentially) with respect to the number of backend servers, without sacrificing its recommendation quality. Experimental evaluations validate the effectiveness and efficiency of our approach.
Distributed and Parallel Databases | 2004
Dengfeng Gao; Jose Alvin G. Gendrano; Bongki Moon; Richard T. Snodgrass; Minseok Park; Bruce C. Huang; Jim M. Rodrigue
The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge.
data and knowledge engineering | 2008
Haifeng Jiang; Dengfeng Gao; Wen-Syan Li
Many large enterprises require access to distributed databases for business intelligence (BI) applications. Typically, distributed database are integrated into a centralized data warehouse for the benefit of easy maintenance. However, this approach needs to overcome the complexity of data loading and job scheduling as well as scalability issues. On the other hand, the approach of a fully federated system may not be feasible for data-intensive BI applications. The hybrid approach via intelligent data placement is more flexible and applicable than centralized or full-federation configurations. The current implementation of the hybrid approach to integrating distributed databases is to aggregate selected data from various remote sources as materialized views and cache them at the federation server to improve the performance of complex BI query workloads. In this paper, we propose an improvement that recommends Materialized Query Tables (MQTs) for backend servers for the benefits of load distribution and easy maintenance of aggregated data in conjunction with the current hybrid approach of data placement. Our approach considers the correlation between backend servers and recommends MQTs that are well coordinated among the backend servers and optimized for the workload. We also exploit the parallelism property among the backend servers to make our approach run almost linearly (in contrast to exponentially) with respect to the number of backend servers, without sacrificing its recommendation quality. Experimental evaluations validate the effectiveness and efficiency of our approach.
international conference on data engineering | 2012
Richard T. Snodgrass; Dengfeng Gao; Rui Zhang; Stephen W. Thomas
We show how to extend temporal support of SQL to the Turing-complete portion of SQL, that of persistent stored modules (PSM). Our approach requires minor new syntax beyond that already in SQL/Temporal to define and to invoke PSM routines, thereby extending the current, sequenced, and non-sequenced semantics of queries to PSM routines. Temporal upward compatibility (existing applications work as before when one or more tables are rendered temporal) is ensured. We provide a transformation that converts Temporal SQL/PSM to conventional SQL/PSM. To support sequenced evaluation of PSM routines, we define two different slicing approaches, maximal slicing and per-statement slicing. We compare these approaches empirically using a comprehensive benchmark and provide a heuristic for choosing between them.
international conference on management of data | 2003
Ihab F. Ilyas; Jun Rao; Guy M. Lohman; Dengfeng Gao; Eileen Tien Lin
Archive | 2008
Dengfeng Gao; Haifeng Jiang; Wen-Syan Li
Archive | 2003
Dengfeng Gao; Richard T. Snodgrass