Is this you? Create Your Porfile

Donghui Zhang

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Donghui Zhang is active.

Explore More

Publication

Featured researches published by Donghui Zhang.

very large data bases | 2002

Efficient structural joins on indexed XML documents

Shu-Yao Chien; Zografoula Vagena; Donghui Zhang; Vassilis J. Tsotras; Carlo Zaniolo

Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, parent-child and ancestor-descendant relationships. Efficient support for structural joins is thus the key to efficient implementations of XML queries. Recently proposed node numbering schemes enable the capturing of the XML document structure using traditional indices (such as B+-trees or R-trees). This paper proposes efficient structural join algorithms in the presence of tag indices. We first concentrate on using B+- trees and show how to expedite a structural join by avoiding collections of elements that do not participate in the join. We then introduce an enhancement (based on sibling pointers) that further improves performance. Such sibling pointers are easily implemented and dynamically maintainable. We also present a structural join algorithm that utilizes R-trees. An extensive experimental comparison shows that the B+-tree structural joins are more robust. Furthermore, they provide drastic improvement gains over the current state of the art.

web information systems engineering | 2001

Storing and querying multiversion XML documents using durable node numbers

Shu-Yao Chien; Vassilis J. Tsotras; Carlo Zaniolo; Donghui Zhang

Managing multiple versions of XML documents represents an important problem for many traditional applications, such as software configuration control, as well as new ones, such as link permanence of web documents. Research on managing multiversion XML documents seeks to provide efficient and robust techniques for storing, retrieving and querying such documents. In this paper we present a novel approach to version management that achieves these objectives by a scheme based on Durable Node Numbers and timestamps for the elements of XML documents. We first present efficient storage and retrieval techniques for multiversion documents. Then, we explore the indexing and clustering strategies needed to assure efficient support for complex queries on content and on document evolution.

symposium on principles of database systems | 2001

Efficient computation of temporal aggregates with range predicates

Donghui Zhang; Alexander Markowetz; Vassilis J. Tsotras; Dimitrios Gunopulos; Bernhard Seeger

A temporal aggregation query is an important but costly operation for applications that maintain time-evolving data (data warehouses, temporal databases, etc.). Due to the large volume of such data, performance improvements for temporal aggregation queries are critical. In this paper we examine techniques to compute temporal aggregates that include key-range predicates (range temporal aggregates). In particular we concentrate on SUM, COUNT and AVG aggregates. This problem is novel; to handle arbitrary key ranges, previous methods would need to keep a separate index for every possible key range. We propose an approach based on a new index structure called the Multiversion SB-Tree, which incorporates features from both the SB-Tree and the Multiversion B-Tree, to handle arbitrary key-range temporal SUM, COUNT and AVG queries. We analyze the performance of our approach and present experimental results that show its efficiency.

symposium on principles of database systems | 2002

Efficient aggregation over objects with extent

Donghui Zhang; Vassilis J. Tsotras; Dimitrios Gunopulos

We examine the problem of efficiently computing sum/count/avg aggregates over objects with non-zero extent. Recent work on computing multi-dimensional aggregates has concentrated on objects with zero extent (points) on a multi-dimensional grid, or one-dimensional intervals. However, in many spatial and/or spatio-temporal applications objects have extent in various dimensions, while they can be located anywhere in the application space. The aggregation predicate is typically described by a multi-dimensional box (box-sum aggregation). We examine two variations of the problem. In the simple case an objects value contributes to the aggregation result as a whole as long as the object intersects the query box. More complex is the functional box-sum aggregation introduced in this paper, where objects participate in the aggregation proportionally to the size of their intersection with the query box. We first show that both problems can he reduced to dominance-sum queries. Traditionally dominance-sum queries are addressed in main memory by a static structure, the ECDF-tree. We then propose two extensions, namely the ECDF-B-trees, that make this structure disk-based and dynamic. Finally, we introduce the DA-tree that combines the advantages from each ECDF-B-tree. We run experiments comparing the performance of the ECDF-B-trees, the BA-tree and a traditional R*-tree (which has been augmented to include aggregation information on its index nodes) over spatial datasets. Our evaluation reaffirms that the BA-tree has more robust performance. Compared against the augmented R*-tree, the BA-tree offers drastic improvement in query performance at the expense of some limited extra space.

international conference on data engineering | 2002

Efficient temporal join processing using indices

Donghui Zhang; Vassilis J. Tsotras; Bernhard Seeger

We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. This is especially true when the temporal join involves only parts of the joining relations (e.g., a given time interval instead of the whole timeline). Utilizing an index becomes then beneficial as it directs the join to the data of interest. We consider temporal join algorithms for three representative indexing schemes, namely a B+-tree, an R*-tree and a temporal index, the Multiversion B+-tree (MVBT). Both the B+-tree and R*-tree result in simple but not efficient join algorithms because neither index achieves good temporal data clustering. Better clustering is maintained by the MVBT through record copying. Nevertheless, copies can greatly affect the correctness and effectiveness of the join algorithms. We identify these problems and propose efficient solutions and optimizations. An extensive comparison of all index based temporal joins, using a variety of datasets and query characteristics shows that the MVBT based join algorithms are consistently faster. In particular the link-based algorithm has the most robust behavior. In our experiments it showed a ten fold improvement over the R*-tree joins while it was between six and thirty times faster than the B+-tree joins.

Information Systems | 2003

Temporal and spatio-temporal aggregations over data streams using multiple time granularities

Donghui Zhang; Dimitrios Gunopulos; Vassilis J. Tsotras; Bernhard Seeger

Temporal and spatio-temporal aggregations are important but costly operations for applications that maintain time-evolving data (data warehouses, temporal databases, etc.). In this paper, we examine the problem of computing such aggregates over data streams. The aggregates are maintained using multiple levels of temporal granularities: older data is aggregated using coarser granularities while more recent data is aggregated with finer detail. We present specialized indexing schemes for dynamically and progressively maintaining temporal and spatio-temporal aggregates. Moreover, these schemes can be parameterized. The levels of granularity as well as their corresponding index sizes (or validity lengths) can be dynamically adjusted. This provides a useful trade-off between aggregation detail and storage space. Analytical and experimental results show the efficiency of the proposed structures. We first address the temporal aggregation problem. A general framework of aggregating at multiple time granularities is then proposed. Finally, we show how to utilize this framework to solve the range-temporal and spatio-temporal aggregation problems.

extending database technology | 2002

Temporal Aggregation over Data Streams Using Multiple Granularities

Donghui Zhang; Dimitrios Gunopulos; Vassilis J. Tsotras; Bernhard Seeger

Temporal aggregation is an important but costly operation for applications that maintain time-evolving data (data warehouses, temporal databases, etc.). In this paper we examine the problem of computing temporal aggregates over data streams. Such aggregates are maintained using multiple levels of temporal granularities: older data is aggregated using coarser granularities while more recent data is aggregated with finer detail. We present specialized indexing schemes for dynamically and progressively maintaining temporal aggregates. Moreover, these schemes can be parameterized. The levels of granularity as well as their corresponding index sizes (or validity lengths) can be dynamically adjusted. This provides a useful trade-off between aggregation detail and storage space. Analytical and experimental results show the efficiency of the proposed structures. Moreover, we discuss how the indexing schemes can be extended to solve the more general range temporal and spatio-temporal aggregation problems.

ACM Transactions on Database Systems | 2008

On computing temporal aggregates with range predicates

Donghui Zhang; Alexander Markowetz; Vassilis J. Tsotras; Dimitrios Gunopulos; Bernhard Seeger

Computing temporal aggregates is an important but costly operation for applications that maintain time-evolving data (data warehouses, temporal databases, etc.) Due to the large volume of such data, performance improvements for temporal aggregate queries are critical. Previous approaches have aggregate predicates that involve only the time dimension. In this article we examine techniques to compute temporal aggregates that include key-range predicates as well (range-temporal aggregates). In particular we concentrate on the SUM aggregate, while COUNT is a special case. To handle arbitrary key ranges, previous methods would need to keep a separate index for every possible key range. We propose an approach based on a new index structure called the Multiversion SB-Tree, which incorporates features from both the SB-Tree and the Multiversion B+--tree, to handle arbitrary key-range temporal aggregate queries. We analyze the performance of our approach and present experimental results that show its efficiency. Furthermore, we address a novel and practical variation called functional range-temporal aggregates. Here, the value of any record is a function over time. The meaning of aggregates is altered such that the contribution of a record to the aggregate result is proportional to the size of the intersection between the records time interval and the query time interval. Both analytical and experimental results show the efficiency of our result.

advances in geographic information systems | 2001

Improving min/max aggregation over spatial objects

Donghui Zhang; Vassilis J. Tsotras

We examine the problem of computing MIN/MAX aggregates over a collection of spatial objects. Each spatial object is associated with a weight (value), for example, the average temperature or rainfall over the area covered by the object. Given a query rectangle, the MIN/MAX problem computes the minimum/maximum weight among all objects intersecting the query rectangle. Traditionally such queries have been performed as range search queries. Assuming that the objects are indexed by a spatial access method, the MIN/MAX is computed as objects are retrieved. This requires effort proportional to the number of objects intersecting the query interval, which may be large. A better approach is to maintain aggregate information among the index nodes of the spatial access method; then various index paths can be eliminated during the range search. In this paper we propose four optimizations that further improve the performance of MIN/MAX queries. Our experiments show that the proposed optimizations offer drastic performance improvement over previous approaches. Moreover, as a by-product of this work we present an optimized version of the MSB-tree, an index that has been proposed for the MIN/MAX computation over 1-dimensional interval objects.

ACM Transactions on Internet Technology | 2006

Supporting complex queries on multiversion XML documents

Shu-Yao Chien; Vassilis J. Tsotras; Carlo Zaniolo; Donghui Zhang

Managing multiple versions of XML documents represents a critical requirement for many applications. Recently, there has been much work on supporting complex queries on XML data (e.g., regular path expressions, structural projections, etc.). In this article, we examine the problem of implementing efficiently such complex queries on multiversion XML documents. Our approach relies on a numbering scheme, whereby durable node numbers (DNNs) are used to preserve the order among the nodes of the XML tree while remaining invariant with respect to updates. Using the documents DNNs, we show that many complex queries are reduced to combinations of range version retrieval queries. We thus examine three alternative storage organizations/indexing schemes to efficiently evaluate range version retrieval queries in this environment. A thorough performance analysis is then presented to reveal the advantages of each scheme.

Explore More