Yue Zhuge | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yue Zhuge is active.

Explore More

Publication

Featured researches published by Yue Zhuge.

international conference on parallel and distributed information systems | 1996

The Strobe algorithms for multi-source warehouse consistency

Yue Zhuge; Hector Garcia-Molina; Janet L. Wiener

A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. The authors identify and discuss three fundamental transaction processing scenarios for data warehousing. They define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Their implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.

international conference on data engineering | 1998

Graph structured views and their incremental maintenance

Yue Zhuge; Hector Garcia-Molina

Studies the problem of maintaining materialized views of graph structured data. The base data consists of records containing identifiers of other records. The data could represent traditional objects (with methods, attributes and a class hierarchy), but it could also represent a lower-level data structure. We define simple views and materialized views for such graph structured data, analyzing options for representing record identity and references in the view. We develop incremental maintenance algorithms for these views.

Distributed and Parallel Databases | 1998

Consistency Algorithms for Multi-Source Warehouse View Maintenance

Yue Zhuge; Hector Garcia-Molina; Janet L. Wiener

A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.

international conference on data engineering | 1997

Multiple view consistency for data warehousing

Yue Zhuge; Janet L. Wiener; Hector Garcia-Molina

A data warehouse stores integrated information from multiple distributed data sources. In effect, the warehouse stores materialized views over the source data. The problem of ensuring data consistency at the warehouse can be divided into two components: ensuring that each view reflects a consistent stare of the base data, and ensuring that multiple views are mutually consistent. In this paper we study the latter problem, that of guaranteeing multiple view consistency (MVC). We identify and define formally three layers of consistency for materialized views in a distributed environment. We present a scalable architecture for consistently handling multiple views in a data warehouse, which we have implemented in the WHIPS(WareHousing Information Project at Stanford) prototype. Finally, we develop simple, scalable, algorithms for achieving MVC at a warehouse.

international conference on management of data | 1997

The WHIPS prototype for data warehouse creation and maintenance

Wilburt Juan Labio; Yue Zhuge; Janet L. Wiener; Himanshu Gupta; Hector Garcia-Molina; Jennifer Widom

A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is collected from each source, integrated with data from other sources, and stored at the warehouse. Users then access the data directly from the warehouse. As suggested by Figure 1, there are two major components in a warehouse system: the integration component, responsible for collecting and maintaining the materialized views, and the query and analysis component, responsible for fulfilling the information needs of specific end users. Note that the two components are not independent. For example, which views the integration component materializes depends on the expected needs of end users. Most current commercial warehousing systems (e.g., Redbrick, Sybase, Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project, on the other hand, we focus on the integration component. In particular, we have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. We propose to demonstrate our prototype at SIGMOD, illustrating the main features of our architecture. Our architecture is modular and we designed it specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically; it is scalable by adding more internal modules; changes at the sources are detected automatically; the warehouse may be updated continuously as the sources change, without requiring “down time;” and the warehouse is always kept consistent with the source data by the integration algorithms. More details on these goals and how we achieve them are provided in [WGL+96].

acm symposium on parallel algorithms and architectures | 1998

Distributed and parallel computing issues in data warehousing (abstract)

Hector Garcia-Molina; Wilburt Juan Labio; Janet L. Wiener; Yue Zhuge

A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. For example, a grocery store chain might integrate data from its inventory database, sales databases from different stores, and its marketing department’s promotions records. Warehouse applications differ from traditional database applications in several key features. The quantity of data is often much larger, between 100 Gb and multiple Tb, since warehouses combine and archive data from multiple data stores. Second, the warehouse must solve new distributed consistency problems, since the sources are autonomous and previous consistency algorithms rely on cooperation between sources. Third, the integration software is distinct from both the sources and the warehouse. It can be both distributed and parallelized to improve performance. In addition, it requires new resumption from failure algorithms, since integration may take hours and traditional algorithms would start over. Fourth, portions of the warehouse are often replicated as local data marts; data mart maintenance also requires distributed algorithms, In this talk we overview our work on warehouse creation and maintenance, highlighting the distributed and parallel aspects of the problem and in our solutions.

IEEE Data(base) Engineering Bulletin | 1995