Wilburt Juan Labio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wilburt Juan Labio is active.

Explore More

Publication

Featured researches published by Wilburt Juan Labio.

international conference on data engineering | 1997

Physical database design for data warehouses

Wilburt Juan Labio; Dallan Quass; Brad Adelberg

Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views defined on the local relations to support common or complicated queries. Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting views and of indexes to materialize to minimize the down time. We call this the view index selection (VIS) problem. We present an A* search based solution to the problem as well as rules of thumb. We also perform additional experiments to understand the space-time tradeoff as it applies to data warehouses.

international conference on management of data | 1999

Shrinking the warehouse update Window

Wilburt Juan Labio; Ramana Yerneni; Hector Garcia-Molina

Warehouse views need to be updated when source data changes. Due to the constantly increasing size of warehouses and the rapid rates of change, there is increasing pressure to reduce the time taken for updating the warehouse views. In this paper we focus on reducing this “update window” by minimizing the work required to compute and install a batch of updates. Various strategies have been proposed in the literature for updating a single warehouse view. These algorithms typically cannot be extended to come up with good strategies for updating an entire set of views. We develop an efficient algorithm that selects an optimal update strategy for any single warehouse view. Based on this algorithm, we develop an algorithm for selecting strategies to update a set of views. The performance of these algorithms is studied with experiments involving warehouse views based on TPC-D queries.

international conference on management of data | 1997

The WHIPS prototype for data warehouse creation and maintenance

Wilburt Juan Labio; Yue Zhuge; Janet L. Wiener; Himanshu Gupta; Hector Garcia-Molina; Jennifer Widom

A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is collected from each source, integrated with data from other sources, and stored at the warehouse. Users then access the data directly from the warehouse. As suggested by Figure 1, there are two major components in a warehouse system: the integration component, responsible for collecting and maintaining the materialized views, and the query and analysis component, responsible for fulfilling the information needs of specific end users. Note that the two components are not independent. For example, which views the integration component materializes depends on the expected needs of end users. Most current commercial warehousing systems (e.g., Redbrick, Sybase, Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project, on the other hand, we focus on the integration component. In particular, we have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. We propose to demonstrate our prototype at SIGMOD, illustrating the main features of our architecture. Our architecture is modular and we designed it specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically; it is scalable by adding more internal modules; changes at the sources are detected automatically; the warehouse may be updated continuously as the sources change, without requiring “down time;” and the warehouse is always kept consistent with the source data by the integration algorithms. More details on these goals and how we achieve them are provided in [WGL+96].

international conference on management of data | 2000

Efficient resumption of interrupted warehouse loads

Wilburt Juan Labio; Janet L. Wiener; Hector Garcia-Molina; Vlad Gorelik

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.

international conference on data engineering | 1999

Capability-sensitive query processing on Internet sources

Hector Garcia-Molina; Wilburt Juan Labio; Ramana Yerneni

On the Internet, the limited query processing capabilities of sources make answering even the simplest queries challenging. We present a scheme called GenCompact for generating capability-sensitive plans for queries on Internet sources. The query plans generated by GenCompact have the following advantages over those generated by existing query processing systems: the sources are guaranteed to support the query plans; the plans take advantage of the source capabilities; and the plans are more efficient since a larger space of plans is examined.

acm symposium on parallel algorithms and architectures | 1998

Distributed and parallel computing issues in data warehousing (abstract)

Hector Garcia-Molina; Wilburt Juan Labio; Janet L. Wiener; Yue Zhuge

A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. For example, a grocery store chain might integrate data from its inventory database, sales databases from different stores, and its marketing department’s promotions records. Warehouse applications differ from traditional database applications in several key features. The quantity of data is often much larger, between 100 Gb and multiple Tb, since warehouses combine and archive data from multiple data stores. Second, the warehouse must solve new distributed consistency problems, since the sources are autonomous and previous consistency algorithms rely on cooperation between sources. Third, the integration software is distinct from both the sources and the warehouse. It can be both distributed and parallelized to improve performance. In addition, it requires new resumption from failure algorithms, since integration may take hours and traditional algorithms would start over. Fourth, portions of the warehouse are often replicated as local data marts; data mart maintenance also requires distributed algorithms, In this talk we overview our work on warehouse creation and maintenance, highlighting the distributed and parallel aspects of the problem and in our solutions.

IEEE Data(base) Engineering Bulletin | 1995