Janet L. Wiener | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Janet L. Wiener is active.

Explore More

Publication

Featured researches published by Janet L. Wiener.

International Journal on Digital Libraries | 1997

The Lorel Query Language for Semistructured Data

Serge Abiteboul; Dallan Quass; Jason McHugh; Jennifer Widom; Janet L. Wiener

Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.

ACM Transactions on Database Systems | 2000

Tracing the lineage of view data in a warehousing environment

Yingwei Cui; Jennifer Widom; Janet L. Wiener

We consider the view data lineageproblem in a warehousing environment: For a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formally define the lineage problem, develop lineage tracing algorithms for relational views with aggregation, and propose mechanisms for performing consistent lineage tracing in a multisource data warehousing environment. Our result can form the basis of a tool that allows analysts to browse warehouse data, select view tuples of interest, and then “drill-through” to examine the exact source tuples that produced the view tuples of interest.

international conference on data engineering | 1997

Representative objects: concise representations of semistructured, hierarchical data

Svetlozar Nestorov; Jeffrey D. Ullman; Janet L. Wiener; Sudarshan S. Chawathe

Introduces the concept of representative objects, which uncover the inherent schema(s) in semi-structured, hierarchical data sources and provide a concise description of the structure of the data. Semi-structured data, unlike data stored in typical relational or object-oriented databases, does not have a fixed schema that is known in advance and stored separately from the data. With the rapid growth of the World Wide Web, semi-structured hierarchical data sources are becoming widely available to the casual user. The lack of external schema information currently makes browsing and querying these data sources inefficient at best, and impossible at worst. We show how representative objects make schema discovery efficient and facilitate the generation of meaningful queries over the data.

international conference on parallel and distributed information systems | 1996

The Strobe algorithms for multi-source warehouse consistency

Yue Zhuge; Hector Garcia-Molina; Janet L. Wiener

A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. The authors identify and discuss three fundamental transaction processing scenarios for data warehousing. They define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Their implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.

Distributed and Parallel Databases | 1998

Consistency Algorithms for Multi-Source Warehouse View Maintenance

Yue Zhuge; Hector Garcia-Molina; Janet L. Wiener

A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.

international conference on data engineering | 1997

Multiple view consistency for data warehousing

Yue Zhuge; Janet L. Wiener; Hector Garcia-Molina

A data warehouse stores integrated information from multiple distributed data sources. In effect, the warehouse stores materialized views over the source data. The problem of ensuring data consistency at the warehouse can be divided into two components: ensuring that each view reflects a consistent stare of the base data, and ensuring that multiple views are mutually consistent. In this paper we study the latter problem, that of guaranteeing multiple view consistency (MVC). We identify and define formally three layers of consistency for materialized views in a distributed environment. We present a scalable architecture for consistently handling multiple views in a data warehouse, which we have implemented in the WHIPS(WareHousing Information Project at Stanford) prototype. Finally, we develop simple, scalable, algorithms for achieving MVC at a warehouse.

international conference on management of data | 1997

The WHIPS prototype for data warehouse creation and maintenance

Wilburt Juan Labio; Yue Zhuge; Janet L. Wiener; Himanshu Gupta; Hector Garcia-Molina; Jennifer Widom

A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is collected from each source, integrated with data from other sources, and stored at the warehouse. Users then access the data directly from the warehouse. As suggested by Figure 1, there are two major components in a warehouse system: the integration component, responsible for collecting and maintaining the materialized views, and the query and analysis component, responsible for fulfilling the information needs of specific end users. Note that the two components are not independent. For example, which views the integration component materializes depends on the expected needs of end users. Most current commercial warehousing systems (e.g., Redbrick, Sybase, Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project, on the other hand, we focus on the integration component. In particular, we have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. We propose to demonstrate our prototype at SIGMOD, illustrating the main features of our architecture. Our architecture is modular and we designed it specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically; it is scalable by adding more internal modules; changes at the sources are detected automatically; the warehouse may be updated continuously as the sources change, without requiring “down time;” and the warehouse is always kept consistent with the source data by the integration algorithms. More details on these goals and how we achieve them are provided in [WGL+96].

international conference on management of data | 2000

Efficient resumption of interrupted warehouse loads

Wilburt Juan Labio; Janet L. Wiener; Hector Garcia-Molina; Vlad Gorelik

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.

database programming languages | 1993

A Moose and a Fox Can Aid Scientists with Data Management Problems

Janet L. Wiener; Yannis E. Ioannidis

Fox (Finding Objects of eXperiments) is the declarative query language for Moose (Modeling Objects Of Scientific Experiments), an object-oriented data model at the core of a scientific experiment management system (EMS) being developed at Wisconsin. The goal of the EMS is to support scientists in managing their experimental studies and the data that are generated from them.

acm symposium on parallel algorithms and architectures | 1998

Distributed and parallel computing issues in data warehousing (abstract)

Hector Garcia-Molina; Wilburt Juan Labio; Janet L. Wiener; Yue Zhuge

A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. For example, a grocery store chain might integrate data from its inventory database, sales databases from different stores, and its marketing department’s promotions records. Warehouse applications differ from traditional database applications in several key features. The quantity of data is often much larger, between 100 Gb and multiple Tb, since warehouses combine and archive data from multiple data stores. Second, the warehouse must solve new distributed consistency problems, since the sources are autonomous and previous consistency algorithms rely on cooperation between sources. Third, the integration software is distinct from both the sources and the warehouse. It can be both distributed and parallelized to improve performance. In addition, it requires new resumption from failure algorithms, since integration may take hours and traditional algorithms would start over. Fourth, portions of the warehouse are often replicated as local data marts; data mart maintenance also requires distributed algorithms, In this talk we overview our work on warehouse creation and maintenance, highlighting the distributed and parallel aspects of the problem and in our solutions.

Explore More