Catharine M. Wyss | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catharine M. Wyss is active.

Explore More

Publication

Featured researches published by Catharine M. Wyss.

data warehousing and knowledge discovery | 2001

FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract

Catharine M. Wyss; Chris Giannella; Edward L. Robertson

The problem of discovering functional dependencies (FDs) from an existing relation instance has received considerable attention in the database research community. To date, even the most efficient solutions have exponential complexity in the number of attributes of the instance. We develop an algorithm, FastFDs, for solving this problem based on a depth-first, heuristic-driven (DFHD) search for finding minimal covers of hypergraphs. The technique of reducing the FD discovery problem to the problem of finding minimal covers of hypergraphs was applied previously by Lopes et al. in the algorithm Dep-Miner. Dep-Miner employs a levelwise search for minimal covers, whereas FastFDs uses DFHD search. We report several tests on distinct benchmark relation instances involving Dep-Miner, FastFDs, and TANE. Our experimental results indicate that DFHD search is more efficient than Dep-Miners levelwise search or TANEs partitioning approach for many of these benchmark instances.

ACM Transactions on Database Systems | 2005

Relational languages for metadata integration

Catharine M. Wyss; Edward L. Robertson

In this article, we develop a relational algebra for metadata integration, Federated Interoperable Relational Algebra (FIRA). FIRA has many desirable properties such as compositionality, closure, a deterministic semantics, a modest complexity, support for nested queries, a subalgebra equivalent to canonical Relational Algebra (RA), and robustness under certain classes of schema evolution. Beyond this, FIRA queries are capable of producing fully dynamic output schemas, where the number of relations and/or the number of columns in relations of the output varies dynamically with the input instance. Among existing query languages for relational metadata integration, only FIRA provides generalized dynamic output schemas, where the values in any (fixed) number of input columns can determine output schemas.Further contributions of this article include development of an extended relational model for metadata integration, the Federated Relational Data Model, which is strictly downward compatible with the relational model. Additionally, we define the notion of Transformational Completeness for relational query languages and postulate FIRA as a canonical transformationally complete language. We also give a declarative, SQL-like query language that is equivalent to FIRA, called Federated Interoperable Structured Query Language (FISQL).While our main contributions are conceptual, the federated model, FISQL/FIRA, and the notion of transformational completeness nevertheless have important applications to data integration and OLAP. In addition to summarizing these applications, we illustrate the use of FIRA to optimize FISQL queries using rule-based transformations that directly parallel their canonical relational counterparts. We conclude the article with an extended discussion of related work as well as an indication of current and future work on FISQL/FIRA.

conference on information and knowledge management | 2005

A formal characterization of PIVOT/UNPIVOT

Catharine M. Wyss; Edward L. Robertson

PIVOT is an important relational operation that allows data in rows to be exchanged for columns. Although most current relational database management systems support PIVOT-type operations, to date a purely formal, algebraic characterization of PIVOT has been lacking. In this paper, we present a characterization in terms of extended relational algebra operators τ (transpose), Π (drop projection), and μ (unique optimal tuple merge). This enables us to (1) draw parallels with PIVOT and existing operators employed in Dynamic Data Mapping Systems (DDMS), (2) formally characterize invertible PIVOT instances, and (3) provide complexity results for PIVOT-type operations. These contributions are an important part of ongoing work on formal models for relational OLAP.

conference on information and knowledge management | 2001

A relational algebra for data/metadata integration in a federated database system

Catharine M. Wyss; Dirk Van Gucht

The need for interoperability among databases has increased dramatically with the proliferation of readily available DBMS and application software. Even within a single organization, data from disparate relational databases must be integrated. A framework for interoperability in a federated system of relational databases should be inherently relational, so that it can use existing techniques for query evaluation and optimization where possible and retain the key features of SQL, such as a modest complexity and ease of query formulation. Our contribution is a logspace relational algebra, the Meta-Algebra (MA), for data/metadata integration among relational databases containing semantically similar information in schematically disparate formats. The MA is a simple yet powerful extension of the classical relational algebra (RA). The MA has a natural declarative counterpart, the Meta-Query Language (MQL), which we briefly describe. We state a result showing MQL and the MA are computationally equivalent, which enables us to algebratize MQL queries in fundamentally the same way as ordinary SQL queries. This algebratization in turn enables us to use MA equivalences to facilitate the application of known query optimization techniques to MQL query evaluation.

extending database technology | 2006

Data mapping as search

George H. L. Fletcher; Catharine M. Wyss

In this paper, we describe and situate the tupelo system for data mapping in relational databases. Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. Starting from user provided example instances of the source and target schemas, tupeloapproaches mapping discovery as search within the transformation space of these instances based on a set of mapping operators. tupelomapping expressions incorporate not only data-metadata transformations, but also simple and complex semantic transformations, resulting in significantly wider applicability than previous systems. Extensive empirical validation of tupelo, both on synthetic and real world datasets, indicates that the approach is both viable and effective.

international conference on management of data | 2007

Extending relational query optimization to dynamic schemas for information integration in multidatabases

Catharine M. Wyss; Felix I. Wyss

This paper extends relational processing and optimization to the FISQL/FIRA languages for dynamic schema queries over multidatabases. Dynamic schema queries involve the creation and restructuring of metadata at runtime. We present a full implementation of a FISQL/FIRA engine, which includes subqueries and all transformational capabilities of FISQL/FIRA on distributed, multidatabase platforms. An important application of the system is to enhance traditional information architectures by enabling the creation and maintenance of dynamic wrappers and mapping queries at source databases within GAV, LAV, GLAV, peer-to-peer, or other integration frameworks. In addition to fully supporting FISQL/FIRA on multidatabases, our implementation introduces a bi-level optimization paradigm where purely relational sub-fragments of queries are pushed into source engines. This paradigm shares features of canonical distributed database processing, but has a new dimension through the extension of the relational model to dynamic schemas. We present empirical results showing the feasibility of optimization in this context, and discuss tradeoffs involved. Our system is the first to extend relational databases with these capabilities on this scale.

Journal on Data Semantics | 2009

Towards a General Framework for Effective Solutions to the Data Mapping Problem

George H. L. Fletcher; Catharine M. Wyss

Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. We discuss the rich history of the problem and the variety of technical solutions advanced in the database community over the previous four decades. Based on this discussion, we develop a basic statement of the data mapping problem and a general framework for reasoning about the design space of system solutions to the problem. We then concretely illustrate the framework with the Tupelo system for data mapping discovery, focusing on the important common case of relational data sources. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping. Hence, Tupelo is applicable in a wide range of data mapping scenarios. Finally, we present the results of extensive empirical validation, both on synthetic and real world datasets, indicating that the system is both viable and effective.

International Workshop on Challenges in Web Information Retrieval and Integration | 2005

Mapping Between Data Sources on the Web

George H. L. Fletcher; Catharine M. Wyss

The data mapping problem is to discover effective mappings between structured representations of data. These mappings are the basic ‘glue’ for facilitating large-scale ad-hoc information sharing between autonomous peers in a dynamic environment. Automating their discovery is one of the fundamental unsolved challenges for information integration and sharing on the Web. We outline a general approach to automating the discovery of mappings between relational data sources which leverages new perspectives on the data mapping problem and report on a prototype implementation. Our approach utilizes heuristic search within a space delineated by basic relational transformation operators. A further novelty of our approach is that these operators include data to metadata transformations (and vice versa), allowing a generalization of previous solutions such as token-based schema matching.

Electronic Notes in Theoretical Computer Science | 2006

A Calculus for Data Mapping

George H. L. Fletcher; Catharine M. Wyss; Edward L. Robertson; Dirk Van Gucht

Technologies for overcoming heterogeneities between autonomous data sources are key in the emerging networked world. In this paper we discuss the initial results of a formal investigation into the underpinnings of technologies for alleviating structural heterogeneity. At the core of structural heterogeneity is the data mapping problem: discovering effective mappings between structured representations of data. Automating the discovery of these mappings is one of the fundamental unsolved challenges for data interoperability, integration, and sharing. We introduce a novel data model and calculus for expressing data mappings between relational data sources, laying the ground for a better understanding of the data mapping problem. This research uncovers several new safety issues in data mapping languages. We discuss ongoing investigations of syntactic and semantic restrictions on the calculus to deal with these issues.

ACM Sigsoft Software Engineering Notes | 2004

Report on the Engineering Federated Information Systems 2003 workshop (EFIS 2003)

Catharine M. Wyss; Anne E. James; Wilhelm Hasselbring; Stefan Conrad; Hagen Höpfner

This paper summarizes the EFIS 2003 workshop, held in Coventry, U.K. in July, as part of Coventry Universitys Data Horizons Week. Major research issues discussed include metadata/ontologies, integration frameworks, data quality and evolution, and mobile interfaces. Topics for future work include evolution, expressiveness, maintenance, and dissemination of FIS.

Explore More