Is this you? Create Your Porfile

Wang Chiew Tan

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wang Chiew Tan is active.

Explore More

Publication

Featured researches published by Wang Chiew Tan.

very large data bases | 2004

An annotation management system for relational databases

Deepavali Bhagwat; Laura Chiticariu; Wang Chiew Tan; Gaurav Vijayvargiya

We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system could be used for understanding the provenance (aka lineage) of data, who has seen or edited a piece of data or the quality of data, which are useful functionalities for applications that deal with integration of scientific and biological data.We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.

symposium on principles of database systems | 2005

Peer data exchange

Ariel Fuxman; Phokion G. Kolaitis; Renée J. Miller; Wang Chiew Tan

In this article, we introduce and study a framework, called peer data exchange, for sharing and exchanging data between peers. This framework is a special case of a full-fledged peer data management system and a generalization of data exchange between a source schema and a target schema. The motivation behind peer data exchange is to model authority relationships between peers, where a source peer may contribute data to a target peer, specified using source-to-target constraints, and a target peer may use target-to-source constraints to restrict the data it is willing to receive, but cannot modify the data of the source peer.A fundamental algorithmic problem in this framework is that of deciding the existence of a solution: given a source instance and a target instance for a fixed peer data exchange setting, can the target instance be augmented in such a way that the source instance and the augmented target instance satisfy all constraints of the settingq We investigate the computational complexity of the problem for peer data exchange settings in which the constraints are given by tuple generating dependencies. We show that this problem is always in NP, and that it can be NP-complete even for “acyclic” peer data exchange settings. We also show that the data complexity of the certain answers of target conjunctive queries is in coNP, and that it can be coNP-complete even for “acyclic” peer data exchange settings.After this, we explore the boundary between tractability and intractability for deciding the existence of a solution and for computing the certain answers of target conjunctive queries. To this effect, we identify broad syntactic conditions on the constraints between the peers under which the existence-of-solutions problem is solvable in polynomial time. We also identify syntactic conditions between peer data exchange settings and target conjunctive queries that yield polynomial-time algorithms for computing the certain answers. For both problems, these syntactic conditions turn out to be tight, in the sense that minimal relaxations of them lead to intractability. Finally, we introduce the concept of a universal basis of solutions in peer data exchange and explore its properties.

very large data bases | 2008

STBenchmark: towards a benchmark for mapping systems

Bogdan Alexe; Wang Chiew Tan; Yannis Velegrakis

A fundamental problem in information integration is to precisely specify the relationships, called mappings, between schemas. Designing mappings is a time-consuming process. To alleviate this problem, many mapping systems have been developed to assist the design of mappings. However, a benchmark for comparing and evaluating these systems has not yet been developed. We present STBenchmark, a solution towards a much needed benchmark for mapping systems. We first describe the challenges that are unique to the development of benchmarks for mapping systems. After this, we describe the three components of STBenchmark: (1) a basic suite of mapping scenarios that we believe represents a minimum set of transformations that should be readily supported by any mapping system, (2) a mapping scenario generator as well as an instance generator that can produce complex mapping scenarios and, respectively, instances of varying sizes of a given schema, (3) a simple usability model that can be used as a first-cut measure on the case of use of a mapping system. We use STBenchmark to evaluate four mapping systems and report our results, as well as describe some interesting observations.

international conference on management of data | 2005

DBNotes: a post-it system for relational databases based on provenance

Laura Chiticariu; Wang Chiew Tan; Gaurav Vijayvargiya

We demonstrate DBNotes, a Post-It note system for relational databases where every piece of data may be associated with zero or more notes (or annotations). These annotations are transparently propagated along as data is being transformed. The method by which annotations are propagated is based on provenance (aka lineage): the annotations associated with a piece of data d in the result of a transformation consist of the annotations associated with each piece of data in the source where d is copied from. One immediate application of this system is to use annotations to systematically trace the provenance and flow of data. If every piece of source data is attached with an annotation that describes its address (i.e., origins), then the annotations of a piece of data in the result of a transformation describe its provenance. Hence, one can easily determine the provenance of data through a sequence of transformation steps simply by examining the annotations. Annotations can also be used to store additional information about data. Since a database schema is often proprietary, the ability to insert new information about data without having to change the underlying schema is a useful feature. For example, an error report could be attached to an erroneous piece of data, and this error report will be propagated to other databases along transformations, thus notifying other users of the error. Overall, the annotations on the result of a transformation can also provide an estimate on the quality of the resulting database.

symposium on principles of database systems | 2008

Curated databases

Peter Buneman; James Cheney; Wang Chiew Tan; Stijn Vansummeren

Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries -- dictionaries, encyclopedias, gazetteers etc. -- are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.

international conference on data engineering | 2008

Muse: Mapping Understanding and deSign by Example

Bogdan Alexe; Laura Chiticariu; Renée J. Miller; Wang Chiew Tan

A fundamental problem in information integration is that of designing the relationships, called schema mappings, between two schemas. The specification of a semantically correct schema mapping is typically a complex task. Automated tools can suggest potential mappings, but few tools are available for helping a designer understand mappings and design alternative mappings. We describe Muse, a mapping design wizard that uses data examples to assist designers in understanding and refining a schema mapping towards the desired specification. We present novel algorithms behind Muse and show how Muse systematically guides the designer on two important components of a mapping design: the specification of the desired grouping semantics for sets of data and the choice among alternative interpretations for semantically ambiguous mappings. In every component, Muse infers the desired semantics based on the designers actions on a short sequence of small examples. Whenever possible, Muse draws examples from a familiar database, thus facilitating the design process even further. We report our experience with Muse on some publicly available schemas.

symposium on principles of database systems | 2006

The complexity of data exchange

Phokion G. Kolaitis; Jonathan Panttaja; Wang Chiew Tan

Data exchange is the problem of transforming data structured under a source schema into data structured under a target schema in such a way that all constraints of a schema mapping are satisfied. At the heart of data exchange, lies a basic decision problem, called the existence-of-solutions problem: given a source instance, is there a target instance that satisfies the constraints of the schema mapping at hand? Earlier work showed that for schema mappings specified by embedded implicational dependencies, this problem is solvable in polynomial time, assuming that (1) the schema mapping is kept fixed and (2) the constraints of the schema mapping satisfy a certain structural condition, called weak acyclicity.We investigate the effect of these assumptions on the complexity of the existence-of-solutions problem, and show that each one is indispensable in deriving polynomial-time algorithms for this problem. Specifically, using machinery from universal algebra, we show that if the weak acyclicity assumption is relaxed even in a minimal way, then the existence-of-solutions problem becomes undecidable. We also show that if, in addition to the source instance, the schema mapping is part of the input, then the existence-of-solutions problem becomes EXPTIME-complete. Thus, there is a provable exponential gap between the data complexity and the combined complexity of data exchange. Finally, we study restricted classes of schema mappings and develop a comprehensive picture for the combined complexity of the existence-of-solutions problem for these restrictions. In particular, depending on the restriction considered, the combined complexity of this problem turns out to be either EXPTIME-complete or coNP-complete.

ACM Transactions on Database Systems | 2008

Quasi-inverses of schema mappings

Ronald Fagin; Phokion G. Kolaitis; Lucian Popa; Wang Chiew Tan

Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the study of the inverse operator was not made until very recently, as even finding the exact semantics of this operator turned out to be a fairly delicate task. Furthermore, this notion is rather restrictive, since it is rare that a schema mapping possesses an inverse. In this article, we introduce and study the notion of a quasi-inverse of a schema mapping. This notion is a principled relaxation of the notion of an inverse of a schema mapping; intuitively, it is obtained from the notion of an inverse by not differentiating between instances that are equivalent for data-exchange purposes. For schema mappings specified by source-to-target tuple-generating dependencies (s-t tgds), we give a necessary and sufficient combinatorial condition for the existence of a quasi-inverse, and then use this condition to obtain both positive and negative results about the existence of quasi-inverses. In particular, we show that every LAV (local-as-view) schema mapping has a quasi-inverse, but that there are schema mappings specified by full s-t tgds that have no quasi-inverse. After this, we study the language needed to express quasi-inverses of schema mappings specified by s-t tgds, and we obtain a complete characterization. We also characterize the language needed to express inverses of schema mappings, and thereby solve a problem left open in the earlier study of the inverse operator. Finally, we show that quasi-inverses can be used in many cases to recover the data that was exported by the original schema mapping when performing data exchange.

very large data bases | 2009

Artemis: a system for analyzing missing answers

Melanie Herschel; Mauricio A. Hernández; Wang Chiew Tan

A central feature of relational database management systems is the ability to define multiple different views over an underlying database schema. Views provide a method of defining access control to the underlying database, since a view exposes a part of the database and hides the rest. Views also provide logical data independence to application programs that access the database. For most cases, the process of specifying the desired views in SQL is typically tedious and error-prone. While numerous tools exist to support developers in debugging program code, we are not aware of any tool that supports developers in verifying the correctness of their views defined in SQL.

Schema Matching and Mapping | 2011

Schema Mapping Evolution Through Composition and Inversion

Ronald Fagin; Phokion G. Kolaitis; Lucian Popa; Wang Chiew Tan

Mappings between different representations of data are the essential building blocks for many information integration tasks. A schema mapping is a high-level specification of the relationship between two schemas, and represents a useful abstraction that specifies how the data from a source format can be transformed into a target format. The development of schema mappings is laborious and time consuming, even in the presence of tools that facilitate this development. At the same time, schema evolution inevitably causes the invalidation of the existing schema mappings (since their schemas change). Providing tools and methods that can facilitate the adaptation and reuse of the existing schema mappings in the context of the new schemas is an important research problem. In this chapter, we show how two fundamental operators on schema mappings, namely composition and inversion, can be used to address the mapping adaptation problem in the context of schema evolution. We illustrate the applicability of the two operators in various concrete schema evolution scenarios, and we survey the most important developments on the semantics, algorithms, and implementation of composition and inversion. We also discuss the main research questions that still remain to be addressed.

Explore More