Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Thau is active.

Publication


Featured researches published by David Thau.


OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II | 2009

Merging Sets of Taxonomically Organized Data Using Concept Mappings under Uncertainty

David Thau; Shawn Bowers; Bertram Ludäscher

We present a method for using aligned ontologies to merge taxonomically organized data sets that have apparently compatible schemas, but potentially different semantics for corresponding domains. We restrict the relationships involved in the alignment to basic set relations and disjunctions of these relations. A merged data set combines the domains of the source data set attributes, conforms to the observations reported in both data sets, and minimizes uncertainty introduced by ontology alignments. We find that even in very simple cases, merging data sets under this scenario is non-trivial. Reducing uncertainty introduced by the ontology alignments in combination with the data set observations often results in many possible merged data sets, which are managed using a possible worlds semantics. The primary contributions of this paper are a framework for representing aligned data sets and algorithms for merging data sets that report the presence and absence of taxonomically organized entities, including an efficient algorithm for a common data set merging scenario.


Proceedings of the 2008 EDBT Ph.D. workshop on | 2008

Reasoning about taxonomies and articulations

David Thau

Taxonomically organized data pervade science, business and everyday life. Unfortunately, taxonomies are often underspecified, limiting their utility in contexts such as data integration, information navigation and autonomous agent communication. This work formalizes taxonomies and relationships between them as formulas in logic. This formalization concretizes notions such as consistency and inconsistency of taxonomies and articulations (inter-taxonomic relations) between them, enables the derivation of new articulations based on a given set of taxonomies and articulations and provides a framework for testing assumptions about underspecified taxonomies. Given the typical intractability of reasoning with taxonomies and articulations, this research investigates many optimizations: from those that reduce the search space, to those that leverage parallel processing, to those investigating logics more tractable than first-order logic (e.g., monadic first-order logic, propositional logic, description logics, and subsets of the RCC-5 spatial algebra). Finally, in addition to reasoning with taxonomies and articulations, this research investigates how to repair inconsistent taxonomies and articulations, how to explain inconsistencies and discovered relations, and how to merge taxonomies given articulations. Critical to this research is the development of a framework for testing logics and support for the development of taxonomies and articulations. This framework, CleanTax is already well under way and has been used to study articulations between two large-scale biological taxonomies.


international semantic web conference | 2004

Data procurement for enabling scientific workflows: on exploring inter-ant parasitism

Shawn Bowers; David Thau; Rich Williams; Bertram Ludäscher

Similar to content on the web, scientific data is highly heterogeneous and can benefit from rich semantic descriptions. We are particularly interested in developing an infrastructure for expressing explicit semantic descriptions of ecological data (and life-sciences data in general), and exploiting these descriptions to provide support for automated data integration and transformation within scientific workflows [2]. Using semantic descriptions, our goal is to provide scientists with: (1) tools to easily search for and retrieve datasets relevant to their study (i.e., data procurement), (2) the ability to select a subset of returned datasets as input to a scientific workflow, and (3) automated integration and restructuring of the selected datasets for seamless workflow execution.


international conference on data engineering | 2010

Towards best-effort merge of taxonomically organized data

David Thau; Shawn Bowers; Bertram Ludäscher

We consider the task of merging datasets that have been organized using different, but aligned taxonomies. We assume such a merge is intended to create a single dataset that unambiguously describes the information in the source datasets using the alignment. We also assume that the merged result should reflect the observations of the datasets as specifically as possible. Typically, there will be no single merge result that is both unambiguous and maximally specific. In this case, a user may be provided with a set of possible merged datasets. If the user requires a single dataset, that dataset loses specificity. Here we examine whether the data exchange setting can provide a way to derive a ¿best-effort¿ merge. We find that the data exchange setting might be a good candidate for providing the merge, but further research is needed.


OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009 | 2009

Contemporary Challenges in Ambient Data Integration for Biodiversity Informatics

David Thau; Robert A. Morris; Sean White

Biodiversity informatics (BDI) information is both highly localized and highly distributed. The temporal and spatial contexts of data collection events are generally of primary importance in BDI studies, and most studies are focused around specific localities. At the same time, data are collected by many groups working independently, but often at the same sites, leading to a distribution of data. BDI data are also distributed over time, due to protracted longitudinal studies, and the continuously evolving meanings of taxonomic names. Ambient data integration provides new opportunities for collecting, sharing, and analyzing BDI data, and the nature of BDI data poses interesting challenges for applications of ADI. This paper surveys recent work on utilization of BDI data in the context of ADI. Topics covered include applying ADI to species identification, data security, annotation and provenance sharing, and coping with multiple competing classification ontologies. We conclude with a summary of requirements for applying ADI to biodiversity informatics.


Journal of computing science and engineering | 2009

Merging Taxonomies under RCC-5 Algebraic Articulations

David Thau; Shawn Bowers; Bertram Ludaescher

Taxonomies are widely used to classify information, and multiple (possibly competing) taxonomies often exist for the same domain. Given a set of correspondences between two taxonomies, it is often necessary to “merge” the taxonomies, thereby creating a unied taxonomy (e.g., that can then be used by data integration and discovery applications). We present an algorithm for merging taxonomies that have been related using articulations given as RCC-5 constraints. Two taxa N and M can be related using (disjunctions of) the ve base relations in RCC-5: N≡M; N ? M; N ?; N ? M (partial overlap of N and M); and N ! M (disjointness: N ∩ M = ?). RCC-5 is increasingly being adopted by scientists to specify mappings between large biological taxonomies. We discuss the properties of the proposed merge algorithm and evaluate our approach using real-world taxonomies.


Ecological Informatics | 2007

Reasoning about taxonomies in first-order logic ☆

David Thau; Bertram Ludäscher


ontologies and information systems for the semantic web | 2008

Merging taxonomies under RCC-5 algebraic articulations

David Thau; Shawn Bowers; Bertram Ludäscher


Lecture Notes in Computer Science | 2005

Data procurement for enabling scientific workflows: On exploring inter-ant parasitism

Shawn Bowers; David Thau; Rich Williams; Bertram Ludäscher


national conference on artificial intelligence | 2009

Cleantax: A framework for reasoning about taxonomies

David Thau; Shawn Bowers; Bertram Ludäscher

Collaboration


Dive into the David Thau's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rich Williams

University of California

View shared research outputs
Top Co-Authors

Avatar

Robert A. Morris

University of Massachusetts Boston

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge