Susan B. Davidson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Susan B. Davidson is active.

Explore More

Publication

Featured researches published by Susan B. Davidson.

ACM Computing Surveys | 1985

Consistency in a partitioned network: a survey

Susan B. Davidson; Hector Garcia-Molina; Dale Skeen

Recently, several strategies have been proposed for transaction processing in partitioned distributed database systems with replicated data. These strategies are surveyed in light of the competing goals of maintaining correctness and achieving high availability. Extensions and combinations are then discussed, and guidelines are presented for selecting strategies for particular applications.

international conference on database theory | 1997

Adding Structure to Unstructured Data

Peter Buneman; Susan B. Davidson; Mary F. Fernández; Dan Suciu

We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.

international conference on management of data | 2008

Provenance and scientific workflows: challenges and opportunities

Susan B. Davidson; Juliana Freire

Provenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. Several workshops have been held on the topic, and it has been the focus of many research projects and prototype systems. This tutorial provides an overview of research issues in provenance for scientific workflows, with a focus on recent literature and technology in this area. It is aimed at a general database research audience and at people who work with scientific data and workflows. We will (1) provide a general overview of scientific workflows, (2) describe research on provenance for scientific workflows and show in detail how provenance is supported in existing systems; (3) discuss emerging applications that are enabled by provenance; and (4) outline open problems and new directions for database-related research.

Information Systems | 2003

Reasoning about keys for XML

Peter Buneman; Susan B. Davidson; Wenfei Fan; Carmem S. Hara; Wang Chiew Tan

We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, we show that these keys are always (finitely) satisfiable, and their (finite) implication problem is finitely axiomatizable. Furthermore, we provide a polynomial time algorithm for determining (finite) implication in the size of keys. Our results also demonstrate, among other things, that the analysis of XML keys is far more intricate than its relational counterpart.

Journal of Computational Biology | 1995

Challenges in Integrating Biological Data Sources

Susan B. Davidson; G. Christian Overton; Peter Buneman

Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.

ACM Transactions on Database Systems | 1984

Optimism and consistency in partitioned distributed database systems

Susan B. Davidson

A protocol for transaction processing during partition failures is presented which guarantees mutual consistency between copies of data-items after repair is completed. The protocol is “optimistic” in that transactions are processed without restrictions during failure; conflicts are then detected at repair time using a precedence graph, and are resolved by backing out transactions according to some backout strategy. The resulting database state then corresponds to a serial execution of some subset of transactions run during the failure. Results from simulation and probabilistic modeling show that the optimistic protocol is a reasonable alternative in many cases. Conditions under which the protocol performs well are noted, and suggestions are made as to how performance can be improved. In particular, a backout strategy is presented which takes into account individual transaction costs and attempts to minimize total backout cost. Although the problem of choosing transactions to minimize total backout cost is, in general, NP-complete, the backout strategy is efficient and produces very good results.

extending database technology | 1992

Theoretical Aspects of Schema Merging

Peter Buneman; Susan B. Davidson; Anthony Kosky

A general technique for merging database schemas is developed that has a number of advantages over existing techniques, the most important of which is that schemas are placed in a partial order that has bounded joins. This means that the merging operation, when it succeeds, is both associative and commutative, i.e., that the merge of schemas is independent of the order in which they are considered — a property not possessed by existing methods. The operation is appropriate for the design of interactive programs as it allows user assertions about relationships between nodes in the schemas to be considered as elementary schemas. These can be combined with existing schemas using precisely the same merging operation.

international conference on data engineering | 2006

An Efficient XPath Query Processor for XML Streams

Yi Chen; Susan B. Davidson; Yifeng Zheng

Streaming XPath evaluation algorithms must record a potentially exponential number of pattern matches when both predicates and descendant axes are present in queries, and the XML data is recursive. In this paper, we use a compact data structure to encode these pattern matches rather than storing them explicitly. We then propose a polynomial time streaming algorithm to evaluate XPath queries by probing the data structure in a lazy fashion. Extensive experiments show that our approach not only has a good theoretical complexity bound but is also efficient in practice.

Computer Networks | 2002

Keys for XML

Peter Buneman; Susan B. Davidson; Wenfei Fan; Carmem S. Hara; Wang Chiew Tan

Abstract We discuss the definition of keys for XML documents, paying particular attention to the concept of a relative key , which is commonly used in hierarchically structured documents and scientific databases.

international provenance and annotation workshop | 2006

A model for user-oriented data provenance in pipelined scientific workflows

Shawn Bowers; Timothy M. McPhillips; Bertram Ludäscher; Shirley Cohen; Susan B. Davidson

Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to optimize performance (by reusing intermediate data products in future runs), it is perhaps more significant that provenance information may also be used by scientists to reproduce results from earlier runs, to explain unexpected results, and to prepare results for publication. Current workflow systems offer little or no direct support for these “scientist-oriented” queries of provenance information. Indeed the use of advanced execution models in scientific workflows (e.g. process networks, which exhibit pipeline parallelism over streaming data) and failure to record certain fundamental events such as state resets of processes, can render existing provenance schemas useless for scientific applications of provenance. We develop a simple provenance model that is capable of supporting a wide range of scientific use cases even for complex models of computation such as process networks. Our approach reduces these use cases to database queries over event logs, and is capable of reconstructing complete data and invocation dependency graphs for a workflow run.

Explore More