Peter Buneman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Buneman is active.

Explore More

Publication

Featured researches published by Peter Buneman.

symposium on principles of database systems | 1997

Semistructured data

Peter Buneman

In semistructured data, the information that is normally associated with a schema is contained within the data, which is sometimes called “self-describing”. In some forms of semistructured data there is no separate schema, in others it exists but only places loose constraints on the data. Semistructured data has recently emerged as an important topic of study for a variety of reasons. First, there are data sources such as the Web, which we would like to treat as databases but which cannot be constrained by a schema. Second, it may be desirable to have an extremely flexible format for data exchange between disparate databases. Third, even when dealing with structured data, it may be helpful to view it. as semistructured for the purposes of browsing. This tutorial will cover a number of issues surrounding such data: finding a concise formulation, building a sufficiently expressive language for querying and transformation, and optimizat,ion problems.

international conference on database theory | 1997

Adding Structure to Unstructured Data

Peter Buneman; Susan B. Davidson; Mary F. Fernández; Dan Suciu

We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.

international conference on management of data | 2006

Provenance management in curated databases

Peter Buneman; Adriane Chapman; James Cheney

Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and scientific value. General purpose database systems provide little support for tracking provenance, especially when data moves among databases. This paper investigates general-purpose techniques for recording provenance for data that is copied among databases. We describe an approach in which we track the users actions while browsing source databases and copying data into a curated database, in order to record the users actions in a convenient, queryable form. We present an implementation of this technique and use it to evaluate the feasibility of database support for provenance management. Our experiments show that although the overhead of a naive approach is fairly high, it can be decreased to an acceptable level using simple optimizations.

very large data bases | 2000

UnQL: a query language and algebra for semistructured data based on structural recursion

Peter Buneman; Mary F. Fernández; Dan Suciu

Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees. On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with respect to value equality.

Journal of Combinatorial Theory | 1974

A Note on the Metric Properties of Trees

Peter Buneman

Proof. Necessity follows immediately, and to prove sufficiency, assume that the graph, T, contains a circuit. Choose a circuit of minimum lengthp. Since T contains no triangles, p = 4q + r with q 3 1 and 0 < r < 3. By the minimality of the circuit, distances between points on it can be measured along paths in the circuit. Therefore, we can choose points x, y, z, t in the circuit such that distances d(x, y), d( y, z), d(z, t), and d(t, x) are all either q or q + 1. These points then violate the four-point condition and it follows that there can be no circuit in T. We can also attach a positive weight A, to each edge of a tree and define a new distance d between points x, y of T by

international conference on database theory | 1995

Principles of programming with complex objects and collection types

Peter Buneman; Shamim A. Naqvi; Val Tannen; Limsoon Wong

Abstract We present a new principle for the development of database query languages that the primitive operations should be organized around types. Viewing a relational database as consisting of sets of records, this principle dectates that we should investigate separately operations for records and sets. There are two immediate advantages of this approach, which is partly inspired by basic ideas from category theoryl. First, it provides a language for structures in which record and set types may be freely combined: nested relations or complex objects. Second, the fundamental operations for sets are closely related to those for other “collection types” such as bags or lists, and this suggests how database languages may be uniformly extended to these new types. the most general operation on sets, that of structural recursion , is one in which not all programs are well-defined. In looking for limited forms of this operation that always give rise to well-defined operations, we find a number of close connection with exiting database languages, notably those developed for complex objects. Moreover, even though the general paradigm of structural recursion is shown to be no more expressive than one of the existing languages for complex objects, it possesses certain properties of uniformity that make it a better candidate for an efficient, practical language. Thus rather than developing query languages by extending, for example, relational calculus, we advocate a very powerful paradigm in which a number of well-known languages are to be found as natural sublanguages.

Information Systems | 2003

Reasoning about keys for XML

Peter Buneman; Susan B. Davidson; Wenfei Fan; Carmem S. Hara; Wang Chiew Tan

We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, we show that these keys are always (finitely) satisfiable, and their (finite) implication problem is finitely axiomatizable. Furthermore, we provide a polynomial time algorithm for determining (finite) implication in the size of keys. Our results also demonstrate, among other things, that the analysis of XML keys is far more intricate than its relational counterpart.

very large data bases | 2003

Path queries on compressed XML

Peter Buneman; Martin Grohe; Christoph Koch

Central to any XML query language is a path language such as XPath which operates on the tree structure of the XML document. We demonstrate in this paper that the tree structure can be effectively compressed and manipulated using techniques derived from symbolic model checking. Specifically, we show first that succinct representations of document tree structures based on sharing subtrees are highly effective. Second, we show that compressed structures can be queried directly and efficiently through a process of manipulating selections of nodes and partial decompression. We study both the theoretical and experimental properties of this technique and provide algorithms for querying our compressed instances using node-selecting path query languages such as XPath. We believe the ability to store and manipulate large portions of the structure of very large XML documents in main memory is crucial to the development of efficient, scalable native XML databases and query engines.

symposium on principles of database systems | 2002

On propagation of deletions and annotations through views

Peter Buneman; Sanjeev Khanna; Wang Chiew Tan

We study two classes of view update problems in relational databases. We are given a source database S, a monotone query Q, and the view Q(S) generated by the query. The first problem that we consider is the classical view deletion problem where we wish to identify a minimal set T of tuples in S whose deletion will eliminate a given tuple t from the view. We study the complexity of optimizing two natural objectives in this setting, namely, find T to minimize the side-effects on the view, and the source, respectively. For both objective functions, we show a dichotomy in the complexity. Interestingly, the problem is either in P or is NP-hard, for queries in the same class in either objective function.The second problem in our study is the annotation placement problem. Suppose we annotate an attribute of a tuple in S. The rules for carrying the annotation forward through a query are easily stated. On the other hand, suppose we annotate an attribute of a tuple in the view Q(S), what annotation(s) in S will cause this annotation to appear in the view, minimizing the propagation to other attributes in Q(S)? View annotation is becoming an increasingly useful method of communicating meta-data among users of shared scientific data sets, and to our knowledge, there has been no formal study of this problem.Our study of these problems gives us important insights into computational issues involved in data provenance or lineage --- the process by which data moves through databases. We show that the two problems correspond to two fundamentally distinct notions of provenance, why and where-provenance.

Journal of Computational Biology | 1995

Challenges in Integrating Biological Data Sources

Susan B. Davidson; G. Christian Overton; Peter Buneman

Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.

Explore More