James Cheney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James Cheney is active.

Explore More

Publication

Featured researches published by James Cheney.

Foundations and Trends in Databases | 2009

Provenance in Databases: Why, How, and Where

James Cheney; Laura Chiticariu; Wang-Chiew Tan

Different notions of provenance for database queries have been proposed and studied in the past few years. In this article, we detail three main notions of database provenance, some of their applications, and compare and contrast amongst them. Specifically, we review why, how, and where provenance, describe the relationships among these notions of provenance, and describe some of their applications in confidence computation, view maintenance and update, debugging, and annotation propagation.

programming language design and implementation | 2002

Region-based memory management in cyclone

Dan Grossman; J. Gregory Morrisett; Trevor Jim; Michael Hicks; Yanling Wang; James Cheney

Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline. The design incorporates several advancements, including support for region subtyping and a coherent integration with stack allocation and a garbage collector. To support separate compilation, Cyclone requires programmers to write some explicit region annotations, but a combination of default annotations, local type inference, and a novel treatment of region effects reduces this burden. As a result, we integrate C idioms in a region-based framework. In our experience, porting legacy C to Cyclone has required altering about 8% of the code; of the changes, only 6% (of the 8%) were region annotations.

international conference on management of data | 2006

Provenance management in curated databases

Peter Buneman; Adriane Chapman; James Cheney

Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and scientific value. General purpose database systems provide little support for tracking provenance, especially when data moves among databases. This paper investigates general-purpose techniques for recording provenance for data that is copied among databases. We describe an approach in which we track the users actions while browsing source databases and copying data into a curated database, in order to record the users actions in a convenient, queryable form. We present an implementation of this technique and use it to evaluate the feasibility of database support for provenance management. Our experiments show that although the overhead of a naive approach is fairly high, it can be decreased to an acceptable level using simple optimizations.

data compression conference | 2001

Compressing XML with multiplexed hierarchical PPM models

James Cheney

We established a working Extensible Markup Language (XML) compression benchmark based on text compression, and found that bzip2 compresses XML best, albeit more slowly than gzip. Our experiments verified that T/sub XMILL/ speeds up and improves compression using gzip and bounded-context PPM by up to 15%, but found that it worsens the compression for bzip2 and PPM. We describe alternative approaches to XML compression that illustrate other tradeoffs between speed and effectiveness. We describe experiments using several text compressors and XMILL to compress a variety of XML documents. Using these as a benchmark, we describe our two main results: an online binary encoding for XML called Encoded SAX (ESAX) that compresses better and faster than existing methods; and an online, adaptive, XML-conscious encoding based on prediction by partial match (PPM) called multiplexed hierarchical modeling (MHM) that compresses up to 35 % better than any existing method but is fairly slow.

mathematics of program construction | 2012

Notions of Bidirectional Computation and Entangled State Monads

Faris Abou-Saleh; James Cheney; Jeremy Gibbons; James McKinna; Perdita Stevens

Bidirectional transformations (bx) support principled consistency maintenance between data sources. Each data source corresponds to one perspective on a composite system, manifested by operations to ‘get’ and ‘set’ a view of the whole from that particular perspective. Bx are important in a wide range of settings, including databases, interactive applications, and model-driven development. We show that bx are naturally modelled in terms of mutable state; in particular, the ‘set’ operations are stateful functions. This leads naturally to considering bx that exploit other computational effects too, such as I/O, nondeterminism, and failure, all largely ignored in the bx literature to date. We present a semantic foundation for symmetric bidirectional transformations with effects. We build on the mature theory of monadic encapsulation of effects in functional programming, develop the equational theory and important combinators for effectful bx, and provide a prototype implementation in Haskell along with several illustrative examples.

ACM Transactions on Database Systems | 2008

On the expressiveness of implicit provenance in query and update languages

Peter Buneman; James Cheney; Stijn Vansummeren

Information describing the origin of data, generally referred to as provenance, is important in scientific and curated databases where it is the basis for the trust one puts in their contents. Since such databases are constructed using operations of both query and update languages, it is of paramount importance to describe the effect of these languages on provenance. In this article we study provenance for query and update languages that are closely related to SQL, and compare two ways in which they can manipulate provenance so that elements of the input are rearranged to elements of the output: implicit provenance, where a query or update only provides the rearranged output, and provenance is provided implicitly by a default provenance semantics; and explicit provenance, where a query or update provides both the output and the description of the provenance of each component of the output. Although explicit provenance is in general more expressive, we show that the classes of implicit provenance operations expressible by query and update languages correspond to natural semantic subclasses of the explicit provenance queries. One of the consequences of this study is that provenance separates the expressive power of query and update languages. The model is also relevant to annotation propagation schemes in which annotations on the input to a query or update have to be transferred to the output or vice versa.

symposium/workshop on haskell | 2002

A lightweight implementation of generics and dynamics

James Cheney; Ralf Hinze

The recent years have seen a number of proposals for extending statically typed languages by dynamics or generics. Most proposals --- if not all --- require significant extensions to the underlying language. In this paper we show that this need not be the case. We propose a particularly lightweight extension that supports both dynamics and generics. Furthermore, the two features are smoothly integrated: dynamic values, for instance, can be passed to generic functions. Our proposal makes do with a standard Hindley-Milner type system augmented by existential types. Building upon these ideas we have implemented a small library that is readily usable both with Hugs and with the Glasgow Haskell compiler.

symposium on principles of database systems | 2008

Curated databases

Peter Buneman; James Cheney; Wang Chiew Tan; Stijn Vansummeren

Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries -- dictionaries, encyclopedias, gazetteers etc. -- are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.

conference on object-oriented programming systems, languages, and applications | 2009

Provenance: a future history

James Cheney; Stephen Chong; Nate Foster; Margo I. Seltzer; Stijn Vansummeren

Science, industry, and society are being revolutionized by radical new capabilities for information sharing, distributed computation, and collaboration offered by the World Wide Web. This revolution promises dramatic benefits but also poses serious risks due to the fluid nature of digital information. One important cross-cutting issue is managing and recording provenance, or metadata about the origin, context, or history of data. We posit that provenance will play a central role in emerging advanced digital infrastructures. In this paper, we outline the current state of provenance research and practice, identify hard open research problems involving provenance semantics, formal modeling, and security, and articulate a vision for the future of provenance.

Mathematical Structures in Computer Science | 2011

Provenance as dependency analysis

James Cheney; Amal Ahmed; Umut A. Acar

Provenance is information recording the source, derivation or history of some information. Provenance tracking has been studied in a variety of settings, particularly database management systems. However, although many candidate definitions of provenance have been proposed, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterisation of such dependency provenance for a core database query language, show that minimal dependency provenance is not computable, and provide dynamic and static approximation techniques. We also discuss preliminary implementation experience with using dependency provenance to compute data slices, or summaries of the parts of the input relevant to a given part of the output.

Explore More