Zachary G. Ives | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zachary G. Ives is active.

Explore More

Publication

Featured researches published by Zachary G. Ives.

international semantic web conference | 2007

DBpedia: a nucleus for a web of open data

Sören Auer; Georgi Kobilarov; Jens Lehmann; Richard Cyganiak; Zachary G. Ives

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human-andmachine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.

international conference on management of data | 1999

An adaptive query execution system for data integration

Zachary G. Ives; Daniela Florescu; Marc Friedman; Alon Y. Levy; Daniel S. Weld

Query processing in data integration occurs over network-bound, autonomous data sources. This requires extensions to traditional optimization and execution techniques for three reasons: there is an absence of quality statistics about the data, data transfer rates are unpredictable and bursty, and slow or unavailable data sources can often be replaced by overlapping or mirrored sources. This paper presents the Tukwila data integration system, designed to support adaptivity at its core using a two-pronged approach. Interleaved planning and execution with partial optimization allows Tukwila to quickly recover from decisions based on inaccurate estimates. During execution, Tukwila uses adaptive query operators such as the double pipelined hash join, which produces answers quickly, and the dynamic collector, which robustly and efficiently computes unions across overlapping data sources. We demonstrate that the Tukwila architecture extends previous innovations in adaptive execution (such as query scrambling, mid-execution re-optimization, and choose nodes), and we present experimental evidence that our techniques result in behavior desirable for a data integration system.

international conference on data engineering | 2003

Schema mediation in peer data management systems

Alon Y. Halevy; Zachary G. Ives; Dan Suciu; Igor Tatarinov

Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backwards compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by nondatabase-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas. We consider the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas, which extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Finally, we describe several methods for optimizing the reformulation algorithm, and an initial set of experiments studying its performance.

international world wide web conferences | 2003

Piazza: data management infrastructure for semantic web applications

Alon Y. Halevy; Zachary G. Ives; Peter Mork; Igor Tatarinov

The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the worlds data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies.This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.

international conference on management of data | 2001

Updating XML

Igor Tatarinov; Zachary G. Ives; Alon Y. Halevy; Daniel S. Weld

As XML has developed over the past few years, its role has expanded beyond its original domain as a semantics-preserving markup language for online documents, and it is now also the de facto format for interchanging data between heterogeneous systems. Data sources expert XML “views” over their data, and other system can directly import or query these views. As a result, there has been great interest in languages and systems for expressing queries over XML data, whether the XML is stored in a repository or generated as a view over some other data storage format. Clearly, in order to fully evolve XML into a universal data representation and sharing format, we must allow users to specify updates to XML documents and must develop techniques to process them efficiently. Update capabilities are important not only for modifying XML documents, but also for propagating changes through XML view and for expressing and transmitting changes to documents. This paper begins by proposing a set of basic update operations for both ordered and unordered XML data. We next describe extensions to the proposed standard XML query language, XQuery, to incorporate the update operations. We then consider alternative methods for implementing update operations when the XML data is mapped into a relational database. Finally, we describe an experimental evaluation of the alternative techniques for implementing our extensions.

international conference on management of data | 2003

The Piazza peer data management project

Igor Tatarinov; Zachary G. Ives; Jayant Madhavan; Alon Y. Halevy; Dan Suciu; Nilesh N. Dalvi; Xin Dong; Yana Kadiyska; Gerome Miklau; Peter Mork

A major problem in todays information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing heterogeneous data in a distributed and scalable way. Piazza assumes the participants to be interested in sharing data, and willing to define pairwise mappings between their schemas. Then, users formulate queries over their preferred schema, and a query answering system expands recursively any mappings relevant to the query, retrieving data from other peers. In this paper, we provide a brief overview of the Piazza project including our work on developing mapping languages and query reformulation algorithms, assisting the users in defining mappings, indexing, and enforcing access control over shared data.

very large data bases | 2002

An XML query engine for network-bound data

Zachary G. Ives; Alon Y. Halevy; Daniel S. Weld

Abstract. XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources - namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide are a also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output - while providing good performance for both batch-oriented and ad hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a querys input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability.

international conference on database theory | 2009

Reconcilable differences

Todd J. Green; Zachary G. Ives; Val Tannen

Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing. Generalizations to larger fragments of the relational algebra (RA) --- specifically, support for the difference operator --- would increase the options available for query reformulation, and also apply to view adaptation (updating a materialized view in response to a modified view definition) and view maintenance. Unfortunately, most questions about queries become undecidable in the presence of difference/negation. We present a novel way of managing this difficulty via an excursion through a non-standard semantics, Z-relations, where tuples are annotated with positive or negative integers. We show that under Z-semantics RA queries have a normal form as a single difference of positive queries and this leads to the decidability of equivalence. In most real-world settings with difference, it is possible to convert the queries to this normal form. We give a sound and complete algorithm that explores all reformulations of an RA query (under Z-semantics) using a set of RA views, finitely bounding the search space with a simple and natural cost model. We investigate related complexity questions, and we also extend our results to queries with built-in predicates. Z-relations are interesting in their own right because they capture updates and data uniformly. However, our algorithm turns out to be sound and complete also for bag semantics, albeit necessarily only for a subclass of RA. This subclass turns out to be quite large and covers generously the applications of interest to us. We also show a subclass of RA where reformulation and evaluation under Z-semantics can be combined with duplicate elimination to obtain the answer under set semantics.

international conference on management of data | 2007

ORCHESTRA: facilitating collaborative data sharing

Todd J. Green; Grigoris Karvounarakis; Nicholas E. Taylor; Olivier Biton; Zachary G. Ives; Val Tannen

One of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery. Schemes for data sharing at scale have generally failed in the past because database approaches tend to impose strict global constraints: a single global schema, a (perhaps virtual) globally consistent data instance, and central administration. Each of these requirements is a barrier to participation: global schema design across a community is arduous and often requires many revisions; global consistency restricts a participant from disagreeing with others (if enforced), or may result in inconsistent answers (if unenforced); central administration impedes responsiveness to evolving requirements. Even the new approach of peer data management [9, 7], which supports multiple mediated schemas and thus distributes some aspects of administration and eliminates the need for global schema design, still limits Copyright is held by the author/owner(s). SIGMOD’07, June 11–14, 2007, Beijing, China. ACM 978-1-59593-686-8/07/0006. local autonomy because of strong data consistency requirements. To sidestep these limitations, data providers typically resort to custom, ad hoc tools: scientific data sharing often consists of large databases placed on FTP sites, which users download and convert into their local format using custom Perl scripts. Meanwhile the original data sources continue to be edited. In some cases the data providers publish weekly or monthly lists of updates to help others keep in sync; however, few sites, except direct replicas, actually exploit these update lists — instead, different copies of the data are simply allowed to diverge. Our research goal is to provide a more principled and general-purpose infrastructure for data sharing with significant gains in terms of freshness, flexibility, functionality, and extensibility. Largely guided by the needs of biologists and other scientific users, but with a goal of addressing large-scale data sharing in the broader context, we define a model for a declarative, yet extremely flexible, approach to data sharing, called the collaborative data sharing system, or CDSS.

international conference on management of data | 2008

The ORCHESTRA Collaborative Data Sharing System

Zachary G. Ives; Todd J. Green; Grigoris Karvounarakis; Nicholas E. Taylor; Val Tannen; Partha Pratim Talukdar; Marie Jacob; Fernando Pereira

Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. However, for settings in which structured data is being collaboratively authored by a large community, e.g., in the sciences, there is often a lack of consensus about how it should be represented, what is correct, and which sources are authoritative. Moreover, such data is seldom static: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. In this paper we describe the basic architecture and implementation of the ORCHESTRA system, and summarize some of the open challenges that arise in this setting.

Explore More