Grigoris Karvounarakis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Grigoris Karvounarakis is active.

Explore More

Publication

Featured researches published by Grigoris Karvounarakis.

symposium on principles of database systems | 2007

Provenance semirings

Todd J. Green; Grigoris Karvounarakis; Val Tannen

We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics.

international semantic web conference | 2005

Benchmarking database representations of RDF/S stores

Yannis Theoharis; Vassilis Christophides; Grigoris Karvounarakis

In this paper we benchmark three popular database representations of RDF/S schemata and data: (a) a schema-aware (i.e., one table per RDF/S class or property) with explicit (ISA) or implicit (NOISA) storage of subsumption relationships, (b) a schema-oblivious (i.e., a single table with triples of the form 〈subject-predicate-object〉), using (ID) or not (URI) identifiers to represent resources and (c) a hybrid of the schema-aware and schema-oblivious representations (i.e., one table per RDF/S meta-class by distinguishing also the range type of properties). Furthermore, we benchmark two common approaches for evaluating taxonomic queries either on-the-fly (ISA, NOISA, Hybrid), or by precomputing the transitive closure of subsumption relationships (MatView, URI, ID). The main conclusion drawn from our experiments is that the evaluation of taxonomic queries is most efficient over RDF/S stores utilizing the Hybrid and MatView representations. Of the rest, schema-aware representations (ISA, NOISA) exhibit overall better performance than URI, which is superior to that of ID, which exhibits the overall worst performance.

international conference on management of data | 2007

ORCHESTRA: facilitating collaborative data sharing

Todd J. Green; Grigoris Karvounarakis; Nicholas E. Taylor; Olivier Biton; Zachary G. Ives; Val Tannen

One of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery. Schemes for data sharing at scale have generally failed in the past because database approaches tend to impose strict global constraints: a single global schema, a (perhaps virtual) globally consistent data instance, and central administration. Each of these requirements is a barrier to participation: global schema design across a community is arduous and often requires many revisions; global consistency restricts a participant from disagreeing with others (if enforced), or may result in inconsistent answers (if unenforced); central administration impedes responsiveness to evolving requirements. Even the new approach of peer data management [9, 7], which supports multiple mediated schemas and thus distributes some aspects of administration and eliminates the need for global schema design, still limits Copyright is held by the author/owner(s). SIGMOD’07, June 11–14, 2007, Beijing, China. ACM 978-1-59593-686-8/07/0006. local autonomy because of strong data consistency requirements. To sidestep these limitations, data providers typically resort to custom, ad hoc tools: scientific data sharing often consists of large databases placed on FTP sites, which users download and convert into their local format using custom Perl scripts. Meanwhile the original data sources continue to be edited. In some cases the data providers publish weekly or monthly lists of updates to help others keep in sync; however, few sites, except direct replicas, actually exploit these update lists — instead, different copies of the data are simply allowed to diverge. Our research goal is to provide a more principled and general-purpose infrastructure for data sharing with significant gains in terms of freshness, flexibility, functionality, and extensibility. Largely guided by the needs of biologists and other scientific users, but with a goal of addressing large-scale data sharing in the broader context, we define a model for a declarative, yet extremely flexible, approach to data sharing, called the collaborative data sharing system, or CDSS.

international conference on management of data | 2008

The ORCHESTRA Collaborative Data Sharing System

Zachary G. Ives; Todd J. Green; Grigoris Karvounarakis; Nicholas E. Taylor; Val Tannen; Partha Pratim Talukdar; Marie Jacob; Fernando Pereira

Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. However, for settings in which structured data is being collaboratively authored by a large community, e.g., in the sciences, there is often a lack of consensus about how it should be represented, what is correct, and which sources are authoritative. Moreover, such data is seldom static: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. In this paper we describe the basic architecture and implementation of the ORCHESTRA system, and summarize some of the open challenges that arise in this setting.

Computer Networks | 2003

Querying the Semantic Web with RQL

Grigoris Karvounarakis; A. Magganaraki; Sofia Alexaki; Vassilis Christophides; Dimitris Plexousakis; Michel Scholl; Karsten Tolle

Real-scale Semantic Web applications, such as Knowledge Portals and E-Marketplaces, require the management of voluminous repositories of resource metadata. The Resource Description Framework (RDF) enables the creation and exchange of metadata as any other Web data. Although large volumes of RDF descriptions are already appearing, sufficiently expressive declarative query languages for RDF are still missing. We propose RQL, a new query language adapting the functionality of semistructured or XML query languages to the peculiarities of RDF but also extending this functionality in order to uniformly query both RDF descriptions and schemas. RQL is a typed language, following a functional approach a la OQL and relies on a formal graph model that permits the interpretation of superimposed resource descriptions created using one or more RDF schemas. We illustrate the syntax, semantics and type system of RQL and report on the performance of RSSDB, our persistent RDF Store, for storing and querying voluminous RDF metadata.

international conference on management of data | 2012

Semiring-annotated data: queries and provenance?

Grigoris Karvounarakis; Todd J. Green

We present an overview of the literature on querying semiring-annotated data, a notion we introduced five years ago in a paper with Val Tannen. First, we show that positive relational algebra calculations for various forms of annotated relations, as well as provenance models for such queries, are particular cases of the same general algorithm involving commutative semirings. For this reason, we present a formal framework for answering queries on data with annotations from commutative semirings, and propose a comprehensive provenance representation based on semirings of polynomials. We extend these considerations to XQuery views over annotated, unordered XML data, and show that the semiring framework suffices for a large positive fragment of XQuery applied to such data. Finally, we conclude with a brief overview of the large body of work that builds upon these results, including both extensions to the theoretical foundations and uses in practical applications.

international conference on datalog in academia and industry | 2012

LogicBlox, platform and language: a tutorial

Todd J. Green; Molham Aref; Grigoris Karvounarakis

The modern enterprise software stack--a collection of applications supporting bookkeeping, analytics, planning, and forecasting for enterprise data--is in danger of collapsing under its own weight. The task of building and maintaining enterprise software is tedious and laborious; applications are cumbersome for end-users; and adapting to new computing hardware and infrastructures is difficult. We believe that much of the complexity in todays architecture is accidental, rather than inherent. This tutorial provides an overview of the LogicBlox platform, a ambitious redesign of the enterprise software stack centered around a unified declarative programming model, based on an extended version of Datalog.

IEEE Internet Computing | 2011

On Provenance of Queries on Semantic Web Data

Yannis Theoharis; Irini Fundulaki; Grigoris Karvounarakis; Vassilis Christophides

Capturing trustworthiness, reputation, and reliability of Semantic Web data manipulated by SPARQL requires researchers to represent adequate provenance information, usually modeled as source data annotations and propagated to query results along with a query evaluation. Alternatively, abstract provenance models can capture the relationship between query results and source data by taking into account the employed query operators. The authors argue the benefits of the latter for settings in which query results are materialized in several repositories and analyzed by multiple users. They also investigate how relational provenance models can be leveraged for SPARQL queries, and advocate for new provenance models.

international conference on database theory | 2013

Algebraic structures for capturing the provenance of SPARQL queries

Floris Geerts; Grigoris Karvounarakis; Vassilis Christophides; Irini Fundulaki

We show that the evaluation of SPARQL algebra queries on various notions of annotated RDF graphs can be seen as particular cases of the evaluation of these queries on RDF graphs annotated with elements of so-called spm-semirings. Spm-semirings extend semirings, used for positive relational algebra queries on annotated relational data, with a new operator to capture the semantics of the non-monotone SPARQL operator OPTIONAL. Furthermore, spm-semiring-based annotations ensure that desired SPARQL query equivalences hold when querying annotated RDF. In addition to introducing spm-semirings, we study their properties and provide an alternative characterization of these structures in terms of semirings with an embedded boolean algebra (or seba-structure for short). This characterization allows to construct spm-semirings and to identify a universal object in the class of spm-semirings. Finally, we show that this universal object provides a concise provenance representation and can be used to evaluate SPARQL queries on arbitrary spm-semiring-annotated RDF graphs.

ACM Transactions on Database Systems | 2013

Collaborative data sharing via update exchange and provenance

Grigoris Karvounarakis; Todd J. Green; Zachary G. Ives; Val Tannen

Recent work [Ives et al. 2005] proposed a new class of systems for supporting data sharing among scientific and other collaborations: this new collaborative data sharing system connects heterogeneous logical peers using a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to incorporate related data from other peers as well. To achieve this, every peers data and updates propagate along the mappings to the other peers. However, this operation, termed update exchange, is filtered by trust conditions—expressing what data and sources a peer judges to be authoritative—which may cause a peer to reject anothers updates. In order to support such filtering, updates carry provenance information. This article develops methods for realizing such systems: we build upon techniques from data integration, data exchange, incremental view maintenance, and view update to propagate updates along mappings, both to derived and optionally to source instances. We incorporate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance. We implement our techniques in a layer above an off-the-shelf RDBMS, and we experimentally demonstrate the viability of these techniques in the Orchestra prototype system.

Explore More