George H. L. Fletcher

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George H. L. Fletcher is active.

Explore More

Publication

Featured researches published by George H. L. Fletcher.

conference on information and knowledge management | 2009

Scalable indexing of RDF graphs for efficient join processing

George H. L. Fletcher; Peter W. Beck

Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structures. Weak data locality negatively impacts storage and query processing costs. Towards stronger data locality, we propose a Three-way Triple Tree (TripleT) secondary memory indexing technique to facilitate flexible and efficient join evaluation on RDF data. The novelty of TripleT is that the index is built over the atoms occurring in the data set, rather than at a coarser granularity, such as whole triples occurring in the data set; and, the atoms are indexed regardless of the roles (i.e., subjects, predicates, or objects) they play in the triples of the data set. We show through extensive empirical evaluation that TripleT exhibits multiple orders of magnitude improvement over the state-of-the-art, in terms of both storage and query processing costs.

Information Systems | 2009

A methodology for coupling fragments of XPath with structural indexes for XML documents

George H. L. Fletcher; Dirk Van Gucht; Yuqing Wu; Marc Gyssens; Sofia Brenes; Jan Paredaens

We introduce a new methodology for coupling language-induced partitions and index-induced partitions on XML documents that is aimed for the benefit of efficient evaluation of XPath queries. In particular, we identify XPath fragments which are ideally coupled with the newly introduced P(k)-partition which has its definition grounded in the well-known A(k) structural index and its associated partition. We then utilize these couplings to investigate fundamental questions about the use of structural indexes in XPath query evaluation.

international semantic web conference | 2012

A structural approach to indexing triples

François Picalausa; Yongming Luo; George H. L. Fletcher; Jan Hidders; Stijn Vansummeren

As an essential part of the W3Cs semantic web stack and linked data initiative, RDF data management systems (also known as triplestores) have drawn a lot of research attention. The majority of these systems use value-based indexes (e.g., B+-trees) for physical storage, and ignore many of the structural aspects present in RDF graphs. Structural indexes, on the other hand, have been successfully applied in XML and semi-structured data management to exploit structural graph information in query processing. In those settings, a structural index groups nodes in a graph based on some equivalence criterion, for example, indistinguishability with respect to some query workload (usually XPath). Motivated by this body of work, we have started the SAINT-DB project to study and develop a native RDF management system based on structural indexes. In this paper we present a principled framework for designing and using RDF structural indexes for practical fragments of SPARQL, based on recent formal structural characterizations of these fragments. We then explain how structural indexes can be incorporated in a typical query processing workflow; and discuss the design, implementation, and initial empirical evaluation of our approach.

Data-Centric Systems and Applications | 2012

Storing and Indexing Massive RDF Datasets

Yongming Luo; François Picalausa; George H. L. Fletcher; Jan Hidders; Stijn Vansummeren

The resource description framework (RDF for short) provides a flexible method for modeling information on the Web [34, 40]. All data items in RDF are uniformly represented as triples of the form (subject, predicate, object), sometimes also referred to as (subject, property, value)triples.

british national conference on databases | 2013

Bisimulation reduction of big graphs on mapreduce

Yongming Luo; Yannick de Lange; George H. L. Fletcher; Paul De Bra; Jan Hidders; Yuqing Wu

Computing the bisimulation partition of a graph is a fundamental problem which plays a key role in a wide range of basic applications. Intuitively, two nodes in a graph are bisimilar if they share basic structural properties such as labeling and neighborhood topology. In data management, reducing a graph under bisimulation equivalence is a crucial step, e.g., for indexing the graph for efficient query processing. Often, graphs of interest in the real world are massive; examples include social networks and linked open data. For analytics on such graphs, it is becoming increasingly infeasible to rely on in-memory or even I/O-efficient solutions. Hence, a trend in Big Data analytics is the use of distributed computing frameworks such as MapReduce. While there are both internal and external memory solutions for efficiently computing bisimulation, there is, to our knowledge, no effective MapReduce-based solution for bisimulation. Motivated by these observations we propose in this paper the first efficient MapReduce-based algorithm for computing the bisimulation partition of massive graphs. We also detail several optimizations for handling the data skew which often arises in real-world graphs. The results of an extensive empirical study are presented which demonstrate the effectiveness and scalability of our solution.

very large data bases | 2016

Generating flexible workloads for graph databases

Guillaume Bagan; Angela Bonifati; Radu Ciucanu; George H. L. Fletcher; Aurélien Lemay; Nicky Advokaat

Graph data management tools are nowadays evolving at a great pace. Key drivers of progress in the design and study of data intensive systems are solutions for synthetic generation of data and workloads, for use in empirical studies. Current graph generators, however, provide limited or no support for workload generation or are limited to fixed use-cases. Towards addressing these limitations, we demonstrate gMark, the first domain- and query language-independent framework for synthetic graph and query workload generation. Its novel features are: (i) fine-grained control of graph instance and query workload generation via expressive user-defined schemas; (ii) the support of expressive graph query languages, including recursion among other features; and, (iii) selectivity estimation of the generated queries. During the demonstration, we will showcase the highly tunable generation of graphs and queries through various user-defined schemas and targeted selectivities, and the variety of supported practical graph query languages. We will also show a performance comparison of four state-of-the-art graph database engines, which helps us understand their current strengths and desirable future extensions.

conference on information and knowledge management | 2013

External memory K-bisimulation reduction of big graphs

Yongming Luo; George H. L. Fletcher; Jan Hidders; Yuqing Wu; Paul De Bra

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k > 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort}(|Et|) + k · scan(|Nt|) + sort(|Nt|)), while our maintenance algorithms are bounded by O(k · sort}(|Et|) + k · scan(|Nt|). The space complexity bounds are O(|Nt|+|Et|)

foundations of information and knowledge systems | 2012

The impact of transitive closure on the boolean expressiveness of navigational query languages on graphs

George H. L. Fletcher; Marc Gyssens; Dirk Leinders; Jan Van den Bussche; Dirk Van Gucht; Stijn Vansummeren; Yuqing Wu

and O(k · |Nt|+k ·|Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graphs edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.

Journal on Data Semantics | 2009

Towards a General Framework for Effective Solutions to the Data Mapping Problem

George H. L. Fletcher; Catharine M. Wyss

Several established and novel applications motivate us to study the expressive power of navigational query languages on graphs, which represent binary relations. Our basic language has only the operators union and composition, together with the identity relation. Richer languages can be obtained by adding other features such as other set operators, projection and coprojection, converse, and the diversity relation. In this paper, we show that, when evaluated at the level of boolean queries with an unlabeled input graph (i.e., a single relation), adding transitive closure to the languages with coprojection adds expressive power, while this is not the case for the basic language to which none, one, or both of projection and the diversity relation are added. In combination with earlier work [10], these results yield a complete understanding of the impact of transitive closure on the languages under consideration.

Journal of Logic and Computation | 2015

Similarity and bisimilarity notions appropriate for characterizing indistinguishability in fragments of the calculus of relations

George H. L. Fletcher; Marc Gyssens; Dirk Leinders; Jan Van den Bussche; Dirk Van Gucht; Stijn Vansummeren

Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. We discuss the rich history of the problem and the variety of technical solutions advanced in the database community over the previous four decades. Based on this discussion, we develop a basic statement of the data mapping problem and a general framework for reasoning about the design space of system solutions to the problem. We then concretely illustrate the framework with the Tupelo system for data mapping discovery, focusing on the important common case of relational data sources. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping. Hence, Tupelo is applicable in a wide range of data mapping scenarios. Finally, we present the results of extensive empirical validation, both on synthetic and real world datasets, indicating that the system is both viable and effective.

Explore More