Is this you? Create Your Porfile

Jesse Weaver

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jesse Weaver is active.

Explore More

Publication

Featured researches published by Jesse Weaver.

international semantic web conference | 2009

Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples

Jesse Weaver; James A. Hendler

In this paper, we consider the problem of materializing the complete finite RDFS closure in a scalable manner; this includes those parts of the RDFS closure that are often ignored such as literal generalization and container membership properties. We point out characteristics of RDFS that allow us to derive an embarrassingly parallel algorithm for producing said closure, and we evaluate our C/MPI implementation of the algorithm on a cluster with 128 cores using different-size subsets of the LUBM 10,000-university data set. We show that the time to produce inferences scales linearly with the number of processes, evaluating this behavior on up to hundreds of millions of triples. We also show the number of inferences produced for different subsets of LUBM10k. To the best of our knowledge, our work is the first to provide RDFS inferencing on such large data sets in such low times. Finally, we discuss future work in terms of promising applications of this approach including OWL2RL rules, MapReduce implementations, and massive scaling on supercomputers.

Semantic Web archive | 2013

Facebook Linked Data via the Graph API

Jesse Weaver; Paul Tarjan

Facebooks Graph API is an API for accessing objects and connections in Facebooks social graph. To give some idea of the enormity of the social graph underlying Facebook, it was recently announced that Facebook has 901 million users, and the social graph consists of many types beyond just users. Until recently, the Graph API provided data to applications in only a JSON format. In 2011, an effort was undertaken to provide the same data in a semantically-enriched, RDF format containing Linked Data URIs. This was achieved by implementing a flexible and robust translation of the JSON output to a Turtle output. This paper describes the associated design decisions, the resulting Linked Data for objects in the social graph, and known issues.

IEEE Micro | 2014

Scaling Semantic Graph Databases in Size and Performance

Alessandro Morari; Vito Giovanni Castellana; Oreste Villa; Antonino Tumeo; Jesse Weaver; David J. Haglin; Sutanay Choudhury; John Feo

GEMS is a full software system that implements a large-scale, semantic graph database on commodity clusters. Its framework comprises a SPARQL-to-C++ compiler, a library of distributed data structures, and a custom multithreaded runtime library. The authors evaluated their software stack on the Berlin SPARQL benchmark with datasets of up to 10 billion graph edges, demonstrating scaling in dataset size and performance as they added cluster nodes.

IEEE Computer | 2015

In-Memory Graph Databases for Web-Scale Data

Vito Giovanni Castellana; Alessandro Morari; Jesse Weaver; Antonino Tumeo; David J. Haglin; Oreste Villa; John Feo

A software stack relies primarily on graph-based methods to implement scalable resource description framework databases on top of commodity clusters, providing an inexpensive way to extract meaning from volumes of heterogeneous data.

international semantic web conference | 2011

Enabling fine-grained HTTP caching of SPARQL query results

Gregory Todd Williams; Jesse Weaver

As SPARQL endpoints are increasingly used to serve linked data, their ability to scale becomes crucial. Although much work has been done to improve query evaluation, little has been done to take advantage of caching. Effective solutions for caching query results can improve scalability by reducing latency, network IO, and CPU overhead. We show that simple augmentation of the database indexes found in common SPARQL implementations can directly lead to effective caching at the HTTP protocol level. Using tests from the Berlin SPARQL benchmark, we evaluate the potential of such caching to improve overall efficiency of SPARQL query evaluation.

Handbook of Semantic Web Technologies | 2011

KR and Reasoning on the Semantic Web: Web-Scale Reasoning

Spyros Kotoulas; Frank van Harmelen; Jesse Weaver

Reasoning is a key element of the Semantic Web. For the Semantic Web to scale, it is required that reasoning also scales. This chapter focuses on two approaches to achieve this: The first deals with increasing the computational power available for a given task by harnessing distributed resources. These distributed resources refer to peer-to-peer networks, federated data stores, or cluster-based computing. The second deals with containing the set of axioms that need to be considered for a given task. This can be achieved by using intelligent selection strategies and limiting the scope of statements. The former is exemplified by methods substituting expensive web-scale reasoning with the cheaper application of heuristics while the latter by methods to control the quality of the provided axioms. Finally, future issues concerning information centralization and logics vs information retrieval-based methods, metrics, and benchmarking are considered.

international conference on big data | 2013

Accelerating semantic graph databases on commodity clusters

Alessandro Morari; Vito Giovanni Castellana; David J. Haglin; John Feo; Jesse Weaver; Antonino Tumeo; Oreste Villa

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.

annual acis international conference on computer and information science | 2015

Enhancing the impact of science data toward data discovery and reuse

Alan R. Chappell; Jesse Weaver; Sumit Purohit; William P. Smith; Karen L. Schuchardt; Patrick West; Benno Lee; Peter Fox

The a mount of data produced in support of scientific research continues to grow rapidly. Despite the accumulation and demand for scientific data, relatively little data are actually made available for the broader scientific community. We surmise that one root of this problem is the perceived difficulty of electronically publishing scientific data and associated metadata in a way that makes it discoverable. We propose exploiting Semantic Web technologies and best practices to make metadata both discoverable and easy to publish. We share experiences in curating metadata to illustrate the cumbersome nature of data reuse in the current research environment. We also make recommendations with a real-world example of how data publishers can provide their metadata by adding limited additional markup to HTML pages on the Web. With little additional effort from data publishers, the difficulty of data discovery, access, and sharing can be greatly reduced and the impact of research data greatly enhanced.

international conference on cluster computing | 2015

High-Performance, Distributed Dictionary Encoding of RDF Datasets

Alessandro Morari; Jesse Weaver; Oreste Villa; David J. Haglin; Antonino Tumeo; Vito Giovanni Castellana; John Feo

In this work we propose a novel approach for RDF (Resource Description Framework) dictionary encoding that employs a parallel RDF parser and a distributed dictionary data structure, exploiting RDF-specific optimizations. In contrast with previous solutions, this approach exploits the Partitioned Global Address Space (PGAS) programming model combined with active messages. We evaluate the performance of our dictionary encoder in our RDF database, GEMS (Graph Engine for Multithreaded Systems), and provide an empirical comparison against previous approaches. Our comparison shows that our dictionary encoder scales significantly better and achieves higher performance than the current state of the art, providing a key element for the realization of a more efficient RDF database.

Handbook of Statistics | 2015

Chapter 14 – Scaling RDF Triple Stores in Size and Performance: Modeling SPARQL Queries as Graph Homomorphism Routines

Vito Giovanni Castellana; Jesse Weaver; Alessandro Morari; Antonino Tumeo; David J. Haglin; John Feo; Oreste Villa

Abstract This chapter discusses the approaches integrated in GEMS (Graph database Engine for Multithreaded Systems) for managing and querying datasets of RDF (Resource Description Framework) triples. GEMS is a software stack that implements graph databases on top of commodity, high-performance clusters. GEMS is composed of a SPARQL-to-C++ compiler, a library of data structures and parallel graph methods, and a multithreaded runtime library. Differently from other RDF databases, which resort to more conventional relational databases approaches, and largely employ table-based methods for query processing, GEMS mostly employs graph methods. Query processing in GEMS is performed through the conversion of SPARQL queries into graph homomorphism routines, which then are directly executed on the graph database (resulting from the RDF triples ingestion) through its runtime library. The runtime library, which implements a Partitioned Global Address Space (PGAS), lightweight software multithreading, and network message aggregation, mitigates some of the typical issues of graph processing on modern commodity clusters, enabling scaling in performance and size while new cluster nodes are added. In fact, although very powerful, these systems are built for regular computation and easily partitionable workloads, while graph processing typically has an irregular behavior. The chapter explains how SPARQL queries can be naturally modeled as graph pattern-matching algorithms and details how GEMS performs the conversion to C++ routines. It briefly discusses the other components of GEMS and then shows the results of the full stack on the Berlin SPARQL Benchmark (BSBM) and the SPARQL Performance Benchmark (SP2B). We discuss effects of the automatic conversion and present a comparison with a full custom appliance for data analytics (YarcData Urika).

Explore More