Jacopo Urbani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jacopo Urbani is active.

Explore More

Publication

Featured researches published by Jacopo Urbani.

international semantic web conference | 2010

OWL reasoning with WebPIE: calculating the closure of 100 billion triples

Jacopo Urbani; Spyros Kotoulas; Jason Maassen; Frank van Harmelen; Henri E. Bal

In previous work we have shown that the MapReduce framework for distributed computation can be deployed for highly scalable inference over RDF graphs under the RDF Schema semantics. Unfortunately, several key optimizations that enabled the scalable RDFS inference do not generalize to the richer OWL semantics. In this paper we analyze these problems, and we propose solutions to overcome them. Our solutions allow distributed computation of the closure of an RDF graph under the OWL Horst semantics. We demonstrate the WebPIE inference engine, built on top of the Hadoop platform and deployed on a compute cluster of 64 machines. We have evaluated our approach using some real-world datasets (UniProt and LDSR, about 0.9-1.5 billion triples) and a synthetic benchmark (LUBM, up to 100 billion triples). Results show that our implementation is scalable and vastly outperforms current systems when comparing supported language expressivity, maximum data size and inference speed.

Journal of Web Semantics | 2014

Streaming the Web

Alessandro Margara; Jacopo Urbani; Frank van Harmelen; Henri E. Bal

In the last few years a new research area, called stream reasoning, emerged to bridge the gap between reasoning and stream processing. While current reasoning approaches are designed to work on mainly static data, the Web is, on the other hand, extremely dynamic: information is frequently changed and updated, and new data is continuously generated from a huge number of sources, often at high rate. In other words, fresh information is constantly made available in the form of streams of new data and updates.Despite some promising investigations in the area, stream reasoning is still in its infancy, both from the perspective of models and theories development, and from the perspective of systems and tools design and implementation.The aim of this paper is threefold: (i)?we identify the requirements coming from different application scenarios, and we isolate the problems they pose; (ii)?we survey existing approaches and proposals in the area of stream reasoning, highlighting their strengths and limitations; (iii)?we draw a research agenda to guide the future research and development of stream reasoning. In doing so, we also analyze related research fields to extract algorithms, models, techniques, and solutions that could be useful in the area of stream reasoning.

international semantic web conference | 2011

QueryPIE: backward reasoning for OWL horst over very large knowledge bases

Jacopo Urbani; Frank van Harmelen; Stefan Schlobach; Henri E. Bal

Both materialization and backward-chaining as different modes of performing inference have complementary advantages and disadvantages. Materialization enables very efficient responses at query time, but at the cost of an expensive up front closure computation, which needs to be redone every time the knowledge base changes. Backward-chaining does not need such an expensive and change-sensitive precomputation, and is therefore suitable for more frequently changing knowledge bases, but has to perform more computation at query time. Materialization has been studied extensively in the recent semantic web literature, and is now available in industrial-strength systems. In this work, we focus instead on backward-chaining, and we present an hybrid algorithm to perform efficient backward-chaining reasoning on very large datasets expressed in the OWL Horst (pD*) fragment. As a proof of concept, we have implemented a prototype called QueryPIE (Query Parallel Inference Engine), and we have tested its performance on different datasets of up to 1 billion triples. Our parallel implementation greatly reduces the reasoning complexity of a naive backward-chaining approach and returns results for single query-patterns in the order of milliseconds when running on a modest 8 machine cluster. To the best of our knowledge, QueryPIE is the first reported backward-chaining reasoner for OWL Horst that efficiently scales to a billion triples.

international semantic web conference | 2013

DynamiTE: Parallel Materialization of Dynamic RDF Data

Jacopo Urbani; Alessandro Margara; Ceriel J. H. Jacobs; Frank van Harmelen; Henri E. Bal

One of the main advantages of using semantically annotated data is that machines can reason on it, deriving implicit knowledge from explicit information. In this context, materializing every possible implicit derivation from a given input can be computationally expensive, especially when considering large data volumes. Most of the solutions that address this problem rely on the assumption that the information is static, i.e., that it does not change, or changes very infrequently. However, the Web is extremely dynamic: online newspapers, blogs, social networks, etc., are frequently changed so that outdated information is removed and replaced with fresh data. This demands for a materialization that is not only scalable, but also reactive to changes. In this paper, we consider the problem of incremental materialization, that is, how to update the materialized derivations when new data is added or removed. To this purpose, we consider the i¾?df RDFS fragment [12], and present a parallel system that implements a number of algorithms to quickly recalculate the derivation. In case new data is added, our system uses a parallel version of the well-known semi-naive evaluation of Datalog. In case of removals, we have implemented two algorithms, one based on previous theoretical work, and another one that is more efficient since it does not require a complete scan of the input. We have evaluated the performance using a prototype system called DynamiTE, which organizes the knowledge bases with a number of indices to facilitate the query process and exploits parallelism to improve the performance. The results show that our methods are indeed capable to recalculate the derivation in a short time, opening the door to reasoning on much more dynamic data than is currently possible.

high performance distributed computing | 2010

Massive Semantic Web data compression with MapReduce

Jacopo Urbani; Jason Maassen; Henri E. Bal

The Semantic Web consists of many billions of statements made of terms that are either URIs or literals. Since these terms usually consist of long sequences of characters, an effective compression technique must be used to reduce the data size and increase the application performance. One of the best known techniques for data compression is dictionary encoding. In this paper we propose a MapReduce algorithm that efficiently compresses and decompresses a large amount of Semantic Web data. We have implemented a prototype using the Hadoop framework and we report an evaluation of the performance. The evaluation shows that our approach is able to efficiently compress a large amount of data and that it scales linearly regarding the input size and number of nodes.

Grids, Clouds and Virtualization | 2011

Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds

Frank J. Seinstra; Jason Maassen; Rob V. van Nieuwpoort; Niels Drost; Timo van Kessel; Ben van Werkhoven; Jacopo Urbani; Ceriel J. H. Jacobs; Thilo Kielmann; Henri E. Bal

In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle .

extended semantic web conference | 2013

Seven Commandments for Benchmarking Semantic Flow Processing Systems

Thomas Scharrenbach; Jacopo Urbani; Alessandro Margara; Emanuele Della Valle; Abraham Bernstein

Over the last few years, the processing of dynamic data has gained increasing attention in the Semantic Web community. This led to the development of several stream reasoning systems that enable on-the-fly processing of semantically annotated data that changes over time. Due to their streaming nature, analyzing such systems is extremely difficult. Currently, their evaluation is conducted under heterogeneous scenarios, hampering their comparison and an understanding of their benefits and limitations. In this paper, we strive for a better understanding of the key challenges that these systems must face and define a generic methodology to evaluate their performance. Specifically, we identify three Key Performance Indicators and seven commandments that specify how to design the stress tests for system evaluation.

Semantic Web | 2013

Hybrid Reasoning on OWL RL

Jacopo Urbani; Robert Piro; Frank van Harmelen; Henri E. Bal

Both materialization and backward-chaining as different modes of performing inference have complementary advan- tages and disadvantages. Materialization enables very efficient responses at query time, but at the cost of an expensive up front closure computation, which needs to be redone every time the knowledge base changes. Backward-chaining does not need such an expensive and change-sensitive pre-computation, and is therefore suitable for more frequently changing knowledge bases, but has to perform more computation at query time. Materialization has been studied extensively in the recent semantic web literature, and is now available in industrial-strength systems. In this work, we focus instead on backward-chaining, and we present a general hybrid algorithm to perform efficient backward-chaining reasoning on very large RDF data sets. To this end, we analyze the correctness of our algorithm by proving its completeness using the theory developed in deductive databases and we introduce a number of techniques that exploit the characteristics of our method to execute efficiently (most of) the OWL RL rules. These techniques reduce the computation and hence improve the response time by reducing the size of the generated proof tree and the number of duplicates produced in the derivation. We have implemented these techniques in an experimental prototype called QueryPIE and present an evaluation on both realistic and artificial data sets of a size that is between five and ten billion of triples. The evaluation was performed using one machine with commodity hardware and it shows that (i) with our approach the initial pre-computation takes only a few minutes against the hours (or even days) necessary for a full materialization and that (ii) the remaining overhead introduced by reasoning still allows atomic queries to be processed with an interactive response time. To the best of our knowledge our method is the first that demonstrates complex rule-based reasoning at query time over an input of several billion triples and it takes a step forward towards truly large-scale reasoning by showing that complex and large-scale OWL inference can be performed without an expensive distributed hardware architecture.

international world wide web conferences | 2014

Efficient RDF stream reasoning with graphics processingunits (GPUs)

Chang Liu; Jacopo Urbani; Guilin Qi

In this paper, we study the problem of stream reasoning and propose a reasoning approach over large amounts of RDF data, which uses graphics processing units (GPU) to improve the performance. First, we show how the problem of stream reasoning can be reduced to a temporal reasoning problem. Then, we describe a number of algorithms to perform stream reasoning with GPUs.

european semantic web conference | 2015

A Compact In-Memory Dictionary for RDF Data

Hamid R. Bazoobandi; Steven de Rooij; Jacopo Urbani; Annette ten Teije; Frank van Harmelen; Henri E. Bal

While almost all dictionary compression techniques focus on static RDF data, we present a compact in-memory RDF dictionary for dynamic and streaming data. To do so, we analysed the structure of terms in real-world datasets and observed a high degree of common prefixes. We studied the applicability of Trie data structures on RDF data to reduce the memory occupied by common prefixes and discovered that all existing Trie implementations lead to either poor performance, or an excessive memory wastage. In our approach, we address the existing limitations of Tries for RDF data, and propose a new variant of Trie which contains some optimizations explicitly designed to improve the performance on RDF data. Furthermore, we show how we use this Trie as an in-memory dictionary by using as numerical ID a memory address instead of an integer counter. This design removes the need for an additional decoding data structure, and further reduces the occupied memory. An empirical analysis on real-world datasets shows that with a reasonable overhead our technique uses 50---59% less memory than a conventional uncompressed dictionary.

Explore More