Marcin Wylot | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcin Wylot is active.

Explore More

Publication

Featured researches published by Marcin Wylot.

IEEE Transactions on Knowledge and Data Engineering | 2016

DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud

Marcin Wylot; Philippe Cudré-Mauroux

Despite recent advances in distributed RDF data management, processing large-amounts of RDF data in the cloud is still very challenging. In spite of its seemingly simple data model, RDF actually encodes rich and complex graphs mixing both instance and schema-level data. Sharding such data using classical techniques or partitioning the graph using traditional min-cut algorithms leads to very inefficient distributed operations and to a high number of joins. In this paper, we describe DiploCloud, an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data. In this paper, we describe the architecture of DiploCloud, its main data structures, as well as the new algorithms we use to partition and distribute data. We also present an extensive evaluation of DiploCloud showing that our system is often two orders of magnitude faster than state-of-the-art systems on standard workloads.

First International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2011) | 2011

BowlognaBench—Benchmarking RDF Analytics

Gianluca Demartini; Iliya Enchev; Marcin Wylot; Joël Gapany; Philippe Cudré-Mauroux

The proliferation of semantic data on the Web requires RDF database systems to constantly improve their scalability and efficiency. At the same time, users are increasingly interested in investigating large collections of online data by performing complex analytic queries (e.g.,“how did university student performance evolve over the last 5 years?”). This paper introduces a novel benchmark for evaluating and comparing the efficiency of Semantic Web data management systems on analytic queries. Our benchmark models a real-world setting derived from the Bologna process and offers a broad set of queries reflecting a large panel of concrete, data-intensive user needs.

IEEE Transactions on Knowledge and Data Engineering | 2017

Storing, Tracking, and Querying Provenance in Linked Data

Marcin Wylot; Philippe Cudré-Mauroux; Manfred Hauswirth; Paul T. Groth

The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triplestores. We present methods extending a native RDF store to efficiently handle the storage, tracking, and querying of provenance in RDF data. We describe a reliable and understandable specification of the way results were derived from the data and how particular pieces of data were combined to answer a query. Subsequently, we present techniques to tailor queries with provenance data. We empirically evaluate the presented methods and show that the overhead of storing and tracking provenance is acceptable. Finally, we show that tailoring a query with provenance information can also significantly improve the performance of query execution.

Handbook of Big Data Technologies | 2017

Linked Data Management

Manfred Hauswirth; Marcin Wylot; Martin Grund; Paul T. Groth; Philippe Cudré-Mauroux

The size of Linked Data is growing exponentially, thus a Linked Data management system has to be able to deal with increasing amounts of data. Additionally, in the Linked Data context, variety is especially important. In spite of its seemingly simple data model, Linked Data actually encodes rich and complex graphs mixing both instance and schema-level data. Since Linked Data is schema-free (i.e., the schema is not strict), standard databases techniques cannot be directly adopted to manage it. Even though organizing Linked Data in a form of a table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required typical queries. The heterogeneity of Linked Data poses also entirely new challenges to database systems, where managing provenance information is becoming a requirement. Linked Data queries usually include multiple sources and results can be produced in various ways for a specific scenario. Such heterogeneous data can incorporate knowledge on provenance, which can be further leveraged to provide users with a reliable and understandable description of the way the query result was derived, and improve the query execution performance due to high selectivity of provenance information. In this chapter, we provide a detailed overview of current approaches specifically designed for Linked Data management. We focus on storage models, indexing techniques, and query execution strategies. Finally, we provide an overview of provenance models, definitions, and serialization techniques for Linked Data. We also survey the database management systems implementing techniques to manage provenance information in the context of Linked Data.

the internet of things | 2018

RDF4Led: an RDF engine for lightweight edge devices

Anh Le-Tuan; Conor Hayes; Marcin Wylot; Danh Le-Phuoc

Semantic interoperability for the Internet of Things(IoT) is being enabled by standards and technologies from the Semantic Web. As recent research suggests a move towards decentralised IoT architectures, our focus is on how to enable scalable and robust RDF engines that can be embedded throughout the architecture, in particular at edge nodes. RDF processing at edge enables the creation of semantic integration gateways for locally connected low-level devices. We introduce a lightweight RDF engine, which comprises of RDF storage and SPARQL processor, for the lightweight edge devices, called RDF4Led. RDF4Led follows the RISCstyle (Reduce Instruction Set Computer) design philosophy. The design comprises a flash-aware storage structure, an indexing scheme and a low-memory-footprint join algorithm which improves scalability as well as robustness over competing solutions. With a significantly smaller memory footprint, we show that RDF4Led can handle 2 to 5 times more data than RDF engines such as Jena TDB and Virtuoso. On three types of ARM boards, RDF4Led requires 10--30% memory of its competitors to operate up to 30 million triples dataset; it can perform faster updates and can scale better than Jena TDB and Virtuoso. Furthermore, we demonstrate considerably faster query operations than Jena TDB.

Archive | 2018