Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marcin Wylot is active.

Publication


Featured researches published by Marcin Wylot.


IEEE Transactions on Knowledge and Data Engineering | 2016

DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud

Marcin Wylot; Philippe Cudré-Mauroux

Despite recent advances in distributed RDF data management, processing large-amounts of RDF data in the cloud is still very challenging. In spite of its seemingly simple data model, RDF actually encodes rich and complex graphs mixing both instance and schema-level data. Sharding such data using classical techniques or partitioning the graph using traditional min-cut algorithms leads to very inefficient distributed operations and to a high number of joins. In this paper, we describe DiploCloud, an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data. In this paper, we describe the architecture of DiploCloud, its main data structures, as well as the new algorithms we use to partition and distribute data. We also present an extensive evaluation of DiploCloud showing that our system is often two orders of magnitude faster than state-of-the-art systems on standard workloads.


First International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2011) | 2011

BowlognaBench—Benchmarking RDF Analytics

Gianluca Demartini; Iliya Enchev; Marcin Wylot; Joël Gapany; Philippe Cudré-Mauroux

The proliferation of semantic data on the Web requires RDF database systems to constantly improve their scalability and efficiency. At the same time, users are increasingly interested in investigating large collections of online data by performing complex analytic queries (e.g.,“how did university student performance evolve over the last 5 years?”). This paper introduces a novel benchmark for evaluating and comparing the efficiency of Semantic Web data management systems on analytic queries. Our benchmark models a real-world setting derived from the Bologna process and offers a broad set of queries reflecting a large panel of concrete, data-intensive user needs.


IEEE Transactions on Knowledge and Data Engineering | 2017

Storing, Tracking, and Querying Provenance in Linked Data

Marcin Wylot; Philippe Cudré-Mauroux; Manfred Hauswirth; Paul T. Groth

The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triplestores. We present methods extending a native RDF store to efficiently handle the storage, tracking, and querying of provenance in RDF data. We describe a reliable and understandable specification of the way results were derived from the data and how particular pieces of data were combined to answer a query. Subsequently, we present techniques to tailor queries with provenance data. We empirically evaluate the presented methods and show that the overhead of storing and tracking provenance is acceptable. Finally, we show that tailoring a query with provenance information can also significantly improve the performance of query execution.


Handbook of Big Data Technologies | 2017

Linked Data Management

Manfred Hauswirth; Marcin Wylot; Martin Grund; Paul T. Groth; Philippe Cudré-Mauroux

The size of Linked Data is growing exponentially, thus a Linked Data management system has to be able to deal with increasing amounts of data. Additionally, in the Linked Data context, variety is especially important. In spite of its seemingly simple data model, Linked Data actually encodes rich and complex graphs mixing both instance and schema-level data. Since Linked Data is schema-free (i.e., the schema is not strict), standard databases techniques cannot be directly adopted to manage it. Even though organizing Linked Data in a form of a table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required typical queries. The heterogeneity of Linked Data poses also entirely new challenges to database systems, where managing provenance information is becoming a requirement. Linked Data queries usually include multiple sources and results can be produced in various ways for a specific scenario. Such heterogeneous data can incorporate knowledge on provenance, which can be further leveraged to provide users with a reliable and understandable description of the way the query result was derived, and improve the query execution performance due to high selectivity of provenance information. In this chapter, we provide a detailed overview of current approaches specifically designed for Linked Data management. We focus on storage models, indexing techniques, and query execution strategies. Finally, we provide an overview of provenance models, definitions, and serialization techniques for Linked Data. We also survey the database management systems implementing techniques to manage provenance information in the context of Linked Data.


the internet of things | 2018

RDF4Led: an RDF engine for lightweight edge devices

Anh Le-Tuan; Conor Hayes; Marcin Wylot; Danh Le-Phuoc

Semantic interoperability for the Internet of Things(IoT) is being enabled by standards and technologies from the Semantic Web. As recent research suggests a move towards decentralised IoT architectures, our focus is on how to enable scalable and robust RDF engines that can be embedded throughout the architecture, in particular at edge nodes. RDF processing at edge enables the creation of semantic integration gateways for locally connected low-level devices. We introduce a lightweight RDF engine, which comprises of RDF storage and SPARQL processor, for the lightweight edge devices, called RDF4Led. RDF4Led follows the RISCstyle (Reduce Instruction Set Computer) design philosophy. The design comprises a flash-aware storage structure, an indexing scheme and a low-memory-footprint join algorithm which improves scalability as well as robustness over competing solutions. With a significantly smaller memory footprint, we show that RDF4Led can handle 2 to 5 times more data than RDF engines such as Jena TDB and Virtuoso. On three types of ARM boards, RDF4Led requires 10--30% memory of its competitors to operate up to 30 million triples dataset; it can perform faster updates and can scale better than Jena TDB and Virtuoso. Furthermore, we demonstrate considerably faster query operations than Jena TDB.


Archive | 2018

Distributed RDF Query Processing

Sherif Sakr; Marcin Wylot; Raghava Mutharaju; Danh Le Phuoc; Irini Fundulaki

With increasing sizes of RDF datasets, executing complex queries on a single node has turned to be impractical especially when the node’s main memory is dwarfed by the volume of the dataset. Therefore, there was a crucial need for distributed systems with a high degree of parallelism that can satisfy the performance demands of complex SPARQL queries. In this chapter, we give an overview of various techniques and systems for efficiently querying large RDF datasets in distributed environments.


Archive | 2018

Processing of RDF Stream Data

Sherif Sakr; Marcin Wylot; Raghava Mutharaju; Danh Le Phuoc; Irini Fundulaki

We are witnessing a paradigm shift, where real-time, time-dependent data is becoming ubiquitous. As Linked Data facilitates the data integration process among heterogenous data sources, RDF Stream Data has the same goal with respect to data streams. It bridges the gap between stream and more static data sources. To support the processing on RDF stream data, there is a need on investigating how to extend RDF to model and represent stream data. Then, from the RDF-based data representation, the query model processing models need to be defined to build the stream processing engine that is tailored for streaming data. This chapter provides an overview on how such requirements are addressed in the current state-of-the-art of RDF Stream Data processing.


Archive | 2018

Centralized RDF Query Processing

Sherif Sakr; Marcin Wylot; Raghava Mutharaju; Danh Le Phuoc; Irini Fundulaki

The wide adoption of the RDF data model has called for efficient and scalable RDF query processing schemes. As a response to this call, a number of centralized RDF query processing systems have been designed to tackle this challenge. In these systems, the storage and query processing of RDF datasets are managed on a single node. In this chapter, we give an overview of various techniques and systems for centrally querying RDF datasets.


Archive | 2018

Provenance Management for Linked Data

Sherif Sakr; Marcin Wylot; Raghava Mutharaju; Danh Le Phuoc; Irini Fundulaki

The term Provenance refers to the origin of information and is used to describe where and how the data was obtained. Provenance is versatile and could include various types of information, such as the source of the data, information on the processes that led to a certain result, date of creation or last modification, and authorship. Recording and managing the provenance of data is of paramount importance, as it allows supporting trust mechanisms, access control and privacy policies, digital rights management, quality management and assessment, in addition to reputability, reliability and accountability of data sources.


Archive | 2018

Benchmarking RDF Query Engines and Instance Matching Systems

Sherif Sakr; Marcin Wylot; Raghava Mutharaju; Danh Le Phuoc; Irini Fundulaki

Standards and benchmarking have traditionally been used as the main tools to formally define and provably illustrate the level of the adequacy of systems to address the new challenges. In this chapter, we discuss benchmarks for RDF query engines and instance matching systems. In practice, benchmarks are used to inform users of the strengths and weaknesses of competing tools and approaches, but more importantly, they encourage the advancement of technology by providing both academia and industry with clear targets for performance and functionality.

Collaboration


Dive into the Marcin Wylot's collaboration.

Top Co-Authors

Avatar

Sherif Sakr

King Saud bin Abdulaziz University for Health Sciences

View shared research outputs
Top Co-Authors

Avatar

Danh Le Phuoc

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Manfred Hauswirth

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Danh Le-Phuoc

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge