Alexander Schätzle | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander Schätzle is active.

Explore More

Publication

Featured researches published by Alexander Schätzle.

Proceedings of the International Workshop on Semantic Web Information Management | 2011

PigSPARQL: mapping SPARQL to Pig Latin

Alexander Schätzle; Martin Przyjaciel-Zablocki; Georg Lausen

In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Googles MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.

very large data bases | 2016

S2RDF: RDF querying with SPARQL on spark

Alexander Schätzle; Martin Przyjaciel-Zablocki; Simon Skilevic; Georg Lausen

RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Thus, the ever-increasing size of RDF data collections raises the need for scalable distributed approaches. We endorse the usage of existing infrastructures for Big Data processing like Hadoop for this purpose. Yet, SPARQL query performance is a major challenge as Hadoop is not intentionally designed for RDF processing. Existing approaches often favor certain query pattern shapes while performance drops significantly for other shapes. In this paper, we introduce a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses SQL to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state of the art SPARQL-on-Hadoop approaches.

international semantic web conference | 2014

Sempala: Interactive SPARQL Query Processing on Hadoop

Alexander Schätzle; Martin Przyjaciel-Zablocki; Antony Neu; Georg Lausen

Driven by initiatives like Schema.org, the amount of semantically annotated data is expected to grow steadily towards massive scale, requiring cluster-based solutions to query it. At the same time, Hadoop has become dominant in the area of Big Data processing with large infrastructures being already deployed and used in manifold application fields. For Hadoop-based applications, a common data pool (HDFS) provides many synergy benefits, making it very attractive to use these infrastructures for semantic data processing as well. Indeed, existing SPARQL-on- Hadoop (MapReduce) approaches have already demonstrated very good scalability, however, query runtimes are rather slow due to the underlying batch processing framework. While this is acceptable for data-intensive queries, it is not satisfactory for the majority of SPARQL queries that are typically much more selective requiring only small subsets of the data. In this paper, we present Sempala, a SPARQL-over-SQL-on-Hadoop approach designed with selective queries in mind. Our evaluation shows performance improvements by an order of magnitude compared to existing approaches, paving the way for interactive-time SPARQL query processing on Hadoop.

international semantic web conference | 2011

RDFPath: path query processing on large RDF graphs with mapreduce

Martin Przyjaciel-Zablocki; Alexander Schätzle; Thomas Hornung; Georg Lausen

The MapReduce programming model has gained traction in different application areas in recent years, ranging from the analysis of log files to the computation of the RDFS closure. Yet, for most users the MapReduce abstraction is too low-level since even simple computations have to be expressed as Map and Reduce phases. In this paper we propose RDFPath, an expressive RDF path query language geared towards casual users that benefits from the scaling properties of the MapReduce framework by automatically transforming declarative path queries into MapReduce jobs. Our evaluation on a real world data set shows the applicability of RDFPath for investigating typical graph properties like shortest paths.

very large data bases | 2015

S2X: Graph-Parallel Querying of RDF with GraphX

Alexander Schätzle; Martin Przyjaciel-Zablocki; Thorsten Berberich; Georg Lausen

RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system. It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems. In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other operators are implemented in a data-parallel manner. To the best of our knowledge, this is the first approach to combine graph-parallel and data-parallel computation for SPARQL querying of RDF data based on Hadoop.

international conference on management of data | 2013

Large-scale bisimulation of RDF graphs

Alexander Schätzle; Antony Neu; Georg Lausen; Martin Przyjaciel-Zablocki

RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.

Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2013

Evaluating Hybrid Music Recommender Systems

Thomas Hornung; Cai-Nicolas Ziegler; Simon Franz; Martin Przyjaciel-Zablocki; Alexander Schätzle; Georg Lausen

Taste in music is of highly subjective nature, making the recommending of music tracks a challenging research task. With TRecS, our live prototype system, we present a weighted hybrid recommender approach that amalgamates three diverse recommender techniques into one comprehensive score. Moreover, our system peppers the generated result list with recommendations based on a simple serendipity heuristic. This way, users can benefit from recommendations aligned with their current taste in music while gaining some exploratory diversification. An explanation feature helps the user understand the rationale behind each of the tracks being recommended to him. Empirical evaluations of the live system, based on an online evaluation, assess the overall recommendation quality as well as the impact of each of the three sub-recommenders.

Proceedings of Semantic Web Information Management on Semantic Web Information Management | 2014

Extending SPARQL for Recommendations

Victor Anthony Arrascue Ayala; Martin Przyjaciel-Zablocki; Thomas Hornung; Alexander Schätzle; Georg Lausen

For processing data on the Web, recommender systems and SPARQL are two popular paradigms, which however have rather different characteristics. SPARQL is a declarative language on RDF graphs which allows a user to precisely specify the desired information. In contrast, a recommender system suggests certain items to a user, based on similarity to other users or items. As the data to be processed by a recommender may be an RDF graph as well, the question arises whether both processing paradigms can benefit from each other. RecSPARQL fills this gap by extending the syntax and semantics of SPARQL to enable a generic and flexible way for collaborative filtering and content-based recommendations over arbitrary RDF graphs. Our experiments on the MovieLens data set demonstrate the applicability of our approach.

international workshop on the web and databases | 2015

TriAL-QL: Distributed Processing of Navigational Queries

Martin Przyjaciel-Zablocki; Alexander Schätzle; Georg Lausen

Navigational queries are among the most natural query patterns for RDF data, but yet most existing RDF query languages fail to cover all the varieties inherent to its triple-based model, including SPARQL 1.1 and its derivatives. As a consequence, the development of more expressive RDF languages is of general interest. With TriAL* [14], there exists an expressive algebra which subsumes many previous approaches, while adding novel features that are not expressible in most other RDF query languages based on the standard graph model. However, its algebraic notation is inappropriate for practical usage and it is not supported by any existing RDF triple store. In this paper, we propose TriAL-QL, an easy to write and grasp language for TriAL*, preserving its compositional algebraic structure. We present an implementation based on Impala, a massive parallel SQL query engine on Hadoop, using an optimized semi-naive evaluation for the recursive fragments of TriAL*. This way, we support both data-intensive ETL-like workloads and explorative ad-hoc style queries. To demonstrate the scalability and expressiveness of our approach, we conducted experiments on generated social networks with up to 1.8 billion triples and compared different execution strategies to a Hive-based solution.

international conference on management of data | 2017

Querying Semantic Knowledge Bases with SQL-on-Hadoop

Martin Przyjaciel-Zablocki; Alexander Schätzle; Georg Lausen

The constant growth of semantically-annotated data and an increasing interest in cross-domain knowledge bases raises the need for expressive query languages for RDF and novel approaches that enable their evaluation for web-scale data sizes. However, SPARQL, the W3C standard query language for RDF, suffers from a rather limited capability to express navigational queries. More expressive languages have been theoretically studied, however not implemented. In this paper, we continue our work on TRIAL-QL, an expressive (SQL-like) RDF query language based on the Triple Algebra with Recursion [31]. We present a new version of our TRIAL-QL processor, which takes advantage of the current momentum in in-memory SQL-on-Hadoop solutions and is built on top of Impala and SPARK while using one unified data storage. We use our system to study the application of multiple evaluation algorithms, storage strategies and optimizations on Impala and SPARK while highlighting their properties. Comprehensive experiments examine the performance of our system in comparison to other competitive RDF management systems. The obtained results demonstrate its suitability for querying semantic knowledge bases by providing interactive query response times for selective queries on datasets with more than one billion triple. More data-intensive use-cases that produce, e.g. over 25 billion results finished in the order of minutes.

Explore More