Stefan Hagedorn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Hagedorn is active.

Explore More

Publication

Featured researches published by Stefan Hagedorn.

Datenbank-spektrum | 2015

Complex Event Processing on Linked Stream Data

Omran Saleh; Stefan Hagedorn; Kai-Uwe Sattler

Social networks and Sensor Web technologies typically generate a massive amount of data published as streams. In order to give these streams a meaningful sense and enrich them with semantic descriptions, the concept of Linked Stream Data (LSD) has emerged. However, to support a wide range of LSD scenarios and queries comprehensive solutions providing not only classic data stream operators such as windows, but also for processing of complex events, linking of (static) datasets, and scalable processing are required. In this paper, we present our approach for processing LSD and addressing these requirements. In contrast to existing LSD engines relying on streaming extensions to SPARQL, our PipeFlow system is a (relational) dataflow language and engine providing support for complex event processing (CEP) and a few dedicated operators for RDF data. We describe this language and particularly the CEP model as well as the system architecture for parallel CEP and LSD processing by exploiting partitioning techniques for cluster environments. Finally, we report results from experiments evaluating our system in comparison to existing LSD engines.

edbt icdt workshops | 2013

Discovery querying in linked open data

Stefan Hagedorn; Kai-Uwe Sattler

The problem of the inability of machines to interpret and process information published on web pages caused the development of a web of data, next to the web of documents. The idea is known as the Semantic Web, where links between information are established in a way that machines can understand and interpret. With its development, new applications were introduced to query and process this linked data. Additionally the open data initiative was launched with the goal to publish governmental, scientific, and cultural data freely accessible on the web. Often, this open data is offered in a semi-structured form, like CSV files, but can also be transformed into linked data format. With this linked open data, programs can be created that efficiently process queries and find information. This work is supposed to integrate the support for discovery queries into an existing LOD cache engine. The goal is to develop a new approach that processes SPARQL queries and augments the result with discovered information from different (online) sources. Thus, the approach can help users to explore new information and knowledge more easily. Users should not worry about what particular data is stored locally and which identifiers are used. To do so, we plan to extend the rewriting process during logical optimization of SPARQL queries.

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2013

Efficient Parallel Processing of Analytical Queries on Linked Data

Stefan Hagedorn; Kai-Uwe Sattler

Linked data has become one of the most successful movements of the Semantic Web community. RDF and SPARQL have been established as de-facto standards for representing and querying linked data and there exists quite a number of RDF stores and SPARQL engines that can be used to work with the data. However, for many types of queries on linked data these stores are not the best choice regarding query execution times. For example, users are interested in analytical tasks such as profiling or finding correlated entities in their datasets.

international world wide web conferences | 2016

Piglet: Interactive and Platform Transparent Analytics for RDF & Dynamic Data

Stefan Hagedorn; Kai-Uwe Sattler

Data analytics has gained more and more focus during recent years and many data processing platforms have been developed. They all provide a powerful but often complex API that users have to learn. Furthermore, results can only be stored or printed, without any possibility for visualization. In this paper we present Piglet, a compiler for the high-level Pig Latin script language that generates code for various platforms like Spark, Flink, Storm, and PipeFabric. Piglet lets users write elegant code with extensions for SPARQL and RDF, as well as support for streaming data. An integration into the notebook-based frontend Zeppelin provides a homogeneous and interactive user interface for exploring, analyzing, and visualizing data from different sources and lets users share their scripts and results.

international conference on data engineering | 2014

LODHub — A platform for sharing and integrated processing of linked open data

Stefan Hagedorn; Kai-Uwe Sattler

In this paper we discuss the need for a new platform that combines existing solutions for publishing and sharing linked open data with the infrastructure of services for exploring, processing, and analyzing data across multiple data sets. We identify various requirements for such a platform, describe the architecture, and sketch initial results of our prototype.

geographic information retrieval | 2016

Refining imprecise spatio-temporal events: a network-based approach

Andreas Spitz; Johanna Geiß; Michael Gertz; Stefan Hagedorn; Kai-Uwe Sattler

Events as composites of temporal, spatial and actor information are a central object of interest in many information retrieval (IR) scenarios. There are several challenges to such event-centric IR, which range from the detection and extraction of geographic, temporal and actor mentions in documents to the construction of event descriptions as triples of locations, dates, and actors that can support event query scenarios. For the latter challenge, existing approaches fall short when dealing with imprecise event components. For example, if the exact location or date is unknown, existing IR methods are often unaware of different granularity levels and the conceptual proximity of dates or locations. To address these problems, we present a framework that efficiently answers imprecise event queries, whose geographic or temporal component is given only at a coarse granularity level. Our approach utilizes a network-based event model that includes location, date, and actor components that are extracted from large document collections. Instances of entity and event mentions in the network are weighted based on both their frequency of occurrence and textual distance to reflect semantic relatedness. We demonstrate the utility and flexibility of our approach for evaluating imprecise event queries based on a large collection of events extracted from the English Wikipedia for a ground truth of news events.

British International Conference on Databases | 2015

A Framework for Scalable Correlation of Spatio-temporal Event Data

Stefan Hagedorn; Kai-Uwe Sattler; Michael Gertz

Spatio-temporal event data do not only arise from sensor readings, but also in information retrieval and text analysis. However, such events extracted from a text corpus may be imprecise in both dimensions. In this paper we focus on the task of event correlation, i.e., finding events that are similar in terms of space and time. We present a framework for Apache Spark that provides correlation operators that can be configured to deal with such imprecise event data.

advances in databases and information systems | 2018

Cost-Based Sharing and Recycling of (Intermediate) Results in Dataflow Programs

Stefan Hagedorn; Kai-Uwe Sattler

In data analytics, researchers often work on the same data-sets investigating different aspects and moreover develop their programs in an incremental manner. This opens opportunities to share and recycle results from previously executed jobs if they contain identical operations, e.g., restructuring, filtering and other kinds of data preparation.

Information Technology | 2016

Stream processing platforms for analyzing big dynamic data

Stefan Hagedorn; Philipp Götze; Omran Saleh; Kai-Uwe Sattler

Abstract Nowadays, data is produced in every aspect of our lives, leading to a massive amount of information generated every second. However, this vast amount is often too large to be stored and for many applications the information contained in these data streams is only useful when it is fresh. Batch processing platforms like Hadoop MapReduce do not fit these needs as they require to collect data on disk and process it repeatedly. Therefore, modern data processing engines combine the scalability of distributed architectures with the one-pass semantics of traditional stream engines. In this paper, we survey the current state of the art in scalable stream processing from a user perspective. We examine and describe their architecture, execution model, programming interface, and data analysis support as well as discuss the challenges and limitations of their APIs. In this connection, we introduce Piglet, an extended Pig Latin language and code generator that compiles (extended) Pig Latin code into programs for various data processing platforms. Thereby, we discuss the mapping to platform-specic concepts in order to provide a uniform view.

16. Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW) | 2015