Stefan Pröll | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Pröll is active.

Explore More

Publication

Featured researches published by Stefan Pröll.

international conference on big data | 2013

Scalable data citation in dynamic, large databases: Model and reference implementation

Stefan Pröll; Andreas Rauber

Uniquely and precisely identifying and citing arbitrary subsets of data is essential in many settings, e.g. to facilitate experiment validation and data re-use in meta-studies. Current approaches relying on pointers to entire data collections or on explicit copies of data do not scale. We propose a novel approach relying on persistent, timestamped, adapted queries to versioned and timestamped data sources. Result set hashes are used for validation correctness on later re-execution. The proposed method works both for static as well as dynamically growing or changing data. Alternative implementation styles for relational databases are presented and evaluated with regard to performance issues and impact on existing applications while aiming at minimal to no additional effort requirements for data users. The approach is validated in an infrastructure monitoring domain relying on sensor data networks.

international conference on data technologies and applications | 2014

A Scalable Framework for Dynamic Data Citation of Arbitrary Structured Data

Stefan Pröll; Andreas Rauber

Sharing research data is becoming increasingly important as it enables peers to validate and reproduce data driven experiments. Without original raw data at hand, serious peer review is impossible. Also exchanging data allows scientists to reuse data in different contexts and gather new knowledge from available sources. But with increasing volume and iteratively enhanced data sets, researchers need to reference exact versions of data sets. Until now access to research data often based on single archives of data files where versioning and subsetting support is limited. In this paper we introduce a mechanism that allows researchers to create versioned subsets of research data which can be cited and shared in a lightweight and secure manner. We demonstrate a prototype that supports researchers in creating subsets based on filtering and sorting source data. These subsets can be cited for later reference and reuse. The system produces evidence that allows users to verify the correctness and completeness of a subset based on cryptographic hashing. We describe a replication scenario for enabling scalable data citation in dynamic contexts.

D-lib Magazine | 2017

Enabling Reproducibility for Small and Large Scale Research Data Sets

Stefan Pröll; Andreas Rauber

A large portion of scientific results is based on analysing and processing research data. In order for an eScience experiment to be reproducible, we need to able to identify precisely the data set which was used in a study. Considering evolving data sources this can be a challenge, as studies often use subsets which have been extracted from a potentially large parent data set. Exporting and storing subsets in multiple versions does not scale with large amounts of data sets. For tackling this challenge, the RDA Working Group on Data Citation has developed a framework and provides a set of recommendations, which allow identifying precise subsets of evolving data sources based on versioned data and timestamped queries. In this work, we describe how this method can be applied in small scale research data scenarios and how it can be implemented in large scale data facilities having access to sophisticated data infrastructure. We describe how the RDA approach improves the reproducibility of eScience experiments and we provide an overview of existing pilots and use cases in small and large scale settings.

international conference theory and practice digital libraries | 2013

From Preserving Data to Preserving Research: Curation of Process and Context

Rudolf Mayer; Stefan Pröll; Andreas Rauber; Raúl Palma; Daniel Garijo

In the domain of eScience, investigations are increasingly collaborative. Most scientific and engineering domains benefit from building on top of the outputs of other research: By sharing information to reason over and data to incorporate in the modelling task at hand.

iPRES | 2013