Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefan Dietze is active.

Publication


Featured researches published by Stefan Dietze.


european semantic web conference | 2014

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles

Besnik Fetahu; Stefan Dietze; Bernardo Pereira Nunes; Marco A. Casanova; Davide Taibi; Wolfgang Nejdl

The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of techniques for resource sampling from datasets, topic extraction from reference datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate profiles even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.


Archive | 2014

A Survey on Linked Data and the Social Web as Facilitators for TEL Recommender Systems

Stefan Dietze; Hendrik Drachsler; Daniela Giordano

Personalisation, adaptation and recommendation are central features of TEL environments. In this context, information retrieval techniques are applied as part of TEL recommender systems to filter and recommend learning resources or peer learners according to user preferences and requirements. However, the suitability and scope of possible recommendations is fundamentally dependent on the quality and quantity of available data, for instance, metadata about TEL resources as well as users. On the other hand, throughout the last years, the Linked Data (LD) movement has succeeded to provide a vast body of well-interlinked and publicly accessible Web data. This in particular includes Linked Data of explicit or implicit educational nature. The potential of LD to facilitate TEL recommender systems research and practice is discussed in this paper. In particular, an overview of most relevant LD sources and techniques is provided, together with a discussion of their potential for the TEL domain in general and TEL recommender systems in particular. Results from highly related European projects are presented and discussed together with an analysis of prevailing challenges and preliminary solutions.


international learning analytics knowledge conference | 2017

Improving learning through achievement priming in crowdsourced information finding microtasks

Ujwal Gadiraju; Stefan Dietze

Crowdsourcing has become an increasingly popular means to acquire human input on demand. Microtask crowdsourcing market-places facilitate the access to millions of people (called workers) who are willing to participate in tasks in return for monetary rewards or other forms of compensation. This paradigm presents a unique learning context where workers have to learn to complete tasks on-the-fly by applying their learning immediately through the course of tasks. However, most workers typically dropout early in large batches of tasks, depriving themselves of the opportunity to learn on-the-fly through the course of batch completion. By doing so workers squander a potential chance at improving their performance and completing tasks effectively. In this paper, we propose a novel method to engage and retain workers, to improve their learning in crowdsourced information finding tasks by using achievement priming. Through rigorous experimental findings, we show that it is possible to retain workers in long batches of tasks by triggering their inherent motivation to achieve and excel. As a consequence of increased worker retention, we find that workers learn to perform more effectively, depicting relatively more stable accuracy and lower task completion times in comparison to workers who drop out early.


international semantic web conference | 2015

Improving Entity Retrieval on Structured Data

Besnik Fetahu; Ujwal Gadiraju; Stefan Dietze

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the x---means and spectral clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge BTC12 dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches.


international semantic web conference | 2016

Towards Entity Summarisation on Structured Web Markup

Ran Yu; Ujwal Gadiraju; Xiaofei Zhu; Besnik Fetahu; Stefan Dietze

Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data. However, statements extracted from markup are fundamentally different to traditional RDF graphs: entity descriptions are flat, facts are highly redundant and granular, and co-references are very frequent yet explicit links are missing. Therefore, carrying out typical entity-centric tasks such as retrieval and summarisation cannot be tackled sufficiently with state of the art methods. We present an entity summarisation approach that overcomes such issues through a combination of entity retrieval and summarisation techniques geared towards the specific challenges associated with embedded markup. We perform a preliminary evaluation on a subset of the Web Data Commons dataset and show improvements over existing entity retrieval baselines. In addition, an investigation into the coverage and complementary of facts from the constructed entity summaries shows potential for aiding tasks such as knowledge base population.


ACM Transactions on Computer-Human Interaction | 2017

Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks

Ujwal Gadiraju; Besnik Fetahu; Ricardo Kawase; Patrick Siehndel; Stefan Dietze

Paid crowdsourcing platforms have evolved into remarkable marketplaces where requesters can tap into human intelligence to serve a multitude of purposes, and the workforce can benefit through monetary returns for investing their efforts. In this work, we focus on individual crowd worker competencies. By drawing from self-assessment theories in psychology, we show that crowd workers often lack awareness about their true level of competence. Due to this, although workers intend to maintain a high reputation, they tend to participate in tasks that are beyond their competence. We reveal the diversity of individual worker competencies, and make a case for competence-based pre-selection in crowdsourcing marketplaces. We show the implications of flawed self-assessments on real-world microtasks, and propose a novel worker pre-selection method that considers accuracy of worker self-assessments. We evaluated our method in a sentiment analysis task and observed an improvement in the accuracy by over 15%, when compared to traditional performance-based worker pre-selection. Similarly, our proposed method resulted in an improvement in accuracy of nearly 6% in an image validation task. Our results show that requesters in crowdsourcing platforms can benefit by considering worker self-assessments in addition to their performance for pre-selection.


International Workshop on Semantic, Analytics, Visualization | 2016

Analysing Structured Scholarly Data Embedded in Web Pages

Pracheta Sahoo; Ujwal Gadiraju; Ran Yu; Sriparna Saha; Stefan Dietze

Web pages increasingly embed structured data in the form of microdata, microformats and RDFa. Through efforts such as schema.org, such embedded markup have become prevalent, with current studies estimating an adoption by about 26% of all web pages. Similar to the early adoption of Linked Data principles by publishers, libraries and other providers of bibliographic data, such organisations have been among the early adopters, providing an unprecedented source of structured data about scholarly works. Such data, however, is fundamentally different from traditional Linked Data, by being very sparsely linked and consisting of a large amount of coreferences and redundant statements. So far, the scale and nature of embedded scholarly data on the Web has not been investigated. In this work, we provide a study on embedded scholarly data to answer research questions about the depth, syntactic and semantic characteristics and distribution of extracted data, thereby investigating challenges and opportunities for using embedded data as a structured knowledge graph of scholarly information.


international world wide web conferences | 2018

Inferring Missing Categorical Information in Noisy and Sparse Web Markup

Nicolas Tempelmeier; Elena Demidova; Stefan Dietze

Embedded markup of Web pages has seen widespread adoption throughout the past years driven by standards such as RDFa and Microdata and initiatives such as schema.org, where recent studies show an adoption by 39% of all Web pages already in 2016. While this constitutes an important information source for tasks such as Web search, Web page classification or knowledge graph augmentation, individual markup nodes are usually sparsely described and often lack essential information. For instance, from 26 million nodes describing events within the Common Crawl in 2016, 59% of nodes provide less than six statements and only 257,000 nodes (0.96%) are typed with more specific event subtypes. Nevertheless, given the scale and diversity of Web markup data, nodes that provide missing information can be obtained from the Web in large quantities, in particular for categorical properties. Such data constitutes potential training data for inferring missing information to significantly augment sparsely described nodes. In this work, we introduce a supervised approach for inferring missing categorical properties in Web markup. Our experiments, conducted on properties of events and movies, show a performance of 79% and 83% F1 score correspondingly, significantly outperforming existing baselines.


conference on human information interaction and retrieval | 2018

Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web

Ujwal Gadiraju; Ran Yu; Stefan Dietze; Peter Holtz

Web search is frequently used by people to acquire new knowledge and to satisfy learning-related objectives, but little is known about how a user»s knowledge evolves through the course of a search session. We present a study addressing the knowledge gain of users in informational search sessions. Using crowdsourcing, we recruited 500 distinct users and orchestrated real-world search sessions spanning 10 different topics and information needs. By using scientifically formulated knowledge tests we calibrated the knowledge of users before and after their search sessions, quantifying their knowledge gain. We investigated the impact of information needs on the search behavior and knowledge gain of users, revealing a significant effect of information need on user queries and navigational patterns, but no direct effect on the knowledge gain. Users on average exhibited a higher knowledge gain through search sessions pertaining to topics they were less familiar with. Our findings in this paper contribute important ground work towards advancing current research in understanding user knowledge gain through web search sessions.


Lecture Notes in Computer Science | 2016

Retrieval, crawling and fusion of entity-centric data on the web

Stefan Dietze

While the Web of (entity-centric) data has seen tremendous growth over the past years, take-up and re-use is still limited. Data vary heavily with respect to their scale, quality, coverage or dynamics, what poses challenges for tasks such as entity retrieval or search. This chapter provides an overview of approaches to deal with the increasing heterogeneity of Web data. On the one hand, recommendation, linking, profiling and retrieval can provide efficient means to enable discovery and search of entity-centric data, specifically when dealing with traditional knowledge graphs and linked data. On the other hand, embedded markup such as Microdata and RDFa has emerged a novel, Web-scale source of entity-centric knowledge. While markup has seen increasing adoption over the last few years, driven by initiatives such as schema.org, it constitutes an increasingly important source of entity-centric data on the Web, being in the same order of magnitude as the Web itself with regards to dynamics and scale. To this end, markup data lends itself as a data source for aiding tasks such as knowledge base augmentation, where data fusion techniques are required to address the inherent characteristics of markup data, such as its redundancy, heterogeneity and lack of links. Future directions are concerned with the exploitation of the complementary nature of markup data and traditional knowledge graphs.

Collaboration


Dive into the Stefan Dietze's collaboration.

Top Co-Authors

Avatar

Davide Taibi

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Hendrik Drachsler

Goethe University Frankfurt

View shared research outputs
Top Co-Authors

Avatar

Bernardo Pereira Nunes

Universidade Federal do Estado do Rio de Janeiro

View shared research outputs
Top Co-Authors

Avatar

Marco A. Casanova

Pontifical Catholic University of Rio de Janeiro

View shared research outputs
Top Co-Authors

Avatar

Ricardo Kawase

Leibniz University of Hanover

View shared research outputs
Top Co-Authors

Avatar

John G. Breslin

National University of Ireland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Roope Jaakonmäki

University of Liechtenstein

View shared research outputs
Top Co-Authors

Avatar

Albrecht Fortenbacher

HTW Berlin - University of Applied Sciences

View shared research outputs
Top Co-Authors

Avatar

Davide Taibi

National Research Council

View shared research outputs
Researchain Logo
Decentralizing Knowledge