Stefan Zwicklbauer
University of Passau
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefan Zwicklbauer.
international acm sigir conference on research and development in information retrieval | 2016
Stefan Zwicklbauer; Christin Seifert; Michael Granitzer
Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. We propose a new collective, graph-based disambiguation algorithm utilizing semantic entity and document embeddings for robust entity disambiguation. Robust thereby refers to the property of achieving better than state-of-the-art results over a wide range of very different data sets. Our approach is also able to abstain if no appropriate entity can be found for a specific surface form. Our evaluation shows, that our approach achieves significantly (>5%) better results than all other publicly available disambiguation algorithms on 7 of 9 datasets without data set specific tuning. Moreover, we discuss the influence of the quality of the knowledge base on the disambiguation accuracy and indicate that our algorithm achieves better results than non-publicly available state-of-the-art algorithms.
international semantic web conference | 2016
Stefan Zwicklbauer; Christin Seifert; Michael Granitzer
Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF Resource Description Framework from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. In this work, we propose DoSeR Disambiguation of Semantic Resources, a named entity disambiguation framework that is knowledge-base-agnostic in terms of RDF e.g. DBpedia and entity-annotated document knowledge bases e.g. Wikipedia. Initially, our framework automatically generates semantic entity embeddings given one or multiple knowledge bases. In the following, DoSeR accepts documents with a given set of surface forms as input and collectively links them to an entity in a knowledge base with a graph-based approach. We evaluate DoSeR on seven different data sets against publicly available, state-of-the-art named entity disambiguation frameworks. Our approach outperforms the state-of-the-art approaches that make use of RDF knowledge bases and/or entity-annotated document knowledge bases by upi?źto 10i?ź% F1 measure.
international conference on knowledge management and knowledge technologies | 2013
Stefan Zwicklbauer; Christin Seifert; Michael Granitzer
Entity Disambiguation has been studied extensively in the last 10 years with authors reporting increasingly well performing systems. However, most studies focused on general purpose knowledge bases like Wikipedia or DBPedia and left out the question how those results generalize to more specialized domains. This is especially important in the context of Linked Open Data which forms an enormous resource for disambiguation. However, the influence of domain heterogeneity, size and quality of the knowledge base remains largely unanswered. In this paper we present an extensive set of experiments on special purpose knowledge bases from the biomedical domain where we evaluate the disambiguation performance along four variables: (i) the representation of the knowledge base as being either entity-centric or document-centric, (ii) the size of the knowledge base in terms of entities covered, (iii) the semantic heterogeneity of a domain and (iv) the quality and completeness of a knowledge base. Our results show that for special purpose knowledge bases (i) document-centric disambiguation significantly outperforms entity-centric disambiguation, (ii) document-centric disambiguation does not depend on the size of the knowledge-base, while entity-centric approaches do, and (iii) disambiguation performance varies greatly across domains. These results suggest that domain-heterogeneity, size and knowledge base quality have to be carefully considered for the design of entity disambiguation systems and that for constructing knowledge bases user-annotated texts are preferable to carefully constructed knowledge bases.
Revised Selected Papers of the First Workshop on Specifying Big Data Benchmarks - Volume 8163 | 2012
Florian Stegmaier; Christin Seifert; Roman Kern; Patrick Höfler; Sebastian Bayerl; Michael Granitzer; Harald Kosch; Stefanie N. Lindstaedt; Belgin Mutlu; Vedran Sabol; Kai Schlegel; Stefan Zwicklbauer
Research depends to a large degree on the availability and quality of primary research data, i.e., data generated through experiments and evaluations. While the Web in general and Linked Data in particular provide a platform and the necessary technologies for sharing, managing and utilizing research data, an ecosystem supporting those tasks is still missing. The vision of the CODE project is the establishment of a sophisticated ecosystem for Linked Data. Here, the extraction of knowledge encapsulated in scientific research paper along with its public release as Linked Data serves as the major use case. Further, Visual Analytics approaches empower end users to analyse, integrate and organize data. During these tasks, specific Big Data issues are present.
knowledge discovery and data mining | 2013
Christin Seifert; Michael Granitzer; Patrick Höfler; Belgin Mutlu; Vedran Sabol; Kai Schlegel; Sebastian Bayerl; Florian Stegmaier; Stefan Zwicklbauer; Roman Kern
Scientific publications constitute an extremely valuable body of knowledge and can be seen as the roots of our civilisation. However, with the exponential growth of written publications, comparing facts and findings between different research groups and communities becomes nearly impossible. In this paper, we present a conceptual approach and a first implementation for creating an open knowledge base of scientific knowledge mined from research publications. This requires to extract facts - mostly empirical observations - from unstructured texts (mainly PDF’s). Due to the importance of extracting facts with high-accuracy and the impreciseness of automatic methods, human quality control is of utmost importance. In order to establish such quality control mechanisms, we rely on intelligent visual interfaces and on establishing a toolset for crowdsourcing fact extraction, text mining and data integration tasks.
database and expert systems applications | 2015
Stefan Zwicklbauer; Christin Seifert; Michael Granitzer
Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. Most disambiguation systems focus on general purpose knowledge bases like DBpedia but leave out the question how those results generalize to more specialized domains. This is very important in the context of Linked Open Data, which forms an enormous resource for disambiguation. We implement a ranking-based Learning To Rank disambiguation system and provide a systematic evaluation of biomedical entity disambiguation with respect to three crucial and well-known properties of specialized disambiguation systems. These are i entity context, i.e. the way entities are described, ii user data, i.e. quantity and quality of externally disambiguated entities, and iii quantity and heterogeneity of entities to disambiguate, i.e. the number and size of different domains in a knowledge base. Our results show that i the choice of entity context that is used to attain the best disambiguation results strongly depends on the amount of available user data, ii disambiguation results with large-scale and heterogeneous knowledge bases strongly depend on the entity context, iii disambiguation results are robust against a moderate amount of noise in user data and iv some results can be significantly improved with a federated disambiguation approach that uses different entity contexts. Our results indicate that disambiguation systems must be carefully adapted when expanding their knowledge bases with special domain entities.
workshops on enabling technologies: infrastracture for collaborative enterprises | 2017
Konstantin Ziegler; Olivier Caelen; Mathieu Garchery; Michael Granitzer; Liyun He-Guelton; Johannes Jurgovsky; Pierre-Edouard Portier; Stefan Zwicklbauer
The inferences of a machine learning algorithm are naturally limited by the available data. In many real-world applications, the provided internal data is domain-specific and we use external background knowledge to derive or add new features. Semantic networks, like linked open data, provide a largely unused treasure trove of background knowledge. This drives a recent surge of interest in unsupervised methods to automatically extract such semantic background knowledge and inject it into machine learning algorithms. In this work, we describe the general process of extracting knowledge from semantic networks through vector space embeddings. The locations in the vector space then reflect relations in the original semantic network. We perform this extraction for geographic background knowledge and inject it into a neural network for the complicated real-world task of credit-card fraud detection. This improves the performance by 11.2%.
ACM Journal on Computing and Cultural Heritage | 2017
Christin Seifert; Werner Bailer; Thomas Orgel; Louis Gantner; Roman Kern; Hermann Ziak; Albin Petit; Jörg Schlötterer; Stefan Zwicklbauer; Michael Granitzer
The digitization initiatives in the past decades have led to a tremendous increase in digitized objects in the cultural heritage domain. Although digitally available, these objects are often not easily accessible for interested users because of the distributed allocation of the content in different repositories and the variety in data structure and standards. When users search for cultural content, they first need to identify the specific repository and then need to know how to search within this platform (e.g., usage of specific vocabulary). The goal of the EEXCESS project is to design and implement an infrastructure that enables ubiquitous access to digital cultural heritage content. Cultural content should be made available in the channels that users habitually visit and be tailored to their current context without the need to manually search multiple portals or content repositories. To realize this goal, open-source software components and services have been developed that can either be used as an integrated infrastructure or as modular components suitable to be integrated in other products and services. The EEXCESS modules and components comprise (i) Web-based context detection, (ii) information retrieval-based, federated content aggregation, (iii) metadata definition and mapping, and (iv) a component responsible for privacy preservation. Various applications have been realized based on these components that bring cultural content to the user in content consumption and content creation scenarios. For example, content consumption is realized by a browser extension generating automatic search queries from the current page context and the focus paragraph and presenting related results aggregated from different data providers. A Google Docs add-on allows retrieval of relevant content aggregated from multiple data providers while collaboratively writing a document. These relevant resources then can be included in the current document either as citation, an image, or a link (with preview) without having to leave disrupt the current writing task for an explicit search in various content providers’ portals.
Smart Health | 2015
Stefan Zwicklbauer; Christin Seifert; Michael Granitzer
The application of Knowledge Discovery and Data Mining approaches forms the basis of realizing the vision of Smart Hospitals. For instance, the automated creation of high-quality knowledge bases from clinical reports is important to facilitate decision making processes for clinical doctors. A subtask of creating such structured knowledge is entity disambiguation that establishes links by identifying the correct semantic meaning from a set of candidate meanings to a text fragment. This paper provides a short, concise overview of entity disambiguation in the biomedical domain, with a focus on annotated corpora (e.g. CalbC), term disambiguation algorithms (e.g. abbreviation disambiguation) as well as gene and protein disambiguation algorithms (e.g. inter-species gene name disambiguation). Finally, we provide some open problems and future challenges that we expect future research will take into account.
Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business | 2015
Stefan Zwicklbauer; Christin Seifert; Michael Granitzer
Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. One possibility to describe these entities within a knowledge base is via entity-annotated documents (document-centric knowledge base). It has been shown that entity disambiguation with search-based algorithms that use document-centric knowledge bases perform well on the biomedical domain. In this context, the question remains how the quantity of annotated entities within documents and the document count used for entity classification influence disambiguation results. Another open question is whether disambiguation results hold true on more general knowledge data sets (e.g. Wikipedia). In our work we implement a search-based, document-centric disambiguation system and explicitly evaluate the mentioned issues on the biomedical data set CALBC and general knowledge data set Wikipedia, respectively. We show that the number of documents used for classification and the amount of annotations within these documents must be well-matched to attain the best result. Additionally, we reveal that disambiguation accuracy is poor on Wikipedia. We show that disambiguation results significantly improve when using shorter but more documents (e.g. Wikipedia paragraphs). Our results indicate that search-based, document-centric disambiguation systems must be carefully adapted with reference to the underlying domain and availability of user data.