Martin Brümmer
Leipzig University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Brümmer.
international semantic web conference | 2013
Sebastian Hellmann; Jens Lehmann; Sören Auer; Martin Brümmer
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.
international world wide web conferences | 2015
Michael Röder; Axel-Cyrille Ngonga Ngomo; Ciro Baron; Andreas Both; Martin Brümmer; Diego Ceccarelli; Marco Cornolti; Didier Cherix; Bernd Eickmann; Paolo Ferragina; Christiane Lemke; Andrea Moro; Roberto Navigli; Francesco Piccinno; Giuseppe Rizzo; Harald Sack; René Speck; Raphaël Troncy; Jörg Waitelonis; Lars Wesemann
We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.
Sprachwissenschaft | 2015
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo; Sebastian Hellmann; Steven Moran; Martin Brümmer; John P. McCrae
In this paper we describe the Semantic Quran dataset, a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources and aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under-represented languages in the Linked Data Cloud, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. We present the ontology devised for structuring the data. We also provide the transformation rules implemented in our extraction framework. Finally, we detail the link creation process as well as possible usage scenarios for the Semantic Quran dataset.
international conference on semantic systems | 2014
Martin Brümmer; Ciro Baron; Ivan Ermilov; Markus Freudenberg; Dimitris Kontokostas; Sebastian Hellmann
The constantly growing amount of Linked Open Data (LOD) datasets constitutes the need for rich metadata descriptions, enabling users to discover, understand and process the available data. This metadata is often created, maintained and stored in diverse data repositories featuring disparate data models that are often unable to provide the metadata necessary to automatically process the datasets described. This paper proposes DataID, a best-practice for LOD dataset descriptions which utilize RDF files hosted together with the datasets, under the same domain. We are describing the data model, which is based on the widely used DCAT and VoID vocabularies, as well as supporting tools to create and publish DataIDs and use cases that show the benefits of providing semantically rich metadata for complex datasets. As a proof of concept, we generated a DataID for the DBpedia dataset, which we will present in the paper.
european semantic web conference | 2014
Dimitris Kontokostas; Martin Brümmer; Sebastian Hellmann; Jens Lehmann; Lazaros Ioannidis
Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.
Psychological Science | 2018
Michael Dufner; Martin Brümmer; Joanne M. Chung; Pia M. Drewke; Christophe Blaison; Stefan C. Schmukle
Abel and Kruger (2010) found that the smile intensity of professional baseball players who were active in 1952, as coded from photographs, predicted these players’ longevity. In the current investigation, we sought to replicate this result and to extend the initial analyses. We analyzed (a) a sample that was almost identical to the one from Abel and Kruger’s study using the same database and inclusion criteria (N = 224), (b) a considerably larger nonoverlapping sample consisting of other players from the same cohort (N = 527), and (c) all players in the database (N = 13,530 valid cases). Like Abel and Kruger, we relied on categorical smile codings as indicators of positive affectivity, yet we supplemented these codings with subjective ratings of joy intensity and automatic codings of positive affectivity made by computer programs. In both samples and for all three indicators, we found that positive affectivity did not predict mortality once birth year was controlled as a covariate.
PLOS ONE | 2017
Julia M. Rohrer; Martin Brümmer; Stefan C. Schmukle; Jan Goebel; Gert G. Wagner
Open-ended questions have routinely been included in large-scale survey and panel studies, yet there is some perplexity about how to actually incorporate the answers to such questions into quantitative social science research. Tools developed recently in the domain of natural language processing offer a wide range of options for the automated analysis of such textual data, but their implementation has lagged behind. In this study, we demonstrate straightforward procedures that can be applied to process and analyze textual data for the purposes of quantitative social science research. Using more than 35,000 textual answers to the question “What else are you worried about?” from participants of the German Socio-economic Panel Study (SOEP), we (1) analyzed characteristics of respondents that determined whether they answered the open-ended question, (2) used the textual data to detect relevant topics that were reported by the respondents, and (3) linked the features of the respondents to the worries they reported in their textual data. The potential uses as well as the limitations of the automated analysis of textual data are discussed.
metadata and semantics research | 2016
Markus Freudenberg; Martin Brümmer; Jessika Rücknagel; Robert Ulrich; Thomas Eckart; Dimitris Kontokostas; Sebastian Hellmann
The rapid increase of data produced in a data-centric economy emphasises the need for rich metadata descriptions of datasets, covering many domains and scenarios. While there are multiple metadata formats, describing datasets for specific purposes, exchanging metadata between them is often a difficult endeavour. More general approaches for domain-independent descriptions, often lack the precision needed in many domain-specific use cases. This paper introduces the multilayer ontology of DataID, providing semantically rich metadata for complex datasets. In particular, we focus on the extensibility of its core model and the interoperability with foreign ontologies and other metadata formats. As a proof of concept, we will present a way to describe Data Management Plans (DMP) of research projects alongside the metadata of its datasets, repositories and involved agents.
international conference on semantic systems | 2015
Natanael Arndt; Markus Ackermann; Martin Brümmer; Thomas Riechert
Popular knowledge bases that provide SPARQL endpoints for the web are usually experiencing a high number of requests, which often results in low availability of their interfaces. A common approach to counter the availability issue is to run a local mirror of the knowledge base. Running a SPARQL endpoint is currently a complex task which requires a lot of effort and technical support for domain experts who just want to use the SPARQL interface. With our approach of containerised knowledge base shipping we are introducing a simple to setup methodology for running a local mirror of an RDF knowledge base and SPARQL endpoint with interchangeable exploration components. The flexibility of the presented approach further helps maintaining the publication infrastructure for dataset projects. We are demonstrating and evaluating the presented methodology at the example of the dataset projects DBpedia, Catalogus Professorum Lipsiensium and Sächsisches Pfarrerbuch.
international world wide web conferences | 2016
Ciro Baron Neto; Kay Müller; Martin Brümmer; Dimitris Kontokostas; Sebastian Hellmann