Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthias Hartung is active.

Publication


Featured researches published by Matthias Hartung.


International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 2008

Ontology-based information extraction and integration from heterogeneous data sources

Paul Buitelaar; Philipp Cimiano; Anette Frank; Matthias Hartung; Stefania Racioppa

In this paper we present the design, implementation and evaluation of SOBA, a system for ontology-based information extraction from heterogeneous data resources, including plain text, tables and image captions. SOBA is capable of processing structured information, text and image captions to extract information and integrate it into a coherent knowledge base. To establish coherence, SOBA interlinks the information extracted from different sources and detects duplicate information. The knowledge base produced by SOBA can then be used to query for information contained in the different sources in an integrated and seamless manner. Overall, this allows for advanced retrieval functionality by which questions can be answered precisely. A further distinguishing feature of the SOBA system is that it straightforwardly integrates deep and shallow natural language processing to increase robustness and accuracy. We discuss the implementation and application of the SOBA system within the SmartWeb multimodal dialog system. In addition, we present a thorough evaluation of the different components of the system. However, an end-to-end evaluation of the whole SmartWeb system is out of the scope of this paper and has been presented elsewhere by the SmartWeb consortium.


Semantics in Text Processing. STEP 2008 Conference Proceedings | 2008

A Resource-Poor Approach for Linking Ontology Classes to Wikipedia Articles

Nils Reiter; Matthias Hartung; Anette Frank

The applicability of ontologies for natural language processing depends on the ability to link ontological concepts and relations to their realisations in texts. We present a general, resource-poor account to create such a linking automatically by extracting Wikipedia articles corresponding to ontology classes. We evaluate our approach in an experiment with the Music Ontology. We consider linking as a promising starting point for subsequent steps of information extraction.


international conference on computational linguistics | 2014

Ontology-based Extraction of Structured Information from Publications on Preclinical Experiments for Spinal Cord Injury Treatments

Benjamin Paassen; Andreas Stöckel; Raphael Dickfelder; Jan Philip Göpfert; Nicole Brazda; Tarek Kirchhoffer; Hans Werner Müller; Roman Klinger; Matthias Hartung; Philipp Cimiano

Preclinical research in the field of central nervous system trauma advances at a fast pace, currently yielding over 8,000 new publications per year, at an exponentially growing rate. This amount of published information by far exceeds the capacity of individual scientists to read and understand the relevant literature. So far, no clinical trial has led to therapeutic approaches which achieve functional recovery in human patients. In this paper, we describe a first prototype of an ontology-based information extraction system that automatically extracts relevant preclinical knowledge about spinal cord injury treatments from natural language text by recognizing participating entity classes and linking them to each other. The evaluation on an independent test corpus of manually annotated full text articles shows a macroaverage F1 measure of 0.74 with precision 0.68 and recall 0.81 on the task of identifying entities participating in relations.


knowledge acquisition, modeling and management | 2016

Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models

Sherzod Hakimov; Hendrik ter Horst; Soufian Jebbara; Matthias Hartung; Philipp Cimiano

Named Entity Disambiguation NED is the task of disambiguating named entities in a natural language text by linking them to their corresponding entities in a knowledge base such as DBpedia, which are already recognized. It is an important step in transforming unstructured text into structured knowledge. Previous work on this task has proven a strong impact of graph-based methods such as PageRank on entity disambiguation. Other approaches rely on distributional similarity between an article and the textual description of a candidate entity. However, the combined impact of these different feature groups has not been explored to a sufficient extent. In this paper, we present a novel approach that exploits an undirected probabilistic model to combine different types of features for named entity disambiguation. Capitalizing on Markov Chain Monte Carlo sampling, our model is capable of exploiting complementary strengths between both graph-based and textual features. We analyze the impact of these features and their combination on named entity disambiguation. In an evaluation on the GERBIL benchmark, our model compares favourably to the current state-of-the-art in 8 out of 14 data sets.


meeting of the association for computational linguistics | 2014

Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach

Matthias Hartung; Roman Klinger; Matthias Zwick; Philipp Cimiano

Retrieving information about highly ambiguous gene/protein homonyms is a challenge, in particular where their non-protein meanings are more frequent than their protein meaning (e. g., SAH or HF). Due to their limited coverage in common benchmarking data sets, the performance of existing gene/protein recognition tools on these problematic cases is hard to assess. We uniformly sample a corpus of eight ambiguous gene/protein abbreviations from MEDLINEr and provide manual annotations for each mention of these abbreviations. 1 Based on this resource, we show that available gene recognition tools such as conditional random fields (CRF) trained on BioCreative 2 NER data or GNAT tend to underperform on this phenomenon. We propose to extend existing gene recognition approaches by combining a CRF and a support vector machine. In a crossentity evaluation and without taking any entity-specific information into account, our model achieves a gain of 6 points F1-Measure over our best baseline which checks for the occurrence of a long form of the abbreviation and more than 9 points over all existing tools investigated.


applications of natural language to data bases | 2018

Assessing the Impact of Single and Pairwise Slot Constraints in a Factor Graph Model for Template-based Information Extraction

Hendrik ter Horst; Matthias Hartung; Roman Klinger; Nicole Brazda; Hans Werner Müller; Philipp Cimiano

Template-based information extraction generalizes over standard token-level binary relation extraction in the sense that it attempts to fill a complex template comprising multiple slots on the basis of information given in a text. In the approach presented in this paper, templates and possible fillers are defined by a given ontology. The information extraction task consists in filling these slots within a template with previously recognized entities or literal values. We cast the task as a structure prediction problem and propose a joint probabilistic model based on factor graphs to account for the interdependence in slot assignments. Inference is implemented as a heuristic building on Markov chain Monte Carlo sampling. As our main contribution, we investigate the impact of soft constraints modeled as single slot factors which measure preferences of individual slots for ranges of fillers, as well as pairwise slot factors modeling the compatibility between fillers of two slots. Instead of relying on expert knowledge to acquire such soft constraints, in our approach they are directly captured in the model and learned from training data. We show that both types of factors are effective in improving information extraction on a real-world data set of full-text papers from the biomedical domain. Pairwise factors are shown to particularly improve the performance of our extraction model by up to \({+}0.43\) points in precision, leading to an F\(_1\) score of 0.90 for individual templates.


language data and knowledge | 2017

Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

Hendrik ter Horst; Matthias Hartung; Philipp Cimiano

The problems of recognizing mentions of entities in texts and linking them to unique knowledge base identifiers have received considerable attention in recent years. In this paper we present a probabilistic system based on undirected graphical models that jointly addresses both the entity recognition and the linking task. Our framework considers the span of mentions of entities as well as the corresponding knowledge base identifier as random variables and models the joint assignment using a factorized distribution. We show that our approach can be easily applied to different technical domains by merely exchanging the underlying ontology. On the task of recognizing and linking disease names, we show that our approach outperforms the state-of-the-art systems DNorm and TaggerOne, as well as two strong lexicon-based baselines. On the task of recognizing and linking chemical names, our system achieves comparable performance to the state-of-the-art.


Reasoning Web International Summer School | 2018

Cold-Start Knowledge Base Population Using Ontology-Based Information Extraction with Conditional Random Fields

Hendrik ter Horst; Matthias Hartung; Philipp Cimiano

In this tutorial we discuss how Conditional Random Fields can be applied to knowledge base population tasks. We are in particular interested in the cold-start setting which assumes as given an ontology that models classes and properties relevant for the domain of interest, and an empty knowledge base that needs to be populated from unstructured text. More specifically, cold-start knowledge base population consists in predicting semantic structures from an input document that instantiate classes and properties as defined in the ontology. Considering knowledge base population as structure prediction, we frame the task as a statistical inference problem which aims at predicting the most likely assignment to a set of ontologically grounded output variables given an input document. In order to model the conditional distribution of these output variables given the input variables derived from the text, we follow the approach adopted in Conditional Random Fields. We decompose the cold-start knowledge base population task into the specific problems of entity recognition, entity linking and slot-filling, and show how they can be modeled using Conditional Random Fields.


applications of natural language to data bases | 2017

Identifying Right-Wing Extremism in German Twitter Profiles: A Classification Approach

Matthias Hartung; Roman Klinger; Franziska Schmidtke; Lars Vogel

Social media platforms are used by an increasing number of extremist political actors for mobilization, recruiting or radicalization purposes. We propose a machine learning approach to support manual monitoring aiming at identifying right-wing extremist content in German Twitter profiles. We frame the task as profile classification, based on textual cues, traits of emotionality in language use, and linguistic patterns. A quantitative evaluation reveals a limited precision of 25% with a close-to-perfect recall of 95%. This leads to a considerable reduction of the workload of human analysts in detecting right-wing extremist users.


sai intelligent systems conference | 2016

Providing Intelligent Assistance for Product Configuration in Manufacturing: A Learning-to-Rank Approach

Carsten Poggemeier; Matthias Hartung; Philipp Cimiano

Configuring complex products can be a challenge due to the huge number of configuration possibilities. In this paper, our goal is to foster the development of intelligent configuration assistants that can support customers in configuring complex products. We formalize the task as a machine learning problem and in particular as a learning-to-rank problem. Given pairwise preferences elicited from experts, we show that we can train a model using support vector machines that ranks possible products according to their relevance to a given set of requirements specified by a user.

Collaboration


Dive into the Matthias Hartung's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nicole Brazda

University of Düsseldorf

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge