Leonhard Hennig | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leonhard Hennig is active.

Explore More

Publication

Featured researches published by Leonhard Hennig.

international joint conference on natural language processing | 2015

Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities

Dirk Weissenborn; Leonhard Hennig; Feiyu Xu; Hans Uszkoreit

In this paper, we present a novel approach to joint word sense disambiguation (WSD) and entity linking (EL) that combines a set of complementary objectives in an extensible multi-objective formalism. During disambiguation the system performs continuous optimization to find optimal probability distributions over candidate senses. The performance of our system on nominal WSD as well as EL improves state-ofthe-art results on several corpora. These improvements demonstrate the importance of combining complementary objectives in a joint model for robust disambiguation.

Archive | 2009

Automatic Layouting of Personalized Newspaper Pages

Thomas Strecker; Leonhard Hennig

Layouting items in a 2D-constrained container for maximizing container value and minimizing wasted space is a 2D Cutting and Packing (C&P) problem. We consider this task in the context of layouting news articles on xed-size pages in a system for delivering personalized newspapers. We propose a grid-based page structure where articles can be laid out in different variants for increased exibility. In addition, we have developed a fitness function integrating aesthetic and relevance criteria for computing the value of a solution. We evaluate our approach using well-known layouting heuristics. Our results show that with the more complex fitness function only advanced C&P algorithms obtain nearly-optimal solutions, while the basic algorithms underperform

Journal of Web Semantics | 2016

Sar-graphs

Sebastian Krause; Leonhard Hennig; Andrea Moro; Dirk Weissenborn; Feiyu Xu; Hans Uszkoreit; Roberto Navigli

Recent years have seen a significant growth and increased usage of large-scale knowledge resources in both academic research and industry. We can distinguish two main types of knowledge resources: those that store factual information about entities in the form of semantic relations (e.g., Freebase), namely so-called knowledge graphs, and those that represent general linguistic knowledge (e.g., WordNet or UWN). In this article, we present a third type of knowledge resource which completes the picture by connecting the two first types. Instances of this resource are graphs of semantically-associated relations (sar-graphs), whose purpose is to link semantic relations from factual knowledge graphs with their linguistic representations in human language.We present a general method for constructing sar-graphs using a language- and relation-independent, distantly supervised approach which, apart from generic language processing tools, relies solely on the availability of a lexical semantic resource, providing sense information for words, as well as a knowledge base containing seed relation instances. Using these seeds, our method extracts, validates and merges relation-specific linguistic patterns from text to create sar-graphs. To cope with the noisily labeled data arising in a distantly supervised setting, we propose several automatic pattern confidence estimation strategies, and also show how manual supervision can be used to improve the quality of sar-graph instances. We demonstrate the applicability of our method by constructing sar-graphs for 25 semantic relations, of which we make a subset publicly available at http://sargraph.dfki.de.We believe sar-graphs will prove to be useful linguistic resources for a wide variety of natural language processing tasks, and in particular for information extraction and knowledge base population. We illustrate their usefulness with experiments in relation extraction and in computer assisted language learning.

Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications | 2015

Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with Language

Sebastian Krause; Leonhard Hennig; Aleksandra Gabryszak; Feiyu Xu; Hans Uszkoreit

We present sar-graphs, a knowledge resource that links semantic relations from factual knowledge graphs to the linguistic patterns with which a language can express instances of these relations. Sar-graphs expand upon existing lexicosemantic resources by modeling syntactic and semantic information at the level of relations, and are hence useful for tasks such as knowledge base population and relation extraction. We present a languageindependent method to automatically construct sar-graph instances that is based on distantly supervised relation extraction. We link sar-graphs at the lexical level to BabelNet, WordNet and UBY, and present our ongoing work on pattern- and relationlevel linking to FrameNet. An initial dataset of English sar-graphs for 25 relations is made publicly available, together with a Java-based API.

database and expert systems applications | 2010

Identifying Sentence-Level Semantic Content Units with Topic Models

Leonhard Hennig; Thomas Strecker; Sascha Narr; Ernesto William De Luca; Sahin Albayrak

Statistical approaches to document content modeling typically focus either on broad topics or on discourse-level subtopics of a text. We present an analysis of the performance of probabilistic topic models on the task of learning sentence-level topics that are similar to facts. The identification of sentential content with the same meaning is an important task in multi-document summarization and the evaluation of multi-document summaries. In our approach, each sentence is represented as a distribution over topics, and each topic is a distribution over words. We compare the topic-sentence assignments discovered by a topic model to gold-standard assignments that were manually annotated on a set of closely related pairs of news articles. We observe a clear correspondence between automatically identified and annotated topics. The high accuracy of automatically discovered topic-sentence assignments suggests that topic models can be utilized to identify (sub-) sentential semantic content units.

web intelligence | 2008

Tailoring Taxonomies for Efficient Text Categorization and Expert Finding

Robert Wetzker; Winfried Umbrath; Leonhard Hennig; Christian Bauckhage; Tansu Alpcan; Florian Metze

Automatic content categorization by means of taxonomies is a powerful tool for information retrieval and search technologies as it improves the accessibility of data both for humans and machines. While research on automatic categorization has mainly focused on the problem of classifier design, hardly any effort has been spent on the optimization of the taxonomy size itself. However, taxonomy tailoring may significantly improve computational efficiency and scalability of modern retrieval systems where taxonomies often consist of tens of thousands of non-uniformly distributed categories. In this paper we demonstrate empirically that small subtrees of a taxonomy already enable reliable categorization. We compare several measures for the optimal selection of sub-taxonomies and investigate to what extent a reduction affects the classification quality. We consider applications in classical document categorization and in the upcoming area of expert finding and report corresponding results obtained from experiments with standard benchmark data.

International Conference of the German Society for Computational Linguistics and Language Technology | 2017

In-Memory Distributed Training of Linear-Chain Conditional Random Fields with an Application to Fine-Grained Named Entity Recognition

Robert Schwarzenberg; Leonhard Hennig; Holmer Hemsen

Recognizing fine-grained named entities, i.e., street and city instead of just the coarse type location, has been shown to increase task performance in several contexts. Fine-grained types, however, amplify the problem of data sparsity during training, which is why larger amounts of training data are needed. In this contribution we address scalability issues caused by the larger training sets. We distribute and parallelize feature extraction and parameter estimation in linear-chain conditional random fields, which are a popular choice for sequence labeling tasks such as named entity recognition (NER) and part of speech (POS) tagging. To this end, we employ the parallel stream processing framework Apache Flink which supports in-memory distributed iterations. Due to this feature, contrary to prior approaches, our system becomes iteration-aware during gradient descent. We experimentally demonstrate the scalability of our approach and also validate the parameters learned during distributed training in a fine-grained NER task.

International Conference of the German Society for Computational Linguistics and Language Technology | 2017

Twitter Geolocation Prediction Using Neural Networks

Philippe Thomas; Leonhard Hennig

Knowing the location of a user is important for several use cases, such as location specific recommendations, demographic analysis, or monitoring of disaster outbreaks. We present a bottom up study on the impact of text- and metadata-derived contextual features for Twitter geolocation prediction. The final model incorporates individual types of tweet information and achieves state-of-the-art performance on a publicly available test set. The source code of our implementation, together with pretrained models, is freely available at https://github.com/Erechtheus/geolocation.

recent advances in natural language processing | 2017

Streaming Text Analytics for Real-Time Event Recognition.

Philippe Thomas; Johannes Kirschnick; Leonhard Hennig; Renlong Ai; Sven Schmeier; Holmer Hemsen; Feiyu Xu; Hans Uszkoreit

A huge body of continuously growing written knowledge is available on the web in the form of social media posts, RSS feeds, and news articles. Real-time information extraction from such high velocity, high volume text streams requires scalable, distributed natural language processing pipelines. We introduce such a system for fine-grained event recognition within the big data framework Flink, and demonstrate its capabilities for extracting and geo-locating mobility- and industry-related events from heterogeneous text sources. Performance analyses conducted on several large datasets show that our system achieves high throughput and maintains low latency, which is crucial when events need to be detected and acted upon in real-time. We also present promising experimental results for the event extraction component of our system, which recognizes a novel set of event types. The demo system is available at http://dfki.de/sd4m-sta-demo/.

meeting of the association for computational linguistics | 2016

Real-Time Discovery and Geospatial Visualization of Mobility and Industry Events from Large-Scale, Heterogeneous Data Streams

Leonhard Hennig; Philippe Thomas; Renlong Ai; Johannes Kirschnick; He Wang; Jakob Pannier; Nora Zimmermann; Sven Schmeier; Feiyu Xu; Jan Ostwald; Hans Uszkoreit

Monitoring mobility- and industryrelevant events is important in areas such as personal travel planning and supply chain management, but extracting events pertaining to specific companies, transit routes and locations from heterogeneous, high-volume text streams remains a significant challenge. We present Spree, a scalable system for real-time, automatic event extraction from social media, news and domain-specific RSS feeds. Our system is tailored to a range of mobilityand industry-related events, and processes German texts within a distributed linguistic analysis pipeline implemented in Apache Flink. The pipeline detects and disambiguates highly ambiguous domain-relevant entities, such as street names, and extracts various events with their geo-locations. Event streams are visualized on a dynamic, interactive map for monitoring and analysis.

Explore More