Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jannik Strötgen is active.

Publication


Featured researches published by Jannik Strötgen.


language resources and evaluation | 2013

Multilingual and cross-domain temporal tagging

Jannik Strötgen; Michael Gertz

Extraction and normalization of temporal expressions from documents are important steps towards deep text understanding and a prerequisite for many NLP tasks such as information extraction, question answering, and document summarization. There are different ways to express (the same) temporal information in documents. However, after identifying temporal expressions, they can be normalized according to some standard format. This allows the usage of temporal information in a term- and language-independent way. In this paper, we describe the challenges of temporal tagging in different domains, give an overview of existing annotated corpora, and survey existing approaches for temporal tagging. Finally, we present our publicly available temporal tagger HeidelTime, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules. We present a broad evaluation on multiple languages and domains on existing corpora as well as on a newly created corpus for a language/domain combination for which no annotated corpus has been available so far.


geographic information retrieval | 2010

Extraction and exploration of spatio-temporal information in documents

Jannik Strötgen; Michael Gertz; Pavel Popov

In the past couple of years, there have been significant advances in the areas of temporal information retrieval (TIR) and geographic information retrieval (GIR), each focusing on extracting and utilizing temporal and geographic information, respectively, from documents for search and exploration tasks. Interestingly, there is only little work that combines models, techniques and applications from these two areas to support scenarios and applications where temporal and geographic information in combination provide interesting meaningful nuggets in document exploration tasks, such as visualizing a chronological sequence of events with their locations. In this paper, we present an approach that combines the two areas of TIR and GIR. Using temporal and geographic information extracted from documents and recorded in temporal and geographic document profiles, we show how co-occurrences of such information are determined and spatio-temporal document profiles are computed. Such profiles then provide the basis for a variety of document search and exploration tasks, such as visualizing the sequences of events on a map. We present a prototypical implementation of our system and demonstrate the effectiveness of combining GIR and TIR in the context of document exploration tasks.


Proceedings of the 2nd Temporal Web Analytics Workshop on | 2012

Identification of top relevant temporal expressions in documents

Jannik Strötgen; Omar Alonso; Michael Gertz

Temporal information is very common in textual documents, and thus, identifying, normalizing, and organizing temporal expressions is an important task in IR. Although there are some tools for temporal tagging, there is a lack in research focusing on the relevance of temporal expressions. Besides counting their frequency and verifying whether they satisfy a temporal search query, temporal expressions are often considered in isolation only. There are no methods to calculate the relevance of temporal expressions, neither in general nor with respect to a query. In this paper, we present an approach to identify top relevant temporal expressions in documents using expression-, document-, corpus-, and query-based features. We present two relevance functions: one to calculate relevance scores for temporal expressions in general, and one with respect to a search query, which consists of a textual part, a temporal part, or both. Using two evaluation scenarios, we demonstrate the effectiveness of our approach.


very large data bases | 2010

TimeTrails: a system for exploring spatio-temporal information in documents

Jannik Strötgen; Michael Gertz

Spatial and temporal data have become ubiquitous in many application domains such as the Geosciences or life sciences. Sophisticated database management systems are employed to manage such structured data. However, an important source of spatio-temporal information that has not been fully utilized are unstructured text documents. In documents, combinations of temporal and spatial expressions form events, which can be mapped to a database structure and organized into trajectories that can be explored. In this context, the coupling of information retrieval techniques with spatio-temporal database concepts leads to new ways for managing and exploring document collections. In this demonstration, we present TimeTrails, a system for the extraction, querying, storage, and exploration of spatio-temporal information embedded in text documents. The user can query a document collection, and TimeTrails visualizes the spatio-temporal information extracted from relevant documents as document trajectories, resulting in a map-based view of documents. This view helps the user to explore the temporal and spatial content of documents in a meaningful way and to further restrict search results using spatial and temporal predicates.Spatial and temporal data have become ubiquitous in many application domains such as the Geosciences or life sciences. Sophisticated database management systems are employed to manage such structured data. However, an important source of spatio-temporal information that has not been fully utilized are unstructured text documents. In documents, combinations of temporal and spatial expressions form events, which can be mapped to a database structure and organized into trajectories that can be explored. In this context, the coupling of information retrieval techniques with spatio-temporal database concepts leads to new ways for managing and exploring document collections. In this demonstration, we present TimeTrails, a system for the extraction, querying, storage, and exploration of spatio-temporal information embedded in text documents. The user can query a document collection, and TimeTrails visualizes the spatio-temporal information extracted from relevant documents as document trajectories, resulting in a map-based view of documents. This view helps the user to explore the temporal and spatial content of documents in a meaningful way and to further restrict search results using spatial and temporal predicates.


international acm sigir conference on research and development in information retrieval | 2011

An event-centric model for multilingual document similarity

Jannik Strötgen; Michael Gertz; Conny Junghans

Document similarity measures play an important role in many document retrieval and exploration tasks. Over the past decades, several models and techniques have been developed to determine a ranked list of documents similar to a given query document. Interestingly, the proposed approaches typically rely on extensions to the vector space model and are rarely suited for multilingual corpora. In this paper, we present a novel document similarity measure that is based on events extracted from documents. An event is solely described by nearby occurrences of temporal and geographic expressions in a documents text. Thus, a document is modeled as a set of events that can be compared and ranked using temporal and geographic hierarchies. A key feature of our model is that it is term- and language-independent as temporal and geographic expressions mentioned in texts are normalized to a standard format. This also allows to determine similar documents across languages, an important feature in the context of document exploration. Our approach proves to be quite effective, including the discovery of new similarities, as our experiments using different (multilingual) corpora demonstrate.


acm/ieee joint conference on digital libraries | 2012

Event-centric search and exploration in document collections

Jannik Strötgen; Michael Gertz

Textual data ranging from corpora of digitized historic documents to large collections of news feeds provide a rich source for temporal and geographic information. Such types of information have recently gained a lot of interest in support of different search and exploration tasks, e.g., by organizing news along a timeline or placing the origin of documents on a map. However, for this, temporal and geographic information embedded in documents is often considered in isolation. We claim that through combining such information into (chronologically ordered) event-like features interesting and meaningful search and exploration tasks are possible. In this paper, we present a framework for the extraction, exploration, and visualization of event information in document collections. For this, one has to identify and combine temporal and geographic expressions from documents, thus enriching a document collection by a set of normalized events. Traditional search queries then can be enriched by conditions on the events relevant to the search subject. Most important for our event-centric approach is that a search result consists of a sequence of events relevant to the search terms and not just a document hit-list. Such events can originate from different documents and can be further explored, in particular events relevant to a search query can be ordered chronologically. We demonstrate the utility of our framework by different (multilingual) search and exploration scenarios using a Wikipedia corpus.


ACM Transactions on Asian Language Information Processing | 2014

Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

Jannik Strötgen; Ayser Armiti; Tran Van Canh; Julian Zell; Michael Gertz

Most of the research on temporal tagging so far is done for processing English text documents. There are hardly any multilingual temporal taggers supporting more than two languages. Recently, the temporal tagger HeidelTime has been made publicly available, supporting the integration of new languages by developing language-dependent resources without modifying the source code. In this article, we describe our work on developing such resources for two Asian and two Romance languages: Arabic, Vietnamese, Spanish, and Italian. While temporal tagging of the two Romance languages has been addressed before, there has been almost no research on Arabic and Vietnamese temporal tagging so far. Furthermore, we analyze language-dependent challenges for temporal tagging and explain the strategies we followed to address them. Our evaluation results on publicly available and newly annotated corpora demonstrate the high quality of our new resources for the four languages, which we make publicly available to the research community.


international world wide web conferences | 2017

Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media

Kashyap Popat; Subhabrata Mukherjee; Jannik Strötgen; Gerhard Weikum

The web is a huge source of valuable information. However, in recent times, there is an increasing trend towards false claims in social media, other web-sources, and even in news. Thus, factchecking websites have become increasingly popular to identify such misinformation based on manual analysis. Recent research proposed methods to assess the credibility of claims automatically. However, there are major limitations: most works assume claims to be in a structured form, and a few deal with textual claims but require that sources of evidence or counter-evidence are easily retrieved from the web. None of these works can cope with newly emerging claims, and no prior method can give user-interpretable explanations for its verdict on the claims credibility. This paper overcomes these limitations by automatically assessing the credibility of emerging claims, with sparse presence in web-sources, and generating suitable explanations from judiciously selected sources. To this end, we retrieve diverse articles about the claim, and model the mutual interaction between: the stance (i.e., support or refute) of the sources, the language style of the articles, the reliability of the sources, and the claims temporal footprint on the web. Extensive experiments demonstrate the viability of our method and its superiority over prior works. We show that our methods work well for early detection of emerging claims, as well as for claims with limited presence on the web and social media.


conference on information and knowledge management | 2016

Credibility Assessment of Textual Claims on the Web

Kashyap Popat; Subhabrata Mukherjee; Jannik Strötgen; Gerhard Weikum

There is an increasing amount of false claims in news, social media, and other web sources. While prior work on truth discovery has focused on the case of checking factual statements, this paper addresses the novel task of assessing the credibility of arbitrary claims made in natural-language text - in an open-domain setting without any assumptions about the structure of the claim, or the community where it is made. Our solution is based on automatically finding sources in news and social media, and feeding these into a distantly supervised classifier for assessing the credibility of a claim (i.e., true or fake). For inference, our method leverages the joint interaction between the language of articles about the claim and the reliability of the underlying web sources. Experiments with claims from the popular website snopes.com and from reported cases of Wikipedia hoaxes demonstrate the viability of our methods and their superior accuracy over various baselines.


empirical methods in natural language processing | 2015

A Baseline Temporal Tagger for all Languages

Jannik Strötgen; Michael Gertz

Temporal taggers are usually developed for a certain language. Besides English, only few languages have been addressed, and only the temporal tagger HeidelTime covers several languages. While this tool was manually extended to these languages, there have been earlier approaches for automatic extensions to a single target language. In this paper, we present an approach to extend HeidelTime to all languages in the world. Our evaluation shows promising results, in particular considering that our approach neither requires language skills nor training data, but results in a baseline tagger for 200+ languages.

Collaboration


Dive into the Jannik Strötgen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge