Georg Rehm
German Research Centre for Artificial Intelligence
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Georg Rehm.
european semantic web conference | 2016
Peter Bourgonje; Julián Moreno-Schneider; Jan Nehring; Georg Rehm; Felix Sasaki; Ankit Srivastava
In an attempt to put a Semantic Web-layer that provides linguistic analysis and discourse information on top of digital content, we develop a platform for digital curation technologies. The platform offers language-, knowledge- and data-aware services as a flexible set of workflows and pipelines for the efficient processing of various types of digital content. The platform is intended to enable human experts (knowledge workers) to get a grasp and understand the contents of large document collections in an efficient way so that they can curate, process and further analyse the collection according to their sector-specific needs.
meeting of the association for computational linguistics | 2017
Georg Rehm; Julian Moreno Schneider; Peter Bourgonje; Ankit Srivastava; Jan Nehring; Armin Berger; Luca König; Sören Räuchle; Jens Gerth
We present an approach at identifying a specific class of events, movement action events (MAEs), in a data set that consists of ca. 2,800 personal letters exchanged by the German architect Erich Mendelsohn and his wife, Luise. A backend system uses these and other semantic analysis results as input for an authoring environment that digital curators can use to produce new pieces of digital content. In our example case, the human expert will receive recommendations from the system with the goal of putting together a travelogue, i.e., a description of the trips and journeys undertaken by the couple. We describe the components and architecture and also apply the system to news data.
international conference on human interface and management of information | 2017
Georg Rehm; Jing He; Julián Moreno-Schneider; Jan Nehring; Joachim Quantz
Digital content and online media have reached an unprecedented level of relevance and importance. In the context of a research and technology transfer project on Digital Curation Technologies for online content we develop a platform that provides curation services that can be integrated into concrete curation or content management systems. In this project, the German Research Center for Artificial Intelligence (DFKI) collaborates with four Berlin-based SMEs that work with and on digital content in four different sectors. The curation services comprise several semantic text and document analytics processes as well as knowledge technologies that can be applied to document collections. The key objective of this set of curation services is to support knowledge workers and digital curators in their daily work, i.e., to automate or to semi-automate processes that the human experts are normally required to do intellectually and without tool support. The goal is to help this group of information and knowledge workers to become more efficient and more effective as well as to enable them to produce high-quality content in their respective sectors. In this article we concentrate on the current state of a user interface that is currently under development at ART+COM, one of the SME partners in the project. A second, more generic, i.e., not domain-specific user interface is under development at DFKI. In this article we describe the technology platform and the two different interfaces. We also take a look at the different requirements for ART+COM’s domain-specific and DFKI’s generic user interface.
international conference on human interface and management of information | 2017
Julián Moreno-Schneider; Peter Bourgonje; Georg Rehm
Digital content and online media have reached an unprecedented level of relevance and importance. In the context of a research and technology transfer project on Digital Curation Technologies for online content we develop a Semantic Storytelling prototype. The approach is based on the semantic analysis of document collections, in which, among others, individual analysis results are, if possible, mapped to external knowledge bases. We interlink key information contained in the documents of the collection, which can be essentially conceptualised as automatic hypertext generation. With this semantic layer on top of the set of documents in place, we attempt to identify interesting, surprising, eye-opening relationships between different concepts or entities mentioned in the document collection. In this article we concentrate on the current state of the user interfaces of our Semantic Storytelling prototype.
International Conference of the German Society for Computational Linguistics and Language Technology | 2017
Peter Bourgonje; Julián Moreno-Schneider; Ankit Srivastava; Georg Rehm
The sheer ease with which abusive and hateful utterances can be made online – typically from the comfort of your home and the lack of any immediate negative repercussions – using today’s digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.
Archive | 2015
Georg Rehm; Felix Sasaki
Der Beitrag beleuchtet im Kontext mehrsprachiger semantischer Anwendungen die Rolle ausgewahlter Technologien und Standards. Standardisierte semantische Ressourcen und standardisierte Verfahren fur ihre Nutzung in sprachtechnologischen Anwendungen und Workflows besitzen das Potential, die Qualitat der Anwendungen entscheidend zu verbessern und den Prozess der Anwendungsentwicklung erheblich zu vereinfachen. Im Zentrum steht zum einen die Infrastruktur META-SHARE. Diese wurde im Rahmen der Initiative META-NET entwickelt und umfasst ein XML-Metadatenschema fur die Katalogisierung von Sprachressourcen. Zum anderen behandelt der Beitrag die Nutzung von Linked Data zur Reprasentation von Metadaten und Sprachressourcen. Relevant hierfur sind Standards wie DCAT, NIF und ITS. Nach diesen praxisorientierten Betrachtungen schliest der Beitrag mit der Einbettung in einen groseren Kontext: die mehrsprachige, europaische Informationsgesellschaft.
The Prague Bulletin of Mathematical Linguistics | 2017
Ankit Srivastava; Georg Rehm; Felix Sasaki
Abstract With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translations are among the most common sources of error. In this paper, we attempt to minimise these types of errors by interfacing Statistical Machine Translation (SMT) models with Linked Open Data (LOD) resources such as DBpedia and BabelNet. We perform several experiments based on the SMT system Moses and evaluate multiple strategies for exploiting knowledge from multilingual linked data in automatically translating named entities. We conclude with an analysis of best practices for multilingual linked data sets in order to optimise their benefit to multilingual and cross-lingual applications.
International Conference of the German Society for Computational Linguistics and Language Technology | 2017
Ankit Srivastava; Sabine Weber; Peter Bourgonje; Georg Rehm
Coreference Resolution is the process of identifying all words and phrases in a text that refer to the same entity. It has proven to be a useful intermediary step for a number of natural language processing applications. In this paper, we describe three implementations for performing coreference resolution: rule-based, statistical, and projection-based (from English to German). After a comparative evaluation on benchmark datasets, we conclude with an application of these systems on German and English texts from different scenarios in digital curation such as an archive of personal letters, excerpts from a museum exhibition, and regional news articles.
International Conference of the German Society for Computational Linguistics and Language Technology | 2017
Georg Rehm; Julián Moreno-Schneider; Peter Bourgonje; Ankit Srivastava; Rolf Fricke; Jan Thomsen; Jing He; Joachim Quantz; Armin Berger; Luca König; Sören Räuchle; Jens Gerth; David Wabnitz
Many industries face an increasing need for smart systems that support the processing and generation of digital content. This is both due to an ever increasing amount of incoming content that needs to be processed faster and more efficiently, but also due to an ever increasing pressure of publishing new content in cycles that are getting shorter and shorter. In a research and technology transfer project we develop a platform that provides content curation services that can be integrated into Content Management Systems, among others. In the project we develop curation services, which comprise semantic text and document analytics processes as well as knowledge technologies that can be applied to document collections. The key objective is to support digital curators in their daily work, i.e., to (semi-)automate processes that the human experts are normally required to carry out intellectually and, typically, without tool support. The goal is to enable knowledge workers to become more efficient and more effective as well as to produce high-quality content. In this article we focus on the current state of development with regard to semantic storytelling in our four use cases.
meeting of the association for computational linguistics | 2017
Ankit Srivastava; Georg Rehm; Julian Moreno Schneider