Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Steffen Remus is active.

Publication


Featured researches published by Steffen Remus.


north american chapter of the association for computational linguistics | 2015

Do Supervised Distributional Methods Really Learn Lexical Inference Relations

Omer Levy; Steffen Remus; Chris Biemann; Ido Dagan

Distributional representations of words have been recently used in supervised settings for recognizing lexical inference relations between word pairs, such as hypernymy and entailment. We investigate a collection of these state-of-the-art methods, and show that they do not actually learn a relation between two words. Instead, they learn an independent property of a single word in the pair: whether that word is a “prototypical hypernym”.


north american chapter of the association for computational linguistics | 2016

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

Alexander Panchenko; Stefano Faralli; Eugen Ruppert; Steffen Remus; Hubert Naets; Cédrick Fairon; Simone Paolo Ponzetto; Chris Biemann

We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and Hearst-style lexicosyntactic patterns from domain-specific texts obtained via language model based focused crawling. Extracted taxonomies are evaluated on English, Dutch, French and Italian for three domains each (Food, Environment and Science). Evaluations against a gold standard and by human judgment show that our method outperforms more complex and knowledge-rich approaches on most domains and languages. Furthermore, to adapt the method to a new domain or language, only a small amount of manual labour is needed.


meeting of the association for computational linguistics | 2016

EmpiriST: AIPHES - Robust Tokenization and POS-Tagging for Different Genres

Steffen Remus; Gerold Hintz; Chris Biemann; Christian M. Meyer; Darina Benikova; Judith Eckle-Kohler; Margot Mieskes; Thomas Arnold

We present our system used for the AIPHES team submission in the context of the EmpiriST shared task on “Automatic Linguistic Annotation of ComputerMediated Communication / Social Media”. Our system is based on a rulebased tokenizer and a machine learning sequence labelling POS tagger using a variety of features. We show that the system is robust across the two tested genres: German computer mediated communication (CMC) and general German web data (WEB). We achieve the second rank in three of four scenarios. Also, the presented systems are freely available as open source components.


conference of the european chapter of the association for computational linguistics | 2014

Unsupervised Relation Extraction of In-Domain Data from Focused Crawls

Steffen Remus

This thesis proposal approaches unsupervised relation extraction from web data, which is collected by crawling only those parts of the web that are from the same domain as a relatively small reference corpus. The first part of this proposal is concerned with the efficient discovery of web documents for a particular domain and in a particular language. We create a combined, focused web crawling system that automatically collects relevant documents and minimizes the amount of irrelevant web content. The collected web data is semantically processed in order to acquire rich in-domain knowledge. Here, we focus on fully unsupervised relation extraction by employing the extended distributional hypothesis. We use distributional similarities between two pairs of nominals based on dependency paths as context and vice versa for identifying relational structure. We apply our system for the domain of educational sciences by focusing primarily on crawling scientific educational publications in the web. We are able to produce promising initial results on relation identification and we will discuss future directions.


Cognitive Approach to Natural Language Processing | 2017

Benchmarking n-grams, Topic Models and Recurrent Neural Networks by Cloze Completions, EEGs and Eye Movements

Markus J. Hofmann; Chris Biemann; Steffen Remus

Abstract: In neurocognitive psychology, manually collected cloze completion probabilities (CCPs) are the standard approach to quantifying a word’s predictability from sentence context. Here, we test a series of language models in accounting for CCPs, as well as the data they typically account for, i.e. electroencephalographic (EEG) and eye movement (EM) data. With this, we hope to render time-consuming CCP procedures unnecessary. We test a statistical n-gram language model, a Latent Dirichlet Allocation (LDA) topic model, as well as a recurrent neural network (RNN) language model for correlation with the neurocognitive data.


RANLP 2017 - Biomedical NLP Workshop | 2017

Entity-Centric Information Access with the Human-in-the-Loop for the Biomedical Domains

Seid Muhie Yimam; Steffen Remus; Alexander Panchenko; Andreas Holzinger; Chris Biemann

In this paper, we describe the concept of entity-centric information access for the biomedical domain. With entity recognition technologies approaching acceptable levels of accuracy, we put forward a paradigm of document browsing and searching where the entities of the domain and their relations are explicitly modeled to provide users the possibility of collecting exhaustive information on relations of interest. We describe three working prototypes along these lines: NEW/S/LEAK, which was developed for investigative journalists who need a quick overview of large leaked document collections; STORYFINDER, which is a personalized organizer for information found in web pages that allows adding entities as well as relations, and is capable of personalized information management; and adaptive annotation capabilities of WEBANNO, which is a general-purpose linguistic annotation tool. We will discuss future steps towards the adaptation of these tools to biomedical data, which is subject to a recently started project on biomedical knowledge acquisition. A key difference to other approaches is the centering around the user in a Human-in-theLoop machine learning approach, where users define and extend categories and enable the system to improve via feedback and interaction.


language resources and evaluation | 2016

Domain-Specific Corpus Expansion with Focused Webcrawling.

Steffen Remus; Chris Biemann


KONVENS | 2014

Knowledge Discovery in Scientific Literature.

Jinseok Nam; Christian Kirschner; Zheng Ma; Nicolai Erbs; Susanne Neumann; Daniela Oelke; Steffen Remus; Chris Biemann; Judith Eckle-Kohler; Johannes Fürnkranz; Iryna Gurevych; Marc Rittberger; Karsten Weihe


language resources and evaluation | 2018

Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities.

Steffen Remus; Chris Biemann

Collaboration


Dive into the Steffen Remus's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Judith Eckle-Kohler

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Alexander Panchenko

Université catholique de Louvain

View shared research outputs
Top Co-Authors

Avatar

Christian M. Meyer

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Darina Benikova

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Eugen Ruppert

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Iryna Gurevych

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Jinseok Nam

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Johannes Fürnkranz

Technische Universität Darmstadt

View shared research outputs
Researchain Logo
Decentralizing Knowledge