Benjamin Roth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benjamin Roth is active.

Explore More

Publication

Featured researches published by Benjamin Roth.

international joint conference on natural language processing | 2015

Compositional Vector Space Models for Knowledge Base Completion

Arvind Neelakantan; Benjamin Roth; Andrew McCallum

Knowledge base (KB) completion adds new facts to a KB by making inferences from existing facts, for example by inferring with high likelihood nationality(X,Y) from bornIn(X,Y). Most previous methods infer simple one-hop relational synonyms like this, or use as evidence a multi-hop relational path treated as an atomic feature, like bornIn(X,Z)→ containedIn(Z,Y). This paper presents an approach that reasons about conjunctions of multi-hop relations non-atomically, composing the implications of a path using a recurrent neural network (RNN) that takes as inputs vector embeddings of the binary relation in the path. Not only does this allow us to generalize to paths unseen at training time, but also, with a single high-capacity RNN, to predict new relation types not seen when the compositional model was trained (zero-shot learning). We assemble a new dataset of over 52M relational triples, and show that our method improves over a traditional classifier by 11%, and a method leveraging pre-trained embeddings by 7%.

conference on information and knowledge management | 2013

A survey of noise reduction methods for distant supervision

Benjamin Roth; Tassilo Barth; Michael Wiegand; Dietrich Klakow

We survey recent approaches to noise reduction in distant supervision learning for relation extraction. We group them according to the principles they are based on: at-least-one constraints, topic-based models, or pattern correlations. Besides describing them, we illustrate the fundamental differences and attempt to give an outlook to potentially fruitful further research. In addition, we identify related work in sentiment analysis which could profit from approaches to noise reduction.

conference of the european chapter of the association for computational linguistics | 2014

RelationFactory: A Fast, Modular and Effective System for Knowledge Base Population

Benjamin Roth; Tassilo Barth; Grzegorz Chrupała; Martin Gropp; Dietrich Klakow

We present RelationFactory, a highly effective open source relation extraction system based on shallow modeling techniques. RelationFactory emphasizes modularity, is easily configurable and uses a transparent pipelined approach. The interactive demo allows the user to pose queries for whichRelationFactory retrieves and analyses contexts that contain relational information about the query entity. Additionally, a recall error analysis component categorizes and illustrates cases in which the system missed a correct answer.

applications of natural language to data bases | 2012

Web-Based relation extraction for the food domain

Michael Wiegand; Benjamin Roth; Dietrich Klakow

In this paper, we examine methods to extract different domain-specific relations from the food domain. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. As we need to process a large amount of unlabeled data our methods only require a low level of linguistic processing. This has also the advantage that these methods can provide responses in real time.

information retrieval facility conference | 2010

Combining wikipedia-based concept models for cross-language retrieval

Benjamin Roth; Dietrich Klakow

As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR by simply normalizing the training data. Furthermore, we show that LDA and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC corpus und a CLEF dataset.

north american chapter of the association for computational linguistics | 2016

Comparing Convolutional Neural Networks to Traditional Models for Slot Filling

Heike Adel; Benjamin Roth; Hinrich Schütze

We address relation classification in the context of slot filling, the task of finding and evaluating fillers like “Steve Jobs” for the slot X in “X founded Apple”. We propose a convolutional neural network which splits the input sentence into three parts according to the relation arguments and compare it to state-ofthe-art and traditional approaches of relation classification. Finally, we combine different methods and show that the combination is better than individual approaches. We also analyze the effect of genre differences on performance.

international acm sigir conference on research and development in information retrieval | 2010

Cross-language retrieval using link-based language models

Benjamin Roth; Dietrich Klakow

We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A combination scheme to interpolate the link translation model with retrieval based on Latent Dirichlet Allocation. On the CLEF 2000 data we achieve improvement with respect to the best German-English system at the bilingual track (non-significant) and improvement against a baseline based on machine translation (significant).

conference on information and knowledge management | 2013

Feature-based models for improving the quality of noisy training data for relation extraction

Benjamin Roth; Dietrich Klakow

Supervised relation extraction from text relies on annotated data. Distant supervision is a scheme to obtain noisy training data by using a knowledge base of relational tuples as the ground truth and finding entity pair matches in a text corpus. We propose and evaluate two feature-based models for increasing the quality of distant supervision extraction patterns. The first model is an extension of a hierarchical topic model that induces background, relation specific and argument-pair specific feature distributions. The second model is a perceptron, trained to match an objective function that enforces two constraints: 1) an at-least-one semantics, i.e. at least one training example per relational tuple is assumed to be correct; 2) high scores for a dedicated NIL label that accounts for the noise in the training data. For both algorithms, neither explicit negative data nor the ratio of negatives has to be provided. Both algorithms give improvements over a maximum likelihood baseline as well as over a previous topic model without features, evaluated on TAC KBP data.

european conference on information retrieval | 2016

Finding Relevant Relations in Relevant Documents

Michael Schuhmacher; Benjamin Roth; Simone Paolo Ponzetto; Laura Dietz

This work studies the combination of a document retrieval and a relation extraction system for the purpose of identifying query-relevant relational facts. On the TREC Web collection, we assess extracted facts separately for correctness and relevance. Despite some TREC topics not being covered by the relation schema, we find that this approach reveals relevant facts, and in particular those not yet known in the knowledge base DBpedia. The study confirms that mention frequency, document relevance, and entity relevance are useful indicators for fact relevance. Still, the task remains an open research problem.

conference of the european chapter of the association for computational linguistics | 2014

Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction

Michael Wiegand; Benjamin Roth; Dietrich Klakow

We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering.

Explore More