Stephen Roller | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen Roller is active.

Explore More

Publication

Featured researches published by Stephen Roller.

north american chapter of the association for computational linguistics | 2016

MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

Ye Zhang; Stephen Roller; Byron C. Wallace

We introduce a novel, simple convolution neural network (CNN) architecture - multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of word embeddings for sentence classification. MGNC-CNN extracts features from input embedding sets independently and then joins these at the penultimate layer in the network to form a final feature vector. We then adopt a group regularization strategy that differentially penalizes weights associated with the subcomponents generated from the respective embedding sets. This model is much simpler than comparable alternative architectures and requires substantially less training time. Furthermore, it is flexible in that it does not require input word embeddings to be of the same dimensionality. We show that MGNC-CNN consistently outperforms baseline models.

Computational Linguistics | 2016

Representing meaning with a combination of logical and distributional models

Islam Beltagy; Stephen Roller; Pengxiang Cheng; Katrin Erk; Raymond J. Mooney

NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence structure in the same detail as logic-based approaches. It has therefore been argued that the two are complementary.We adopt a hybrid approach that combines logical and distributional semantics using probabilistic logic, specifically Markov Logic Networks. In this article, we focus on the three components of a practical system:1 1) Logical representation focuses on representing the input problems in probabilistic logic; 2) knowledge base construction creates weighted inference rules by integrating distributional information with other sources; and 3) probabilistic inference involves solving the resulting MLN inference problems efficiently. To evaluate our approach, we use the task of textual entailment, which can utilize the strengths of both logic-based and distributional representations. In particular we focus on the SICK data set, where we achieve state-of-the-art results. We also release a lexical entailment data set of 10,213 rules extracted from the SICK data set, which is a valuable resource for evaluating lexical entailment systems.2

international conference on computational linguistics | 2014

UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic

Islam Beltagy; Stephen Roller; Gemma Boleda; Katrin Erk; Raymond J. Mooney

We represent natural language semantics by combining logical and distributional information in probabilistic logic. We use Markov Logic Networks (MLN) for the RTE task, and Probabilistic Soft Logic (PSL) for the STS task. The system is evaluated on the SICK dataset. Our best system achieves 73% accuracy on the RTE task, and a Pearson’s correlation of 0.71 on the STS task.

empirical methods in natural language processing | 2016

Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment

Stephen Roller; Katrin Erk

We consider the task of predicting lexical entailment using distributional vectors. We perform a novel qualitative analysis of one existing model which was previously shown to only measure the prototypicality of word pairs. We find that the model strongly learns to identify hypernyms using Hearst patterns, which are well known to be predictive of lexical relations. We present a novel model which exploits this behavior as a method of feature extraction in an iterative procedure similar to Principal Component Analysis. Our model combines the extracted features with the strengths of other proposed models in the literature, and matches or outperforms prior work on multiple data sets.

international joint conference on natural language processing | 2017

Distributional modeling on a diet: One-shot word learning from text only

Su Wang; Stephen Roller; Katrin Erk

We test whether distributional models can do one-shot learning of definitional properties from text only. Using Bayesian models, we find that first learning overarching structure in the known data, regularities in textual contexts and in properties, helps one-shot learning, and that individual context items can be highly informative.

north american chapter of the association for computational linguistics | 2016

PIC a Different Word: A Simple Model for Lexical Substitution in Context

Stephen Roller; Katrin Erk

The Lexical Substitution task involves selecting and ranking lexical paraphrases for a target word in a given sentential context. We present PIC, a simple measure for estimating the appropriateness of substitutes in a given context. PIC outperforms another simple, comparable model proposed in recent work, especially when selecting substitutes from the entire vocabulary. Analysis shows that PIC improves over baselines by incorporating frequency biases into predictions.

Proceedings of the 10th Workshop on Multiword Expressions (MWE) | 2014

Feature Norms of German Noun Compounds

Stephen Roller; Sabine Schulte im Walde

This paper presents a new data collection of feature norms for 572 German nounnoun compounds. The feature norms complement existing data sets for the same targets, including compositionality ratings, association norms, and images. We demonstrate that the feature norms are potentially useful for research on the nounnoun compounds and their semantic transparency: The feature overlap of the compounds and their constituents correlates with human ratings on the compound‐ constituent degrees of compositionality, = 0.46.

empirical methods in natural language processing | 2012