Sander Wubben | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sander Wubben is active.

Explore More

Publication

Featured researches published by Sander Wubben.

natural language generation | 2009

Clustering and Matching Headlines for Automatic Paraphrase Acquisition

Sander Wubben; Antal van den Bosch; Emiel Krahmer; Erwin Marsi

For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.

IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics | 2009

A semantic relatedness metric based on free link structure

Sander Wubben; Antal van den Bosch

While shortest paths in WordNet are known to correlate well with semantic similarity, an is-a hierarchy is less suited for estimating semantic relatedness. We demonstrate this by comparing two free scale networks ( ConceptNet and Wikipedia) to WordNet. Using the Finkelstein353 dataset we show that a shortest path metric run on Wikipedia attains a better correlation than WordNet-based metrics. ConceptNet attains a good correlation as well, but suffers from a low concept coverage.

meeting of the association for computational linguistics | 2016

Towards more variation in text generation: Developing and evaluating variation models for choice of referential form

Thiago Castro Ferreira; Emiel Krahmer; Sander Wubben

In this study, we introduce a nondeterministic method for referring expression generation. We describe two models that account for individual variation in the choice of referential form in automatically generated text: a Naive Bayes model and a Recurrent Neural Network. Both are evaluated using the VaREG corpus. Then we select the best performing model to generate referential forms in texts from the GREC-2.0 corpus and conduct an evaluation experiment in which humans judge the coherence and comprehensibility of the generated texts, comparing them both with the original references and those produced by a random baseline model.

north american chapter of the association for computational linguistics | 2016

Individual Variation in the Choice of Referential Form

Thiago Castro Ferreira; Emiel Krahmer; Sander Wubben

This study aims to measure the variation between writers in their choices of referential form by collecting and analysing a new and publicly available corpus of referring expressions. The corpus is composed of referring expressions produced by different participants in identical situations. Results, measured in terms of normalized entropy, reveal substantial individual variation. We discuss the problems and prospects of this finding for automatic text generation applications.

conference of the european chapter of the association for computational linguistics | 2009

Instance-Driven Discovery of Ontological Relation Labels

Marieke van Erp; Antal van den Bosch; Sander Wubben; Steve Hunt

An approach is presented to the automatic discovery of labels of relations between pairs of ontological classes. Using a hyperlinked encyclopaedic resource, we gather evidence for likely predicative labels by searching for sentences that describe relations between terms. The terms are instances of the pair of ontological classes under consideration, drawn from a populated knowledge base. Verbs or verb phrases are automatically extracted, yielding a ranked list of candidate relations. Human judges rate the extracted relations. The extracted relations provide a basis for automatic ontology discovery from a non-relational database. The approach is demonstrated on a database from the natural history domain.

conference on human information interaction and retrieval | 2017

Automatic Summarization of Domain-specific Forum Threads: Collecting Reference Data

Suzan Verberne; Antal van den Bosch; Sander Wubben; Emiel Krahmer

We create and analyze two sets of reference summaries for discussion threads on a patient support forum: expert summaries and crowdsourced, non-expert summaries. Ideally, reference summaries for discussion forum threads are created by expert members of the forum community. When there are few or no expert members available, crowdsourcing the reference summaries is an alternative. In this paper we investigate whether domain-specific forum data requires the hiring of domain experts for creating reference summaries. We analyze the inter-rater agreement for both data-sets and we train summarization models using the two types of reference summaries. The inter-rater agreement in crowdsourced reference summaries is low, close to random, while domain experts achieve a considerably higher, fair, agreement. The trained models however are similar to each other. We conclude that it is possible to train an extractive summarization model on crowdsourced data that is similar to an expert model, even if the inter-rater agreement for the crowdsourced data is low.

international conference on natural language generation | 2016

Towards proper name generation: a corpus analysis

Thiago Castro Ferreira; Sander Wubben; Emiel Krahmer

We introduce a corpus for the study of proper name generation. The corpus consists of proper name references to people in webpages, extracted from the Wikilinks corpus. In our analyses, we aim to identify the different ways, in terms of length and form, in which a proper names are produced throughout a text.

language resources and evaluation | 2018

Creating a reference data set for the summarization of discussion forum threads

Suzan Verberne; Emiel Krahmer; I.H.E. Hendrickx; Sander Wubben; Antal van den Bosch

In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model’s summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset.

international conference on natural language generation | 2016