Caroline Sporleder | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Caroline Sporleder is active.

Explore More

Publication

Featured researches published by Caroline Sporleder.

Natural Language Engineering | 2008

Using automatically labelled examples to classify rhetorical relations: An assessment

Caroline Sporleder; Alex Lascarides

Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning to obtain a classifier which can distinguish between different relations typically depends on the availability of manually labelled training data, which is very time-consuming to create. However, rhetorical relations are sometimes lexically marked, i.e., signalled by discourse markers (e.g., because, but, consequently etc.), and it has been suggested (Marcu and Echihabi, 2002) that the presence of these cues in some examples can be exploited to label them automatically with the corresponding relation. The discourse markers are then removed and the automatically labelled data are used to train a classifier to determine relations even when no discourse marker is present (based on other linguistic cues such as word co-occurrences). In this paper, we investigate empirically how feasible this approach is. In particular, we test whether automatically labelled, lexically marked examples are really suitable training material for classifiers that are then applied to unmarked examples. Our results suggest that training on this type of data may not be such a good strategy, as models trained in this way do not seem to generalise very well to unmarked data. Furthermore, we found some evidence that this behaviour is largely independent of the classifiers used and seems to lie in the data itself (e.g., marked and unmarked examples may be too dissimilar linguistically and removing unambiguous markers in the automatic labelling process may lead to a meaning shift in the examples).

Computational Linguistics | 2012

Modality and negation: An introduction to the special issue

Roser Morante; Caroline Sporleder

Traditionally, most research in NLP has focused on propositional aspects of meaning. To truly understand language, however, extra-propositional aspects are equally important. Modality and negation typically contribute significantly to these extra-propositional meaning aspects. Although modality and negation have often been neglected by mainstream computational linguistics, interest has grown in recent years, as evidenced by several annotation projects dedicated to these phenomena. Researchers have started to work on modeling factuality, belief and certainty, detecting speculative sentences and hedging, identifying contradictions, and determining the scope of expressions of modality and negation. In this article, we will provide an overview of how modality and negation have been modeled in computational linguistics.

meeting of the association for computational linguistics | 2009

Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions

Caroline Sporleder; Linlin Li

We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to tell apart literal and non-literal usages, even for idioms which occur in canonical form.

international conference on computational linguistics | 2008

Semantic Role Assignment for Event Nominalisations by Leveraging Verbal Data

Sebastian Padó; Marco Pennacchiotti; Caroline Sporleder

This paper presents a novel approach to the task of semantic role labelling for event nominalisations, which make up a considerable fraction of predicates in running text, but are underrepresented in terms of training data and difficult to model. We propose to address this situation by data expansion. We construct a model for nominal role labelling solely from verbal training data. The best quality results from salvaging grammatical features where applicable, and generalising over lexical heads otherwise.

empirical methods in natural language processing | 2009

Classifier Combination for Contextual Idiom Detection Without Labelled Data

Linlin Li; Caroline Sporleder

We propose a novel unsupervised approach for distinguishing literal and non-literal use of idiomatic expressions. Our model combines an unsupervised and a supervised classifier. The former bases its decision on the cohesive structure of the context and labels training data for the latter, which can then take a larger feature space into account. We show that a combination of both classifiers leads to significant improvements over using the unsupervised classifier alone.

ACM Transactions on Speech and Language Processing | 2006

Broad coverage paragraph segmentation across languages and domains

Caroline Sporleder; Mirella Lapata

This article considers the problem of automatic paragraph segmentation. The task is relevant for speech-to-text applications whose output transcipts do not usually contain punctuation or paragraph indentation and are naturally difficult to read and process. Text-to-text generation applications (e.g., summarization) could also benefit from an automatic paragaraph segementation mechanism which indicates topic shifts and provides visual targets to the reader. We present a paragraph segmentation model which exploits a variety of knowledge sources (including textual cues, syntactic and discourse-related information) and evaluate its performance in different languages and domains. Our experiments demonstrate that the proposed approach significantly outperforms our baselines and in many cases comes to within a few percent of human performance. Finally, we integrate our method with a single document summarizer and show that it is useful for structuring the output of automatically generated text.

language resources and evaluation | 2013

Beyond sentence-level semantic role labeling: linking argument structures in discourse

Josef Ruppenhofer; Russell Lee-Goldman; Caroline Sporleder; Roser Morante

Semantic role labeling is traditionally viewed as a sentence-level task concerned with identifying semantic arguments that are overtly realized in a fairly local context (i.e., a clause or sentence). However, this local view potentially misses important information that can only be recovered if local argument structures are linked across sentence boundaries. One important link concerns semantic arguments that remain locally unrealized (null instantiations) but can be inferred from the context. In this paper, we report on the SemEval 2010 Task-10 on “Linking Events and Their Participants in Discourse”, that addressed this problem. We discuss the corpus that was created for this task, which contains annotations on multiple levels: predicate argument structure (FrameNet and PropBank), null instantiations, and coreference. We also provide an analysis of the task and its difficulties.

linguistic annotation workshop | 2009

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Ines Rehbein; Josef Ruppenhofer; Caroline Sporleder

In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation does increase its overall quality.

IEEE Intelligent Systems | 2009

Making a Clean Sweep of Cultural Heritage

A. van den Bosch; M.G.J. van Erp; Caroline Sporleder

Automatic error detection is a high priority for cultural heritage data managers and researchers. The authors describe a general approach to cleaning cultural heritage databases.

Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL) | 2014

Structure-based Clustering of Novels

Mariona Coll Ardanuy; Caroline Sporleder

To date, document clustering by genres or authors has been performed mostly by means of stylometric and content features. With the premise that novels are societies in miniature, we build social networks from novels as a strategy to quantify their plot and structure. From each social network, we extract a vector of features which characterizes the novel. We perform clustering over the vectors obtained, and the resulting groups are contrasted in terms of author and genre.

Explore More