Caroline Sporleder
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Caroline Sporleder.
Natural Language Engineering | 2008
Caroline Sporleder; Alex Lascarides
Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning to obtain a classifier which can distinguish between different relations typically depends on the availability of manually labelled training data, which is very time-consuming to create. However, rhetorical relations are sometimes lexically marked, i.e., signalled by discourse markers (e.g., because, but, consequently etc.), and it has been suggested (Marcu and Echihabi, 2002) that the presence of these cues in some examples can be exploited to label them automatically with the corresponding relation. The discourse markers are then removed and the automatically labelled data are used to train a classifier to determine relations even when no discourse marker is present (based on other linguistic cues such as word co-occurrences). In this paper, we investigate empirically how feasible this approach is. In particular, we test whether automatically labelled, lexically marked examples are really suitable training material for classifiers that are then applied to unmarked examples. Our results suggest that training on this type of data may not be such a good strategy, as models trained in this way do not seem to generalise very well to unmarked data. Furthermore, we found some evidence that this behaviour is largely independent of the classifiers used and seems to lie in the data itself (e.g., marked and unmarked examples may be too dissimilar linguistically and removing unambiguous markers in the automatic labelling process may lead to a meaning shift in the examples).
Computational Linguistics | 2012
Roser Morante; Caroline Sporleder
Traditionally, most research in NLP has focused on propositional aspects of meaning. To truly understand language, however, extra-propositional aspects are equally important. Modality and negation typically contribute significantly to these extra-propositional meaning aspects. Although modality and negation have often been neglected by mainstream computational linguistics, interest has grown in recent years, as evidenced by several annotation projects dedicated to these phenomena. Researchers have started to work on modeling factuality, belief and certainty, detecting speculative sentences and hedging, identifying contradictions, and determining the scope of expressions of modality and negation. In this article, we will provide an overview of how modality and negation have been modeled in computational linguistics.
meeting of the association for computational linguistics | 2009
Caroline Sporleder; Linlin Li
We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to tell apart literal and non-literal usages, even for idioms which occur in canonical form.
international conference on computational linguistics | 2008
Sebastian Padó; Marco Pennacchiotti; Caroline Sporleder
This paper presents a novel approach to the task of semantic role labelling for event nominalisations, which make up a considerable fraction of predicates in running text, but are underrepresented in terms of training data and difficult to model. We propose to address this situation by data expansion. We construct a model for nominal role labelling solely from verbal training data. The best quality results from salvaging grammatical features where applicable, and generalising over lexical heads otherwise.
empirical methods in natural language processing | 2009
Linlin Li; Caroline Sporleder
We propose a novel unsupervised approach for distinguishing literal and non-literal use of idiomatic expressions. Our model combines an unsupervised and a supervised classifier. The former bases its decision on the cohesive structure of the context and labels training data for the latter, which can then take a larger feature space into account. We show that a combination of both classifiers leads to significant improvements over using the unsupervised classifier alone.
ACM Transactions on Speech and Language Processing | 2006
Caroline Sporleder; Mirella Lapata
This article considers the problem of automatic paragraph segmentation. The task is relevant for speech-to-text applications whose output transcipts do not usually contain punctuation or paragraph indentation and are naturally difficult to read and process. Text-to-text generation applications (e.g., summarization) could also benefit from an automatic paragaraph segementation mechanism which indicates topic shifts and provides visual targets to the reader. We present a paragraph segmentation model which exploits a variety of knowledge sources (including textual cues, syntactic and discourse-related information) and evaluate its performance in different languages and domains. Our experiments demonstrate that the proposed approach significantly outperforms our baselines and in many cases comes to within a few percent of human performance. Finally, we integrate our method with a single document summarizer and show that it is useful for structuring the output of automatically generated text.
language resources and evaluation | 2013
Josef Ruppenhofer; Russell Lee-Goldman; Caroline Sporleder; Roser Morante
Semantic role labeling is traditionally viewed as a sentence-level task concerned with identifying semantic arguments that are overtly realized in a fairly local context (i.e., a clause or sentence). However, this local view potentially misses important information that can only be recovered if local argument structures are linked across sentence boundaries. One important link concerns semantic arguments that remain locally unrealized (null instantiations) but can be inferred from the context. In this paper, we report on the SemEval 2010 Task-10 on “Linking Events and Their Participants in Discourse”, that addressed this problem. We discuss the corpus that was created for this task, which contains annotations on multiple levels: predicate argument structure (FrameNet and PropBank), null instantiations, and coreference. We also provide an analysis of the task and its difficulties.
linguistic annotation workshop | 2009
Ines Rehbein; Josef Ruppenhofer; Caroline Sporleder
In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation does increase its overall quality.
IEEE Intelligent Systems | 2009
A. van den Bosch; M.G.J. van Erp; Caroline Sporleder
Automatic error detection is a high priority for cultural heritage data managers and researchers. The authors describe a general approach to cleaning cultural heritage databases.
Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL) | 2014
Mariona Coll Ardanuy; Caroline Sporleder
To date, document clustering by genres or authors has been performed mostly by means of stylometric and content features. With the premise that novels are societies in miniature, we build social networks from novels as a strategy to quantify their plot and structure. From each social network, we extract a vector of features which characterizes the novel. We perform clustering over the vectors obtained, and the resulting groups are contrasted in terms of author and genre.