Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roser Saurí is active.

Publication


Featured researches published by Roser Saurí.


language resources and evaluation | 2009

FactBank: a corpus annotated with event factuality

Roser Saurí; James Pustejovsky

Recent work in computational linguistics points out the need for systems to be sensitive to the veracity or factuality of events as mentioned in text; that is, to recognize whether events are presented as corresponding to actual situations in the world, situations that have not happened, or situations of uncertain interpretation. Event factuality is an important aspect of the representation of events in discourse, but the annotation of such information poses a representational challenge, largely because factuality is expressed through the interaction of numerous linguistic markers and constructions. Many of these markers are already encoded in existing corpora, albeit in a somewhat fragmented way. In this article, we present FactBank, a corpus annotated with information concerning the factuality of events. Its annotation has been carried out from a descriptive framework of factuality grounded on both theoretical findings and data analysis. FactBank is built on top of TimeBank, adding to it an additional level of semantic information.


meeting of the association for computational linguistics | 2005

Automating Temporal Annotation with TARSQI

Marc Verhagen; Inderjeet Mani; Roser Saurí; Jessica Littman; Robert Knippen; Seok Bae Jang; Anna Rumshisky; John Phillips; James Pustejovsky

We present an overview of TARSQI, a modular system for automatic temporal annotation that adds time expressions, events and temporal relations to news texts.


empirical methods in natural language processing | 2005

Evita: A Robust Event Recognizer For QA Systems

Roser Saurí; Robert Knippen; Marc Verhagen; James Pustejovsky

We present Evita, an application for recognizing events in natural language texts. Although developed as part of a suite of tools aimed at providing question answering systems with information about both temporal and intensional relations among events, it can be used independently as an event extraction tool. It is unique in that it is not limited to any pre-established list of relation types (events), nor is it restricted to a specific domain. Evita performs the identification and tagging of event expressions based on fairly simple strategies, informed by both linguistic-and statistically-based data. It achieves a performance ratio of 80.12% F-measure.


language resources and evaluation | 2005

Temporal and Event Information In Natural Language Text

James Pustejovsky; Robert Knippen; Jessica Littman; Roser Saurí

In this paper, we discuss the role that temporal information plays in natural language text, specifically in the context of question answering systems. We define a descriptive framework with which we can examine the temporally sensitive aspects of natural language queries. We then investigate broadly what properties a general specification language would need, in order to mark up temporal and event information in text. We present a language, TimeML, which attempts to capture the richness of temporal and event related information in language, while demonstrating how it can play an important part in the development of more robust question answering systems.


Computational Linguistics | 2012

Are you sure that this happened? assessing the factuality degree of events in text

Roser Saurí; James Pustejovsky

Identifying the veracity, or factuality, of event mentions in text is fundamental for reasoning about eventualities in discourse. Inferences derived from events judged as not having happened, or as being only possible, are different from those derived from events evaluated as factual. Event factuality involves two separate levels of information. On the one hand, it deals with polarity, which distinguishes between positive and negative instantiations of events. On the other, it has to do with degrees of certainty (e.g., possible, probable), an information level generally subsumed under the category of epistemic modality. This article aims at contributing to a better understanding of how event factuality is articulated in natural language. For that purpose, we put forward a linguistic-oriented computational model which has at its core an algorithm articulating the effect of factuality relations across levels of syntactic embedding. As a proof of concept, this model has been implemented in De Facto, a factuality profiler for eventualities mentioned in text, and tested against a corpus built specifically for the task, yielding an F1 of 0.70 (macro-averaging) and 0.80 (micro-averaging). These two measures mutually compensate for an over-emphasis present in the other (either on the lesser or greater populated categories), and can therefore be interpreted as the lower and upper bounds of the De Factos performance.


annual meeting of the special interest group on discourse and dialogue | 2009

Classification of discourse coherence relations: an exploratory study using multiple knowledge sources

Ben Wellner; James Pustejovsky; Catherine Havasi; Anna Rumshisky; Roser Saurí

In this paper we consider the problem of identifying and classifying discourse coherence relations. We report initial results over the recently released Discourse GraphBank (Wolf and Gibson, 2005). Our approach considers, and determines the contributions of, a variety of syntactic and lexico-semantic features. We achieve 81% accuracy on the task of discourse relation type classification and 70% accuracy on relation identification.


meeting of the association for computational linguistics | 2002

Medstract: creating large-scale information servers from biomedical texts

James Pustejovsky; José Wei Luo; Castaño; Jason Zhang; Roser Saurí

The automatic extraction of information from Medline articles and abstracts (commonly referred to now as the biobibliome) promises to play an increasingly critical role in aiding research while speeding up the discovery process. We have been developing robust natural language tools for the automated extraction of structured information from biomedical texts as part of a project we call MEDSTRACT. Here we will describe an architecture for developing databases for domain specific information servers for research and support in the biomedical community. These are currently comprised of the following: a Bio-Relation Server, and the Bio-Acronym server, Acromed, which will include also aliases. Each information server is derived automatically from an integration of diverse components which employ robust natural language processing of Medline text and IE techniques. The front-end consists of conventional search and navigation capabilities, as well as visualization tools that help to navigate the databases and explore the results of a search. It is hoped that this set of applications will allow for quick, structured access to relevant information on individual genes by biologists over the web.


international conference on semantic computing | 2007

Determining Modality and Factuality for Text Entailment

Roser Saurí; James Pustejovsky

Topic segmentation of videos enables topic-based categorization, retrieval and browsing and also facilitates efficient video authoring. Existing video topic segmentation techniques, however, are domain specific to news or narrative videos while generic approaches based on video shot analysis generate too fine-grained micro-segments. This paper addresses this challenge through a multi-modal semantic analysis technique for recognizing topical segments. We analyze the content of a video by using textual and audio features such as keyword synonym sets, sentence boundary information, silence/music breaks and speech similarity. Specifically, we propose a new natural language processing (NLP) technique for constructing synonym sets from video transcripts. A synonym set is a list of domain- specific keywords that are semantically related and represent a topic. We align the synonym sets with audio cues to identify the topical segments. Our experiments with six instructional videos show that the system produced very small number of false positives, and the topical segments generated by our system are 5.5 times longer on average compared to those generated by a state-of-the-art micro-segmentation system. The system has been embedded in an e-Learning project, and the user feedback on using the generated topical segments is very encouraging. The experiments were conducted with instructional videos, but our approach is domain-general and is not restricted to instructional videos.Recognizing textual entailment (TE) is a complex task involving knowledge from many different sources. One major source of information in this task is event factuality, since the inferences derivable from factual eventualities are different from those judged as possible or as non-existent. Some TE systems already factor in factuality features at the local level, but determining the factuality of events more generally involves dealing with information that is nonlocal to a particular textual event. In this paper, we present a tool providing events with their factuality values, characterized as pairs of modality and polarity features. In previous work, we identified polarity and modality at the local context with a performance of 92% precision and 56% recall. The research presented here extends and enhances our algorithm to incorporate the influence of non-local context as well as the identification of sources.


Proceedings of the 2005 international conference on Annotating, extracting and reasoning about time and events | 2005

Arguments in TimeML: events and entities

James Pustejovsky; Jessica Littman; Roser Saurí

TimeML is a specification language for the annotation of events and temporal expressions in natural language text. In addition, the language introduces three relational tags linking temporal objects and events to one another. These links impose both aspectual and temporal ordering over time objects, as well as mark up subordination contexts introduced by modality, evidentiality, and factivity. Given the richness of this specification, the TimeML working group decided not to include the arguments of events within the language specification itself. Full reasoning and inference over natural language texts clearly requires knowledge of events along with their participants. In this paper, we define the appropriate role of argumenthood within event markup and propose that TimeML should make a basic distinction between arguments that are events and those that are entities. We first review how TimeML treats event arguments in subordinating and aspectual contexts, creating eventevent relations between predicate and argument. As it turns out, these constructions cover a large number of the argument types selected for by event predicates.We suggest that TimeML be enriched slightly to include causal predicates, such as lead to, since these also involve event-event relations. As such, causal relationships will be a relation type for the new Discourse Link that will also encode other discourse relations such as elaboration. We propose that all other verbal arguments be ignored by the specification, and any predicate-argument binding of participants to an event should be performed by independent means. In fact, except for the event-denoting arguments handled by the extension to TimeML proposed here, almost full temporal ordering of the events in a text can be computed without argument identification.


asia information retrieval symposium | 2013

Generating New LIWC Dictionaries by Triangulation

Guillem Massó; Patrik Lambert; Carlos Rodríguez Penagos; Roser Saurí

This work aims at exploring a triangulation-based methodology for generating a sentiment dictionary in a language from equivalent dictionaries in other languages. Direct machine translation of dictionaries generally leads to incomplete or wrong results, but multilingual translation can help disambiguate and improve these data. More precisely, we want to translate the LIWC dictionary (Linguistic Inquiry and Word Count) into Catalan from the original English dictionary, complemented with other versions in Romance languages close to Catalan, that is, Spanish, French and Italian. Comparing translations from these dictionaries allows us to identify the most reliable solutions, namely, those common to the different languages. Since LIWC classifies words by categories, assigning the correct ones to the chosen translations is also an important issue, specially when the source categories are different. We present the results of a semi-automatic approach and the challenges that had to be addressed in the translation process.

Collaboration


Dive into the Roser Saurí's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna Rumshisky

University of Massachusetts Lowell

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Graham Katz

University of Osnabrück

View shared research outputs
Researchain Logo
Decentralizing Knowledge