Stefan Bott
Pompeu Fabra University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefan Bott.
conference on web accessibility | 2013
Luz Rello; Ricardo A. Baeza-Yates; Stefan Bott; Horacio Saggion
We present a user study for two different automatic strategies that simplify text content for people with dyslexia. The strategies considered are the standard one (replacing a complex word with the most simpler synonym) and a new one that presents several synonyms for a complex word if the user requests them. We compare texts transformed by both strategies with the original text and to a gold standard manually built. The study was undertook by 96 participants, 47 with dyslexia plus a control group of 49 people without dyslexia. To show device independence, for the new strategy we used three different reading devices. Overall, participants with dyslexia found texts presented with the new strategy significantly more readable and comprehensible. To the best of our knowledge, this is the largest user study of its kind.
ACM Transactions on Accessible Computing | 2015
Horacio Saggion; Sanja Štajner; Stefan Bott; Simon Mille; Luz Rello
The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.
international conference on computational linguistics | 2013
Sanja Štajner; Stefan Bott; Susana Bautista; Horacio Saggion
In this paper we present two components of an automatic text simplification system for Spanish, aimed at making news articles more accessible to readers with cognitive disabilities. Our system in its current state consists of a rule-based lexical transformation component and a module for syntactic simplification. We evaluate the two components separately and as a whole, with a view to determining the level of simplification and the preservation of meaning and grammaticality. In order to test the readability level pre- and post-simplification, we apply seven readability measures for Spanish to three sets of randomly chosen news articles: the original texts, the output obtained after lexical transformations, the syntactic simplification output, and the output of both system components. To test whether the simplification output is grammatically correct and semantically adequate, we ask human annotators to grade pairs of original and simplified sentences according to these two criteria. Our results suggest that both components of our system produce simpler output when compared to the original, and that grammaticality and meaning preservation are positively rated by the annotators.
conference on computational natural language learning | 2009
Xavier Lluís; Stefan Bott; Lluís Màrquez
We present a system developed for the CoNLL-2009 Shared Task (Hajic et al., 2009). We extend the Carreras (2007) parser to jointly annotate syntactic and semantic dependencies. This state-of-the-art parser factorizes the built tree in second-order factors. We include semantic dependencies in the factors and extend their score function to combine syntactic and semantic scores. The parser is coupled with an on-line averaged perceptron (Collins, 2002) as the learning method. Our averaged results for all seven languages are 71.49 macro F1, 79.11 LAS and 63.06 semantic F1.
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus | 2006
Gemma Boleda; Stefan Bott; Rodrigo Meza; Carlos Castillo; Toni Badia; Vicente López
This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a flexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages.
conference on web accessibility | 2013
Luz Rello; Clara Bayarri; Azuki Gòrriz; Ricardo A. Baeza-Yates; Saurabh Gupta; Gaurang Kanvinde; Horacio Saggion; Stefan Bott; Roberto Carlini; Vasile Topac
Even if dyslexia is neurological in origin, certain text modifications could make texts more accessible for people with dyslexia. We introduce DysWebxia 2.0, a model that integrates our findings from research conducted with this target group. It alters content and presentation of the text to make it more readable. We also present the current integrations of DysWebxia in different reading software applications.
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing | 2013
Horacio Saggion; Stefan Bott; Luz Rello
In this paper we study the effect of different lexical resources and strategies for selecting synonyms in a lexical simplification system for the Spanish language. The resources used for the experiments are the Spanish EuroWordNet, the Spanish Open Thesaurus and a combination of both. As for the synonym selection strategies, we have used both local and global contexts for word sense disambiguation. We present a novel evaluation framework in lexical simplification that takes into account the level of ambiguity of the word to be simplified. The evaluation compares various instances of the lexical simplification system, a gold standard, and a baseline. On the basis of our results we recommend different resources and word sense disambiguation methods depending on the ambiguity level of the target word to be simplified.
language resources and evaluation | 2014
Stefan Bott; Horacio Saggion
In this paper we present the development of a text simplification system for Spanish. Text simplification is the adaptation of a text for the special needs of certain groups of readers, such as language learners, people with cognitive difficulties, and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing text is labour-intensive and costly. Automatic simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no existing simplification tools for Spanish. We present a corpus study which aims to identify the operations a text simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify news texts. We also present a first prototype for automatic simplification, which shows that the most important simplification operations can be successfully treated.
Computer Speech & Language | 2016
Horacio Saggion; Stefan Bott; Luz Rello
HighlightsWe developed the first lexical simplification for Spanish.Human-informed evaluation of the system.Comparison of two WSD strategies.Comparison of two lexical resources.Software and dataset made available for testing and verification. In this paper we study the effect of different lexical resources for selecting synonyms and strategies for word sense disambiguation in a lexical simplification system for the Spanish language. The resources used for the experiments are the Spanish EuroWordNet, the Spanish Open Thesaurus and a combination of both. As for the synonym selection strategies, we have used both local and global contexts for word sense disambiguation. We present a novel evaluation framework in lexical simplification that takes into account the level of ambiguity of the word to be simplified. The evaluation compares various instances of the lexical simplification system, a gold standard, and a baseline. The paper presents an in-depth qualitative error analysis of the results.
joint conference on lexical and computational semantics | 2014
Stefan Bott; Sabine Schulte im Walde
German particle verbs, like anblicken (to gaze at) combine a base verb (blicken) with a particle (an) to form a special kind of Multi Word Expression. Particle verbs may share the semantics of the base verb and the particle to a variable degree. However, while syntactic subcategorization frames tend to be good predictor for the semantics of verbs in general (verbs that are similar in meaning also tend to have similar subcategorization frames and selectional preferences), there are regular changes in subcategorization frames by particle verbs with regard to the corresponding base verbs. This paper demonstrates that the syntactic behavior of particle verbs and base verbs together (modeling regular changes in subcategorization frames by particle verbs and corresponding base verbs) and applying clustering techniques allows us to distinguish particle verb meaning and shows the tight connection between transfer patterns and the semantic classes of particle verbs.