Rodrigo Agerri
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rodrigo Agerri.
Artificial Intelligence | 2016
Rodrigo Agerri; German Rigau
We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.
conference of the european chapter of the association for computational linguistics | 2014
Iñaki San Vicente; Rodrigo Agerri; German Rigau
This paper presents a simple, robust and (almost) unsupervised dictionary-based method, qwn-ppv (Q-WordNet as Personalized PageRanking Vector) to automatically generate polarity lexicons. We show that qwn-ppv outperforms other automatically generated lexicons for the four extrinsic evaluations presented here. It also shows very competitive and robust results with respect to manually annotated ones. Results suggest that no single lexicon is best for every task and dataset and that the intrinsic evaluation of polarity lexicons is not a good performance indicator on a Sentiment Analysis task. The qwn-ppv method allows to easily create quality polarity lexicons whenever no domain-based annotated corpora are available for a given language.
north american chapter of the association for computational linguistics | 2015
Iñaki San Vicente; Xabier Saralegi; Rodrigo Agerri
This paper presents a supervised Aspect Based Sentiment Analysis (ABSA) system. Our aim is to develop a modular platform which allows to easily conduct experiments by replacing the modules or adding new features. We obtain the best result in the Opinion Target Extraction (OTE) task (slot 2) using an off-the-shelf sequence labeler. The target polarity classification (slot 3) is addressed by means of a multiclass SVM algorithm which includes lexical based features such as the polarity values obtained from domain and open polarity lexicons. The system obtains accuracies of 0.70 and 0.73 for the restaurant and laptop domain respectively, and performs second best in the out-of-domain hotel, achieving an accuracy of 0.80.
conference of the european chapter of the association for computational linguistics | 2014
Rodrigo Agerri; Josu Bermúdez; German Rigau
IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology. It aims at lowering the barriers of using NLP technology both for research purposes and for small industrial developers and SMEs by offering robust and efficient linguistic annotation to both researchers and non-NLP experts. IXA pipeline can be used “as is” or exploit its modularity to pick and change different components. This paper describes the general data-centric architecture of IXA pipeline and presents competitive results in several NLP annotations for English and Spanish.
Knowledge Based Systems | 2017
Egoitz Laparra; Rodrigo Agerri; Itziar Aldabe; German Rigau
Abstract In this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of Multilingual and Cross-lingual data sources. Based on the assumption that event-related information can be recovered from different documents written in different languages, we extend the Cross-document Event Ordering task presented at SemEval 2015 by specifying two new tasks for, respectively, Multilingual and Cross-lingual timeline extraction. We then develop three deterministic algorithms for timeline extraction based on two main ideas. First, we address implicit temporal relations at document level since explicit time-anchors are too scarce to build a wide coverage timeline extraction system. Second, we leverage several multilingual resources to obtain a single, interoperable, semantic representation of events across documents and across languages. The result is a highly competitive system that strongly outperforms the current state-of-the-art. Nonetheless, further analysis of the results reveals that linking the event mentions with their target entities and time-anchors remains a difficult challenge. The systems, resources and scorers are freely available to facilitate its use and guarantee the reproducibility of results.
Procesamiento Del Lenguaje Natural | 2018
Rodrigo Agerri; Montse Maritxalar; Verena Lyding; Lionel Nicolas
We present enetCollect, a large European COST action network set up with the aim of promoting a research trend combining the well-established domain of Language Learning with recent and successful crowdsourcing approaches. More specifically, the challenge of enetCollect is to foster the language skills of all citizens regardless of their backgrounds by enhancing the production of language learning material using Crowdsourcing techniques. In order to do so, the action will create a balanced interdisciplinary community of active stakeholders related to contentcreation, content-usage, and Learning/Content Management Systems to create a theoretical framework for achieving a shared understanding of Language Learning and Crowdsourcing. This will allow to unlock the crowdsourcing potential available for language learning and to facilitate the development of prototypical experiments for the production of language learning material, such as lesson or exercise content. These activities would potentially benefit a wide range of users and languages.
Procesamiento Del Lenguaje Natural | 2018
Rodrigo Agerri; Núria Bel; German Rigau; Horacio Saggion
The TUNER coordinated project (2016-2018) has focused on the development of domain adaptation technologies that reduce the cost of creating linguistic resources to develop systems in different languages and for different domains and genres. In this article we present the demonstrators, prototypes and resources that are already available project results.
international joint conference on artificial intelligence | 2017
Rodrigo Agerri; German Rigau
We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.
language resources and evaluation | 2010
Rodrigo Agerri; Ana García-Serrano
Knowledge Based Systems | 2015
Rodrigo Agerri; Xabier Artola; Zuhaitz Beloki; German Rigau; Aitor Soroa