Oier Lopez de Lacalle
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oier Lopez de Lacalle.
Computational Linguistics | 2014
Eneko Agirre; Oier Lopez de Lacalle; Aitor Soroa
Word Sense Disambiguation (WSD) systems automatically choose the intended meaning of a word in context. In this article we present a WSD algorithm based on random walks over large Lexical Knowledge Bases (LKB). We show that our algorithm performs better than other graph-based methods when run on a graph built from WordNet and eXtended WordNet. Our algorithm and LKB combination compares favorably to other knowledge-based approaches in the literature that use similar knowledge on a variety of English data sets and a data set on Spanish. We include a detailed analysis of the factors that affect the algorithm. The algorithm and the LKBs used are publicly available, and the results easily reproducible.
empirical methods in natural language processing | 2006
Eneko Agirre; David Martinez; Oier Lopez de Lacalle; Aitor Soroa
This paper explores the use of two graph algorithms for unsupervised induction and tagging of nominal word senses based on corpora. Our main contribution is the optimization of the free parameters of those algorithms and its evaluation against publicly available gold standards. We present a thorough evaluation comprising supervised and unsupervised modes, and both lexical-sample and all-words tasks. The results show that, in spite of the information loss inherent to mapping the induced senses to the gold-standard, the optimization of parameters based on a small sample of nouns carries over to all nouns, performing close to supervised systems in the lexical sample task and yielding the second-best WSD systems for the Senseval-3 all-words task.
north american chapter of the association for computational linguistics | 2009
Eneko Agirre; Oier Lopez de Lacalle; Christiane Fellbaum; Andrea Marchetti; Antonio Toral; Piek Vossen
Domain portability and adaptation of NLP components and Word Sense Disambiguation systems present new challenges. The difficulties found by supervised systems to adapt might change the way we assess the strengths and weaknesses of supervised and knowledge-based WSD systems. Unfortunately, all existing evaluation datasets for specific domains are lexical-sample corpora. This task presented all-words datasets on the environment domain for WSD in four languages (Chinese, Dutch, English, Italian). 11 teams participated, with supervised and knowledge-based systems, mainly in the English dataset. The results show that in all languages the participants where able to beat the most frequent sense heuristic as estimated from general corpora. The most successful approaches used some sort of supervision in the form of hand-tagged examples from the domain.
conference on information and knowledge management | 2011
Roberto Navigli; Stefano Faralli; Aitor Soroa; Oier Lopez de Lacalle; Eneko Agirre
In this paper we present a novel approach to learning semantic models for multiple domains, which we use to categorize Wikipedia pages and to perform domain Word Sense Disambiguation (WSD). In order to learn a semantic model for each domain we first extract relevant terms from the texts in the domain and then use these terms to initialize a random walk over the WordNet graph. Given an input text, we check the semantic models, choose the appropriate domain for that text and use the best-matching model to perform WSD. Our results show considerable improvements on text categorization and domain WSD tasks.
meeting of the association for computational linguistics | 2007
Eneko Agirre; Oier Lopez de Lacalle
This work describes the University of the Basque Country system (UBC-ALM) for lexical sample and all-words WSD subtasks of SemEval-2007 task 17, where it performed in the second and fifth positions respectively. The system is based on a combination of k-Nearest Neighbor classifiers, with each classifier learning from a distinct set of features: local features (syntactic, collocations features), topical features (bag-of-words, domain information) and latent features learned from a reduced space using Singular Value Decomposition.
Journal of Artificial Intelligence Research | 2008
David Martinez; Oier Lopez de Lacalle; Eneko Agirre
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Senseval benchmarks shows that we are able to improve over commonly used baselines and to achieve top-rank performance. The good use of the prior distributions from the senses proved to be a crucial factor.
international conference on computational linguistics | 2008
Eneko Agirre; Oier Lopez de Lacalle
In this paper we explore robustness and domain adaptation issues for Word Sense Disambiguation (WSD) using Singular Value Decomposition (SVD) and unlabeled data. We focus on the semi-supervised domain adaptation scenario, where we train on the source corpus and test on the target corpus, and try to improve results using unlabeled data. Our method yields up to 16.3% error reduction compared to state-of-the-art systems, being the first to report successful semi-supervised domain adaptation. Surprisingly the improvement comes from the use of unlabeled data from the source corpus, and not from the target corpora, meaning that we get robustness rather than domain adaptation. In addition, we study the behavior of our system on the target domain.
meeting of the association for computational linguistics | 2009
Eneko Agirre; Oier Lopez de Lacalle
The lack of positive results on supervised domain adaptation for WSD have cast some doubts on the utility of hand-tagging general corpora and thus developing generic supervised WSD systems. In this paper we show for the first time that our WSD system trained on a general source corpus (Bnc) and the target corpus, obtains up to 22% error reduction when compared to a system trained on the target corpus alone. In addition, we show that as little as 40% of the target corpus (when supplemented with the source corpus) is sufficient to obtain the same results as training on the full target data. The key for success is the use of unlabeled data with svd, a combination of kernels and svm.
cross-language evaluation forum | 2007
Eneko Agirre; Bernardo Magnini; Oier Lopez de Lacalle; Arantxa Otegi; German Rigau; Piek Vossen
This paper presents a first attempt of an application-driven evaluation exercise of WSD. We used a CLIR testbed from the Cross Lingual Evaluation Forum. The expansion, indexing and retrieval strategies where fixed by the organizers. The participants had to return both the topics and documents tagged with WordNet 1.6 word senses. The organization provided training data in the form of a pre-processed Semcor which could be readily used by participants. The task had two participants, and the organizer also provide an in-house WSD system for comparison.
virtual systems and multimedia | 2012
Kate Fernie; Paul D. Clough; Paula Goodale; Mark M. Hall; Eneko Agirre; Oier Lopez de Lacalle; Runar Bergheim
Digitisation of the cultural heritage means that a significant amount of material is now available through online digital library portals. However, the vast quantity of cultural heritage material can also be overwhelming for many users who lack knowledge of the collections, subject knowledge and the specialist language used to describe this content. Search portals often provide little or no guidance on how to find and interpret this information. The situation is very different in museums and galleries where collections are organized in exhibitions which offer themes and stories that visitors can explore. The PATHS project, which is funded under the European Commissions FP7 programme, is developing a system that explores the familiar metaphor of a trail (pathway) to enhance the discovery and use of the content made available in digital libraries. This paper will report on the findings of the user requirements analysis and the specifications for the first prototype of the PATHS system which is based on contents from Europeana and the Alinari Archives.