Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Arantxa Otegi is active.

Publication


Featured researches published by Arantxa Otegi.


cross language evaluation forum | 2009

CLEF 2009 ad hoc track overview: robust-WSD task

Eneko Agirre; Giorgio Maria Di Nunzio; Thomas Mandl; Arantxa Otegi

The Robust-WSD at CLEF 2009 aims at exploring the contribution ofWord Sense Disambiguation to monolingual and multilingual Information Retrieval. The organizers of the task provide documents and topics which have been automatically tagged with Word Senses from WordNet using several state-of-the-art Word Sense Disambiguation systems. The Robust-WSD exercise follows the same design as in 2008. It uses two languages often used in previous CLEF campaigns (English, Spanish). Documents were in English, and topics in both English and Spanish. The document collections are based on the widely used LA94 and GH95 news collections. All instructions and datasets required to replicate the experiment are available from the organizers website (http://ixa2.si.ehu.es/clirwsd/). The results show that some top-scoring systems improve their IR and CLIR results with the use of WSD tags, but the best scoring runs do not use WSD.


Knowledge and Information Systems | 2015

Using knowledge-based relatedness for information retrieval

Arantxa Otegi; Xabier Arregi; Olatz Ansa; Eneko Agirre

Traditional information retrieval (IR) systems use keywords to index and retrieve documents. The limitations of keywords were recognized since the early days, specially when different but closely related words are used in the query and the relevant document. Query expansion techniques like pseudo-relevance feedback (PRF) and document clustering techniques rely on the target document set in order to bridge the gap between those words. This paper explores the use of knowledge-based semantic relatedness techniques to overcome the vocabulary mismatch between the query and documents, both on IR and Passage Retrieval for question answering. We performed query expansion and document expansion using WordNet, with positive effects over a language modeling baseline on three datasets, and over PRF on two of those datasets. Our analysis shows that our models and PRF are complementary; in that, PRF is better for easy queries, and our models are stronger for difficult queries and that our models generalize better to other collections, being more robust to parameter adjustments. In addition, we show that our method has a positive impact in an end-to-end question answering system for Basque and that it can be readily applied to other knowledge bases, as our good results using Wikipedia show, paving the way for the use of other knowledge structures such as medical ontologies and linked data repositories.


Journal of Biomedical Informatics | 2014

Improving search over Electronic Health Records using UMLS-based query expansion through random walks

David Martinez; Arantxa Otegi; Aitor Soroa; Eneko Agirre

OBJECTIVE Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs. MATERIALS AND METHODS The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets. RESULTS Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline. DISCUSSION Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms. CONCLUSIONS Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts.


cross-language evaluation forum | 2007

SemEval-2007 Task 01: Evaluating WSD on Cross-Language Information Retrieval

Eneko Agirre; Bernardo Magnini; Oier Lopez de Lacalle; Arantxa Otegi; German Rigau; Piek Vossen

This paper presents a first attempt of an application-driven evaluation exercise of WSD. We used a CLIR testbed from the Cross Lingual Evaluation Forum. The expansion, indexing and retrieval strategies where fixed by the organizers. The participants had to return both the topics and documents tagged with WordNet 1.6 word senses. The organization provided training data in the form of a pre-processed Semcor which could be readily used by participants. The task had two participants, and the organizer also provide an in-house WSD system for comparison.


cross language evaluation forum | 2008

IXA at CLEF 2008 robust-WSD task: using word sense disambiguation for (cross lingual) information retrieval

Eneko Agirre; Arantxa Otegi; German Rigau

This paper describes experiments for the CLEF 2008 Robust-WSD task, both for the monolingual (English) and the bilingual (Spanish to English) subtasks. We tried several query and document expansion and translation strategies, with and without the use of the word sense disambiguation results provided by the organizers. All expansions and translations were done using the English and Spanish wordnets as provided by the organizers and no other resource was used. We used Indri as the search engine, which we tuned in the training part. Our main goal was to improve (Cross Lingual) Information Retrieval results using WSD information, and we attained improvements in both mono and bilingual subtasks, with statistically significant differences on the second. Our best systems ranked 4th overall and 3rd overall in the monolingual and bilingual subtasks, respectively.


cross language evaluation forum | 2009

Elhuyar-IXA: semantic relatedness and cross-lingual passage retrieval

Eneko Agirre; Olatz Ansa; Xabier Arregi; Maddalen Lopez de Lacalle; Arantxa Otegi; Xabier Saralegi; Hugo Zaragoza

This article describes the participation of the joint Elhuyar-IXA group in the ResPubliQA exercise at QA&CLEF. In particular, we participated in the English-English monolingual task and in the Basque-English crosslingual one. Our focus has been threefold: (1) to check to what extent information retrieval (IR) can achieve good results in passage retrieval without question analysis and answer validation, (2) to check Machine Readable Dictionary (MRD) techniques for the Basque to English retrieval when faced with the lack of parallel corpora for Basque in this domain, and (3) to check the contribution of semantic relatedness based on WordNet to expand the passages to related words. Our results show that IR provides good results in the monolingual task, that our crosslingual system performs lower than the monolingual runs, and that semantic relatedness improves the results in both tasks (by 6 and 2 points, respectively).


international conference theory and practice digital libraries | 2013

Implementing Recommendations in the PATHS System

Paul D. Clough; Arantxa Otegi; Eneko Agirre; Mark M. Hall

In this paper we describe the design and implementation of non-personalized recommendations in the PATHS system. This system allows users to explore items from Europeana in new ways. Recommendations of the type “people who viewed this item also viewed this item” are powered by pairs of viewed items mined from Europeana. However, due to limited usage data only 10.3 % of items in the PATHS dataset have recommendations (4.3 % of item pairs visited more than once). Therefore, “related items”, a form of content-based recommendation, are offered to users based on identifying similar items. We discuss some of the problems with implementing recommendations and highlight areas for future work in the PATHS project.


cross language evaluation forum | 2009

Using semantic relatedness and word sense disambiguation for (CL)IR

Eneko Agirre; Arantxa Otegi; Hugo Zaragoza

In this paper we report the experiments for the CLEF 2009 Robust-WSD task, both for the monolingual (English) and the bilingual (Spanish to English) subtasks. Our main experimentation strategy consisted of expanding and translating the documents, based on the related concepts of the documents. For that purpose we applied a state-of-the art semantic relatedness method based on WordNet. The relatedness measure was used with and without WSD information. Even though we obtained positive results in our training and development datasets, we did not manage to improve over the baseline in the monolingual case. The improvement over the baseline in the bilingual case is marginal. We plan further work on this technique, which has attained positive results in the passage retrieval for question answering task at CLEF (ResPubliQA).


acm/ieee joint conference on digital libraries | 2014

Personalised PageRank for making recommendations in digital cultural heritage collections

Arantxa Otegi; Eneko Agirre; Paul D. Clough

In this paper we describe the use of Personalised PageRank (PPR) to generate recommendations from a large collection of cultural heritage items. Various methods for computing item-to-item similarities are investigated, together with representing the collection as a network over which random walks can be taken. The network can represent similarity between item metadata, item co-occurrences in search logs, and the similarity of items based on linking them to Wikipedia articles and categories. To evaluate the use of PPR, search logs from Europeana are used to simulate user interactions. PPR on each information source is compared to a standard retrieval-based baseline, resulting in higher performance.


international conference theory and practice digital libraries | 2013

PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment

Eneko Agirre; Ander Barrena; Kike Fernandez; Esther Miranda; Arantxa Otegi; Aitor Soroa

Large amounts of cultural heritage material are nowadays available through online digital library portals. Most of these cultural items have short descriptions and lack rich contextual information. The PATHS project has developed experimental enrichment services. As a proof of concept, this paper presents a web service prototype which allows independent content providers to enrich cultural heritage items with a subset of the full functionality: links to related items in the collection and links to related Wikipedia articles. In the future we plan to provide more advanced functionality, as available offline for PATHS.

Collaboration


Dive into the Arantxa Otegi's collaboration.

Top Co-Authors

Avatar

Eneko Agirre

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Xabier Arregi

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Olatz Ansa

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aitor Soroa

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ander Soraluze

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

German Rigau

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Maddalen Lopez de Lacalle

University of the Basque Country

View shared research outputs
Researchain Logo
Decentralizing Knowledge