Daniel Ferrés
Polytechnic University of Catalonia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Ferrés.
meeting of the association for computational linguistics | 2007
Daniel Ferrés; Horacio Rodríguez
This paper describes our experiments on Textual Entailment in the context of the Third Pascal Recognising Textual Entailment (RTE-3) Evaluation Challenge. Our system uses a Machine Learning approach with Support Vector Machines and AdaBoost to deal with the RTE challenge. We perform a lexical, syntactic, and semantic analysis of the entailment pairs. From this information we compute a set of semantic-based distances between sentences. The results look promising specially for the QA entailment task.
MLQA '06 Proceedings of the Workshop on Multilingual Question Answering | 2006
Daniel Ferrés; Horacio Rodríguez
This paper describes an approach to adapt an existing multilingual Open-Domain Question Answering (ODQA) system for factoid questions to a Restricted Domain, the Geographical Domain. The adaptation of this ODQA system involved the modification of some components of our system such as: Question Processing, Passage Retrieval and Answer Extraction. The new system uses external resources like GNS Gazetteer for Named Entity (NE) Classification and Wikipedia or Google in order to obtain relevant documents for this domain. The system focuses on a Geographical Scope: given a region, or country, and a language we can semi-automatically obtain multilingual geographical resources (e.g. gazetteers, trigger words, groups of place names, etc.) of this scope. The system has been trained and evaluated for Spanish in the scope of the Spanish Geography. The evaluation reveals that the use of scope-based Geographical resources is a good approach to deal with multilingual Geographical Domain Question Answering.
cross-language evaluation forum | 2005
Daniel Ferrés; Samir Kanaan; Alicia Ageno; Edgar González; Horacio Rodríguez; Jordi Turmo
This paper describes the TALP-QA system in the context of the CLEF 2005 Spanish Monolingual Question Answering (QA) evaluation task. TALP-QA is a multilingual open-domain QA system that processes both factoid (normal and temporally restricted) and definition questions. The approach to factoid questions is based on in-depth NLP tools and resources to create semantic information representation. Answers to definition questions are selected from the phrases that match a pattern from a manually constructed set of definitional patterns.
cross language evaluation forum | 2004
Daniel Ferrés; Samir Kanaan; Alicia Ageno; Edgar González; Horacio Rodríguez; Mihai Surdeanu; Jordi Turmo
This paper describes TALP-QA, a multilingual open-domain Question Answering (QA) system that processes both factoid and definition questions. The system is described and evaluated in the context of our participation in the CLEF 2004 Spanish Monolingual QA task. Our approach to factoid questions is to build a semantic representation of the questions and the sentences in the passages retrieved for each question. A set of Semantic Constraints (SC) are extracted for each question. An answer extraction algorithm extracts and ranks sentences that satisfy the SCs of the question. If matches are not possible the algorithm relaxes the SCs structurally (removing constraints) and/or hierarchically (abstracting the constraints using a taxonomy). Answers to definition questions are generated by selecting the text fragment with more density of those terms more frequently related to the questions target (the Named Entity (NE) that appears in the question) throughout the corpus.
cross language evaluation forum | 2005
Daniel Ferrés; Alicia Ageno; Horacio Rodríguez
This paper describes GeoTALP-IR system, a Geographical Information Retrieval (GIR) system. The system is described and evaluated in the context of our participation in the CLEF 2005 GeoCLEF Monolingual English task. The GIR system is based on Lucene and uses a modified version of the Passage Retrieval module of the TALP Question Answering (QA) system presented at CLEF 2004 and TREC 2004 QA evaluation tasks. We designed a Keyword Selection algorithm based on a Linguistic and Geographical Analysis of the topics. A Geographical Thesaurus (GT) has been built using a set of publicly available Geographical Gazetteers and a Geographical Ontology. Our experiments show that the use of a Geographical Thesaurus for Geographical Indexing and Retrieval has improved the performance of our GIR system.
cross language evaluation forum | 2008
Daniel Ferrés; Horacio Rodríguez
This paper describes and analyzes the results of our experiments in Geographical Information Retrieval (GIR) in the context of our participation in the CLEF 2007 GeoCLEF Monolingual English task. Our system uses Linguistic and Geographical Analysis to process topics and document collections. Geographical Document Retrieval is performed with Terrier and Geographical Knowledge Bases. Our experiments show that Geographical Knowledge Bases can be used to improve the retrieval results of the Terrier state-of-the-art IR system by filtering out non geographically relevant documents.
cross language evaluation forum | 2006
Daniel Ferrés; Horacio Rodríguez
This paper describes our experiments in Geographical Information Retrieval (GIR) in the context of our participation in the CLEF 2006 GeoCLEF Monolingual English task. Our system, named TALP-GeoIR, follows a similar architecture of the GeoTALP-IR system presented at GeoCLEF 2005 with some changes in the retrieval modes and the Geographical Knowledge Base (KB). The system has four phases performed sequentially: i) a Keyword Selection algorithm based on a linguistic and geographical analysis of the topics, ii) a geographical retrieval with Lucene, iii) a document retrieval task with the JIRS Passage Retrieval (PR) software, and iv) a Document Ranking phase. A Geographical KB has been built using a set of publicly available geographical gazetteers and the Alexandria Digital Library (ADL) Feature Type Thesaurus. In our experiments we have used JIRS, a state-of-the-art PR system for Question Answering, for the GIR task. We also have experimented with an approach using both JIRS and Lucene. In this approach JIRS was used only for textual document retrieval and Lucene was used to detect the geographically relevant documents. These experiments show that applying only JIRS we obtain better results than combining JIRS and Lucene.
string processing and information retrieval | 2015
Daniel Ferrés; Horacio Rodríguez
This paper describes and evaluates the use of Geographical Knowledge Re-Ranking, Linguistic Processing, and Query Expansion techniques to improve Geographical Information Retrieval effectiveness. Geographical Knowledge Re-Ranking is performed with Geographical Gazetteers and conservative Toponym Disambiguation techniques that boost the ranking of the geographically relevant documents retrieved by standard state-of-the-art Information Retrieval algorithms. Linguistic Processing is performed in two ways: 1 Part-of-Speech tagging and Named Entity Recognition and Classification are applied to analyze the text collections and topics to detect toponyms, 2 Stemming Porters algorithm and Lemmatization are also applied in combination with default stopwords filtering. The Query Expansion methods tested are the Bose-Einstein Bo1 and Kullback-Leibler term weighting models. The experiments have been performed with the English Monolingual test collections of the GeoCLEF evaluations from years 2005, 2006, 2007, and 2008 using the TF-IDF, BM25, and InL2 Information Retrieval algorithms over unprocessed texts as baselines. The experiments have been performed with each GeoCLEF test collection 25 topics per evaluation separately and with the fusion of all these collections 100 topics. The results of evaluating separately Geographical Knowledge Re-Ranking, Linguistic Processing lemmatization, stemming, and the combination of both, and Query Expansion with the fusion of all the topics show that all these processes improve the Mean Average Precision MAP and RPrecision effectiveness measures in all the experiments and show statistical significance over the baselines in most of them. The best results in MAP and RPrecision are obtained with the InL2 algorithm using the following techniques: Geographical Knowledge Re-Ranking, Lemmatization with Stemming, and Kullback-Leibler Query Expansion. Some configurations with Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion have improved the MAP of the best official results at GeoCLEF evaluations of 2005, 2006, and 2007.
cross language evaluation forum | 2008
Daniel Ferrés; Horacio Rodríguez
This paper describes our experiments and analysis of the results of our participation in the Geographical Query Parsing pilot-task for English at GeoCLEF 2007. The system uses deep linguistic analysis and Geographical Knowledge to perform the task.
cross language evaluation forum | 2008
Davide Buscaldi; José Manuel Perea Ortega; Paolo Rosso; L. Alfonso Ureña López; Daniel Ferrés; Horacio Rodríguez
In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politecnica de Valencia and the one of the Universidad of Jaen participated to the GeoCLEF task under the name TextMess. The proposed result fusion method takes as input the document lists returned by the different systems and returns a document list where the documents are ranked according to the fuzzy Borda voting scheme. The obtained results show that the fusion method allows to improve the results of the component systems, although the fusion is not optimal, because it is effective only if the components return a similar set of relevant documents.