Enrique Alfonseca
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Enrique Alfonseca.
north american chapter of the association for computational linguistics | 2009
Eneko Agirre; Enrique Alfonseca; Keith B. Hall; Jana Kravalova; Marius Pasca; Aitor Soroa
This paper presents and compares WordNet-based and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.
empirical methods in natural language processing | 2015
Katja Filippova; Enrique Alfonseca; Carlos A. Colmenares; Lukasz Kaiser; Oriol Vinyals
We present an LSTM approach to deletion-based sentence compression where the task is to translate a sentence into a sequence of zeros and ones, corresponding to token deletion decisions. We demonstrate that even the most basic version of the system, which is given no syntactic information (no PoS or NE tags, or dependencies) or desired compression length, performs surprisingly well: around 30% of the compressions from a large test set could be regenerated. We compare the LSTM system with a competitive baseline which is trained on the same amount of data but is additionally provided with all kinds of linguistic features. In an experiment with human raters the LSTMbased model outperforms the baseline achieving 4.5 in readability and 3.8 in informativeness.
international acm sigir conference on research and development in information retrieval | 2010
Enrique Alfonseca; Marius Pasca; Enrique Robledo-Arnuncio
This paper presents a method for increasing the quality of automatically extracted instance attributes by exploiting weakly-supervised and unsupervised instance relatedness data. This data consists of (a) class labels for instances and (b) distributional similarity scores. The method organizes the text-derived data into a graph, and automatically propagates attributes among related instances, through random walks over the graph. Experiments on various graph topologies illustrate the advantage of the method over both the original attribute lists and a per-class attribute extractor, both in terms of the number of attributes extracted per instance and the accuracy of the top-ranked attributes.
meeting of the association for computational linguistics | 2008
Enrique Alfonseca; Slaven Bilac; Stefan H. Pharies
Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). Furthermore, real-time IR systems (such as search engines) need to cope with noisy data, as user queries are sometimes written quickly and submitted without review. In this paper we apply a state-of-the-art procedure for German decompounding to other compounding languages, and we show that it is possible to have a single decompounding model that is applicable across languages.
international acm sigir conference on research and development in information retrieval | 2010
Amaç Herdağdelen; Massimiliano Ciaramita; Daniel Mahler; Maria Holmqvist; Keith B. Hall; Stefan Riezler; Enrique Alfonseca
We present a novel approach to query reformulation which combines syntactic and semantic information by means of generalized Levenshtein distance algorithms where the substitution operation costs are based on probabilistic term rewrite functions. We investigate unsupervised, compact and efficient models, and provide empirical evidence of their effectiveness. We further explore a generative model of query reformulation and supervised combination methods providing improved performance at variable computational costs.n Among other desirable properties, our similarity measures incorporate information-theoretic interpretations of taxonomic relations such as specification and generalization.
empirical methods in natural language processing | 2009
Enrique Alfonseca; Massimiliano Ciaramita; Keith B. Hall
In this paper we investigate temporal patterns of web search queries. We carry out several evaluations to analyze the properties of temporal profiles of queries, revealing promising semantic and pragmatic relationships between words. We focus on two applications: query suggestion and query categorization. The former shows a potential for time-series similarity measures to identify specific semantic relatedness between words, which results in state-of-the-art performance in query suggestion while providing complementary information to more traditional distributional similarity measures. The query categorization evaluation suggests that the temporal profile alone is not a strong indicator of broad topical categories.
language resources and evaluation | 2013
Enrique Alfonseca; Guillermo Garrido; Jean-Yves Delort; Anselmo Peñas
This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions. By mining (attribute, value) pairs from the revision history of the English Wikipedia we are able to collect a comprehensive knowledge base that contains data on how attributes change over time. When dealing with the Wikipedia edit history, vandalic and erroneous edits are a concern for data quality. We present a study of vandalism identification in Wikipedia edits that uses only features from the infoboxes, and show that we can obtain, on this dataset, an accuracy comparable to a state-of-the-art vandalism identification method that is based on the whole article. Finally, we discuss different characteristics of the extracted dataset, which we make available for further study.
international conference on computational linguistics | 2008
Enrique Alfonseca; Slaven Bilac; Stefan H. Pharies
Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). In the case of IR systems, they usually have to cope with noisy data, as user queries are usually written quickly and submitted without review. This work attempts at improving the current approaches for German decompounding when applied to query keywords. The results show an increase of more than 10% in accuracy compared to other state-of-the-art methods.
international acm sigir conference on research and development in information retrieval | 2009
Marius Pasca; Enrique Alfonseca
A weakly-supervised extraction method identifies concepts within conceptual hierarchies, at the appropriate level of specificity (e.g., Bank vs. Institution), to which attributes (e.g., routing number) extracted from unstructured text best apply. The extraction exploits labeled classes of instances acquired from a combination of Web documents and query logs, and inserted into existing conceptual hierarchies. The correct concept is identified within the top three positions on average over gold-standard attributes, which corresponds to higher accuracy than in alternative experiments.
north american chapter of the association for computational linguistics | 2009
Enrique Alfonseca; Keith B. Hall; Silvana Hartmann
We present a large-scale, data-driven approach to computing distributional similarity scores for queries. We contrast this to recent web-based techniques which either require the offline computation of complete phrase vectors, or an expensive on-line interaction with a search engine interface. Independent of the computational advantages of our approach, we show empirically that our technique is more effective at ranking query alternatives that the computationally more expensive technique of using the results from a web search engine.