José Luis Alonso Berrocal

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José Luis Alonso Berrocal is active.

Explore More

Publication

Featured researches published by José Luis Alonso Berrocal.

Information Processing and Management | 2005

Reformulation of queries using similarity thesauri

Ángel F. Zazo; Carlos G. Figuerola; José Luis Alonso Berrocal; Emilio Rodríguez

One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.

cross language evaluation forum | 2001

Spanish Monolingual Track: The Impact of Stemming on Retrieval

Carlos G. Figuerola; Raquel Gómez Díaz; Ángel Francisco Zazo Rodríguez; José Luis Alonso Berrocal

Most of the techniques used in Information Retrieval rely on the identification of terms from queries and documents, as much to carry out calculations based on the frequencies of these terms as to carry out comparisons between documents and queries. Terms coming from the same stem, either by morphological inflection or through derivation, can be presumed to have semantic proximity. The conflation of these words to a common form can produce improvements in retrieval. The stemming mechanisms used depend directly on each language. In this paper, a stemmer for Spanish and the tests conducted by applying it to the CLEF Spanish document collection are described, and the results are discussed.

Scientometrics | 2005

Quality assessment of Spanish universities' web sites focused on the European Research Area

María Pinto; José Luis Alonso Berrocal; José Antonio Cordón García; Viviana Fernández Marcial; Carlos G. Figuerola; Javier Marco; Camarero Carmen Gómez; Rodríguez Ángel F. Zazo

SummaryThis work has analyzed and evaluated the dissemination of research done at Spanish universities through the World Wide Web (WWW) in order to obtain a map of the visibility of the information available on this research and to propose measures for improving the quality of this diffusion, all within the social and institutional context of the European Area for Higher Education. The methodology applied in the study has used both qualitative and quantitative research methods to obtain some quality indicators on the dissemination of university research. The object of study consists of a sample of 19 Spanish universities, chosen according to their representativeness by Autonomous Community and their administrative and scientific weight. The process of defining indicators, both qualitative and quantitative, as well as the collection and analysis of data, are explained. The results give us a detailed panorama of the state of the art of the visibility of information on research in the web pages of selected universities. This has allowed us to make certain proposals for improvement that can contribute to the excellence of its dissemination.

Journal of Documentation | 2001

Automatic vs manual categorisation of documents in Spanish

Carlos G. Figuerola; Ángel Francisco Zazo Rodríguez; José Luis Alonso Berrocal

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.

cross language evaluation forum | 2002

Experiments in Term Expansion Using Thesauri in Spanish

Ángel F. Zazo; Carlos G. Figuerola; José Luis Alonso Berrocal; Emilio Rodríguez; Raquel Gómez

This paper presents some experiments carried out this year in the Spanish monolingual task at CLEF2002. The objective is to continue our research on term expansion. Last year we presented results regarding stemming. Now, our effort is centred on term expansion using thesauri. Many words that derive from the same stem have a close semantic content. However other words with very different stems also have semantically close senses. In this case, the analysis of the relationships between words in a document collection can be used to construct a thesaurus of related terms. The thesaurus can then be used to expand a term with the best related terms. This paper describes some experiments carried out to study term expansion using association and similarity thesauri.

practical applications of agents and multi agent systems | 2011

Web Document Duplicate Detection Using Fuzzy Hashing

Carlos G. Figuerola; Raquel Gómez Díaz; José Luis Alonso Berrocal; Ángel Francisco Zazo Rodríguez

The web is the largest repository of documents available and, for retrieval for various purposes, we must use crawlers to navigate autonomously, to select documents and processing them according to the objectives pursued. However, we can see, even intuitively, that are obtained more or less abundant replications of a significant number of documents. The detection of these duplicates is important because it allows to lighten databases and improve the efficiency of information retrieval engines, but also improve the precision of cybermetric analysis, web mining studies, etc. Hash standard techniques used to detect these duplicates only detect exact duplicates, at the bit level. However, many of the duplicates found in the real world are not exactly alike. For example, we can find web pages with the same content, but with different headers or meta tags, or viewed with style sheets different. A frequent case is that of the same document but in different formats; in these cases we will have completely different documents at binary level. The obvious solution is to compare plain text conversions of all these formats, but these conversions are never identical, because of the different treatments of the converters on various formatting elements (treatment of textual characters, diacritics, spacing, paragraphs ...). In this work we introduce the possibility of using what is known as fuzzy-hashing. The idea is to produce fingerprints of files (or documents, etc..). This way, a comparison between two fingerprints could give us an estimate of the closeness or distance between two files, documents, etc. Based on the concept of “rolling hash”, the fuzzy hashing has been used successfully in computer security tasks, such as identifying malware, spam, virus scanning, etc. We have added capabilities of fuzzy hashing to a slight crawler and have made several tests in a heterogeneous network domain, consisting of multiple servers with different software, static and dynamic pages, etc.. These tests allowed us to measure similarity thresholds and to obtain useful data about the quantity and distribution of duplicate documents on web servers.

cross language evaluation forum | 2005

Use of free on-line machine translation for interactive cross-language question answering

Ángel F. Zazo; Carlos G. Figuerola; José Luis Alonso Berrocal; Viviana Fernández Marcial

Free on-line machine translation systems are employed more and more by Internet users. In this paper we have explored the use of these systems for Cross-Language Question Answering, in two aspects: in the formulation of queries and in the presentation of information. Two topic-document language pairs were used, Spanish-English and Spanish-French. For each of these, two groups of users were created, depending on the level of reading skills in document language. When machine translation of the queries was used directly in the search, the number of correct answers was quite high. Users only corrected 8% of the translations proposed. As regards the possibility of using machine translation to translate into Spanish the text passages shown to the user, we expected the search of the users with little knowledge of the target language to improve notably, but we found that this possibility was of little help in finding the correct answers for the questions posed in the experiment.

Journal of Information Science | 2013

Web link-based relationships among top European universities

Carlos G. Figuerola; José Luis Alonso Berrocal

In this paper, an analysis of interlinking between 100 major European universities is given. Since websites contain links to webpages for other organizations, they may reveal the strongest relationships established between two organizations. This analysis of web links allowed us to determine the different behaviours among the universities with regard to incoming or outgoing web links; some universities had significantly greater incoming than outgoing activity. In general, there was a low level of interaction between the universities studied. Also, we observed the existence of geographic–linguistic patterns in establishing links. Five primary nuclei or blocks of universities can be identified: the group composed almost exclusively of universities from the UK; the group composed in large part of German universities, along with some from Switzerland and Austria; the cluster of universities from Mediterranean countries, including various French universities; the group of Belgian and Dutch universities, along with some from French-speaking Switzerland; and finally, the group made up of universities from the Nordic countries. Although there are some universities that overlap with several groups or clusters, the overall design is rather clear. On the other hand, the whole picture seems to agree with the results of other studies based on bibliographic co-authorship production.

cross language evaluation forum | 2006

Local query expansion using terms windows for robust retrieval

Ángel F. Zazo; José Luis Alonso Berrocal; Carlos G. Figuerola

This paper describes our work at CLEF 2006 Robust task. This is an ad-hoc task that explores methods for stable retrieval by focusing on poorly performing topics. We have participated in all subtasks: monolingual (English, French, Italian and Spanish), bilingual (Italian to Spanish) and multilingual (Spanish to [English, French, Italian and Spanish]). In monolingual retrieval we have focused our effort on local query expansion, i.e. using only the information from retrieved documents, not from the complete document collection or external corpora, such as the Web. Some local expansion techniques were applied for training topics. Regarding robustness the most effective one was the use of co-occurrence based thesauri, which were constructed using co-occurrence relations in windows of terms, not in complete documents. This is an effective technique that can be easily implemented by tuning only a few parameters. In bilingual and multilingual retrieval experiments several machine translation programs were used to translate topics. For each target language, translations were merged before performing a monolingual retrieval. We also applied the same local expansion technique. In multilingual retrieval, weighted max-min normalization was used to merge lists. In all the subtasks in which we participated our mandatory runs (using title and description fields of the topics) obtained very good rankings. Runs with short queries (only title field) also obtained high MAP and GMAP values using the same expansion technique.

cross language evaluation forum | 2005

Web page retrieval by combining evidence

Carlos G. Figuerola; José Luis Alonso Berrocal; Ángel F. Zazo; Emilio Rodríguez Vázquez de Aldana

The participation of the REINA Research Group in WebCLEF 2005 focused in the monolingual mixed task. Queries or topics are of two types: named and home pages. For both, we first perform a search by thematic contents; for the same query, we do a search in several elements of information from every page (title, some meta tags, anchor text) and then we combine the results. For queries about home pages, we try to detect using a method based in some keywords and their patterns of use. After, a re-rank of the results of the thematic contents retrieval is performed, based on Page-Rank and Centrality coeficients.

Explore More