Carlos G. Figuerola | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlos G. Figuerola is active.

Explore More

Publication

Featured researches published by Carlos G. Figuerola.

Online Information Review | 2010

Open knowledge: challenges and facts

Francisco José García-Peñalvo; Carlos G. Figuerola; José A. Merlo

Purpose – The purpose of this paper is to open the special issue of Online Information Review on open knowledge management in higher education. Its aim is to review the concept and extension of the movement or philosophy of open knowledge in universities and higher education institutions.Design/methodology/approach – The approach follows the reference model used by the University of Salamanca (Spain) to promote open knowledge in the institution through its Open Knowledge Office. This model comprises four areas: free software, open educational content and cultural dissemination, open science, and open innovation.Findings – For each of the four areas mentioned above, milestones and the most significant projects are presented, showing how they are promoting publication and information transmission in an open environment, without restrictions and favouring knowledge dissemination in all fields.Originality/value – Open knowledge is an approach which, although somewhat controversial, is growing relentlessly as ...

Information Processing and Management | 2005

Reformulation of queries using similarity thesauri

Ángel F. Zazo; Carlos G. Figuerola; José Luis Alonso Berrocal; Emilio Rodríguez

One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.

cross language evaluation forum | 2001

Spanish Monolingual Track: The Impact of Stemming on Retrieval

Carlos G. Figuerola; Raquel Gómez Díaz; Ángel Francisco Zazo Rodríguez; José Luis Alonso Berrocal

Most of the techniques used in Information Retrieval rely on the identification of terms from queries and documents, as much to carry out calculations based on the frequencies of these terms as to carry out comparisons between documents and queries. Terms coming from the same stem, either by morphological inflection or through derivation, can be presumed to have semantic proximity. The conflation of these words to a common form can produce improvements in retrieval. The stemming mechanisms used depend directly on each language. In this paper, a stemmer for Spanish and the tests conducted by applying it to the CLEF Spanish document collection are described, and the results are discussed.

Journal of Information Science | 2000

Stemming and n-grams in Spanish: An evaluation of their impact on information retrieval

Carlos G. Figuerola; Raquel Gómez; Eva López de San Román

At some stage, most of the models and techniques implemented in information retrieval use frequency counts of the terms appearing in documents and in queries. However, many words, since they are derived from the same stem, have very close semantic content. This makes a grouping of such variants under a single term advisable. Otherwise, dispersal occurs in the calculation of frequency of these terms and it also becomes difficult to compare queries and documents. On the other hand, there are notable differences between different languages in the way of forming derivatives and inflected forms, so that the application of specific techniques can produce unequal results according to the language of the documents and queries. A description is given of tests carried out for documents in Spanish, which involved some stemming techniques widely used in English, as well as the application of n-grams, and the results are compared.

Scientometrics | 2005

Quality assessment of Spanish universities' web sites focused on the European Research Area

María Pinto; José Luis Alonso Berrocal; José Antonio Cordón García; Viviana Fernández Marcial; Carlos G. Figuerola; Javier Marco; Camarero Carmen Gómez; Rodríguez Ángel F. Zazo

SummaryThis work has analyzed and evaluated the dissemination of research done at Spanish universities through the World Wide Web (WWW) in order to obtain a map of the visibility of the information available on this research and to propose measures for improving the quality of this diffusion, all within the social and institutional context of the European Area for Higher Education. The methodology applied in the study has used both qualitative and quantitative research methods to obtain some quality indicators on the dissemination of university research. The object of study consists of a sample of 19 Spanish universities, chosen according to their representativeness by Autonomous Community and their administrative and scientific weight. The process of defining indicators, both qualitative and quantitative, as well as the collection and analysis of data, are explained. The results give us a detailed panorama of the state of the art of the visibility of information on research in the web pages of selected universities. This has allowed us to make certain proposals for improvement that can contribute to the excellence of its dissemination.

Journal of Documentation | 2001

Automatic vs manual categorisation of documents in Spanish

Carlos G. Figuerola; Ángel Francisco Zazo Rodríguez; José Luis Alonso Berrocal

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.

cross language evaluation forum | 2002

Experiments in Term Expansion Using Thesauri in Spanish

Ángel F. Zazo; Carlos G. Figuerola; José Luis Alonso Berrocal; Emilio Rodríguez; Raquel Gómez

This paper presents some experiments carried out this year in the Spanish monolingual task at CLEF2002. The objective is to continue our research on term expansion. Last year we presented results regarding stemming. Now, our effort is centred on term expansion using thesauri. Many words that derive from the same stem have a close semantic content. However other words with very different stems also have semantically close senses. In this case, the analysis of the relationships between words in a document collection can be used to construct a thesaurus of related terms. The thesaurus can then be used to expand a term with the best related terms. This paper describes some experiments carried out to study term expansion using association and similarity thesauri.

practical applications of agents and multi agent systems | 2011

Web Document Duplicate Detection Using Fuzzy Hashing

Carlos G. Figuerola; Raquel Gómez Díaz; José Luis Alonso Berrocal; Ángel Francisco Zazo Rodríguez

The web is the largest repository of documents available and, for retrieval for various purposes, we must use crawlers to navigate autonomously, to select documents and processing them according to the objectives pursued. However, we can see, even intuitively, that are obtained more or less abundant replications of a significant number of documents. The detection of these duplicates is important because it allows to lighten databases and improve the efficiency of information retrieval engines, but also improve the precision of cybermetric analysis, web mining studies, etc. Hash standard techniques used to detect these duplicates only detect exact duplicates, at the bit level. However, many of the duplicates found in the real world are not exactly alike. For example, we can find web pages with the same content, but with different headers or meta tags, or viewed with style sheets different. A frequent case is that of the same document but in different formats; in these cases we will have completely different documents at binary level. The obvious solution is to compare plain text conversions of all these formats, but these conversions are never identical, because of the different treatments of the converters on various formatting elements (treatment of textual characters, diacritics, spacing, paragraphs ...). In this work we introduce the possibility of using what is known as fuzzy-hashing. The idea is to produce fingerprints of files (or documents, etc..). This way, a comparison between two fingerprints could give us an estimate of the closeness or distance between two files, documents, etc. Based on the concept of “rolling hash”, the fuzzy hashing has been used successfully in computer security tasks, such as identifying malware, spam, virus scanning, etc. We have added capabilities of fuzzy hashing to a slight crawler and have made several tests in a heterogeneous network domain, consisting of multiple servers with different software, static and dynamic pages, etc.. These tests allowed us to measure similarity thresholds and to obtain useful data about the quantity and distribution of duplicate documents on web servers.

cross language evaluation forum | 2005

Use of free on-line machine translation for interactive cross-language question answering

Ángel F. Zazo; Carlos G. Figuerola; José Luis Alonso Berrocal; Viviana Fernández Marcial

Free on-line machine translation systems are employed more and more by Internet users. In this paper we have explored the use of these systems for Cross-Language Question Answering, in two aspects: in the formulation of queries and in the presentation of information. Two topic-document language pairs were used, Spanish-English and Spanish-French. For each of these, two groups of users were created, depending on the level of reading skills in document language. When machine translation of the queries was used directly in the search, the number of correct answers was quite high. Users only corrected 8% of the translations proposed. As regards the possibility of using machine translation to translate into Spanish the text passages shown to the user, we expected the search of the users with little knowledge of the target language to improve notably, but we found that this possibility was of little help in finding the correct answers for the questions posed in the experiment.

Scientometrics | 2017

Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA

Carlos G. Figuerola; Francisco Javier García Marco; María Pinto

This paper offers an overview of the bibliometric study of the domain of library and information science (LIS), with the aim of giving a multidisciplinary perspective of the topical boundaries and the main areas and research tendencies. Based on a retrospective and selective search, we have obtained the bibliographical references (title and abstract) of academic production on LIS in the database LISA in the period 1978–2014, which runs to 92,705 documents. In the context of the statistical technique of topic modeling, we apply latent Dirichlet allocation, in order to identify the main topics and categories in the corpus of documents analyzed. The quantitative results reveal the existence of 19 important topics, which can be grouped together into four main areas: processes, information technology, library and specific areas of information application.

Explore More