Is this you? Create Your Porfile

Guillermo Jorge-Botana

National University of Distance Education

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guillermo Jorge-Botana is active.

Explore More

Publication

Featured researches published by Guillermo Jorge-Botana.

Journal of Quantitative Linguistics | 2010

Latent Semantic Analysis Parameters for Essay Evaluation using Small-Scale Corpora*

Guillermo Jorge-Botana; José A. León; Ricardo Olmos; Inmaculada Escudero

Abstract Some previous studies (e.g. that carried out by Van Bruggen et al. in 2004) have pointed to a need for additional research in order to firmly establish the usefulness of LSA (latent semantic analysis) parameters for automatic evaluation of academic essays. The extreme variability in approaches to this technique makes it difficult to identify the most efficient parameters and the optimum combination. With this goal in mind, we conducted a high spectrum study to investigate the efficiency of some of the major LSA parameters in small-scale corpora. We used two specific domain corpora that differed in the structure of the text (one containing only technical terms and the other with more tangential information). Using these corpora we tested different semantic spaces, formed by applying different parameters and different methods of comparing the texts. Parameters varied included weighting functions (Log-IDF or Log-Entropy), dimensionality reduction (truncating the matrices after SVD to a set percentage of dimensions), methods of forming pseudo-documents (vector sum and folding-in) and measures of similarity (cosine or Euclidean distances). We also included two groups of essays to be graded, one written by experts and other by non-experts. Both groups were evaluated by three human graders and also by LSA. We extracted the correlations of each LSA condition with human graders, and conducted an ANOVA to analyse which parameter combination correlates best. Results suggest that distances are more efficient in academic essay evaluation than cosines. We found no clear evidence that the classical LSA protocol works systematically better than some simpler version (the classical protocol achieves the best performance only for some combinations of parameters in a few cases), and found that the benefits of reducing dimensionality arise only when the essays are introduced into semantic spaces using the folding-in method.

Behavior Research Methods | 2009

New algorithms assessing short summaries in expository texts using latent semantic analysis.

Ricardo Olmos; José A. León; Guillermo Jorge-Botana; Inmaculada Escudero

In this study, we compared four expert graders with latent semantic analysis (LSA) to assess short summaries of an expository text. As is well known, there are technical difficulties for LSA to establish a good semantic representation when analyzing short texts. In order to improve the reliability of LSA relative to human graders, we analyzed three new algorithms by two holistic methods used in previous research (León, Olmos, Escudero, Cañas, & Salmerón, 2006). The three new algorithms were (1) the semantic common network algorithm, an adaptation of an algorithm proposed by W. Kintsch (2001, 2002) with respect to LSA as a dynamic model of semantic representation; (2) a best-dimension reduction measure of the latent semantic space, selecting those dimensions that best contribute to improving the LSA assessment of summaries (Hu, Cai, Wiemer-Hastings, Graesser, & McNamara, 2007); and (3) the Euclidean distance measure, used by Rehder et al. (1998), which incorporates at the same time vector length and the cosine measures. A total of 192 Spanish middle-grade students and 6 experts took part in this study. They read an expository text and produced a short summary. Results showed significantly higher reliability of LSA as a computerized assessment tool for expository text when it used a best-dimension algorithm rather than a standard LSA algorithm. The semantic common network algorithm also showed promising results.

Discourse Processes | 2014

Transforming Selected Concepts Into Dimensions in Latent Semantic Analysis

Ricardo Olmos; Guillermo Jorge-Botana; José A. León; Inmaculada Escudero

This study presents a new approach for transforming the latent representation derived from a Latent Semantic Analysis (LSA) space into one where dimensions have nonlatent meanings. These meanings are based on lexical descriptors, which are selected by the LSA user. The authors present three analyses that provide examples of the utility of this methodology. The first analysis demonstrates how document terms can be projected into meaningful new dimensions. The second demonstrates how to use the modified space to perform multidimensional document labeling to obtain a high and substantive reliability between LSA experts. Finally, the internal validity of the method is assessed by comparing an original semantic space with a modified space. The results show high consistency between the two spaces, supporting the conclusion that the nonlatent coordinates generated using this methodology preserve the semantic relationships within the original LSA space.

International journal of continuing engineering education and life-long learning | 2011

Using latent semantic analysis to grade brief summaries: some proposals

Ricardo Olmos; José A. León; Inmaculada Escudero; Guillermo Jorge-Botana

In this paper, we present several proposals in order to improve the LSA tools to evaluate brief summaries (less than 50 words) of narrative and expository texts. First, we analyse the quality of six different methods assessing essays that have been widely employed before (Foltz et al., 2000). The second objective is to analyse how new algorithms inspired by some authors (Denhiere et al., 2007) that try to emulate human behaviour to improve the reliability of LSA with human graders when assessing short summaries, compared with standard LSA use in expository text. Finally, we present an assessment method to combine LSA as a semantic computational linguistic model with ROUGE-N as a lexical model, to show how combining different automatic evaluation systems (LSA and ROUGE) can improve the quality of assessments in different academic levels.

Information Processing and Management | 2016

Transforming LSA space dimensions into a rubric for an automatic assessment and feedback system

Ricardo Olmos; Guillermo Jorge-Botana; José M. Luzón; Jesús I. Martín-Cordero; José A. León

We model how to implement a rubric in latent semantic analysis.The proposed method change abstract dimensions into meaningful dimensions.The method allows to detect easily written contents.Inbuilt rubric method has been used to give feedback to 924 university students. The purpose of this article is to validate, through two empirical studies, a new method for automatic evaluation of written texts, called Inbuilt Rubric, based on the Latent Semantic Analysis (LSA) technique, which constitutes an innovative and distinct turn with respect to LSA application so far. In the first empirical study, evidence of the validity of the method to identify and evaluate the conceptual axes of a text in a sample of 78 summaries by secondary school students is sought. Results show that the proposed method has a significantly higher degree of reliability than classic LSA methods of text evaluation, and displays very high sensitivity to identify which conceptual axes are included or not in each summary. A second study evaluates the methods capacity to interact and provide feedback about quality in a real online system on a sample of 924 discursive texts written by university students. Results show that students improved the quality of their written texts using this system, and also rated the experience very highly. The final conclusion is that this new method opens a very interesting way regarding the role of automatic assessors in the identification of presence/absence and quality of elaboration of relevant conceptual information in texts written by students with lower time costs than the usual LSA-based methods.

Journal of Educational Computing Research | 2015

Automated LSA Assessment of Summaries in Distance Education: Some Variables to Be Considered.

Guillermo Jorge-Botana; José M. Luzón; Isabel Gómez-Veiga; Jesús I. Martín-Cordero

A latent semantic analysis-based automated summary assessment is described; this automated system is applied to a real learning from text task in a Distance Education context. We comment on the use of automated content, plagiarism, text coherence measures, and word weights average and their impact on predicting human judges summary scoring. A first regression analysis showed the independence of interparagraph coherence with respect to superficial text variables, advising its inclusion in a general regression model, along with content, plagiarism measures. The final regression model explains a considerable degree of variability in human judgment of summaries. Finally, we discuss several methodological implications and further applications of the automated summary scoring technique developed in this study.

International journal of continuing engineering education and life-long learning | 2011

The representation of polysemy through vectors: some building blocks for constructing models and applications with LSA

Guillermo Jorge-Botana; José A. León; Ricardo Olmos; Inmaculada Escudero

The problem of the multiplicity of word meanings has preoccupied so many researches from the linguistics, psychology or computational linguistic. In this paper, we revised how LSA represents the polysemous words and we explain some bias related with the meaning generation and revised some constraint-satisfaction models which introduce into the equation some dynamic mechanisms. The idea of these models is to take the amalgamated word vector from LSA and embed it into its discourse and semantic context, and by means of a dynamic mechanism, the appropriate features of it is are selected. To illustrate our arguments, we present some networks, providing evidence that polysemous words have separated representations for each sense only in presence of the linguistic context that involved it. We also present an example of how these mechanisms also contribute to support the visual heuristic searches in the visual information retrieval interfaces (VIRIs).

Wiley Interdisciplinary Reviews: Cognitive Science | 2018

Word maturity indices with latent semantic analysis: why, when, and where is Procrustes rotation applied?

Guillermo Jorge-Botana; Ricardo Olmos; José M. Luzón

The aim of this paper is to describe and explain one useful computational methodology to model the semantic development of word representation: Word maturity. In particular, the methodology is based on the longitudinal word monitoring created by Kirylev and Landauer using latent semantic analysis for the representation of lexical units. The paper is divided into two parts. First, the steps required to model the development of the meaning of words are explained in detail. We describe the technical and theoretical aspects of each step. Second, we provide a simple example of application of this methodology with some simple tools that can be used by applied researchers. This paper can serve as a user-friendly guide for researchers interested in modeling changes in the semantic representations of words. Some current aspects of the technique and future directions are also discussed. WIREs Cogn Sci 2018, 9:e1457. doi: 10.1002/wcs.1457 This article is categorized under: Computer Science > Natural Language Processing Linguistics > Language Acquisition Psychology > Development and Aging.

Information Retrieval Journal | 2017

The Role of Domain Knowledge in Cognitive Modeling of Information Search

Saraschandra Karanam; Guillermo Jorge-Botana; Ricardo Olmos; Herre van Oostendorp

Computational cognitive models developed so far do not incorporate individual differences in domain knowledge in predicting user clicks on search result pages. We address this problem using a cognitive model of information search which enables us to use two semantic spaces having a low (non-expert semantic space) and a high (expert semantic space) amount of medical and health related information to represent respectively low and high knowledge of users in this domain. We also investigated two different processes along which one can gain a larger amount of knowledge in a domain: an evolutionary and a common core process. Simulations of model click behavior on difficult information search tasks and subsequent matching with actual behavioral data from users (divided into low and high domain knowledge groups based on a domain knowledge test) were conducted. Results showed that the efficacy of modeling for high domain knowledge participants (in terms of the number of matches between the model predictions and the actual user clicks on search result pages) was higher with the expert semantic space compared to the non-expert semantic space while for low domain knowledge participants it was the other way around. When the process of knowledge acquisition was taken into account, the effect of using a semantic space based on high domain knowledge was significant only for high domain knowledge participants, irrespective of the knowledge acquisition process. The implications of these outcomes for support tools that can be built based on these models are discussed.

Discourse Processes | 2017

Predicting Word Maturity from Frequency and Semantic Diversity: A Computational Study

Guillermo Jorge-Botana; Ricardo Olmos; Vicente Sanjosé

Semantic word representation changes over different ages of childhood until it reaches its adult form. One method to formally model this change is the word maturity paradigm. This method uses a text sample for each age, including adult age, and transforms the samples into a semantic space by means of Latent Semantic Analysis. The representation of a word at every age is then compared with its adult representation via computational maturity indices. The present study used this paradigm to explore to the impact of word frequency and semantic diversity on maturation indices. To do this, word maturity indices were extracted from a Spanish incremental corpus and validated, using correlation scores with Age of Acquisition and Word Difficulty indices from previous studies. The results show that both frequency and semantic diversity predict word maturity but that the predictive capacity of frequency decreases as exposure to language increases. The latter result is discussed in terms of inductive processes suggested in previous studies.

Explore More