Héctor Jiménez-Salazar
Universidad Autónoma Metropolitana
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Héctor Jiménez-Salazar.
The Computer Journal | 2011
David Pinto; Paolo Rosso; Héctor Jiménez-Salazar
Clustering narrow domain short texts is considered to be a complex task because of the intrinsic features of the corpus to be clustered: (i) the low frequencies of vocabulary terms in short texts, and (ii) the high vocabulary overlapping associated to narrow domains. The aim of this paper is to introduce a self-term expansion methodology for improving the performance of clustering methods when dealing with corpora of this kind. This methodology allows raw textual data to be enriched by adding co-related terms from an automatically constructed lexical knowledge resource obtained from the same target data set (and not from an external resource). We also propose a set of supervised and unsupervised text assessment measures for evaluating different corpus features, such as shortness, stylometry and domain broadness. With the help of these measures, we may determine beforehand whether or not to use the methodology proposed in this paper. Finally, we integrate all these assessment measures in a freely available web-based system named Watermarking Corpora On-line System, which may be used by computer scientists in order to evaluate the different features associated with a given textual corpus.
text speech and dialogue | 2012
David Pinto; Darnes Vilariño; Yuridiana Alemán; Helena Gómez; Nahun Loya; Héctor Jiménez-Salazar
The growing use of information technologies such as mobile devices has had a major social and technological impact such as the growing use of Short Message Services (SMS), a communication system broadly used by cellular phone users. In 2011, it was estimated over 5.6 billion of mobile phones sending between 30 and 40 SMS at month. Hence the great importance of analyzing representation and normalization techniques for this kind of texts. In this paper we show an adaptation of the Soundex phonetic algorithm for representing SMS texts. We use the modified version of the Soundex algorithm for codifying SMS, and we evaluate the presented algorithm by measuring the similarity degree between two codified texts: one originally written in natural language, and the other one originally written in SMS “sub-language”. Our main contribution is basically an improvement of the Soundex algorithm which allows to raise the level of similarity between the texts in SMS and their corresponding text in English or Spanish language.
applications of natural language to data bases | 2009
David Pinto; Paolo Rosso; Héctor Jiménez-Salazar
Classifier-independent measures are important to assess the quality of corpora. In this paper we present supervised and unsupervised measures in order to analyse several data collections for studying the following features: domain broadness, shortness, class imbalance, and stylometry. We found that the investigated assessment measures may allow to evaluate the quality of gold standards. Moreover, they could also be useful for classification systems in order to take strategical decisions when tackling some specific text collections.
mexican international conference on artificial intelligence | 2014
Gabriela Ramírez-de-la-Rosa; Esaú Villatoro-Tello; Héctor Jiménez-Salazar; Christian Sánchez-Sánchez
Online communities are filled with comments of loyal readers or first-time viewers, that are constantly creating and sharing information at an unprecedented level, resulting in millions of messages containing opinions, ideas, needs and beliefs of Internet users. Therefore, businesses companies are very interested in finding influential users and encouraging them to create positive influence. Influential users represent users with the ability to influence individual’s attitudes in a desired way with relative frequency. We present an empirical analysis on influential users identification problem in Twitter. Our proposed approach considers that the influential level of users can be detected by considering its communication patterns, by means of particular writing style features as well as behavioral features. Performed experiments on more that 7000 users profiles, indicate that it is possible to automatically identify influential users among the members of a social networking community, and also it obtains competitive results against several state-of-the-art methods.
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009
David Pinto; Mireya Tovar; Darnes Vilariño; Beatriz Beltrán; Héctor Jiménez-Salazar; Basilia Campos
The aim of this paper is to use unsupervised classification techniques in order to group the documents of a given huge collection into clusters. We approached this challenge by using a simple clustering algorithm (K-Star) in a recursive clustering process over subsets of the complete collection. The presented approach is a scalable algorithm which may automatically discover the number of clusters. The obtained results outperformed different baselines presented in the INEX 2009 clustering task.
cross language evaluation forum | 2006
Franco Rojas; Héctor Jiménez-Salazar; David Pinto
Nowadays, cross-lingual Information Retrieval (IR) is one of the greatest challenges to deal with. Besides, one of the most important issues in IR consists of the corpus vocabulary reduction. In real situations some methods of IR such as the well-known vector space model, it is necessary to reduce the term space. In this work, we have considered a vocabulary reduction process based on the selection of mid-frequency terms. Our approach enhances precision, but in order to obtain a better recall, we have conducted an enrichment process based on the addition of co-ocurrence terms. By using this approach, we have obtained an improvement of 40%, using the BiEnEs topics of the WebCLEF 2005 task. The obtained results in the current mixed monolingual task of the WebCLEF 2006 have shown that the text enrichment must be done before the vocabulary reduction process in order to get the best performance.
mexican international conference on artificial intelligence | 2015
Gabriela Ramírez-de-la-Rosa; Verónica Reyes-Meza; Esaú Villatoro-Tello; Héctor Jiménez-Salazar; Manuel Montes-y-Gómez; Luis Villaseñor-Pineda
Psychologists have long theorized about the effects of birth order on intellectual development and verbal abilities. Several studies within the field of psychology have tried to prove such theories, however no concrete evidence has been found yet. Therefore, in this paper we present an empirical analysis on the pertinence of traditional Author Profiling techniques. Thus, we re-formulate the problem of identifying developed language abilities by firstborns as a classification problem. Particularly we measure the importance of lexical and syntactic features extracted from a set of 129 speech transcriptions, which were gathered from videos of approximately three minutes length each. Obtained results indicate that both bag of words n-grams and bag of part-of-speech n-grams are able to provide useful information for accurately characterize the language properties employed by firstborns and later-borns. Consequently, our performed experiments helped to validate the presence of distinct language abilities among firstborns and later-borns.
meeting of the association for computational linguistics | 2007
David Pinto; Paolo Rosso; Héctor Jiménez-Salazar
CLEF (Working Notes) | 2013
Christian Sánchez-Sánchez; Héctor Jiménez-Salazar; Wulfrano Arturo Luna-Ramírez
fire workshops | 2015
Aarón Ramírez-de-la-Cruz; Gabriela Ramírez-de-la-Rosa; Christian Sánchez-Sánchez; Héctor Jiménez-Salazar; Carlos Rodríguez-Lucatero; Wulfrano Arturo Luna-Ramírez