Mireya Tovar
Benemérita Universidad Autónoma de Puebla
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mireya Tovar.
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009
David Pinto; Mireya Tovar; Darnes Vilariño; Beatriz Beltrán; Héctor Jiménez-Salazar; Basilia Campos
The aim of this paper is to use unsupervised classification techniques in order to group the documents of a given huge collection into clusters. We approached this challenge by using a simple clustering algorithm (K-Star) in a recursive clustering process over subsets of the complete collection. The presented approach is a scalable algorithm which may automatically discover the number of clusters. The obtained results outperformed different baselines presented in the INEX 2009 clustering task.
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval | 2010
Mireya Tovar; Adrián Cruz; Blanca Vázquez; David Pinto; Darnes Vilariño; Azucena Montes
In this paper we propose two iterative clustering methods for grouping Wikipedia documents of a given huge collection into clusters. The recursive method clusters iteratively subsets of the complete collection. In each iteration, we select representative items for each group, which are then used for the next stage of clustering. The presented approaches are scalable algorithms which may be used with huge collections that in other way (for instance, using the classic clustering methods) would be computationally expensive of being clustered. The obtained results outperformed the random baseline presented in the INEX 2010 clustering task of the XML-Mining track.
mexican conference on pattern recognition | 2014
Mireya Tovar; David Pinto; Azucena Montes; Gabriel González; Darnes Vilariño; Beatriz Beltrán
In this paper we present an approach for the evaluation of taxonomic relations of restricted domain ontologies. We use the evidence found in corpora associated to the ontology domain for determining the validity of the taxonomic relations. Our approach employs lexico-syntactic patterns for evaluating taxonomic relations in which the concepts are totally different, and it uses a particular technique based on subsumption for those relations in which one concept is completely included in the other one. The integration of these two techniques has allowed to automatically evaluate taxonomic relations for two ontologies of restricted domain. The performance obtained was about 70% for one ontology of the e-learning domain, whereas we obtained around 88% for the ontology associated to the artificial intelligence domain.
international conference on computational linguistics | 2014
Saul León; Darnes Vilariño; David Pinto; Mireya Tovar; Beatriz Beltrán
The results obtained by the BUAP team at Task 1 of SemEval 2014 are presented in this paper. The run submitted is a supervised version based on two classification models: 1) We used logistic regression for determining the semantic relatedness between a pair of sentences, and 2) We employed support vector machines for identifying textual entailment degree between the two sentences. The behaviour for the second subtask (textual entailment) obtained much better performance than the one evaluated at the first subtask (relatedness), ranking our approach in the 7th position of 18 teams that participated at the competition.
international conference on computational linguistics | 2014
Darnes Vilariño; David Pinto; Saul León; Mireya Tovar; Beatriz Beltrán
In this paper we present the evaluation of different features for multiligual and crosslevel semantic textual similarity. Three different types of features were used: lexical, knowledge-based and corpus-based. The results obtained at the Semeval competition rank our approaches above the average of the rest of the teams highlighting the usefulness of the features presented in this paper.
mexican conference on pattern recognition | 2012
Darnes Vilariño; David Pinto; Beatriz Beltrán; Saul León; Esteban Castillo; Mireya Tovar
Normalization of SMS is a very important task that must be addressed by the computational community because of the tremendous growth of services based on mobile devices, which make use of this kind of messages. There exist many limitations on the automatic treatment of SMS texts derived from the particular writing style used. Even if there are suficient problems dealing with this kind of texts, we are also interested in some tasks requiring to understand the meaning of documents in different languages, therefore, increasing the complexity of such tasks. Our approach proposes to normalize SMS texts employing machine translation techniques. For this purpose, we use a statistical bilingual dictionary calculated on the basis of the IBM-4 model for determining the best translation for a given SMS term. We have compared the presented approach with a traditional probabilistic method of information retrieval, observing that the normalization model proposed here highly improves the performance of the probabilistic one.
mexican conference on pattern recognition | 2017
Mireya Tovar; David Pinto; Azucena Montes; Gabriel González
In this paper we present an approach for the automatic evaluation of relations in ontologies of restricted domain. We use the evidence found in a corpus associated to the same domain of the ontology for determining the validity of the ontological relations. Our approach employs Latent Semantic Analysis, a technique based on the principle that the words in a same context tend to have semantic relationships. The approach uses two variants for evaluating the semantic relations and concepts of the target ontologies. The performance obtained was about 70% for class-inclusion relations and 78% for non-taxonomic relations.
mexican conference on pattern recognition | 2015
Mireya Tovar; David Pinto; Azucena Montes; Gabriel Serna; Darnes Vilariño
In this paper we present an approach for the automatic identification of relations in ontologies of restricted domain. We use the evidence found in a corpus associated to the same domain of the ontology for determining the validity of the ontological relations. Our approach employs formal concept analysis, a method used for the analysis of data, but in this case used for relations discovery in a corpus of restricted domain. The approach uses two variants for filling the incidence matrix that this method employs. The formal concepts are used for evaluating the ontological relations of two ontologies. The performance obtained was about 96i¾?for taxonomic relations and 100i¾?% for non-taxonomic relations, in the first ontology. In the second it was about 92i¾?% for taxonomic relations and 98i¾?% for non-taxonomic relations.
mexican conference on pattern recognition | 2013
Mireya Tovar; David Pinto; Azucena Montes; Darnes Vilariño
Measuring the degree of semantic similarity for word pairs is very challenging task that has been addressed by the computational linguistics community in the recent years. In this paper, we propose a method for evaluating input word pairs in order to measure the degree of semantic similarity. This unsupervised method uses a prototype vector calculated on the basis of word pair representative vectors which are contructed by using snippets automatically gathered from the world wide web.
mexican conference on pattern recognition | 2010
David Pinto; Darnes Vilariño; Carlos Balderas; Mireya Tovar; Beatriz Beltrán
Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing [1]. It is claimed that WSD is essential for those applications that require of language comprehension modules such as search engines, machine translation systems, automatic answer machines, second life agents, etc. Moreover, with the huge amounts of information in Internet and the fact that this information is continuosly growing in different languages, we are encourage to deal with cross-lingual scenarios where WSD systems are also needed. On the other hand, Lexical Substitution (LS) refers to the process of finding a substitute word for a source word in a given sentence. The LS task needs to be approached by firstly disambiguating the source word, therefore, these two tasks (WSD and LS) are somehow related. In this paper, we present a naive approach to tackle the problem of cross-lingual WSD and cross-lingual lexical substitution. We use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to calculate the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). Two versions of the probabilistic model are tested: unweighted and weighted. The results were compared with those of an international competition, obtaining a good performance.