Is this you? Create Your Porfile

Darnes Vilariño

Benemérita Universidad Autónoma de Puebla

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Darnes Vilariño is active.

Explore More

Publication

Featured researches published by Darnes Vilariño.

Pattern Recognition Letters | 2014

A graph-based multi-level linguistic representation for document understanding

David Pinto; Helena Gómez-Adorno; Darnes Vilariño; Vivek Singh

We proposed a graph-based representation that considers multiple linguistic levels.We introduced MinText, a technique useful for extracting features from the graph.We presented a study case for analyzing the performance of the methods proposed. Document understanding goal requires discovery of meaningful patterns in text, which in turn requires analyzing documents and extracting information useful for a purpose. The documents to be analyzed are expected to be represented in some way. It is true that different representations of the same piece of text might have different information extraction outcomes. Therefore, it is very important to propose a reliable text representation schema that may incorporate as many features as possible, and at the same time provides use of efficient document understanding algorithms. In this paper, we propose a graph-based representation of textual documents that employs different levels of formal representation of natural language. This schema takes into account different linguistic levels, such as lexical, morphological, syntactical and semantics. The representation schema proposed is accompanied with a proposal for a technique which allows to extract useful text patterns based on the idea of minimum paths in the graph. The efficiency of the representation schema proposed has been tested in one case of study (Question-Answering for machine Reading Evaluation - QA4MRE), and the results of experiments carried in it, are described. The results obtained show that the proposed graph-based multi-level linguistic representation schema may be successfully used in the broader framework of document understanding.

text speech and dialogue | 2012

The Soundex Phonetic Algorithm Revisited for SMS Text Representation

David Pinto; Darnes Vilariño; Yuridiana Alemán; Helena Gómez; Nahun Loya; Héctor Jiménez-Salazar

The growing use of information technologies such as mobile devices has had a major social and technological impact such as the growing use of Short Message Services (SMS), a communication system broadly used by cellular phone users. In 2011, it was estimated over 5.6 billion of mobile phones sending between 30 and 40 SMS at month. Hence the great importance of analyzing representation and normalization techniques for this kind of texts. In this paper we show an adaptation of the Soundex phonetic algorithm for representing SMS texts. We use the modified version of the Soundex algorithm for codifying SMS, and we evaluate the presented algorithm by measuring the similarity degree between two codified texts: one originally written in natural language, and the other one originally written in SMS “sub-language”. Our main contribution is basically an improvement of the Soundex algorithm which allows to raise the level of similarity between the texts in SMS and their corresponding text in English or Spanish language.

Sensors | 2016

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs.

Helena Gómez-Adorno; Grigori Sidorov; David Pinto; Darnes Vilariño; Alexander F. Gelbukh

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009

BUAP: performance of K-Star at the INEX'09 clustering task

David Pinto; Mireya Tovar; Darnes Vilariño; Beatriz Beltrán; Héctor Jiménez-Salazar; Basilia Campos

The aim of this paper is to use unsupervised classification techniques in order to group the documents of a given huge collection into clusters. We approached this challenge by using a simple clustering algorithm (K-Star) in a recursive clustering process over subsets of the complete collection. The presented approach is a scalable algorithm which may automatically discover the number of clusters. The obtained results outperformed different baselines presented in the INEX 2009 clustering task.

mexican conference on pattern recognition | 2013

A Question Answering System for Reading Comprehension Tests

Helena Gómez-Adorno; David Pinto; Darnes Vilariño

In this paper it is presented a methodology for tackling the problem of question answering for reading comprehension tests. The implemented system accepts a document as input and it answers multiple choice questions about it. It uses the Lucene information retrieval engine for carrying out information extraction employing additional automated linguistic processing such as stemming, anaphora resolution and part-of-speech tagging. The proposed approach validates the answers, by comparing the text retrieved by Lucene for each question with respect to its candidate answers. For this purpose, a validation based on textual entailment is executed. We have evaluated the experiments carried out in order to verify the quality of the methodology proposed using two corpora widely used in international forums. The obtained results show that the proposed system selects the correct answer to a given question with a percentage of 33-37%, a result that overcomes the average of all the runs submitted in the QA4MRE task of the CLEF 2011 and 2012.

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval | 2010

An iterative clustering method for the XML-mining task of the INEX 2010

Mireya Tovar; Adrián Cruz; Blanca Vázquez; David Pinto; Darnes Vilariño; Azucena Montes

In this paper we propose two iterative clustering methods for grouping Wikipedia documents of a given huge collection into clusters. The recursive method clusters iteratively subsets of the complete collection. In each iteration, we select representative items for each group, which are then used for the next stage of clustering. The presented approaches are scalable algorithms which may be used with huge collections that in other way (for instance, using the classic clustering methods) would be computationally expensive of being clustered. The obtained results outperformed the random baseline presented in the INEX 2010 clustering task of the XML-Mining track.

north american chapter of the association for computational linguistics | 2015

UDLAP: Sentiment Analysis Using a Graph-Based Representation

Esteban Castillo; Ofelia Cervantes; Darnes Vilariño; David Báez; J. Alfredo Sánchez

We present an approach for tackling the Sentiment Analysis problem in SemEval 2015. The approach is based on the use of a cooccurrence graph to represent existing relationships among terms in a document with the aim of using centrality measures to extract the most representative words that express the sentiment. These words are then used in a supervised learning algorithm as features to obtain the polarity of unknown documents. The best results obtained for the different datasets are: 77.76% for positive, 100% for negative and 68.04% for neutral, showing that the proposed graph-based representation could be a way of extracting terms that are relevant to detect a sentiment.

international conference on electronics, communications, and computers | 2015

Author attribution using a graph based representation

Esteban Castillo; Darnes Vilariño; Ofelia Cervantes; David Pinto

Authorship attribution is the task of determining the real author of a given anonymous document. Even though different approaches exist in literature, this problem has been barely dealt with by using document representations that employ graph structures. Actually, most research works in literature handle this problem by employing simple sequences of n words (n-grams), such as bigrams and trigrams. In this paper, an exploration in the use of graphs for representing document sentences is presented. These structures are used for carrying out experiments for solving the problem of Authorship attribution. The experiments that are presented here attain approximately a 79% of accuracy, showing that the graph-based representation could be a way of encapsulating various levels of natural language descriptions in a simple structure.

mexican conference on pattern recognition | 2014

Use of Lexico-Syntactic Patterns for the Evaluation of Taxonomic Relations

Mireya Tovar; David Pinto; Azucena Montes; Gabriel González; Darnes Vilariño; Beatriz Beltrán

In this paper we present an approach for the evaluation of taxonomic relations of restricted domain ontologies. We use the evidence found in corpora associated to the ontology domain for determining the validity of the taxonomic relations. Our approach employs lexico-syntactic patterns for evaluating taxonomic relations in which the concepts are totally different, and it uses a particular technique based on subsumption for those relations in which one concept is completely included in the other one. The integration of these two techniques has allowed to automatically evaluate taxonomic relations for two ontologies of restricted domain. The performance obtained was about 70% for one ontology of the e-learning domain, whereas we obtained around 88% for the ontology associated to the artificial intelligence domain.

international conference on computational linguistics | 2014

BUAP: Evaluating Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment

Saul León; Darnes Vilariño; David Pinto; Mireya Tovar; Beatriz Beltrán

The results obtained by the BUAP team at Task 1 of SemEval 2014 are presented in this paper. The run submitted is a supervised version based on two classification models: 1) We used logistic regression for determining the semantic relatedness between a pair of sentences, and 2) We employed support vector machines for identifying textual entailment degree between the two sentences. The behaviour for the second subtask (textual entailment) obtained much better performance than the one evaluated at the first subtask (relatedness), ranking our approach in the 7th position of 18 teams that participated at the competition.

Explore More