Daniela Moctezuma
Consejo Nacional de Ciencia y Tecnología
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniela Moctezuma.
Expert Systems With Applications | 2017
Eric Sadit Tellez; Sabino Miranda-Jimnez; Mario Graff; Daniela Moctezuma; Oscar S. Siordia; Elio A. Villaseor
A review of popular techniques to model short texts written in an informal style.An analysis of configurations that produce the top-k sentiment classifiers.The analysis is oriented to the performance in both accuracy and computing time.A simple method to create fast and accurate sentiment analysis systems. Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text because of the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads.The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n-grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n-grams and character-based q-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional word-based combination by 11.17% and 5.62% on the INEGI and TASS15 dataset, respectively.
Pattern Recognition Letters | 2017
Eric Sadit Tellez; Sabino Miranda-Jimnez; Mario Graff; Daniela Moctezuma; Ranyart R. Surez; Oscar S. Siordia
A framework to create sentiment classifiers in several languages.It produces classifier robust to typical writing errors found in micro-blogging.It can serve as a baseline for contests.It can server as a starting point to build new systems.Its performance excels in the majority of the tested datasets. Recently, sentiment analysis has received a lot of attention due to the interest in mining opinions of social media users. Sentiment analysis consists in determining the polarity of a given text, i.e., its degree of positiveness or negativeness. Traditionally, Sentiment Analysis algorithms have been tailored to a specific language given the complexity of having a number of lexical variations and errors introduced by the people generating content. In this contribution, our aim is to provide a simple to implement and easy to use multilingual framework, that can serve as a baseline for sentiment analysis contests, and as a starting point to build new sentiment analysis systems. We compare our approach in eight different languages, three of them correspond to important international contests, namely, SemEval (English), TASS (Spanish), and SENTIPOLC (Italian). Within the competitions, our approach reaches from medium to high positions in the rankings; whereas in the remaining languages our approach outperforms the reported results.
ieee international autumn meeting on power electronics and computing | 2016
Daniela Fernandez; Daniela Moctezuma; Oscar S. Siordia
Gender classification in social platforms and social media has become a relevant topic for the industry because of its impact in making decision process. Gender recognition in Twitter is a business intelligence tool focused on twitter data acquisition, analysis, and process, and it can be used in many ways to transform it into valuable business intelligence data. In this paper, a method for gender recognition in Twitter users is proposed. This method employs several features related to user profile picture, screen name and profile description. This method was evaluated in a dataset with 574 users acquired from Twitter API, these users are located in Aguascalientes City at Mexico and they were manually labelled. The experimental results show an accuracy of 89.5%
Knowledge Based Systems | 2018
Eric Sadit Tellez; Daniela Moctezuma; Sabino Miranda-Jiménez; Mario Graff
A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.
ieee international autumn meeting on power electronics and computing | 2016
Isaac David; Oscar S. Siordia; Daniela Moctezuma
Microblogging social networks are easily subverted by automated fake identities that amass disproportionately large influence. In this paper we present an effort to profile and screen such kind of accounts from existing and original ground truth obtained from the Twitter platform. Seventy-one explanatory properties solely extracted from profile and timeline information are evaluated and used to compare the efficacy of common supervised machine learning methods at this classification task. Results confirm that feasible and largely effective detection devices can be constructed for the problem at hand.
CLEF (Working Notes) | 2017
Eric Sadit Tellez; Sabino Miranda-Jiménez; Mario Graff; Daniela Moctezuma
digital government research | 2015
Gabriel Puron-Cid; J. Jaime Sainz Santamaria; Oscar S. Siordia; Daniela Moctezuma
meeting of the association for computational linguistics | 2017
Sabino Miranda-Jiménez; Mario Graff; Eric Sadit Tellez; Daniela Moctezuma
TASS@SEPLN | 2015
Oscar S. Siordia; Daniela Moctezuma; Mario Graff; Sabino Miranda-Jiménez; Eric Sadit Tellez; Elio-Atenógenes Villaseñor
north american chapter of the association for computational linguistics | 2018
Mario Graff; Sabino Miranda-Jiménez; Eric Sadit Tellez; Daniela Moctezuma