Is this you? Create Your Porfile

Daniela Moctezuma

Consejo Nacional de Ciencia y Tecnología

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniela Moctezuma is active.

Explore More

Publication

Featured researches published by Daniela Moctezuma.

Expert Systems With Applications | 2017

A case study of Spanish text transformations for twitter sentiment analysis

Eric Sadit Tellez; Sabino Miranda-Jimnez; Mario Graff; Daniela Moctezuma; Oscar S. Siordia; Elio A. Villaseor

A review of popular techniques to model short texts written in an informal style.An analysis of configurations that produce the top-k sentiment classifiers.The analysis is oriented to the performance in both accuracy and computing time.A simple method to create fast and accurate sentiment analysis systems. Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text because of the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads.The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n-grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n-grams and character-based q-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional word-based combination by 11.17% and 5.62% on the INEGI and TASS15 dataset, respectively.

Pattern Recognition Letters | 2017

A simple approach to multilingual polarity classification in Twitter

Eric Sadit Tellez; Sabino Miranda-Jimnez; Mario Graff; Daniela Moctezuma; Ranyart R. Surez; Oscar S. Siordia

A framework to create sentiment classifiers in several languages.It produces classifier robust to typical writing errors found in micro-blogging.It can serve as a baseline for contests.It can server as a starting point to build new systems.Its performance excels in the majority of the tested datasets. Recently, sentiment analysis has received a lot of attention due to the interest in mining opinions of social media users. Sentiment analysis consists in determining the polarity of a given text, i.e., its degree of positiveness or negativeness. Traditionally, Sentiment Analysis algorithms have been tailored to a specific language given the complexity of having a number of lexical variations and errors introduced by the people generating content. In this contribution, our aim is to provide a simple to implement and easy to use multilingual framework, that can serve as a baseline for sentiment analysis contests, and as a starting point to build new sentiment analysis systems. We compare our approach in eight different languages, three of them correspond to important international contests, namely, SemEval (English), TASS (Spanish), and SENTIPOLC (Italian). Within the competitions, our approach reaches from medium to high positions in the rankings; whereas in the remaining languages our approach outperforms the reported results.

ieee international autumn meeting on power electronics and computing | 2016

Features combination for gender recognition on Twitter users

Daniela Fernandez; Daniela Moctezuma; Oscar S. Siordia

Gender classification in social platforms and social media has become a relevant topic for the industry because of its impact in making decision process. Gender recognition in Twitter is a business intelligence tool focused on twitter data acquisition, analysis, and process, and it can be used in many ways to transform it into valuable business intelligence data. In this paper, a method for gender recognition in Twitter users is proposed. This method employs several features related to user profile picture, screen name and profile description. This method was evaluated in a dataset with 574 users acquired from Twitter API, these users are located in Aguascalientes City at Mexico and they were manually labelled. The experimental results show an accuracy of 89.5%

Knowledge Based Systems | 2018

An automated text categorization framework based on hyperparameter optimization

Eric Sadit Tellez; Daniela Moctezuma; Sabino Miranda-Jiménez; Mario Graff

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.

ieee international autumn meeting on power electronics and computing | 2016

Features combination for the detection of malicious Twitter accounts

Isaac David; Oscar S. Siordia; Daniela Moctezuma

Microblogging social networks are easily subverted by automated fake identities that amass disproportionately large influence. In this paper we present an effort to profile and screen such kind of accounts from existing and original ground truth obtained from the Twitter platform. Seventy-one explanatory properties solely extracted from profile and timeline information are evaluated and used to compare the efficacy of common supervised machine learning methods at this classification task. Results confirm that feasible and largely effective detection devices can be constructed for the problem at hand.

CLEF (Working Notes) | 2017