Is this you? Create Your Porfile

Fernando Mourão

Universidade Federal de Minas Gerais

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fernando Mourão is active.

Explore More

Publication

Featured researches published by Fernando Mourão.

web search and data mining | 2008

Understanding temporal aspects in document classification

Fernando Mourão; Leonardo C. da Rocha; Renata Braga Araújo; Thierson Couto; Marcos André Gonçalves; Wagner Meira

Due to the increasing amount of information present on the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually follows a standard supervised learning strategy, where we first build a model using preclassified documents and then use it to classify new unseen documents. One major challenge for ADC in many scenarios is that the characteristics of the documents and the classes to which they belong may change over time. However, most of the current techniques for ADC are applied without taking into account the temporal evolution of the collection of documents In this work, we perform a detailed study of the temporal evolution in the ADC, introducing an analysis methodology. We discuss that temporal evolution may be explained by three factors: 1) class distribution; 2) term distribution; and 3) class similarity. We employ metrics and experimental strategies capable of isolating each of these factors in order to analyze them separately, using two very different document collections: the ACM Digital Library and the Medline medical collections. Moreover, we present some preliminary results of potential gains that could be obtained by varying the training set to find the ideal size that minimizes the time effects. We show that by using just 69% of the ACM database, we are able to have an accuracy of 89.76%, and with only 25% of the Medline, an accuracy of 87.57%, which means gains of up to 20% in accuracy with much smaller training sets

international acm sigir conference on research and development in information retrieval | 2010

Temporally-aware algorithms for document classification

Thiago Salles; Leonardo C. da Rocha; Gisele L. Pappa; Fernando Mourão; Wagner Meira; Marcos André Gonçalves

Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use this model to classify unseen documents. The majority of supervised algorithms consider that all documents provide equally important information. However, in practice, a document may be considered more or less important to build the classification model according to several factors, such as its timeliness, the venue where it was published in, its authors, among others. In this paper, we are particularly concerned with the impact that temporal effects may have on ADC and how to minimize such impact. In order to deal with these effects, we introduce a temporal weighting function (TWF) and propose a methodology to determine it for document collections. We applied the proposed methodology to ACM-DL and Medline and found that the TWF of both follows a lognormal. We then extend three ADC algorithms (namely kNN, Rocchio and Naïve Bayes) to incorporate the TWF. Experiments showed that the temporally-aware classifiers achieved significant gains, outperforming (or at least matching) state-of-the-art algorithms.

conference on information and knowledge management | 2008

Exploiting temporal contexts in text classification

Leonardo C. da Rocha; Fernando Mourão; Adriano C. M. Pereira; Marcos André Gonçalves; Wagner Meira

Due to the increasing amount of information being stored and accessible through the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use it to classify unseen documents. One major challenge in building classifiers is dealing with the temporal evolution of the characteristics of the documents and the classes to which they belong. However, most of the current techniques for ADC do not consider this evolution while building and using the models. Previous results show that the performance of classifiers may be affected by three different temporal effects (class distribution, term distribution and class similarity). Further, it is shown that using just portions of the pre-classified documents, which we call contexts, for building the classifiers, result in better performance, as a consequence of the minimization of the aforementioned effects. In this paper we define the concept of temporal contexts as being the portions of documents that minimize those effects. We then propose a general algorithm for determining such contexts, discuss its implementation-related issues, and propose a heuristic that is able to determine temporal contexts efficiently. In order to demonstrate the effectiveness of our strategy, we evaluated it using two distinct collections: ACM-DL and MedLine. We initially evaluated the reduction in terms of both the effort to build a classifier and the entropy associated with each context. Further, we evaluated whether these observed reductions translate into better classification performance by employing a very simple classifier, majority voting. The results show that we achieved precision gains of up to 30% compared to a version that is not temporally contextualized, and the same accuracy of a state-of-the-art classifier (SVM), while presenting an execution time up to hundreds of times faster.

international conference on conceptual structures | 2015

A Framework for Migrating Relational Datasets to NoSQL 1 .

Leonardo Cristian Rocha; Fernando Vale; Elder Cirilo; Dárlinton Barbosa; Fernando Mourão

Abstract In software development, migration from a Data Base Management System (DBMS) to another, especially with distinct characteristics, is a challenge for programmers and database administrators. Changes in the application code in order to comply with new DBMS are usually vast, causing migrations infeasible. In order to tackle this problem, we present NoSQLayer, a framework capable to support conveniently migrating from relational (i.e., MySQL) to NoSQL DBMS (i.e., MongoDB). This framework is presented in two parts: (1) migration module; and, (2) mapping module. The first one is a set of methods enabling seamless migration between DBMSs (i.e. MySQL to MongoDB). The latter provides a persistence layer to process database requests, being capable to translate and execute these requests in any DBMS, returning the data in a suitable format as well. Experiments show NoSQLayer as a handful solution suitable to handle large volume of data (e.g., Web scale) in which traditional relational DBMS might be inept in the duty.

Information Systems | 2013

Temporal contexts: Effective text classification in evolving document collections

Leonardo C. da Rocha; Fernando Mourão; Hilton de Oliveira Mota; Thiago Salles; Marcos André Gonçalves; Wagner Meira

The management of a huge and growing amount of information available nowadays makes Automatic Document Classification (ADC), besides crucial, a very challenging task. Furthermore, the dynamics inherent to classification problems, mainly on the Web, make this task even more challenging. Despite this fact, the actual impact of such temporal evolution on ADC is still poorly understood in the literature. In this context, this work concerns to evaluate, characterize and exploit the temporal evolution to improve ADC techniques. As first contribution we highlight the proposal of a pragmatical methodology for evaluating the temporal evolution in ADC domains. Through this methodology, we can identify measurable factors associated to ADC models degradation over time. Going a step further, based on such analyzes, we propose effective and efficient strategies to make current techniques more robust to natural shifts over time. We present a strategy, named temporal context selection, for selecting portions of the training set that minimize those factors. Our second contribution consists of proposing a general algorithm, called Chronos, for determining such contexts. By instantiating Chronos, we are able to reduce uncertainty and improve the overall classification accuracy. Empirical evaluations of heuristic instantiations of the algorithm, named WindowsChronos and FilterChronos, on two real document collections demonstrate the usefulness of our proposal. Comparing them against state-of-the-art ADC algorithms shows that selecting temporal contexts allows improvements on the classification accuracy up to 10%. Finally, we highlight the applicability and the generality of our proposal in practice, pointing out this study as a promising research direction.

Journal of Web Semantics | 2015

SACI: Sentiment Analysis by Collective Inspection on Social Media Content

Leonardo Cristian Rocha; Fernando Mourão; Thiago Silveira; Rodrigo Chaves; Giovanni Sá; Felipe Teixeira; Ramon Vieira; Renato Ferreira

Collective opinions observed in Social Media represent valuable information for a range of applications. On the pursuit of such information, current methods require a prior knowledge of each individual opinion to determine the collective one in a post collection. Differently, we assume that collective analysis could be better performed when exploiting overlaps among distinct posts of the collection. Thus, we propose SACI (Sentiment Analysis by Collective Inspection), a lexicon-based unsupervised method that extracts collective sentiments without concerning with individual classifications. SACI is based on a directed transition graph among terms of a post set and on a prior classification of these terms regarding their roles in consolidating opinions. Paths represent subsets of posts on this graph and the collective opinion is defined by traversing all paths. Besides demonstrating that collective analysis outperforms individual one w.r.t. approximating collection opinions, assessments on SACI show that good individual classifications do not guarantee good collective analysis and vice-versa. Further, SACI fulfills simultaneously requirements of efficacy, efficiency and handle of dynamicity posed by high demanding scenarios. Indeed, the consolidation of a SACI-based Web tool for real-time analysis of tweets evinces the usefulness of this work.

association for information science and technology | 2016

A quantitative analysis of the temporal effects on automatic text classification

Thiago Salles; Leonardo Cristian Rocha; Marcos André Gonçalves; Jussara M. Almeida; Fernando Mourão; Wagner Meira; Felipe Viegas

Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well‐known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.

brazilian conference on intelligent systems | 2014

Generating Cohesive Semantic Topics from Latent Factors

Paulo Viana Bicalho; Tiago Oliveira Cunha; Fernando Mourão; Gisele L. Pappa; Wagner Meira

Extracting topics from posts in social networks is a challenging and relevant computational task. Traditionally, topics are extracted by analyzing syntactic properties in the messages, assuming a high correlation between syntax and semantics. This work proposes SToC, a new method for generating more cohesive and meaningful semantic topics within a context. SToC post-processes the output of a Non-Negative Matrix Factorization (NMF) method in order to determine which latent factors should be further merged to improve cohesion. Based on NMFs output, SToC defines a topics transition graph and uses Markovian theory to merge pairs of topics mutually reachable in this graph. Experiments on two real data sample from Twitter demonstrate that is statistically better than fair baselines in supervised scenarios and able to determine cohesive and semantically valid topics in unsupervised scenarios.

conference on recommender systems | 2013

Exploiting non-content preference attributes through hybrid recommendation method

Fernando Mourão; Leonardo C. da Rocha; Joseph A. Konstan; Wagner Meira

This paper explores a method for incorporating into a recommender system explicit representations of users preferences over non-content attributes such as popularity, recency, and similarity of recommended items. We show how such attributes can be modeled as a preference vector that can be used in a vector-space content-based recommender, and how that content-based recommender can be integrated with various collaborative filtering techniques through re-weighting of Top-M recommendations. We evaluate this approach on several recommender systems datasets and collaborative filtering methods, and find that incorporating the three preference attributes can lead to a substantial increase in Top-50 precision while also enhancing diversity and novelty.

international conference on conceptual structures | 2015

Quantifying Complementarity among Strategies for Influencers’ Detection on Twitter

Alan Neves; Ramon Vieira; Fernando Mourão; Leonardo Cristian Rocha

The so-called influencer, a person with the ability to persuade people, have important role on the information diffusion in social media environments. Indeed, influencers might dictate word- of-mouth and peer recommendation, impacting tasks such as recommendation, advertising, brand evaluation, among others. Thus, a growing number of works aim to identify influencers by exploiting distinct information. Deciding about the best strategy for each domain, however, is a complex task due to the lack of consensus among these works. This paper presents a quantitative study of analysis among some of the main strategies for identifying influencers, aiming to help researchers on this decision. Besides determining semantic classes of strategies, based on the characteristics they exploit, we obtained through PCA an effective meta-learning process to combine linearly distinct strategies. As main implications, we highlight a better understanding about the selected strategies and a novel manner to alleviate the difficulty on deciding which strategy researchers would adopt.

Explore More