Is this you? Create Your Porfile

Adriano Veloso

Universidade Federal de Minas Gerais

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adriano Veloso is active.

Explore More

Publication

Featured researches published by Adriano Veloso.

web science | 2011

Dengue surveillance based on a computational model of spatio-temporal locality of Twitter

Janaína Gomide; Adriano Veloso; Wagner Meira; Virgílio A. F. Almeida; Fabrício Benevenuto; Fernanda Oliveira Ferraz; Mauro M. Teixeira

Twitter is a unique social media channel, in the sense that users discuss and talk about the most diverse topics, including their health conditions. In this paper we analyze how Dengue epidemic is reflected on Twitter and to what extent that information can be used for the sake of surveillance. Dengue is a mosquito-borne infectious disease that is a leading cause of illness and death in tropical and subtropical regions, including Brazil. We propose an active surveillance methodology that is based on four dimensions: volume, location, time and public perception. First we explore the public perception dimension by performing sentiment analysis. This analysis enables us to filter out content that is not relevant for the sake of Dengue surveillance. Then, we verify the high correlation between the number of cases reported by official statistics and the number of tweets posted during the same time period (i.e., R2 = 0.9578). A clustering approach was used in order to exploit the spatio-temporal dimension, and the quality of the clusters obtained becomes evident when they are compared to official data (i.e., RandIndex = 0.8914). As an application, we propose a Dengue surveillance system that shows the evolution of the dengue situation reported in tweets, which is implemented in www.observatorio.inweb.org.br/dengue/.

international conference on data mining | 2006

Lazy Associative Classification

Adriano Veloso; Wagner Meira; Mohammed Javeed Zaki

Decision tree classifiers perform a greedy search for rules by heuristically selecting the most promising features. Such greedy (local) search may discard important rules. Associative classifiers, on the other hand, perform a global search for rules satisfying some quality constraints (i.e., minimum support). This global search, however, may generate a large number of rules. Further, many of these rules may be useless during classification, and worst, important rules may never be mined. Lazy (non-eager) associative classification overcomes this problem by focusing on the features of the given test instance, increasing the chance of generating more rules that are useful for classifying the test instance. In this paper we assess the performance of lazy associative classification. First we demonstrate that an associative classifier performs no worse than the corresponding decision tree classifier. Also we demonstrate that lazy classifiers outperform the corresponding eager ones. Our claims are empirically confirmed by an extensive set of experimental results. We show that our proposed lazy associative classifier is responsible for an error rate reduction of approximately 10% when compared against its eager counterpart, and for a reduction of 20% when compared against a decision tree classifier. A simple caching mechanism makes lazy associative classification fast, and thus improvements in the execution time are also observed.

knowledge discovery and data mining | 2011

From bias to opinion: a transfer-learning approach to real-time sentiment analysis

Pedro Henrique Calais Guerra; Adriano Veloso; Wagner Meira; Virgílio A. F. Almeida

Real-time interaction, which enables live discussions, has become a key feature of most Web applications. In such an environment, the ability to automatically analyze user opinions and sentiments as discussions develop is a powerful resource known as real time sentiment analysis. However, this task comes with several challenges, including the need to deal with highly dynamic textual content that is characterized by changes in vocabulary and its subjective meaning and the lack of labeled data needed to support supervised classifiers. In this paper, we propose a transfer learning strategy to perform real time sentiment analysis. We identify a task - opinion holder bias prediction - which is strongly related to the sentiment analysis task; however, in constrast to sentiment analysis, it builds accurate models since the underlying relational data follows a stationary distribution. Instead of learning textual models to predict content polarity (i.e., the traditional sentiment analysis approach), we first measure the bias of social media users toward a topic, by solving a relational learning task over a network of users connected by endorsements (e.g., retweets in Twitter). We then analyze sentiments by transferring user biases to textual features. This approach works because while new terms may arise and old terms may change their meaning, user bias tends to be more consistent over time as a basic property of human behavior. Thus, we adopted user bias as the basis for building accurate classification models. We applied our model to posts collected from Twitter on two topics: the 2010 Brazilian Presidential Elections and the 2010 season of Brazilian Soccer League. Our results show that knowing the bias of only 10% of users generates an F1 accuracy level ranging from 80% to 90% in predicting user sentiment in tweets.

international acm sigir conference on research and development in information retrieval | 2008

Learning to rank at query-time using association rules

Adriano Veloso; Humberto Mossri de Almeida; Marcos André Gonçalves; Wagner Meira

Some applications have to present their results in the form of ranked lists. This is the case of many information retrieval applications, in which documents must be sorted according to their relevance to a given query. This has led the interest of the information retrieval community in methods that automatically learn effective ranking functions. In this paper we propose a novel method which uncovers patterns (or rules) in the training data associating features of the document with its relevance to the query, and then uses the discovered rules to rank documents. To address typical problems that are inherent to the utilization of association rules (such as missing rules and rule explosion), the proposed method generates rules on a demand-driven basis, at query-time. The result is an extremely fast and effective ranking method. We conducted a systematic evaluation of the proposed method using the LETOR benchmark collections. We show that generating rules on a demand-driven basis can boost ranking performance, providing gains ranging from 12% to 123%, outperforming the state-of-the-art methods that learn to rank, with no need of time-consuming and laborious pre-processing. As a highlight, we also show that additional information, such as query terms, can make the generated rules more discriminative, further improving ranking performance.

acm/ieee joint conference on digital libraries | 2010

Effective self-training author name disambiguation in scholarly digital libraries

Anderson A. Ferreira; Adriano Veloso; Marcos André Gonçalves; Alberto H. F. Laender

Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and is as effective as, and in some cases superior to, supervised ones, without manually labeling any training example.

multimedia information retrieval | 2010

Learning to rank for content-based image retrieval

Fábio Augusto Faria; Adriano Veloso; Humberto Mossri de Almeida; Eduardo Valle; Ricardo da Silva Torres; Marcos André Gonçalves; Wagner Meira

In Content-based Image Retrieval (CBIR), accurately ranking the returned images is of paramount importance, since users consider mostly the topmost results. The typical ranking strategy used by many CBIR systems is to employ image content descriptors, so that returned images that are most similar to the query image are placed higher in the rank. While this strategy is well accepted and widely used, improved results may be obtained by combining multiple image descriptors. In this paper we explore this idea, and introduce algorithms that learn to combine information coming from different descriptors. The proposed learning to rank algorithms are based on three diverse learning techniques: Support Vector Machines (CBIR-SVM), Genetic Programming (CBIR-GP), and Association Rules (CBIR-AR). Eighteen image content descriptors(color, texture, and shape information) are used as input and provided as training to the learning algorithms. We performed a systematic evaluation involving two complex and heterogeneous image databases (Corel e Caltech) and two evaluation measures (Precision and MAP). The empirical results show that all learning algorithms provide significant gains when compared to the typical ranking strategy in which descriptors are used in isolation. We concluded that, in general, CBIR-AR and CBIR-GP outperforms CBIR-SVM. A fine-grained analysis revealed the lack of correlation between the results provided by CBIR-AR and the results provided by the other two algorithms, which indicates the opportunity of an advantageous hybrid approach.

advances in social networks analysis and mining | 2015

Reverse Engineering Socialbot Infiltration Strategies in Twitter

Carlos Alessandro Sena de Freitas; Fabrício Benevenuto; Saptarshi Ghosh; Adriano Veloso

Online Social Networks (OSNs) such as Twitter and Facebook have become a significant testing ground for Artificial Intelligence developers who build programs, known as socialbots, that imitate actual users by automating their social-network activities such as forming social links and posting content. Particularly, Twitter users have shown difficulties in distinguishing these socialbots from the human users in their social graphs. Frequently, legitimate users engage in conversations with socialbots. More impressively, socialbots are effective in acquiring human users as followers and exercising influence within them. While the success of socialbots is certainly a remarkable achievement for AI practitioners, their proliferation in the Twitter-sphere opens many possibilities for cybercrime. The proliferation of socialbots in the Twitter-sphere motivates us to assess the characteristics or strategies that make socialbots most likely to succeed. In this direction, we created 120 socialbot accounts in Twitter, which have a profile, follow other users, and generate tweets either by reposting messages that others have posted or by creating their own synthetic tweets. Then, we employ a 2k factorial design experiment in order to quantify the infiltration effectiveness of different socialbot strategies. Our analysis is the first of a kind, and reveals what strategies make socialbots successful in the Twitter-sphere.

systems man and cybernetics | 2004

Parallel and distributed methods for incremental frequent itemset mining

Matthew Eric Otey; Srinivasan Parthasarathy; Chao Wang; Adriano Veloso; Wagner Meira

Traditional methods for data mining typically make the assumption that the data is centralized, memory-resident, and static. This assumption is no longer tenable. Such methods waste computational and input/output (I/O) resources when data is dynamic, and they impose excessive communication overhead when data is distributed. Efficient implementation of incremental data mining methods is, thus, becoming crucial for ensuring system scalability and facilitating knowledge discovery when data is dynamic and distributed. In this paper, we address this issue in the context of the important task of frequent itemset mining. We first present an efficient algorithm which dynamically maintains the required information even in the presence of data updates without examining the entire dataset. We then show how to parallelize this incremental algorithm. We also propose a distributed asynchronous algorithm, which imposes minimal communication overhead for mining distributed dynamic datasets. Our distributed approach is capable of generating local models (in which each site has a summary of its own database) as well as the global model of frequent itemsets (in which all sites have a summary of the entire database). This ability permits our approach not only to generate frequent itemsets, but also to generate high-contrast frequent itemsets, which allows one to examine how the data is skewed over different sites.

conference on recommender systems | 2012

Pareto-efficient hybridization for multi-objective recommender systems

Marco Túlio de Freitas Ribeiro; Anisio Lacerda; Adriano Veloso; Nivio Ziviani

Performing accurate suggestions is an objective of paramount importance for effective recommender systems. Other important and increasingly evident objectives are novelty and diversity, which are achieved by recommender systems that are able to suggest diversified items not easily discovered by the users. Different recommendation algorithms have particular strengths and weaknesses when it comes to each of these objectives, motivating the construction of hybrid approaches. However, most of these approaches only focus on optimizing accuracy, with no regard for novelty and diversity. The problem of combining recommendation algorithms grows significantly harder when multiple objectives are considered simultaneously. For instance, devising multi-objective recommender systems that suggest items that are simultaneously accurate, novel and diversified may lead to a conflicting-objective problem, where the attempt to improve an objective further may result in worsening other competing objectives. In this paper we propose a hybrid recommendation approach that combines existing algorithms which differ in their level of accuracy, novelty and diversity. We employ an evolutionary search for hybrids following the Strength Pareto approach, which isolates hybrids that are not dominated by others (i.e., the so called Pareto frontier). Experimental results on two recommendation scenarios show that: (i) we can combine recommendation algorithms in order to improve an objective without significantly hurting other objectives, and (ii) we allow for adjusting the compromise between accuracy, diversity and novelty, so that the recommendation emphasis can be adjusted dynamically according to the needs of different users.

european conference on principles of data mining and knowledge discovery | 2007

Multi-label Lazy Associative Classification

Adriano Veloso; Wagner Meira; Marcos André Gonçalves; Mohammed Javeed Zaki

Most current work on classification has been focused on learning from a set of instances that are associated with a single label (i.e., single-label classification). However, many applications, such as gene functional prediction and text categorization, may allow the instances to be associated with multiple labels simultaneously. Multi-label classification is a generalization of single-label classification, and its generality makes it much more difficult to solve. Despite its importance, research on multi-label classification is still lacking. Common approaches simply learn independent binary classifiers for each label, and do not exploit dependencies among labels. Also, several small disjuncts may appear due to the possibly large number of label combinations, and neglecting these small disjuncts may degrade classification accuracy. In this paper we propose a multi-label lazy associative classifier, which progressively exploits dependencies among labels. Further, since in our lazy strategy the classification model is induced on an instance-based fashion, the proposed approach can provide a better coverage of small disjuncts. Gains of up to 24% are observed when the proposed approach is compared against the state-of-the-art multi-label classifiers.

Explore More