Is this you? Create Your Porfile

Gisele L. Pappa

Universidade Federal de Minas Gerais

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gisele L. Pappa is active.

Explore More

Publication

Featured researches published by Gisele L. Pappa.

Transactions in Gis | 2011

Inferring the Location of Twitter Messages Based on User Relationships

Clodoveu A. Davis; Gisele L. Pappa; Diogo Rennó Rocha de Oliveira; Filipe de Lima Arcanjo

User interaction in social networks, such as Twitter and Facebook, is increasingly becoming a source of useful information on daily events. The online monitoring of short messages posted in such networks often provides insight on the repercussions of events of several different natures, such as (in the recent past) the earthquake and tsunami in Japan, the royal wedding in Britain and the death of Osama bin Laden. Studying the origins and the propagation of messages regarding such topics helps social scientists in their quest for improving the current understanding of human relationships and interactions. However, the actual location associated to a tweet or to a Facebook message can be rather uncertain. Some tweets are posted with an automatically determined location (from an IP address), or with a user-informed location, both in text form, usually the name of a city. We observe that most Twitter users opt not to publish their location, and many do so in a cryptic way, mentioning non-existing places or providing less specific place names (such as “Brazil”). In this article, we focus on the problem of enriching the location of tweets using alternative data, particularly the social relationships between Twitter users. Our strategy involves recursively expanding the network of locatable users using following-follower relationships. Verification is achieved using cross-validation techniques, in which the location of a fraction of the users with known locations is used to determine the location of the others, thus allowing us to compare the actual location to the inferred one and verify the quality of the estimation. With an estimate of the precision of the method, it can then be applied to locationless tweets. Our intention is to infer the location of as many users as possible, in order to increase the number of tweets that can be used in spatial analyses of social phenomena. The article demonstrates the feasibility of our approach using a dataset comprising tweets that mention keywords related to dengue fever, increasing by 45% the number of locatable tweets.

Genetic Programming and Evolvable Machines | 2014

Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms

Gisele L. Pappa; Gabriela Ochoa; Matthew R. Hyde; Alex Alves Freitas; John R. Woodward; Jerry Swan

The fields of machine meta-learning and hyper-heuristic optimisation have developed mostly independently of each other, although evolutionary algorithms (particularly genetic programming) have recently played an important role in the development of both fields. Recent work in both fields shares a common goal, that of automating as much of the algorithm design process as possible. In this paper we first provide a historical perspective on automated algorithm design, and then we discuss similarities and differences between meta-learning in the field of supervised machine learning (classification) and hyper-heuristics in the field of optimisation. This discussion focuses on the dimensions of the problem space, the algorithm space and the performance measure, as well as clarifying important issues related to different levels of automation and generality in both fields. We also discuss important research directions, challenges and foundational issues in meta-learning and hyper-heuristic research. It is important to emphasize that this paper is not a survey, as several surveys on the areas of meta-learning and hyper-heuristics (separately) have been previously published. The main contribution of the paper is to contrast meta-learning and hyper-heuristics methods and concepts, in order to promote awareness and cross-fertilisation of ideas across the (by and large, non-overlapping) different communities of meta-learning and hyper-heuristic researchers. We hope that this cross-fertilisation of ideas can inspire interesting new research in both fields and in the new emerging research area which consists of integrating those fields.

Knowledge and Information Systems | 2009

Evolving rule induction algorithms with multi-objective grammar-based genetic programming

Gisele L. Pappa; Alex Alves Freitas

Multi-objective optimization has played a major role in solving problems where two or more conflicting objectives need to be simultaneously optimized. This paper presents a Multi-Objective grammar-based genetic programming (MOGGP) system that automatically evolves complete rule induction algorithms, which in turn produce both accurate and compact rule models. The system was compared with a single objective GGP and three other rule induction algorithms. In total, 20 UCI data sets were used to generate and test generic rule induction algorithms, which can be now applied to any classification data set. Experiments showed that, in general, the proposed MOGGP finds rule induction algorithms with competitive predictive accuracies and more compact models than the algorithms it was compared with.

genetic and evolutionary computation conference | 2012

Roadside unit deployment for information dissemination in a VANET: an evolutionary approach

Evellyn Cavalcante; André L. L. de Aquino; Gisele L. Pappa; Antonio Alfredo Ferreira Loureiro

A VANET is a network where each node represents a vehicle equipped with wireless communication technology. This type of network enhances road safety, traffic efficiency, Internet access and many others applications to minimize environmental impact and in general maximize the benefits for the road users. This paper studies a relevant problem in VANETs, known as the deployment of RSUs. A RSU is an access points, used together with the vehicles, to allow information dissemination in the roads. Knowing where to place these RSUs so that a maximum number of vehicles circulating is covered is a challenge. We model the problem as a Maximum Coverage with Time Threshold Problem (MCTTP), and use a genetic algorithm to solve it. The algorithm is tested in four real-world datasets, and compared to a greedy approach previously proposed in the literature. The results show that our approach finds better results than the greedy in all scenarios, with gains up to 11 percentage points.

international symposium on neural networks | 2009

From an artificial neural network to a stock market day-trading system: A case study on the BM&F BOVESPA

Leonardo Conegundes Martinez; Diego N. da Hora; João R. M. Palotti; Wagner Meira; Gisele L. Pappa

Predicting trends in the stock market is a subject of major interest for both scholars and financial analysts. The main difficulties of this problem are related to the dynamic, complex, evolutive and chaotic nature of the markets. In order to tackle these problems, this work proposes a day-trading system that “translates” the outputs of an artificial neural network into business decisions, pointing out to the investors the best times to trade and make profits. The ANN forecasts the lowest and highest stock prices of the current trading day. The system was tested with the two main stocks of the BM&FBOVESPA, an important and understudied market. A series of experiments were performed using different data input configurations, and compared with four benchmarks. The results were evaluated using both classical evaluation metrics, such as the ANN generalization error, and more general metrics, such as the annualized return. The ANN showed to be more accurate and give more return to the investor than the four benchmarks. The best results obtained by the ANN had an mean absolute percentage error around 50% smaller than the best benchmark, and doubled the capital of the investor.

international acm sigir conference on research and development in information retrieval | 2010

Temporally-aware algorithms for document classification

Thiago Salles; Leonardo C. da Rocha; Gisele L. Pappa; Fernando Mourão; Wagner Meira; Marcos André Gonçalves

Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use this model to classify unseen documents. The majority of supervised algorithms consider that all documents provide equally important information. However, in practice, a document may be considered more or less important to build the classification model according to several factors, such as its timeliness, the venue where it was published in, its authors, among others. In this paper, we are particularly concerned with the impact that temporal effects may have on ADC and how to minimize such impact. In order to deal with these effects, we introduce a temporal weighting function (TWF) and propose a methodology to determine it for document collections. We applied the proposed methodology to ACM-DL and Medline and found that the TWF of both follows a lognormal. We then extend three ADC algorithms (namely kNN, Rocchio and Naïve Bayes) to incorporate the TWF. Experiments showed that the temporally-aware classifiers achieved significant gains, outperforming (or at least matching) state-of-the-art algorithms.

european conference on machine learning | 2010

Demand-driven tag recommendation

Guilherme Vale Menezes; Jussara M. Almeida; Fabiano Muniz Belém; Marcos André Goncçalves; Anisio Lacerda; Edleno Silva de Moura; Gisele L. Pappa; Adriano Veloso; Nivio Ziviani

Collaborative tagging allows users to assign arbitrary keywords (or tags) describing the content of objects, which facilitates navigation and improves searching without dependence on pre-configured categories. In large-scale tag-based systems, tag recommendation services can assist a user in the assignment of tags to objects and help consolidate the vocabulary of tags across users. A promising approach for tag recommendation is to exploit the co-occurrence of tags. However, these methods are challenged by the huge size of the tag vocabulary, either because (1) the computational complexity may increase exponentially with the number of tags or (2) the score associated with each tag may become distorted since different tags may operate in different scales and the scores are not directly comparable. In this paper we propose a novel method that recommends tags on a demand-driven basis according to an initial set of tags applied to an object. It reduces the space of possible solutions, so that its complexity increases polynomially with the size of the tag vocabulary. Further, the score of each tag is calibrated using an entropy minimization approach which corrects possible distortions and provides more precise recommendations. We conducted a systematic evaluation of the proposed method using three types of media: audio, bookmarks and video. The experimental results show that the proposed method is fast and boosts recommendation quality on different experimental scenarios. For instance, in the case of a popular audio site it provides improvements in precision (p@5) ranging from 6.4% to 46.7% (depending on the number of tags given as input), outperforming a recently proposed co-occurrence based tag recommendation method.

congress on evolutionary computation | 2010

Active Learning Genetic programming for record deduplication

Junio de Freitas; Gisele L. Pappa; Altigran Soares da Silva; Marcos André Goncçalves; Edleno Silva de Moura; Adriano Veloso; Alberto H. F. Laender; Moisés G. de Carvalho

The great majority of genetic programming (GP) algorithms that deal with the classification problem follow a supervised approach, i.e., they consider that all fitness cases available to evaluate their models are labeled. However, in certain application domains, a lot of human effort is required to label training data, and methods following a semi-supervised approach might be more appropriate. This is because they significantly reduce the time required for data labeling while maintaining acceptable accuracy rates. This paper presents the Active Learning GP (AGP), a semi-supervised GP, and instantiates it for the data deduplication problem. AGP uses an active learning approach in which a committee of multi-attribute functions votes for classifying record pairs as duplicates or not. When the committee majority voting is not enough to predict the class of the data pairs, a user is called to solve the conflict. The method was applied to three datasets and compared to two other deduplication methods. Results show that AGP guarantees the quality of the deduplication while reducing the number of labeled examples needed.

european conference on machine learning | 2006

Automatically evolving rule induction algorithms

Gisele L. Pappa; Alex Alves Freitas

Research in the rule induction algorithm field produced many algorithms in the last 30 years. However, these algorithms are usually obtained from a few basic rule induction algorithms that have been often changed to produce better ones. Having these basic algorithms and their components in mind, this work proposes the use of Grammar-based Genetic Programming (GGP) to automatically evolve rule induction algorithms. The proposed GGP is evaluated in extensive computational experiments involving 11 data sets. Overall, the results show that effective rule induction algorithms can be automatically generated using GGP. The automatically evolved rule induction algorithms were shown to be competitive with well-known manually designed ones. The proposed approach of automatically evolving rule induction algorithms can be considered a pioneering one, opening a new kind of research area.

workshop on location-based social networks | 2012

Traffic observatory: a system to detect and locate traffic events and conditions using Twitter

Sílvio S. Ribeiro; Clodoveu A. Davis; Diogo Rennó Rocha de Oliveira; Wagner Meira; Tatiana S. Gonçalves; Gisele L. Pappa

Twitter has become one of the most popular platforms for sharing user-generated content, which varies from ordinary conversations to information about recent events. Studies have already showed that the content of tweets has a high degree of correlation with what is going on in the real world. A type of event which is commonly talked about in Twitter is traffic. Aiming to help other drivers, many users tweet about current traffic conditions, and there are even user accounts specialized on the subject. With this in mind, this paper proposes a method to identify traffic events and conditions in Twitter, geocode them, and display them on the Web in real time. Preliminary results showed that the method is able to detect neighborhoods and thoroughfares with a precision that varies from 50 to 90%, depending on the number of places mentioned in the tweets.

Explore More