Giovanni Comarela
Universidade Federal de Minas Gerais
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giovanni Comarela.
knowledge discovery and data mining | 2012
Diego Sáez-Trumper; Giovanni Comarela; Virgílio A. F. Almeida; Ricardo A. Baeza-Yates; Fabrício Benevenuto
Influential people have an important role in the process of information diffusion. However, there are several ways to be influential, for example, to be the most popular or the first that adopts a new idea. In this paper we present a methodology to find trendsetters in information networks according to a specific topic of interest. Trendsetters are people that adopt and spread new ideas influencing other people before these ideas become popular. At the same time, not all early adopters are trendsetters because only few of them have the ability of propagating their ideas by their social contacts through word-of-mouth. Differently from other influence measures, a trendsetter is not necessarily popular or famous, but the one whose ideas spread over the graph successfully. Other metrics such as node in-degree or even standard Pagerank focus only in the static topology of the network. We propose a ranking strategy that focuses on the ability of some users to push new ideas that will be successful in the future. To that end, we combine temporal attributes of nodes and edges of the network with a Pagerank based algorithm to find the trendsetters for a given topic. To test our algorithm we conduct innovative experiments over a large Twitter dataset. We show that nodes with high in-degree tend to arrive late for new trends, while users in the top of our ranking tend to be early adopters that also influence their social contacts to adopt the new trend.
International Journal of Parallel Programming | 2013
Emanuel Vianna; Giovanni Comarela; Tatiana Pontes; Jussara M. Almeida; Virgílio A. F. Almeida; Kevin Wilkinson; Harumi A. Kuno; Umeshwar Dayal
MapReduce is a currently popular programming model to support parallel computations on large datasets. Among the several existing MapReduce implementations, Hadoop has attracted a lot of attention from both industry and research. In a Hadoop job, map and reduce tasks coordinate to produce a solution to the input problem, exhibiting precedence constraints and synchronization delays that are characteristic of a pipeline communication between maps (producers) and reduces (consumers). We here address the challenge of designing analytical models to estimate the performance of MapReduce workloads, notably Hadoop workloads, focusing particularly on the intra-job pipeline parallelism between map and reduce tasks belonging to the same job. We propose a hierarchical model that combines a precedence graph model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time, throughput and resource utilization. We validate our solution against a queuing network simulator and a real setup in various scenarios, finding very close agreement in both cases. In particular, our model produces estimates of average job response time that deviate from measurements of a real setup by less than 15 %.
acm conference on hypertext | 2012
Giovanni Comarela; Mark Crovella; Virgílio A. F. Almeida; Fabrício Benevenuto
In information networks where users send messages to one another, the issue of information overload naturally arises: which are the most important messages? In this paper we study the problem of understanding the importance of messages in Twitter. We approach this problem in two stages. First, we perform an extensive characterization of a very large Twitter dataset which includes all users, social relations, and messages posted from the beginning of the service up to August 2009. We show evidence that information overload is present: users sometimes have to search through hundreds of messages to find those that are interesting to reply or retweet. We then identify factors that influence user response or retweet probability: previous responses to the same tweeter, the tweeters sending rate, the age and some basic text elements of the tweet. In our second stage, we show that some of these factors can be used to improve the presentation order of tweets to the user. First, by inspecting user activity over time, we construct a simple on-off model of user behavior that allows us to infer when a user is actively using Twitter. Then, we explore two methods from machine learning for ranking tweets: a Naive Bayes predictor and a Support Vector Machine classifier. We show that it is possible to reorder tweets to increase the fraction of replied or retweeted messages appearing in the first p positions of the list by as much as 50-60%.
symposium on computer architecture and high performance computing | 2011
Emanuel Vianna; Giovanni Comarela; Tatiana Pontes; Jussara M. Almeida; Virgílio A. F. Almeida; Kevin Wilkinson; Harumi A. Kuno; Umeshwar Dayal
MapReduce is an important paradigm to support modern data-intensive applications. In this paper we address the challenge of modeling performance of one implementation of MapReduce called Hadoop Online Prototype (HOP), with a specific target on the intra-job pipeline parallelism. We use a hierarchical model that combines a precedence model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time and resource utilization. We validate our solution against a queuing network simulator in various scenarios, finding that our performance model presents a close agreement, with maximum relative difference under 15%.
genetic and evolutionary computation conference | 2011
Giovanni Comarela; Kênia Carolina Gonçalves; Gisele L. Pappa; Jussara M. Almeida; Virgílio A. F. Almeida
Sparse wireless sensor networks are characterized by the distances the sensors are from each other. In this type of network, gathering data from all sensors in a point of interest might be a difficult task, and in many cases a mobile robot is used to travel along the sensors and collect data from them. In this case, we need to provide the robot with a route that minimizes the traveled distance and allows data collection from all sensors. This problem can be modeled as the classic Traveling Salesman Problem (TSP). However, when the sensors have an influence area bounded by a circle, for example, it is not necessary that the robot touches each sensor, but only a point inside the covered area. In this case, the problem can be modeled as a special case TSP with Neighborhoods (TSPN). This work presents a new approach based on continuous Ant Colony Optimization (ACO) and simple combinatorial technique for TSP in 0order to solve that special case of TSPN. The experiments performed indicate that significant improvements are obtained with the proposed heuristic when compared with other methods found in literature.
internet measurement conference | 2014
Giovanni Comarela; Mark Crovella
Understanding the dynamics of the interdomain routing system is challenging. One reason is that a single routing or policy change can have far reaching and complex effects. Connecting observed behavior with its underlying causes is made even more difficult by the amount of noise in the BGP system. In this paper we address these challenges by presenting PathMiner, a system to extract large scale routing events from background noise and identify the AS or link responsible for the event. PathMiner is distinguished from previous work in its ability to identify and analyze large-scale events that may re-occur many times over long timescales. The central idea behind PathMiner is that although a routing change at one AS may induce large-scale, complex responses in other ASes, the correlation among those responses (in space and time) helps to isolate the relevant set of responses from background noise, and makes the cause much easier to identify. Hence, PathMiner has two components: an algorithm for mining large scale coordinated changes from routing tables, and an algorithm for identifying the network element (AS or link) responsible for the set of coordinated changes. We describe the implementation and validation of PathMiner. We show that it is scalable, being able to extract significant events from multiple years of routing data at a daily granularity. Finally, using PathMiner we study interdomain routing over past 9 years and use it to characterize the presence of large scale routing events and to identify the responsible network elements.
internet measurement conference | 2013
Giovanni Comarela; Gonca Gürsun; Mark Crovella
The dynamics of interdomain routing have traditionally been studied through the analysis of BGP update traffic. However, such studies tend to focus on the volume of BGP updates rather than their effects, and tend to be local rather than global in scope. Studying the global state of the Internet routing system over time requires the development of new methods, which we do in this paper. We define a new metric, MRSD, that allows us to measure the similarity between two prefixes with respect to the state of the global routing system. Applying this metric over time yields a measure of how the set of total paths to each prefix varies at a given timescale. We implement this analysis method in a MapReduce framework and apply it to a dataset of more than 1TB, collected daily over 3 distinct years and monthly over 8 years. We show that this analysis method can uncover interesting aspects of how Internet routing has changed over time. We show that on any given day, approximately 1% of the next-hop decisions made in the Internet change, and this property has been remarkably constant over time; the corresponding amount of change in one month is 10% and in two years is 50%. Digging deeper, we can decompose next-hop decision changes into two classes: churn, and structural (persistent) change. We show that structural change shows a strong 7-day periodicity and that it represents approximately 2/3 of the total amount of changes.
Proceedings of the Workshop on Language in Social Media (LSM 2011) | 2011
Evandro Cunha; Gabriel Magno; Giovanni Comarela; Virgílio A. F. Almeida; Marcos André Gonçalves; Fabrício Benevenuto
internet measurement conference | 2012
Gabriel Magno; Giovanni Comarela; Diego Sáez-Trumper; Meeyoung Cha; Virgílio A. F. Almeida
internet measurement conference | 2016
Giovanni Comarela; Evimaria Terzi; Mark Crovella