Bruno F. Ribeiro | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bruno F. Ribeiro is active.

Explore More

Publication

Featured researches published by Bruno F. Ribeiro.

internet measurement conference | 2010

Estimating and sampling graphs with multidimensional random walks

Bruno F. Ribeiro; Donald F. Towsley

Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too many resources (time, bandwidth, or money). Random walks, which normally require fewer resources per sample, can suffer from large estimation errors in the presence of disconnected or loosely connected graphs. In this work we propose a new m-dimensional random walk that uses m dependent random walkers. We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk. At the same time, our simulations over large real world graphs show that, in the presence of disconnected or loosely connected components, Frontier sampling exhibits lower estimation errors than regular random walks. We also show that Frontier sampling is more suitable than random vertex sampling to sample the tail of the degree distribution of the graph.

workshop on algorithms and models for the web graph | 2010

Improving Random Walk Estimation Accuracy with Uniform Restarts

Konstantin Avrachenkov; Bruno F. Ribeiro; Donald F. Towsley

This work proposes and studies the properties of a hybrid sampling scheme that mixes independent uniform node sampling and random walk (RW)-based crawling. We show that our sampling method combines the strengths of both uniform and RW sampling while minimizing their drawbacks. In particular, our method increases the spectral gap of the random walk, and hence, accelerates convergence to the stationary distribution. The proposed method resembles PageRank but unlike PageRank preserves time-reversibility. Applying our hybrid RW to the problem of estimating degree distributions of graphs shows promising results.

internet measurement conference | 2006

Fisher information of sampled packets: an application to flow size estimation

Bruno F. Ribeiro; Donald F. Towsley; Tao Ye; Jean Bolot

Packet sampling is widely used in network monitoring. Sampled packet streams are often used to determine flow-level statistics of network traffic. To date there is conflicting evidence on the quality of the resulting estimates. In this paper we take a systematic approach, using the Fisher information metric and the Cramér-Rao bound, to understand the contributions that different types of information within sampled packets have on the quality of flow-level estimates. We provide concrete evidence that, without protocol information and with packet sampling rate p = 0.005, any accurate unbiased estimator needs approximately 1016 sampled flows. The required number of sampled flows drops to roughly 104 with the use of TCP sequence numbers. Furthermore, additional SYN flag information significantly reduces the estimation error of short flows. We present a Maximum Likelihood Estimator (MLE) that relies on all of this information and show that it is efficient, even when applied to a small sample set. We validate our results using Tier-1 Internet backbone traces and evaluate the benefits of sampling from multiple monitors. Our results show that combining estimates from several monitors is 50% less accurate than an estimate based on all samples.

international conference on computer communications | 2012

Sampling directed graphs with random walks

Bruno F. Ribeiro; Pinghui Wang; Fabricio Murai; Donald F. Towsley

Despite recent efforts to characterize complex networks such as citation graphs or online social networks (OSNs), little attention has been given to developing tools that can be used to characterize directed graphs in the wild, where no pre-processed data is available. The presence of hidden incoming edges but observable outgoing edges poses a challenge to characterize large directed graphs through crawling, as existing sampling methods cannot cope with hidden incoming links. The driving principle behind our random walk (RW) sampling method is to construct, in real-time, an undirected graph from the directed graph such that the random walk on the directed graph is consistent with one on the undirected graph. We then use the RW on the undirected graph to estimate the outdegree distribution. Our algorithm accurately estimates outdegree distributions of a variety of real world graphs. We also study the hardness of indegree distribution estimation when indegrees are latent (i.e., incoming links are only observed as outgoing edges). We observe that, in the same scenarios, indegree distribution estimates are highly innacurate unless the directed graph is highly symmetrical.

international world wide web conferences | 2014

Modeling and predicting the growth and death of membership-based websites

Bruno F. Ribeiro

Driven by outstanding success stories of Internet startups such as Facebook and The Huffington Post, recent studies have thoroughly described their growth. These highly visible online success stories, however, overshadow an untold number of similar ventures that fail. The study of website popularity is ultimately incomplete without general mechanisms that can describe both successes and failures. In this work we present six years of the daily number of users (DAU) of twenty-two membership-based websites - encompassing online social networks, grassroots movements, online forums, and membership-only Internet stores - well balanced between successes and failures. We then propose a combination of reaction-diffusion-decay processes whose resulting equations seem not only to describe well the observed DAU time series but also provide means to roughly predict their evolution. This model allows an approximate automatic DAU-based classification of websites into self-sustainable v.s. unsustainable and whether the startup growth is mostly driven by marketing & media campaigns or word-of-mouth adoptions.

Scientific Reports | 2013

Quantifying the effect of temporal resolution on time-varying networks

Bruno F. Ribeiro; Nicola Perra; Andrea Baronchelli

Time-varying networks describe a wide array of systems whose constituents and interactions evolve over time. They are defined by an ordered stream of interactions between nodes, yet they are often represented in terms of a sequence of static networks, each aggregating all edges and nodes present in a time interval of size Δt. In this work we quantify the impact of an arbitrary Δt on the description of a dynamical process taking place upon a time-varying network. We focus on the elementary random walk, and put forth a simple mathematical framework that well describes the behavior observed on real datasets. The analytical description of the bias introduced by time integrating techniques represents a step forward in the correct characterization of dynamical processes on time-varying graphs.

ACM Transactions on Knowledge Discovery From Data | 2014

Efficiently Estimating Motif Statistics of Large Networks

Pinghui Wang; John C. S. Lui; Bruno F. Ribeiro; Donald F. Towsley; Junzhou Zhao; Xiaohong Guan

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical networks—often stored in already overloaded relational databases—effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known datasets and show that, for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.

passive and active network measurement | 2005

Exploiting the IPID field to infer network path and end-system characteristics

Weifeng Chen; Yong Huang; Bruno F. Ribeiro; Kyoungwon Suh; Honggang Zhang; Edmundo de Souza e Silva; James F. Kurose; Donald F. Towsley

In both active and passive network Internet measurements, the IP packet has a number of important header fields that have played key roles in past measurement efforts, e.g., IP source/destination address, protocol, TTL, port, and sequence number/acknowledgment. The 16-bit identification field (IPID) has only recently been studied to determine what information it might yield for network measurement and performance characterization purposes. We explore several new uses of the IPID field, including how it can be used to infer: (a) the amount of internal (local) traffic generated by a server; (b) the number of servers in a large-scale, load-balanced server complex and; (c) the difference between one-way delays of two machines to a target computer. We illustrate and validate the use of these techniques through empirical measurement studies.

conference on computer communications workshops | 2010

On MySpace Account Spans and Double Pareto-Like Distribution of Friends

Bruno F. Ribeiro; William Gauvin; Benyuan Liu; Donald F. Towsley

In this work we study the activity span of MySpace accounts and its connection to the distribution of the number of friends. The activity span is the time elapsed since the creation of the account until the users last login time. We observe exponentially distributed activity spans. We also observe that the distribution of the number of friends over accounts with the same activity span is well approximated by a lognormal with a fairly light tail. These two findings shed light into the puzzling (yet unexplained) inflection point (knee) in the distribution of friends in MySpace when plotted in log-log scale. We argue that the inflection point resembles the inflection point of Reeds (Double Pareto) Geometric Brownian Motion with Exponential Stopping Times model. We also present evidence against the Dunbar number hypothesis of online social networks, which argues, without proof, that the inflection point is due to the Dunbar number (a theoretical limit on the number of people that a human brain can sustain active social contact with). While we answer many questions, we leave many others open.

2011 IEEE Network Science Workshop | 2011

Online estimating the k central nodes of a network

Yeon-sup Lim; Daniel Sadoc Menasché; Bruno F. Ribeiro; Donald F. Towsley; Prithwish Basu

A well known way to find the most central nodes in a network consists of coupling random walk sampling (or one of its variants) with a method to identify the most central nodes in the subgraph induced by the samples. Although it is commonly assumed that degree information is collected during the sampling step, in previous works this information has not been used at the identification step [10], [18]. In this paper, we showed that using degree information at the identification step in a very naive way, namely setting the degree as an alias to other centrality metrics, yields promising results.

Explore More