Johan Ugander
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Johan Ugander.
Proceedings of the National Academy of Sciences of the United States of America | 2012
Johan Ugander; Lars Backstrom; Cameron Marlow; Jon M. Kleinberg
The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads, political opinions, the adoption of new technologies, and financial decisions. Traditional models of social contagion have been based on physical analogies with biological contagion, in which the probability that an individual is affected by the contagion grows monotonically with the size of his or her “contact neighborhood”—the number of affected individuals with whom he or she is in contact. Whereas this contact neighborhood hypothesis has formed the underpinning of essentially all current models, it has been challenging to evaluate it due to the difficulty in obtaining detailed data on individual network neighborhoods during the course of a large-scale contagion process. Here we study this question by analyzing the growth of Facebook, a rare example of a social process with genuinely global adoption. We find that the probability of contagion is tightly controlled by the number of connected components in an individuals contact neighborhood, rather than by the actual size of the neighborhood. Surprisingly, once this “structural diversity” is controlled for, the size of the contact neighborhood is in fact generally a negative predictor of contagion. More broadly, our analysis shows how data at the size and resolution of the Facebook network make possible the identification of subtle structural signals that go undetected at smaller scales yet hold pivotal predictive roles for the outcomes of social processes.
knowledge discovery and data mining | 2013
Joel Nishimura; Johan Ugander
Partitioning large graphs is difficult, especially when performed in the limited models of computation afforded to modern large scale computing systems. In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to streaming partitioning algorithms yet empirically perform as well as fully offline algorithms. In streaming partitioning, graphs are partitioned serially in a single pass. Restreaming partitioning is motivated by scenarios where approximately the same dataset is routinely streamed, making it possible to transform streaming partitioning algorithms into an iterative procedure. This combination of simplicity and powerful performance allows restreaming algorithms to be easily adapted to efficiently tackle more challenging partitioning objectives. In particular, we consider the problem of stratified graph partitioning, where each of many node attribute strata are balanced simultaneously. As such, stratified partitioning is well suited for the study of network effects on social networks, where it is desirable to isolate disjoint dense subgraphs with representative user demographics. To demonstrate, we partition a large social network such that each partition exhibits the same degree distribution in the original graph --- a novel achievement for non-regular graphs. As part of our results, we also observe a fundamental difference in the ease with which social graphs are partitioned when compared to web graphs. Namely, the modular structure of web graphs appears to motivate full offline optimization, whereas the locally dense structure of social graphs precludes significant gains from global manipulations.
arXiv: Methodology | 2017
Dean Eckles; Brian Karrer; Johan Ugander
Abstract Estimating the effects of interventions in networks is complicated due to interference, such that the outcomes for one experimental unit may depend on the treatment assignments of other units. Familiar statistical formalism, experimental designs, and analysis methods assume the absence of this interference, and result in biased estimates of causal effects when it exists. While some assumptions can lead to unbiased estimates, these assumptions are generally unrealistic in the context of a network and often amount to assuming away the interference. In this work, we evaluate methods for designing and analyzing randomized experiments under minimal, realistic assumptions compatible with broad interference, where the aim is to reduce bias and possibly overall error in estimates of average effects of a global treatment. In design, we consider the ability to perform random assignment to treatments that is correlated in the network, such as through graph cluster randomization. In analysis, we consider incorporating information about the treatment assignment of network neighbors. We prove sufficient conditions for bias reduction through both design and analysis in the presence of potentially global interference; these conditions also give lower bounds on treatment effects. Through simulations of the entire process of experimentation in networks, we measure the performance of these methods under varied network structure and varied social behaviors, finding substantial bias reductions and, despite a bias–variance tradeoff, error reductions. These improvements are largest for networks with more clustering and data generating processes with both stronger direct effects of the treatment and stronger interactions between units.
Siam Review | 2018
Bailey K. Fosdick; Daniel B. Larremore; Joel Nishimura; Johan Ugander
Random graph null models have found widespread application in diverse research communities analyzing network datasets, including social, information, and economic networks, as well as food webs, protein-protein interactions, and neuronal networks. The most popular family of random graph null models, called configuration models, are defined as uniform distributions over a space of graphs with a fixed degree sequence. Commonly, properties of an empirical network are compared to properties of an ensemble of graphs from a configuration model in order to quantify whether empirical network properties are meaningful or whether they are instead a common consequence of the particular degree sequence. In this work we study the subtle but important decisions underlying the specification of a configuration model, and investigate the role these choices play in graph sampling procedures and a suite of applications. We place particular emphasis on the importance of specifying the appropriate graph labeling (stub-labeled or vertex-labeled) under which to consider a null model, a choice that closely connects the study of random graphs to the study of random contingency tables. We show that the choice of graph labeling is inconsequential for studies of simple graphs, but can have a significant impact on analyses of multigraphs or graphs with self-loops. The importance of these choices is demonstrated through a series of three vignettes, analyzing network datasets under many different configuration models and observing substantial differences in study conclusions under different models. We argue that in each case, only one of the possible configuration models is appropriate. While our work focuses on undirected static networks, it aims to guide the study of directed networks, dynamic networks, and all other network contexts that are suitably studied through the lens of random graph null models.
Proceedings of the National Academy of Sciences of the United States of America | 2017
Isabel M. Kloumann; Johan Ugander; Jon M. Kleinberg
Significance Methods based on PageRank have been fundamental to work on identifying communities in networks, but, to date, there has been little formal basis for the effectiveness of these methods. We establish a surprising connection between the personalized PageRank algorithm and the stochastic block model for random graphs, showing that personalized PageRank, in fact, provides the optimal geometric discriminant function for separating the communities in stochastic block models over a wide class of functions. Building on this result, we develop stronger classifiers that, although scalable, are competitive with computationally much more demanding methods such as belief propagation. Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods through the “seed set expansion problem”: given a subset S of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of “landing probabilities” of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work, we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model and find, surprisingly, that under reasonable assumptions the gradient is asymptotically equivalent to personalized PageRank for a specific choice of the PageRank parameter α that depends on the block model parameters. This connection provides a formal motivation for the success of personalized PageRank in seed set expansion and node ranking generally. We use this connection to propose more advanced techniques incorporating higher moments of landing probabilities; our advanced methods exhibit greatly improved performance, despite being simple linear classification rules, and are even competitive with belief propagation.
web science | 2015
Abigail Z. Jacobs; Samuel F. Way; Johan Ugander; Aaron Clauset
Online social networks represent a popular and diverse class of social media systems. Despite this variety, each of these systems undergoes a general process of online social network assembly, which represents the complicated and heterogeneous changes that transform newly born systems into mature platforms. However, little is known about this process. For example, how much of a networks assembly is driven by simple growth? How does a networks structure change as it matures? How does network structure vary with adoption rates and user heterogeneity, and do these properties play different roles at different points in the assembly? We investigate these and other questions using a unique dataset of online connections among the roughly one million users at the first 100 colleges admitted to Facebook, captured just 20 months after its launch. We first show that different vintages and adoption rates across this population of networks reveal temporal dynamics of the assembly process, and that assembly is only loosely related to network growth. We then exploit natural experiments embedded in this dataset and complementary data obtained via Internet archaeology to show that different subnetworks matured at different rates toward similar end states. These results shed light on the processes and patterns of online social network assembly, and may facilitate more effective design for online social systems.
american control conference | 2007
Johan Ugander; Mary J. Dunlop; Richard M. Murray
The control of synthetic genetic regulatory networks is an emerging engineering challenge. In this study, we propose a new synthetic genetic network that behaves as a digital clock, producing square waveform oscillations. We analyze two models of the network: a deterministic model based on Michaelis-Menten kinetics, as well as a stochastic model based on the Gillespie algorithm. Both models predict regions of oscillatory behavior; the deterministic model provides insight into the conditions required to produce the oscillating clock-like behavior, while the stochastic model is truer to natural dynamics. Intracellular stochasticity is seen to contribute phase noise to the oscillator, and we propose improvements for the network and discuss the conceptual foundations of these improvements.
economics and computation | 2017
Jon M. Kleinberg; Sendhil Mullainathan; Johan Ugander
A broad range of on-line behaviors are mediated by interfaces in which people make choices among sets of options. A rich and growing line of work in the behavioral sciences indicate that human choices follow not only from the utility of alternatives, but also from the choice set in which alternatives are presented. In this work we study comparison-based choice functions, a simple but surprisingly rich class of functions capable of exhibiting so-called choice-set effects. Motivated by the challenge of predicting complex choices, we study the query complexity of these functions in a variety of settings. We consider settings that allow for active queries or passive observation of a stream of queries, and give analyses both at the granularity of individuals or populations that might exhibit heterogeneous choice behavior. Our main result is that any comparison-based choice function in one dimension can be inferred as efficiently as a basic maximum or minimum choice function across many query contexts, suggesting that choice-set effects need not entail any fundamental algorithmic barriers to inference. We also introduce a class of choice functions we call distance-comparison-based functions, and briefly discuss the analysis of such functions. The framework we outline provides intriguing connections between human choice behavior and a range of questions in the theory of sorting.
Proceedings of the 12th workshop on the Economics of Networks, Systems and Computation | 2017
Alexander Peysakhovich; Johan Ugander
A large body of existing work in social science as well as computer science attempts to infer preferences of individuals from the actions they take. This includes research areas such as industrial organization [4], marketing [1], political science [12], analysis of auctions [3], recommender systems [8], search engine ranking [9], and many others. The workhorse model used either implicitly or explicitly in these disparate literatures is the rational choice model.
arXiv: Social and Information Networks | 2011
Johan Ugander; Brian Karrer; Lars Backstrom; Cameron Marlow