Aaron Clauset | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aaron Clauset is active.

Explore More

Publication

Featured researches published by Aaron Clauset.

Siam Review | 2009

Power-Law Distributions in Empirical Data

Aaron Clauset; Cosma Rohilla Shalizi; M. E. J. Newman

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

Physical Review E | 2004

Finding community structure in very large networks

Aaron Clauset; M. E. J. Newman; Cristopher Moore

The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O (md log n) where d is the depth of the dendrogram describing the community structure. Many real-world networks are sparse and hierarchical, with m approximately n and d approximately log n, in which case our algorithm runs in essentially linear time, O (n log(2) n). As an example of the application of this algorithm we use it to analyze a network of items for sale on the web site of a large on-line retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2 x 10(6) edges. We show that our algorithm can extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers.

Nature | 2008

Hierarchical structure and the prediction of missing links in networks

Aaron Clauset; Cristopher Moore; M. E. J. Newman

Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science. Recent studies suggest that networks often exhibit hierarchical organization, in which vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. In many cases the groups are found to correspond to known functional units, such as ecological niches in food webs, modules in biochemical networks (protein interaction networks, metabolic networks or genetic regulatory networks) or communities in social networks. Here we present a general technique for inferring hierarchical structure from network data and show that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partly known networks with high accuracy, and for more general network structures than competing techniques. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena.

Physical Review E | 2005

Finding local community structure in networks.

Aaron Clauset

Although the inference of global community structure in networks has recently become a topic of great interest in the physics community, all such algorithms require that the graph be completely known. Here, we define both a measure of local community structure and an algorithm that infers the hierarchy of communities that enclose a given vertex by exploring the graph one vertex at a time. This algorithm runs in time O(k2d) for general graphs when d is the mean degree and k is the number of vertices to be explored. For graphs where exploring a new vertex is time consuming, the running time is linear, O(k). We show that on computer-generated graphs the average behavior of this technique approximates that of algorithms that require global knowledge. As an application, we use this algorithm to extract meaningful local clustering information in the large recommender network of an online retailer.

Physical Review E | 2010

Performance of modularity maximization in practical contexts

Benjamin H. Good; Yves-Alexandre de Montjoye; Aaron Clauset

Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood in practical contexts. Here, we present a broad characterization of its performance in such situations. First, we revisit and clarify the resolution limit phenomenon for modularity maximization. Second, we show that the modularity function Q exhibits extreme degeneracies: it typically admits an exponential number of distinct high-scoring solutions and typically lacks a clear global maximum. Third, we derive the limiting behavior of the maximum modularity Qmax for one model of infinitely modular networks, showing that it depends strongly both on the size of the network and on the number of modules it contains. Finally, using three real-world metabolic networks as examples, we show that the degenerate solutions can fundamentally disagree on many, but not all, partition properties such as the composition of the largest modules and the distribution of module sizes. These results imply that the output of any modularity maximization procedure should be interpreted cautiously in scientific contexts. They also explain why many heuristics are often successful at finding high-scoring partitions in practice and why different heuristics can disagree on the modular structure of the same network. We conclude by discussing avenues for mitigating some of these behaviors, such as combining information from many degenerate solutions or using generative models.

Journal of Conflict Resolution | 2007

On the Frequency of Severe Terrorist Events

Aaron Clauset; Maxwell Young; Kristian Skrede Gleditsch

In the spirit of Lewis Richardson’s original study of the statistics of deadly conflicts, we study the frequency and severity of terrorist attacks worldwide since 1968. We show that these events are uniformly characterized by the phenomenon of “scale invariance,” that is, the frequency scales as an inverse power of the severity, P(x) Αx-α. We find that this property is a robust feature of terrorism, persisting when we control for economic development of the target country, the type of weapon used, and even for short time scales. Further, we show that the center of the distribution oscillates slightly with a period of roughly τ≈ 13 years, that there exist significant temporal correlations in the frequency of severe events, and that current models of event incidence cannot account for these variations or the scale invariance property of global terrorism. Finally, we describe a simple toy model for the generation of these statistics and briefly discuss its implications.

symposium on the theory of computing | 2005

On the bias of traceroute sampling: or, power-law degree distributions in regular graphs

Dimitris Achlioptas; Aaron Clauset; David Kempe; Cristopher Moore

Understanding the structure of the Internet graph is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining its graph structure is a surprisingly difficult task, as edges cannot be explicitly queried. Instead, empirical studies rely on traceroutes to build what are essentially single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the networks edges, and a recent paper by Lakhina et al. found empirically that the resuting sample is intrinsically biased. For instance, the observed degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson.In this paper, we study the bias of traceroute sampling systematically, and, for a very general class of underlying degree distributions, calculate the likely observed distributions explicitly. To do this, we use a continuous-time realization of the process of exposing the BFS tree of a random graph with a given degree distribution, calculate the expected degree distribution of the tree, and show that it is sharply concentrated. As example applications of our machinery, we show how traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.

Science | 2008

The Evolution and Distribution of Species Body Size

Aaron Clauset; Douglas H. Erwin

The distribution of species body size within taxonomic groups exhibits a heavy right tail extending over many orders of magnitude, where most species are much larger than the smallest species. We provide a simple model of cladogenetic diffusion over evolutionary time that omits explicit mechanisms for interspecific competition and other microevolutionary processes, yet fully explains the shape of this distribution. We estimate the models parameters from fossil data and find that it robustly reproduces the distribution of 4002 mammal species from the late Quaternary. The observed fit suggests that the asymmetric distribution arises from a fundamental trade-off between the short-term selective advantages (Copes rule) and long-term selective risks of increased species body size in the presence of a taxon-specific lower limit on body size.

Physical Review E | 2006

Scale invariance in road networks

Vamsi Kalapala; Vishal Sanwalani; Aaron Clauset; Cristopher Moore

We study the topological and geographic structure of the national road networks of the United States, England, and Denmark. By transforming these networks into their dual representation, where roads are vertices and an edge connects two vertices if the corresponding roads ever intersect, we show that they exhibit both topological and geographic scale invariance. That is, we show that for sufficiently large geographic areas, the dual degree distribution follows a power law with exponent 2.2< or = alpha < or =2.4, and that journeys, regardless of their length, have a largely identical structure. To explain these properties, we introduce and analyze a simple fractal model of road placement that reproduces the observed structure, and suggests a testable connection between the scaling exponent and the fractal dimensions governing the placement of roads and intersections.

Science Advances | 2015

Systematic inequality and hierarchy in faculty hiring networks

Aaron Clauset; Samuel Arbesman; Daniel B. Larremore

An analysis of networks of graduate-to-faculty hires reveals systematic hiring biases and patterns. The faculty job market plays a fundamental role in shaping research priorities, educational outcomes, and career trajectories among scientists and institutions. However, a quantitative understanding of faculty hiring as a system is lacking. Using a simple technique to extract the institutional prestige ranking that best explains an observed faculty hiring network—who hires whose graduates as faculty—we present and analyze comprehensive placement data on nearly 19,000 regular faculty in three disparate disciplines. Across disciplines, we find that faculty hiring follows a common and steeply hierarchical structure that reflects profound social inequality. Furthermore, doctoral prestige alone better predicts ultimate placement than a U.S. News & World Report rank, women generally place worse than men, and increased institutional prestige leads to increased faculty production, better faculty placement, and a more influential position within the discipline. These results advance our ability to quantify the influence of prestige in academia and shed new light on the academic system.

Explore More