Network Inference from TraceRoute Measurements: Internet Topology `Species'
Fabien Viger, Alain Barrat, Luca Dall'Asta, Cun-Hui Zhang, Eric D. Kolaczyk
Abstract
Internet mapping projects generally consist in sampling the network from a limited set of sources by using traceroute probes. This methodology, akin to the merging of spanning trees from the different sources to a set of destinations, leads necessarily to a partial, incomplete map of the Internet. Accordingly, determination of Internet topology characteristics from such sampled maps is in part a problem of statistical inference. Our contribution begins with the observation that the inference of many of the most basic topological quantities -- including network size and degree characteristics -- from traceroute measurements is in fact a version of the so-called `species problem' in statistics. This observation has important implications, since species problems are often quite challenging. We focus here on the most fundamental example of a traceroute internet species: the number of nodes in a network. Specifically, we characterize the difficulty of estimating this quantity through a set of analytical arguments, we use statistical subsampling principles to derive two proposed estimators, and we illustrate the performance of these estimators on networks with various topological characteristics.