Anukool Lakhina | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anukool Lakhina is active.

Explore More

Publication

Featured researches published by Anukool Lakhina.

modeling analysis and simulation on computer and telecommunication systems | 2001

BRITE: an approach to universal topology generation

Alberto Medina; Anukool Lakhina; Ibrahim Matta; John W. Byers

Effective engineering of the Internet is predicated upon a detailed understanding of issues such as the large-scale structure of its underlying physical topology, the manner in which it evolves over time, and the way in which its constituent components contribute to its overall function. Unfortunately, developing a deep understanding of these issues has proven to be a challenging task, since it in turn involves solving difficult problems such as mapping the actual topology, characterizing it, and developing models that capture its emergent behavior. Consequently, even though there are a number of topology models, it is an open question as to how representative the generated topologies they generate are of the actual Internet. Our goal is to produce a topology generation framework which improves the state of the art and is based on the design principles of representativeness, inclusiveness, and interoperability. Representativeness leads to synthetic topologies that accurately reflect many aspects of the actual Internet topology (e.g. hierarchical structure, node degree distribution, etc.). Inclusiveness combines the strengths of as many generation models as possible in a single generation tool. Interoperability provides interfaces to widely-used simulation applications such as ns and SSF and visualization tools like otter. We call such a tool a universal topology generator.

acm special interest group on data communication | 2004

Diagnosing network-wide traffic anomalies

Anukool Lakhina; Mark Crovella; Christophe Diot

Anomalies are unusual and significant changes in a networks traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively by Principal Component Analysis.Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow.We evaluate the methods ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.

measurement and modeling of computer systems | 2004

Structural analysis of network traffic flows

Anukool Lakhina; Konstantina Papagiannaki; Mark Crovella; Christophe Diot; Eric D. Kolaczyk; Nina Taft

Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties.We present the first analysis of complete sets of OD flow time-series, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we find that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately modeled in time using a small number (10 or less) of independent components or dimensions.We also show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. We provide insight into how the various constitutents contribute to the overall structure of OD flows and explore the extent to which this decomposition varies over time.

international conference on computer communications | 2003

Sampling biases in IP topology measurements

Anukool Lakhina; John W. Byers; Mark Crovella; Peng Xie

Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute-like methods have led some to conclude that the router graph of the Internet is well modeled as a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail. We argue that the evidence to date for this conclusion is at best insufficient We show that when graphs are sampled using traceroute-like methods, the resulting degree distribution can differ sharply from that of the underlying graph. For example, given a sparse Erdos-Renyi random graph, the subgraph formed by a collection of shortest paths from a small set of random sources to a larger set of random destinations can exhibit a degree distribution remarkably like a power-law. We explore the reasons for how this effect arises, and show that in such a setting, edges are sampled in a highly biased manner. This insight allows us to formulate tests for determining when sampling bias is present. When we apply these tests to a number of well-known datasets, we find strong evidence for sampling bias.

internet measurement conference | 2006

Impact of packet sampling on anomaly detection metrics

Daniela Brauckhoff; Bernhard Tellenbach; Arno Wagner; Martin May; Anukool Lakhina

Packet sampling methods such as Ciscos NetFlow are widely employed by large networks to reduce the amount of traffic data measured. A key problem with packet sampling is that it is inherently a lossy process, discarding (potentially useful) information. In this paper, we empirically evaluate the impact of sampling on anomaly detection metrics. Starting with unsampled flow records collected during the Blaster worm outbreak, we reconstruct the underlying packet trace and simulate packet sampling at increasing rates. We then use our knowledge of the Blaster anomaly to build a baseline of normal traffic (without Blaster), against which we can measure the anomaly size at various sampling rates. This approach allows us to evaluate the impact of packet sampling on anomaly detection without being restricted to (or biased by) a particular anomaly detection method.We find that packet sampling does not disturb the anomaly size when measured in volume metrics such as the number of bytes and number of packets, but grossly biases the number of flows. However, we find that recently proposed entropy-based summarizations of packet and flow counts are affected less by sampling, and expose the Blaster worm outbreak even at higher sampling rates. Our findings suggest that entropy summarizations are more resilient to sampling than volume metrics. Thus, while not perfect, sampling still preserves sufficient distributional structure, which when harnessed by tools like entropy, can expose hard-to-detect scanning anomalies.

internet measurement conference | 2006

Detection and identification of network anomalies using sketch subspaces

Xin Li; Fang Bian; Mark Crovella; Christophe Diot; Ramesh Govindan; Gianluca Iannaccone; Anukool Lakhina

Network anomaly detection using dimensionality reduction techniques has received much recent attention in the literature. For example, previous work has aggregated netflow records into origin-destination (OD) flows, yielding a much smaller set of dimensions which can then be mined to uncover anomalies. However, this approach can only identify which OD flow is anomalous, not the particular IP flow(s) responsible for the anomaly. In this paper we show how one can use random aggregations of IP flows (i.e., sketches) to enable more precise identification of the underlying causes of anomalies. We show how to combine traffic sketches with a subspace method to (1) detect anomalies with high accuracy and (2) identify the IP flows(s) that are responsible for the anomaly. Our method has detection rates comparable to previous methods and detects many more anomalies than prior work, taking us a step closer towards a robust on-line system for anomaly detection and identification.

measurement and modeling of computer systems | 2005

Traffic matrices: balancing measurements, inference and modeling

Augustin Soule; Anukool Lakhina; Nina Taft; Konstantina Papagiannaki; Kavé Salamatian; Antonio Nucci; Mark Crovella; Christophe Diot

Traffic matrix estimation is well-studied, but in general has been treated simply as a statistical inference problem. In practice, however, network operators seeking traffic matrix information have a range of options available to them. Operators can measure traffic flows directly; they can perform partial flow measurement, and infer missing data using models; or they can perform no flow measurement and infer traffic matrices directly from link counts. The advent of practical flow measurement makes the study of these tradeoffs more important. In particular, an important question is whether judicious modeling, combined with partial flow measurement, can provide traffic matrix estimates that are signficantly better than previous methods at relatively low cost. In this paper we make a number of contributions toward answering this question. First, we provide a taxonomy of the kinds of models that may make use of partial flow measurement, based on the nature of the measurements used and the spatial, temporal, or spatio-temporal correlation exploited. We then evaluate estimation methods which use each kind of model. In the process we propose and evaluate new methods, and extensions to methods previously proposed. We show that, using such methods, small amounts of traffic flow measurements can have significant impacts on the accuracy of traffic matrix estimation, yielding results much better than previous approaches. We also show that different methods differ in their bias and variance properties, suggesting that different methods may be suited to different applications.

IEEE Journal on Selected Areas in Communications | 2003

On the geographic location of Internet resources

Anukool Lakhina; John W. Byers; Mark Crovella; Ibrahim Matta

One relatively unexplored question about the Internets physical structure concerns the geographical location of its components: routers, links, and autonomous systems (ASes). We study this question using two large inventories of Internet routers and links, collected by different methods and about two years apart. We first map each router to its geographical location using two different state-of-the-art tools. We then study the relationship between router location and population density; between geographic distance and link density; and between the size and geographic extent of ASes. Our findings are consistent across the two datasets and both mapping methods. First, as expected, router density per person varies widely over different economic regions; however, in economically homogeneous regions, router density shows a strong superlinear relationship to population density. Second, the probability that two routers are directly connected is strongly dependent on distance; our data is consistent with a model in which a majority (up to 75%-95%) of link formation is based on geographical distance (as in the Waxman (1988) topology generation method). Finally, we find that ASes show high variability in geographic size, which is correlated with other measures of AS size (degree and number of interfaces). Among small to medium ASes, ASes show wide variability in their geographic dispersal; however, all ASes exceeding a certain threshold in size are maximally dispersed geographically. These findings have many implications for the next generation of topology generators, which we envisage as producing router-level graphs annotated with attributes such as link latencies, AS identifiers, and geographical locations.

ieee international conference computer and communications | 2007

Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares

Tarem Ahmed; Mark Coates; Anukool Lakhina

High-speed backbones are regularly affected by various kinds of network anomalies, ranging from malicious attacks to harmless large data transfers. Different types of anomalies affect the network in different ways, and it is difficult to know a priori how a potential anomaly will exhibit itself in traffic statistics. In this paper we describe an online, sequential, anomaly detection algorithm, that is suitable for use with multivariate data. The proposed algorithm is based on the kernel version of the recursive least squares algorithm. It assumes no model for network traffic or anomalies, and constructs and adapts a dictionary of features that approximately spans the subspace of normal behaviour. The algorithm raises an alarm immediately upon encountering a deviation from the norm. Through comparison with existing block-based offline methods based upon Principal Component Analysis, we demonstrate that our online algorithm is equally effective but has much faster time-to-detection and lower computational complexity. We also explore minimum volume set approaches in identifying the region of normality.

acm special interest group on data communication | 2002

On the geographic location of internet resources

Anukool Lakhina; John W. Byers; Mark Crovella; Ibrahim Matta

One relatively unexplored question about the Internet’s physical structure concerns the geographical location of its components: routers, links and autonomous systems (ASes). We study this question using two large inventories of Internet routers and links, collected by different methods and about two years apart. We first map each router to its geographical location using two different stateof-the-art tools. We then study the relationship between router location and population density; between geographic distance and link density; and between the size and geographic extent of ASes. Our findings are consistent across the two datasets and both mapping methods. First, as expected, router density per person varies widely over different economic regions; however, in economically homogeneous regions, router density shows a strong superlinear relationship to population density. Second, the probability that two routers are directly connected is strongly dependent on distance; our data is consistent with a model in which a majority (up to 7595%) of link formation is based on geographical distance (as in the Waxman topology generation method). Finally, we find that ASes show high variability in geographic size, which is correlated with other measures of AS size (degree and number of interfaces). Among small to medium ASes, ASes show wide variability in their geographic dispersal; however, all ASes exceeding a certain threshold in size are maximally dispersed geographically. These findings have many implications for the next generation of topology generators, which we envisage as producing router-level graphs annotated with attributes such as link latencies, AS identifiers and geographical locations.

Explore More