Augustin Soule
Pierre-and-Marie-Curie University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Augustin Soule.
acm special interest group on data communication | 2006
Laurent Bernaille; Renata Teixeira; Ismael Akodkenou; Augustin Soule; Kavé Salamatian
The early detection of applications associated with TCP flows is an essential step for network security and traffic engineering. The classic way to identify flows, i.e. looking at port numbers, is not effective anymore. On the other hand, state-of-the-art techniques cannot determine the application before the end of the TCP flow. In this editorial, we propose a technique that relies on the observation of the first five packets of a TCP connection to identify the application. This result opens a range of new possibilities for online traffic classification.
measurement and modeling of computer systems | 2007
Haakon Ringberg; Augustin Soule; Jennifer Rexford; Christophe Diot
Detecting anomalous traffic is a crucial part of managing IP networks. In recent years, network-wide anomaly detection based on Principal Component Analysis (PCA) has emerged as a powerful method for detecting a wide variety of anomalies. We show that tuning PCA to operate effectively in practice is difficult and requires more robust techniques than have been presented thus far. We analyze a week of network-wide traffic measurements from two IP backbones (Abilene and Geant) across three different traffic aggregations (ingress routers, OD flows, and input links), and conduct a detailed inspection of the feature time series for each suspected anomaly. Our study identifies and evaluates four main challenges of using PCA to detect traffic anomalies: (i) the false positive rate is very sensitive to small differences in the number of principal components in the normal subspace, (ii) the effectiveness of PCA is sensitive to the level of aggregation of the traffic measurements, (iii) a large anomaly may in advertently pollute the normal subspace, (iv) correctly identifying which flow triggered the anomaly detector is an inherently challenging problem.
measurement and modeling of computer systems | 2005
Augustin Soule; Anukool Lakhina; Nina Taft; Konstantina Papagiannaki; Kavé Salamatian; Antonio Nucci; Mark Crovella; Christophe Diot
Traffic matrix estimation is well-studied, but in general has been treated simply as a statistical inference problem. In practice, however, network operators seeking traffic matrix information have a range of options available to them. Operators can measure traffic flows directly; they can perform partial flow measurement, and infer missing data using models; or they can perform no flow measurement and infer traffic matrices directly from link counts. The advent of practical flow measurement makes the study of these tradeoffs more important. In particular, an important question is whether judicious modeling, combined with partial flow measurement, can provide traffic matrix estimates that are signficantly better than previous methods at relatively low cost. In this paper we make a number of contributions toward answering this question. First, we provide a taxonomy of the kinds of models that may make use of partial flow measurement, based on the nature of the measurements used and the spatial, temporal, or spatio-temporal correlation exploited. We then evaluate estimation methods which use each kind of model. In the process we propose and evaluate new methods, and extensions to methods previously proposed. We show that, using such methods, small amounts of traffic flow measurements can have significant impacts on the accuracy of traffic matrix estimation, yielding results much better than previous approaches. We also show that different methods differ in their bias and variance properties, suggesting that different methods may be suited to different applications.
measurement and modeling of computer systems | 2004
Augustin Soule; Antonio Nucci; Rene L. Cruz; Emilio Leonardi; Nina Taft
In this paper we investigate a new idea for traffic matrix estimation that makes the basic problem less under-constrained, by deliberately changing the routing to obtain additional measurements. Because all these measurements are collected over disparate time intervals, we need to establish models for each Origin-Destination (OD) pair to capture the complex behaviours of internet traffic. We model each OD pair with two components: the diurnal pattern and the fluctuation process. We provide models that incorporate the two components above, to estimate both the first and second order moments of traffic matrices. We do this for both stationary and cyclo-stationary traffic scenarios. We formalize the problem of estimating the second order moment in a way that is completely independent from the first order moment. Moreover, we can estimate the second order moment without needing any routing changes (i.e., without explicit changes to IGP link weights). We prove for the first time, that such a result holds for any realistic topology under the assumption of minimum cost routing and strictly positive link weights. We highlight how the second order moment helps the identification of the top largest OD flows carrying the most significant fraction of network traffic. We then propose a refined methodology consisting of using our variance estimator (without routing changes) to identify the top largest flows, and estimate only these flows. The benefit of this method is that it dramatically reduces the number of routing changes needed. We validate the effectiveness of our methodology and the intuitions behind it by using real aggregated sampled netflow data collected from a commercial Tier-1 backbone.
measurement and modeling of computer systems | 2004
Augustin Soule; Kavé Salamatia; Nina Taft; Richard Emilion; Konstantina Papagiannaki
In order to control and manage highly aggregated Internet traffic flows efficiently, we need to be able to categorize flows into distinct classes and to be knowledgeable about the different behavior of flows belonging to these classes. In this paper we consider the problem of classifying BGP level prefix flows into a small set of homogeneous classes. We argue that using the entire distributional properties of flows can have significant benefits in terms of quality in the derived classification. We propose a method based on modeling flow histograms using Dirichlet Mixture Processes for random distributions. We present an inference procedure based on the Simulated Annealing Expectation Maximization algorithm that estimates all the model parameters as well as flow membership probabilities - the probability that a flow belongs to any given class. One of our key contributions is a new method for Internet flow classification. We show that our method is powerful in that it is capable of examining macroscopic flows while simultaneously making fine distinctions between different traffic classes. We demonstrate that our scheme can address issues with flows being close to class boundaries and the inherent dynamic behaviour of Internet flows.
measurement and modeling of computer systems | 2005
Augustin Soule; Kavé Salamatian; Antonio Nucci; Nina Taft
In this work we develop a new approach to monitoring origin-destination flows in a large network. We start by building a state space model for OD flows that is rich enough to fully capture temporal and spatial correlations. We apply a Kalman filter to our linear dynamic system that can be used for both estimation and prediction of traffic matrices. We call our system a traffic matrix tracker due to its lightweight mechanism for temporal updates that enables tracking traffic matrix dynamics at small time scales. Our Kalman filter approach allows us to go beyond traffic matrix estimation in that our single system can also carry out traffic prediction and yield confidence bounds on the estimates, the predictions and the residual error processes. We show that these elements provide key functionalities needed by monitoring systems of the future for carrying out anomaly detection. Using real data collected from a Tier-1 ISP, we validate our model, illustrate that it can achieve low errors, and that our method is adaptive on both short and long timescales.
IEEE ACM Transactions on Networking | 2007
Augustin Soule; Antonio Nucci; Rene L. Cruz; Emilio Leonardi; Nina Taft
In this paper we propose a new approach for dealing with the ill-posed nature of traffic matrix estimation. We present three solution enhancers: an algorithm for deliberately changing link weights to obtain additional information that can make the underlying linear system full rank; a cyclo-stationary model to capture both long-term and short-term traffic variability, and a method for estimating the variance of origin-destination (OD) flows. We show how these three elements can be combined into a comprehensive traffic matrix estimation procedure that dramatically reduces the errors compared to existing methods. We demonstrate that our variance estimates can be used to identify the elephant OD flows, and we thus propose a variant of our algorithm that addresses the problem of estimating only the heavy flows in a traffic matrix. One of our key findings is that by focusing only on heavy flows, we can simplify the measurement and estimation procedure so as to render it more practical. Although there is a tradeoff between practicality and accuracy, we find that increasing the rank is so helpful that we can nevertheless keep the average errors consistently below the 10% carrier target error rate. We validate the effectiveness of our methodology and the intuition behind it using commercial traffic matrix data from Sprints Tier-1 backbone.
passive and active network measurement | 2007
Augustin Soule; Haakon Ringberg; Fernando Silveira; Jennifer Rexford; Christophe Diot
Anomaly detection remains a poorly understood area where visual inspection and manual analysis play a significant role in the effectiveness of the detection technique. We observe traffic anomalies in two adjacent networks, namely GEANT and Abilene, in order to determine what parameters impact the detectability and the characteristics of anomalies. We correlate three weeks of traffic and routing data from both networks and apply Kalman filtering to detect anomalies that transit between the two networks. We show that differences in the monitoring infrastructure, network engineering practices, and anomaly-detection parameters have a large impact on which anomaly detectability. Through a case study of three specific anomalies, we illustrate the influence of the traffic mix, IP address anonymization, detection methodology, and packet sampling on the detectability of traffic anomalies.
acm special interest group on data communication | 2008
Haakon Ringberg; Augustin Soule; Jennifer Rexford
Despite the flurry of anomaly-detection papers in recent years, effective ways to validate and compare proposed solutions have remained elusive. We argue that evaluating anomaly detectors on manually labeled traces is both important and unavoidable. In particular, it is important to evaluate detectors on traces from operational networks because it is in this setting that the detectors must ultimately succeed. In addition, manual labeling of such traces is unavoidable because new anomalies will be identified and characterized from manual inspection long before there are realistic models for them. It is well known, however, that manual labeling is slow and error-prone. In order to mitigate these challenges, we present WebClass, a web-based infrastructure that adds rigor to the manual labeling process. WebClass allows researchers to share, inspect, and label traffic time-series through a common graphical user interface. We are releasing WebClass to the research community in the hope that it will foster greater collaboration in creating labeled traces and that the traces will be of higher quality because the entire community has access to all the information that led to a given label
internet measurement conference | 2007
Augustin Soule; Fernando Silveira; Haakon Ringberg; Christophe Diot
Multiple network-wide anomaly detection techniques proposed in the literature define an anomaly as a statistical outlier in aggregated network traffic. The most popular way to aggregate the traffic is as a Traffic Matrix, where the traffic is divided according to its ingress and egress points in the network. However, the reasons for choosing traffic matrices instead of any other formalism have not been studied yet. In this paper we compare three network-driven traffic aggregation formalisms: ingress routers, input links and origin-destination pairs (i.e. traffic matrices). Each formalism is computed on data collected from two research backbones. Then, a network-wide anomaly detection method is applied to each formalism. All anomalies are manually labeled, as a true or false positive. Our results show that the traffic aggregation level has asignificant impact on the number of anomalies detected and on the false positive rate. We show that aggregating by OD pairs is indeed the most appropriate choice for the data sets and the detection method we consider. We correlate our observations with time series statistics in order to explain how aggregation impacts anomaly detection.