Nina Taft | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nina Taft is active.

Explore More

Publication

Featured researches published by Nina Taft.

acm special interest group on data communication | 2002

Traffic matrix estimation: existing techniques and new directions

Alberto Medina; Nina Taft; Kavé Salamatian; Supratik Bhattacharyya; Christophe Diot

Very few techniques have been proposed for estimating traffic matrices in the context of Internet traffic. Our work on POP-to-POP traffic matrices (TM) makes two contributions. The primary contribution is the outcome of a detailed comparative evaluation of the three existing techniques. We evaluate these methods with respect to the estimation errors yielded, sensitivity to prior information required and sensitivity to the statistical assumptions they make. We study the impact of characteristics such as path length and the amount of link sharing on the estimation errors. Using actual data from a Tier-1 backbone, we assess the validity of the typical assumptions needed by the TM estimation techniques. The secondary contribution of our work is the proposal of a new direction for TM estimation based on using choice models to model POP fanouts. These models allow us to overcome some of the problems of existing methods because they can incorporate additional data and information about POPs and they enable us to make a fundamentally different kind of modeling assumption. We validate this approach by illustrating that our modeling assumption matches actual Internet data well. Using two initial simple models we provide a proof of concept showing that the incorporation of knowledge of POP features (such as total incoming bytes, number of customers, etc.) can reduce estimation errors. Our proposed approach can be used in conjunction with existing or future methods in that it can be used to generate good priors that serve as inputs to statistical inference techniques.

measurement and modeling of computer systems | 2004

Structural analysis of network traffic flows

Anukool Lakhina; Konstantina Papagiannaki; Mark Crovella; Christophe Diot; Eric D. Kolaczyk; Nina Taft

Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties.We present the first analysis of complete sets of OD flow time-series, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we find that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately modeled in time using a small number (10 or less) of independent components or dimensions.We also show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. We provide insight into how the various constitutents contribute to the overall structure of OD flows and explore the extent to which this decomposition varies over time.

measurement and modeling of computer systems | 2005

Traffic matrices: balancing measurements, inference and modeling

Augustin Soule; Anukool Lakhina; Nina Taft; Konstantina Papagiannaki; Kavé Salamatian; Antonio Nucci; Mark Crovella; Christophe Diot

Traffic matrix estimation is well-studied, but in general has been treated simply as a statistical inference problem. In practice, however, network operators seeking traffic matrix information have a range of options available to them. Operators can measure traffic flows directly; they can perform partial flow measurement, and infer missing data using models; or they can perform no flow measurement and infer traffic matrices directly from link counts. The advent of practical flow measurement makes the study of these tradeoffs more important. In particular, an important question is whether judicious modeling, combined with partial flow measurement, can provide traffic matrix estimates that are signficantly better than previous methods at relatively low cost. In this paper we make a number of contributions toward answering this question. First, we provide a taxonomy of the kinds of models that may make use of partial flow measurement, based on the nature of the measurements used and the spatial, temporal, or spatio-temporal correlation exploited. We then evaluate estimation methods which use each kind of model. In the process we propose and evaluate new methods, and extensions to methods previously proposed. We show that, using such methods, small amounts of traffic flow measurements can have significant impacts on the accuracy of traffic matrix estimation, yielding results much better than previous approaches. We also show that different methods differ in their bias and variance properties, suggesting that different methods may be suited to different applications.

acm special interest group on data communication | 2005

The problem of synthetically generating IP traffic matrices: initial recommendations

Antonio Nucci; Ashwin Sridharan; Nina Taft

There exist a wide variety of network design problems that require a traffic matrix as input in order to carry out performance evaluation. The research community has not had at its disposal any information about how to construct realistic traffic matrices. We introduce here the two basic problems that need to be addressed to construct such matrices. The first is that of synthetically generating traffic volume levels that obey spatial and temporal patterns as observed in realistic traffic matrices. The second is that of assigning a set of numbers (representing traffic levels) to particular node pairs in a given topology. This paper provides an in-depth discussion of the many issues that arise when addressing these problems. Our approach to the first problem is to extract statistical characteristics for such traffic from real data collected inside two large IP backbones. We dispel the myth that uniform distributions can be used to randomly generate numbers for populating a traffic matrix. Instead, we show that the lognormal distribution is better for this purpose as it describes well the mean rates of origin-destination flows. We provide estimates for the mean and variance properties of the traffic matrix flows from our datasets. We explain the second problem and discuss the notion of a traffic matrix being well-matched to a topology. We provide two initial solutions to this problem, one using an ILP formulation that incorporates simple and well formed constraints. Our second solution is a heuristic one that incorporates more challenging constraints coming from carrier practices used to design and evolve topologies.

international conference on computer communications | 2003

Long-term forecasting of Internet backbone traffic: observations and initial models

Konstantina Papagiannaki; Nina Taft; Zhi Li Zhang; Christophe Diot

We introduce a methodology to predict when and where link additions/upgrades have to take place in an IP backbone network. Using SNMP statistics, collected continuously since 1999, we compute aggregate demand between any two adjacent PoPs and look at its evolution at time scales larger than one hour. We show that IP backbone traffic exhibits visible long term trends, strong periodicities, and variability at multiple time scales. Our methodology relies on the wavelet multiresolution analysis and linear time series models. Using wavelet multiresolution analysis, we smooth the collected measurements until we identify the overall long-term trend. The fluctuations around the obtained trend are further analyzed at multiple time scales. We show that the largest amount of variability in the original signal is due to its fluctuations at the 12 hour time scale. We model inter-PoP aggregate demand as a multiple linear regression model, consisting of the two identified components. We show that this model accounts for 98% of the total energy in the original signal, while explaining 90% of its variance. Weekly approximations of those components can be accurately modeled with low-order autoregressive integrated moving average (ARIMA) models. We show that forecasting the long term trend and the fluctuations of the traffic at the 12 hour time scale yields accurate estimates for at least six months in the future.

international conference on computer communications | 2003

An approach to alleviate link overload as observed on an IP backbone

Sundar Iyer; Supratik Bhattacharyya; Nina Taft; Christophe Diot

Shortest path routing protocols may suffer from congestion due to the use of a single shortest path between a source and a destination. The goal of our work is to first understand how links become overloaded in an IP backbone, and then to explore if the routing protocol, -either in its existing form, or in some enhanced form could be made to respond immediately to overload and reduce the likelihood of its occurrence. Our method is to use extensive measurements of Sprints backbone network, measuring 138 links between September 2000 and June 2001. We find that since the backbone is designed to be overprovisioned, link overload is rare, and when it occurs, 80% of the time it is caused due to link failures. Furthermore, we find that when a link is overloaded, few (if any) other links in the network are also overloaded. This suggests that deflecting packets to less utilized alternate paths could be an effective method for tackling overload. We analytically derive the condition that a network, which has multiple equal length shortest paths between every pair of nodes (as is common in the highly meshed backbone networks) can provide for loop-free deflection paths if all the link weights are within a ratio 1 + 1/(d- I) of each other; where d is the diameter of the network. Based on our measurements, the nature of the backbone topology and the careful use of link weights, we propose a deflection routing algorithm to tackle link overload where each node makes local decisions. Simulations suggest that this can be a simple and efficient way to overcome link overload, without requiring any changes to the routing protocol.

international test conference | 2003

IGP link weight assignment for transient link failures

Antonio Nucci; Bianca Schroeder; Supratik Bhattacharyya; Nina Taft; Christophe Diot

Intra-domain routing in IP backbone networks relies on link-state protocols such as IS-IS or OSPF. These protocols associate a weight (or cost) with each network link, and compute traffic routes based on these weight. However, proposed methods for selecting link weights largely ignore the issue of failures which arise as part of everyday network operations (maintenance, accidental, etc.). Changing link weights during a short-lived failure is impractical. However such failures are frequent enough to impact network performance. We propose a Tabu-search heuristic for choosing link weights which allow a network to function almost optimally during short link failures. The heuristic takes into account possible link failure scearios when choosing weights, thereby mitigating the effect of such failures. We find that the weights chosen by the heuristic can reduce link overload during transient link failures by as much as 40% at the cost of a small performance degradation in the absence of failures (10%).

ieee symposium on security and privacy | 2013

Privacy-Preserving Ridge Regression on Hundreds of Millions of Records

Valeria Nikolaenko; Udi Weinsberg; Stratis Ioannidis; Marc Joye; Dan Boneh; Nina Taft

Ridge regression is an algorithm that takes as input a large number of data points and finds the best-fit linear curve through these points. The algorithm is a building block for many machine-learning operations. We present a system for privacy-preserving ridge regression. The system outputs the best-fit curve in the clear, but exposes no other information about the input data. Our approach combines both homomorphic encryption and Yao garbled circuits, where each is used in a different part of the algorithm to obtain the best performance. We implement the complete system and experiment with it on real data-sets, and show that it significantly outperforms pure implementations based only on homomorphic encryption or Yao circuits.

ieee international conference computer and communications | 2007

Communication-Efficient Online Detection of Network-Wide Anomalies

Ling Huang; XuanLong Nguyen; Minos N. Garofalakis; Joseph M. Hellerstein; Michael I. Jordan; Anthony D. Joseph; Nina Taft

There has been growing interest in building large-scale distributed monitoring systems for sensor, enterprise, and ISP networks. Recent work has proposed using principal component analysis (PCA) over global traffic matrix statistics to effectively isolate network-wide anomalies. To allow such a PCA-based anomaly detection scheme to scale, we propose a novel approximation scheme that dramatically reduces the burden on the production network. Our scheme avoids the expensive step of centralizing all the data by performing intelligent filtering at the distributed monitors. This filtering reduces monitoring bandwidth overheads, but can result in the anomaly detector making incorrect decisions based on a perturbed view of the global data set. We employ stochastic matrix perturbation theory to bound such errors. Our algorithm selects the filtering parameters at local monitors such that the errors made by the detector are guaranteed to lie below a user-specified upper bound. Our algorithm thus allows network operators to explicitly balance the tradeoff between detection accuracy and the amount of data communicated over the network. In addition, our approach enables real-time detection because we exploit continuous monitoring at the distributed monitors. Experiments with traffic data from Abilene backbone network demonstrate that our methods yield significant communication benefits while simultaneously achieving high detection accuracy.

computer and communications security | 2013

Privacy-preserving matrix factorization

Valeria Nikolaenko; Stratis Ioannidis; Udi Weinsberg; Marc Joye; Nina Taft; Dan Boneh

Recommender systems typically require users to reveal their ratings to a recommender service, which subsequently uses them to provide relevant recommendations. Revealing ratings has been shown to make users susceptible to a broad set of inference attacks, allowing the recommender to learn private user attributes, such as gender, age, etc. In this work, we show that a recommender can profile items without ever learning the ratings users provide, or even which items they have rated. We show this by designing a system that performs matrix factorization, a popular method used in a variety of modern recommendation systems, through a cryptographic technique known as garbled circuits. Our design uses oblivious sorting networks in a novel way to leverage sparsity in the data. This yields an efficient implementation, whose running time is O(Mlog^2M) in the number of ratings M. Crucially, our design is also highly parallelizable, giving a linear speedup with the number of available processors. We further fully implement our system, and demonstrate that even on commodity hardware with 16 cores, our privacy-preserving implementation can factorize a matrix with 10K ratings within a few hours.

Explore More