Mark Crovella | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Crovella is active.

Explore More

Publication

Featured researches published by Mark Crovella.

measurement and modeling of computer systems | 1998

Generating representative Web workloads for network and server performance evaluation

Paul Barford; Mark Crovella

One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.

acm special interest group on data communication | 2004

Diagnosing network-wide traffic anomalies

Anukool Lakhina; Mark Crovella; Christophe Diot

Anomalies are unusual and significant changes in a networks traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively by Principal Component Analysis.Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow.We evaluate the methods ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.

Performance Evaluation | 1996

Measuring bottleneck link speed in packet-switched networks

Robert L. Carter; Mark Crovella

Abstract The quality of available network connections can often have a large impact on the performance of distributed applications. For example, document transfer applications such as FTP, Gopher and the World Wide Web suffer increased response times as a result of network congestion. For these applications, the document transfer time is directly related to the available bandwidth of the connection. Available bandwidth depends on two things: 1) the underlying capacity of the path from client to server, which is limited by the bottleneck link; and 2) the amount of other traffic competing for links on the path. If measurements of these quantities were available to the application, the current utilization of connections could be calculated. Network utilization could then be used as a basis for selection from a set of alternative connections or servers, thus providing reduced response time. Such a dynamic server selection scheme would be especially important in a mobile computing environment in which the set of available servers is frequently changing. In order to provide these measurements at the application level, we introduce two tools: bprobe, which provides an estimate of the uncongested bandwidth of a path; and cprobe, which gives an estimate of the current congestion along a path. These two measures may be used in combination to provide the application with an estimate of available bandwidth between server and client thereby enabling application-level congestion avoidance. In this paper we discuss the design and implementation of our probe tools, specifically illustrating the techniques used to achieve accuracy and robustness. We present validation studies for both tools which demonstrate their reliability in the face of actual Internet conditions; and we give results of a survey of available bandwidth to a random set of WWW servers as a sample application of our probe technique. We conclude with descriptions of other applications of our measurement tools, several of which are currently under development.

international conference on parallel and distributed information systems | 1996

Characterizing reference locality in the WWW

Virgílio A. F. Almeida; Azer Bestavros; Mark Crovella; A. G. de Oliveira

The authors propose models for both temporal and spatial locality of reference in streams of requests arriving at Web servers. They show that simple models based on document popularity alone are insufficient for capturing either temporal or spatial locality. Instead, they rely on an equivalent, but numerical, representation of a reference stream: a stack distance trace. They show that temporal locality can be characterized by the marginal distribution of the stack distance trace, and propose models for typical distributions and compare their cache performance to the traces. They also show that spatial locality in a reference stream can be characterized using the notion of self-similarity. Self-similarity describes long-range correlations in the data set, which is a property that previous researchers have found hard to incorporate into synthetic reference strings. They show that stack distance strings appear to be strongly self-similar, and provide measurements of the degree of self-similarity in the traces. Finally, they discuss methods for generating synthetic Web traces that exhibit the properties of temporal and spatial locality measured in the data.

international conference on network protocols | 1996

On the relationship between file sizes, transport protocols, and self-similar network traffic

Kihong Park; Gitae Kim; Mark Crovella

Measurements of LAN and WAN traffic show that network traffic exhibits variability on different scales. We examine a mechanism that gives rise to self-similar network traffic and discuss performance. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. In a realistic client/server network the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. This causal relationship is robust relative to changes in network resources, network topology, the influence of cross-traffic, and the distribution of interarrival times. Properties of the transport layer play an important role in preserving and modulating this relationship. The reliable transmission and flow control mechanisms of TCP serve to maintain the long-range dependency structure induced by heavy-tailed file size distributions. In contrast, if a non-flow-controlled and unreliable (UDP-based) transport protocol is used, the resulting traffic shows little self-similarity: although still bursty at short time scales, it has little long-range dependence. Performance implications of self-similarity are discussed as represented by various performance measures. Increased self-similarity as expected, results in degradation of performance. Queueing delay, in particular is discussed. Throughput-related measures such as packet loss and retransmission rate, however increase only gradually with increasing traffic self-similarity as long as reliable, flow-controlled transport protocol is used.

international world wide web conferences | 1999

Changes in Web Client Access Patterns: Characteristics and Caching Implications

Paul Barford; Azer Bestavros; Adam D. Bradley; Mark Crovella

Understanding the nature of the workloads and system demands created by users of the World Wide Web is crucial to properly designing and provisioning Web services. Previous measurements of Web client workloads have been shown to exhibit a number of characteristic features; however, it is not clear how those features may be changing with time. In this study we compare two measurements of Web client workloads separated in time by three years, both captured from the same computing facility at Boston University. The older dataset, obtained in 1995, is well known in the research literature and has been the basis for a wide variety of studies. The newer dataset was captured in 1998 and is comparable in size to the older dataset. The new dataset has the drawback that the collection of users measured may no longer be representative of general Web users; however, using it has the advantage that many comparisons can be drawn more clearly than would be possible using a new, different source of measurement. Our results fall into two categories. First we compare the statistical and distributional properties of Web requests across the two datasets. This serves to reinforce and deepen our understanding of the characteristic statistical properties of Web client requests. We find that the kinds of distributions that best describe document sizes have not changed between 1995 and 1998, although specific values of the distributional parameters are different. Second, we explore the question of how the observed differences in the properties of Web client requests, particularly the popularity and temporal locality properties, affect the potential for Web file caching in the network. We find that for the computing facility represented by our traces between 1995 and 1998, (1) the benefits of using size‐based caching policies have diminished; and (2) the potential for caching requested files in the network has declined.

measurement and modeling of computer systems | 2004

Structural analysis of network traffic flows

Anukool Lakhina; Konstantina Papagiannaki; Mark Crovella; Christophe Diot; Eric D. Kolaczyk; Nina Taft

Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties.We present the first analysis of complete sets of OD flow time-series, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we find that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately modeled in time using a small number (10 or less) of independent components or dimensions.We also show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. We provide insight into how the various constitutents contribute to the overall structure of OD flows and explore the extent to which this decomposition varies over time.

mobile ad hoc networking and computing | 2008

Delegation forwarding

Vijay Erramilli; Mark Crovella; Augustin Chaintreau; Christophe Diot

Mobile opportunistic networks are characterized by unpredictable mobility, heterogeneity of contact rates and lack of global information. Successful delivery of messages at low costs and delays in such networks is thus challenging. Most forwarding algorithms avoid the cost associated with flooding the network by forwarding only to nodes that are likely to be good relays, using a quality metric associated with nodes. However it is non-trivial to decide whether an encountered node is a good relay at the moment of encounter. Thus the problem is in part one of online inference of the quality distribution of nodes from sequential samples, and has connections to optimal stopping theory. Based on these observations we develop a new strategy for forwarding, which we refer to as delegation forwarding. We analyse two variants of delegation forwarding and show that while naive forwarding to high contact rate nodes has cost linear in the population size, the cost of delegation forwarding is proportional to the square root of population size. We then study delegation forwarding with different metrics using real mobility traces and show that delegation forwarding performs as well as previously proposed algorithms at much lower cost. In particular we show that the delegation scheme based on destination contact rate does particularly well.

international conference on computer communications | 1997

Server selection using dynamic path characterization in wide-area networks

Robert L. Carter; Mark Crovella

Replication is a commonly proposed solution to problems of scale associated with distributed services. However, when a service is replicated, each client must be assigned a server. Prior work has generally assumed the assignment to be static. In contrast, we propose a dynamic server selection, and show that it enables application-level congestion avoidance. Using tools to measure the available bandwidth and round trip latency (RTT), we demonstrate the dynamic server selection and compare it to previous static approaches. We show that because of the variability of paths in the Internet, dynamic server selection consistently outperforms static policies, reducing response times by as much as 50%. However, we also must adopt a systems perspective and consider the impact of the measurement method on the network. Therefore, we look at alternative low-cost approximations and find that the careful measurements provided by our tools can be closely approximated by much lighter-weight measurements. We propose a protocol using this method which is limited to at most a 1% increase in network traffic but which often costs much less in practice.

international conference on computer communications | 2003

Sampling biases in IP topology measurements

Anukool Lakhina; John W. Byers; Mark Crovella; Peng Xie

Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute-like methods have led some to conclude that the router graph of the Internet is well modeled as a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail. We argue that the evidence to date for this conclusion is at best insufficient We show that when graphs are sampled using traceroute-like methods, the resulting degree distribution can differ sharply from that of the underlying graph. For example, given a sparse Erdos-Renyi random graph, the subgraph formed by a collection of shortest paths from a small set of random sources to a larger set of random destinations can exhibit a degree distribution remarkably like a power-law. We explore the reasons for how this effect arises, and show that in such a setting, edges are sampled in a highly biased manner. This insight allows us to formulate tests for determining when sampling bias is present. When we apply these tests to a number of well-known datasets, we find strong evidence for sampling bias.

Explore More