Anirban Mahanti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anirban Mahanti is active.

Explore More

Publication

Featured researches published by Anirban Mahanti.

internet measurement conference | 2007

Youtube traffic characterization: a view from the edge

Phillipa Gill; Martin F. Arlitt; Zongpeng Li; Anirban Mahanti

This paper presents a traffic characterization study of the popular video sharing service, YouTube. Over a three month period we observed almost 25 million transactions between users on an edge network and YouTube, including more than 600,000 video downloads. We also monitored the globally popular videos over this period of time. In the paper we examine usage patterns, file properties, popularity and referencing characteristics, and transfer behaviors of YouTube, and compare them to traditional Web and media streaming workload characteristics. We conclude the paper with a discussion of the implications of the observed characteristics. For example, we find that as with the traditional Web, caching could improve the end user experience, reduce network bandwidth consumption, and reduce the load on YouTubes core server infrastructure. Unlike traditional Web caching, Web 2.0 provides additional meta-data that should be exploited to improve the effectiveness of strategies like caching.

acm special interest group on data communication | 2006

Traffic classification using clustering algorithms

Jeffrey Erman; Martin F. Arlitt; Anirban Mahanti

Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

IEEE Network | 2000

Traffic analysis of a Web proxy caching hierarchy

Anirban Mahanti; Carey L. Williamson; Derek L. Eager

Understanding Web traffic characteristics is key to improving the performance and scalability of the Web. In this article Web proxy workloads from different levels of a caching hierarchy are used to understand how the workload characteristics change across different levels of a caching hierarchy. The main observations of this study are that HTML and image documents account for 95 percent of the documents seen in the workload; the distribution of transfer sizes of documents is heavy-tailed, with the tails becoming heavier as one moves up the caching hierarchy; the popularity profile of documents does not precisely follow the Zipf distribution; one-timers account for approximately 70 percent of the documents referenced; concentration of references is less at proxy caches than at servers, and concentration of references diminishes as one moves up the caching hierarchy; and the modification rate is higher at higher-level proxies.

IEEE ACM Transactions on Networking | 2003

Scalable on-demand media streaming with packet loss recovery

Anirban Mahanti; Derek L. Eager; Mary K. Vernon; David Sundaram-Stukel

Previous scalable on-demand streaming protocols do not allow clients to recover from packet loss. This paper develops new protocols that: (1) have a tunably short latency for the client to begin playing the media; (2) allow heterogeneous clients to recover lost packets without jitter as long as each clients cumulative loss rate is within a tunable threshold; and (3) assume a tunable upper bound on the transmission rate to each client that can be as small as a fraction (e.g., 25%) greater than the media play rate. Models are developed to compute the minimum required server bandwidth for a given loss rate and playback latency. The results of the models are used to develop the new protocols and assess their performance. The new protocols, Reliable Periodic Broadcast and Reliable Bandwidth Skimming, are simple to implement and achieve nearly the best possible scalability and efficiency for a given set of client characteristics and desirable/feasible media quality. Furthermore, the results show that the new reliable protocols that transmit to each client at only twice the media play rate have similar performance to previous protocols that require clients to receive at many times the play rate.

international world wide web conferences | 2008

A comparative analysis of web and peer-to-peer traffic

Naimul Basher; Aniket Mahanti; Anirban Mahanti; Carey L. Williamson; Martin F. Arlitt

Peer-to-Peer (P2P) applications continue to grow in popularity, and have reportedly overtaken Web applications as the single largest contributor to Internet traffic. Using traces collected from a large edge network, we conduct an extensive analysis of P2P traffic, compare P2P traffic with Web traffic, and discuss the implications of increased P2P traffic. In addition to studying the aggregate P2P traffic, we also analyze and compare the two main constituents of P2P traffic in our data, namely BitTorrent and Gnutella. The results presented in the paper may be used for generating synthetic workloads, gaining insights into the functioning of P2P applications, and developing network management strategies. For example, our results suggest that new models are necessary for Internet traffic. As a first step, we present flow-level distributional models for Web and P2P traffic that may be used in network simulation and emulation experiments.

measurement and modeling of computer systems | 2008

Analysis of bittorrent-like protocols for on-demand stored media streaming

Nadim Parvez; Carey L. Williamson; Anirban Mahanti; Niklas Carlsson

This paper develops analytic models that characterize the behavior of on-demand stored media content delivery using BitTorrent-like protocols. The models capture the effects of different piece selection policies, including Rarest-First and two variants of In-Order. Our models provide insight into transient and steady-state system behavior, and help explain the sluggishness of the system with strict In-Order streaming. We use the models to compare different retrieval policies across a wide range of system parameters, including peer arrival rate, upload/download bandwidth, and seed residence time. We also provide quantitative results on the startup delays and retrieval times for streaming media delivery. Our results provide insights into the optimal design of peer-to-peer networks for on-demand media streaming.

passive and active network measurement | 2008

The flattening internet topology: natural evolution, unsightly barnacles or contrived collapse?

Phillipa Gill; Martin F. Arlitt; Zongpeng Li; Anirban Mahanti

In this paper we collect and analyze traceroute measurements to show that large content providers (e.g., Google, Microsoft, Yahoo!) are deploying their own wide-area networks, bringing their networks closer to users, and bypassing Tier-1 ISPs on many paths. This trend, should it continue and be adopted by more content providers, could flatten the Internet topology, and may result in numerous other consequences to users, Internet Service Providers (ISPs), content providers, and network researchers.

Performance Evaluation | 2000

Temporal locality and its impact on Web proxy cache performance

Anirban Mahanti; Derek L. Eager; Carey L. Williamson

Abstract This paper studies temporal locality characteristics present in document referencing behavior at Web proxies, and the impact of this temporal locality on document caching. First, drift measures are developed to characterize how the popularity profile of “hot” documents changes on a day-to-day basis. Experiments show that although there is considerable “hot set” drift, there is also a significant number of documents that have long-term popularity. Second, a measure of short-term temporal locality is developed that characterizes the relationship between recent-past and near-future document references. Using this measure, it is established that temporal locality arising out of the correlations between document references in the recent-past and near-future does exist for popular documents. Another objective of this paper is to determine whether or not Web document references at proxy caches can be modeled as independent and identically distributed random events. Trace-driven simulations using empirical and synthetic traces (with varying degrees of temporal locality) show that temporal locality is an important factor in cache performance. The caching simulation results also show that temporal locality arising out of short-term correlations between references is important only for small caches. For large caches, a synthetic workload generated by applying the Independent Reference Model on a day-to-day basis gives performance very similar to that obtained for empirical traces.

ACM Transactions on The Web | 2011

Characterizing Web-Based Video Sharing Workloads

Siddharth Mitra; Mayank Agrawal; Amit Yadav; Niklas Carlsson; Derek L. Eager; Anirban Mahanti

Video sharing services that allow ordinary Web users to upload video clips of their choice and watch video clips uploaded by others have recently become very popular. This article identifies invariants in video sharing workloads, through comparison of the workload characteristics of four popular video sharing services. Our traces contain metadata on approximately 1.8 million videos which together have been viewed approximately 6 billion times. Using these traces, we study the similarities and differences in use of several Web 2.0 features such as ratings, comments, favorites, and propensity of uploading content. In general, we find that active contribution, such as video uploading and rating of videos, is much less prevalent than passive use. While uploaders in general are skewed with respect to the number of videos they upload, the fraction of multi-time uploaders is found to differ by a factor of two between two of the sites. The distributions of lifetime measures of video popularity are found to have heavy-tailed forms that are similar across the four sites. Finally, we consider implications for system design of the identified invariants. To gain further insight into caching in video sharing systems, and the relevance to caching of lifetime popularity measures, we gathered an additional dataset tracking views to a set of approximately 1.3 million videos from one of the services, over a twelve-week period. We find that lifetime popularity measures have some relevance for large cache (hot set) sizes (i.e., a hot set defined according to one of these measures is indeed relatively “hot”), but that this relevance substantially decreases as cache size decreases, owing to churn in video popularity.

IEEE ACM Transactions on Networking | 2010

An analytic throughput model for TCP NewReno

Nadim Parvez; Anirban Mahanti; Carey L. Williamson

This paper develops a simple and accurate stochastic model for the steady-state throughput of a TCP NewReno bulk data transfer as a function of round-trip time and loss behavior. Our model builds upon extensive prior work on TCP Reno throughput models but differs from these prior works in three key aspects. First, our model introduces an analytical characterization of the TCP NewReno fast recovery algorithm. Second, our model incorporates an accurate formulation of NewRenos timeout behavior. Third, our model is formulated using a flexible two-parameter loss model that can better represent the diverse packet loss scenarios encountered by TCP on the Internet. We validated our model by conducting a large number of simulations using the ns-2 simulator and by conducting emulation and Internet experiments using a NewReno implementation in the BSD TCP/IP protocol stack. The main findings from the experiments are: 1) the proposed model accurately predicts the steady-state throughput for TCP NewReno bulk data transfers under a wide range of network conditions; 2) TCP NewReno significantly outperforms TCP Reno in many of the scenarios considered; and 3) using existing TCP Reno models to estimate TCP NewReno throughput may introduce significant errors.

Explore More