Srikanth Kandula | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Srikanth Kandula is active.

Explore More

Publication

Featured researches published by Srikanth Kandula.

internet measurement conference | 2010

CloudCmp: comparing public cloud providers

Ang Li; Xiaowei Yang; Srikanth Kandula; Ming Zhang

While many public cloud providers offer pay-as-you-go computing, their varying approaches to infrastructure, virtualization, and software services lead to a problem of plenty. To help customers pick a cloud that fits their needs, we develop CloudCmp, a systematic comparator of the performance and cost of cloud providers. CloudCmp measures the elastic computing, persistent storage, and networking services offered by a cloud along metrics that directly reflect their impact on the performance of customer applications. CloudCmp strives to ensure fairness, representativeness, and compliance of these measurements while limiting measurement cost. Applying CloudCmp to four cloud providers that together account for most of the cloud customers today, we find that their offered services vary widely in performance and costs, underscoring the need for thoughtful provider selection. From case studies on three representative cloud applications, we show that CloudCmp can guide customers in selecting the best-performing provider for their applications.

international conference on mobile systems, applications, and services | 2010

Diversity in smartphone usage

Hossein Falaki; Ratul Mahajan; Srikanth Kandula; Dimitrios Lymberopoulos; Ramesh Govindan; Deborah Estrin

Using detailed traces from 255 users, we conduct a comprehensive study of smartphone use. We characterize intentional user activities -- interactions with the device and the applications used -- and the impact of those activities on network and energy usage. We find immense diversity among users. Along all aspects that we study, users differ by one or more orders of magnitude. For instance, the average number of interactions per day varies from 10 to 200, and the average amount of data received per day varies from 1 to 1000 MB. This level of diversity suggests that mechanisms to improve user experience or energy consumption will be more effective if they learn and adapt to user behavior. We find that qualitative similarities exist among users that facilitate the task of learning user behavior. For instance, the relative application popularity for can be modeled using an exponential distribution, with different distribution parameters for different users. We demonstrate the value of adapting to user behavior in the context of a mechanism to predict future energy drain. The 90th percentile error with adaptation is less than half compared to predictions based on average behavior across users.

internet measurement conference | 2009

The nature of data center traffic: measurements & analysis

Srikanth Kandula; Sudipta Sengupta; Albert G. Greenberg; Parveen Patel; Ronnie Chaiken

We explore the nature of traffic in data centers, designed to support the mining of massive data sets. We instrument the servers to collect socket-level logs, with negligible performance impact. In a 1500 server operational cluster, we thus amass roughly a petabyte of measurements over two months, from which we obtain and report detailed views of traffic and congestion conditions and patterns. We further consider whether traffic matrices in the cluster might be obtained instead via tomographic inference from coarser-grained counter data.

acm special interest group on data communication | 2013

Achieving high utilization with software-driven WAN

Chi-Yao Hong; Srikanth Kandula; Ratul Mahajan; Ming Zhang; Vijay Gill; Mohan Nanduri; Roger Wattenhofer

We present SWAN, a system that boosts the utilization of inter-datacenter networks by centrally controlling when and how much traffic each service sends and frequently re-configuring the networks data plane to match current traffic demand. But done simplistically, these re-configurations can also cause severe, transient congestion because different switches may apply updates at different times. We develop a novel technique that leverages a small amount of scratch capacity on links to apply updates in a provably congestion-free manner, without making any assumptions about the order and timing of updates at individual switches. Further, to scale to large networks in the face of limited forwarding table capacity, SWAN greedily selects a small set of entries that can best satisfy current demand. It updates this set without disrupting traffic by leveraging a small amount of scratch capacity in forwarding tables. Experiments using a testbed prototype and data-driven simulations of two production networks show that SWAN carries 60% more traffic than the current practice.

internet measurement conference | 2010

A first look at traffic on smartphones

Hossein Falaki; Dimitrios Lymberopoulos; Ratul Mahajan; Srikanth Kandula; Deborah Estrin

Using data from 43 users across two platforms, we present a detailed look at smartphone traffic. We find that browsing contributes over half of the traffic, while each of email, media, and maps contribute roughly 10%. We also find that the overhead of lower layer protocols is high because of small transfer sizes. For half of the transfers that use transport-level security, header bytes correspond to 40% of the total. We show that while packet loss is the main factor that limits the throughput of smartphone traffic, larger send buffers at Internet servers can improve the throughput of a quarter of the transfers. Finally, by studying the interaction between smartphone traffic and the radio power management policy, we find that the power consumption of the radio can be reduced by 35% with minimal impact on the performance of packet exchanges.

acm special interest group on data communication | 2007

Towards highly reliable enterprise network services via inference of multi-level dependencies

Paramvir Bahl; Ranveer Chandra; Albert G. Greenberg; Srikanth Kandula; David A. Maltz; Ming Zhang

Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is well-adapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multi-level structure leads to a 30% improvement in fault localization, as compared to two-level approaches.

acm special interest group on data communication | 2007

Dynamic load balancing without packet reordering

Srikanth Kandula; Dina Katabi; Shantanu Sinha; Arthur W. Berger

Dynamic load balancing is a popular recent technique that protects ISP networks from sudden congestion caused by load spikes or link failures. Dynamic load balancing protocols, however, require schemes for splitting traffic across multiple paths at a fine granularity. Current splitting schemes present a tussle between slicing granularity and packet reordering. Splitting traffic at the granularity of packets quickly and accurately assigns the desired traffic share to each path, but can reorder packets within a TCP flow, confusing TCP congestion control. Splitting traffic at the granularity of a flow avoids packet reordering but may overshoot the desired shares by up to 60% in dynamic environments, resulting in low end-to-end network goodput Contrary to popular belief, we show that one can systematically split a single flow across multiple paths without causing packet reordering. We propose FLARE, a new traffic splitting algorithm that operates on bursts of packets, carefully chosen to avoid reordering. Using a combination of analysis and trace-driven simulations, we show that FLARE attains accuracy and responsiveness comparable to packet switching without reordering packets. FLARE is simple and can be implemented with a few KB of router state

acm special interest group on data communication | 2011

Augmenting data center networks with multi-gigabit wireless links

Daniel Halperin; Srikanth Kandula; Jitendra Padhye; Paramvir Bahl; David Wetherall

The 60 GHz wireless technology that is now emerging has the potential to provide dense and extremely fast connectivity at low cost. In this paper, we explore its use to relieve hotspots in oversubscribed data center (DC) networks. By experimenting with prototype equipment, we show that the DC environment is well suited to a deployment of 60GHz links contrary to concerns about interference and link reliability. Using directional antennas, many wireless links can run concurrently at multi-Gbps rates on top-of-rack (ToR) switches. The wired DC network can be used to sidestep several common wireless problems. By analyzing production traces of DC traffic for four real applications, we show that adding a small amount of network capacity in the form of wireless flyways to the wired DC network can improve performance. However, to be of significant value, we find that one hop indirect routing is needed. Informed by our 60GHz experiments and DC traffic analysis, we present a design that uses DC traffic levels to select and adds flyways to the wired DC network. Trace-driven evaluations show that network-limited DC applications with predictable traffic workloads running on a 1:2 oversubscribed network can be sped up by 45% in 95% of the cases, with just one wireless device per ToR switch. With two devices, in 40% of the cases, the performance is identical to that of a non-oversubscribed network.

european conference on computer systems | 2011

Scarlett: coping with skewed content popularity in mapreduce clusters

Ganesh Ananthanarayanan; Sameer Agarwal; Srikanth Kandula; Albert G. Greenberg; Ion Stoica; Duke Harlan; Ed Harris

To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even when there are machines with spare cycles in the cluster. To address this problem, we present Scarlett, a system that replicates blocks based on their popularity. By accurately predicting file popularity and working within hard bounds on additional storage, Scarlett causes minimal interference to running jobs. Trace driven simulations and experiments in two popular MapReduce frameworks (Hadoop, Dryad) show that Scarlett effectively alleviates hotspots and can speed up jobs by 20.2%.

acm special interest group on data communication | 2005

Shrink: a tool for failure diagnosis in IP networks

Srikanth Kandula; Dina Katabi; Jean-Philippe Vasseur

Faults in an IP network have various causes such as the failure of one or more routers at the IP layer, fiber-cuts, failure of physical elements at the optical layer, or extraneous causes like power outages. These faults are usually detected as failures of a set of dependent logical entities--the IP links affected by the failed components. We present Shrink, a tool for root cause analysis of network faults which, given a set of failed IP links, identifies the underlying cause of the faulty state. Shrink models the diagnosis problem as a Bayesian network. It has two main contributions. First, it effectively accounts for noisy measurement and inaccurate mapping between the IP and optical layers. Second, it has an efficient inference algorithm that finds the most likely failure causes in polynomial time and with bounded errors. We compare Shrink with two prior approaches and show that it substantially improves the performance.

Explore More