Nandita Dukkipati
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nandita Dukkipati.
acm special interest group on data communication | 2010
Nandita Dukkipati; Tiziana Refice; Yuchung Cheng; Jerry Chu; Tom Herbert; Amit Agarwal; Arvind Jain; Natalia Sutin
TCP flows start with an initial congestion window of at most four segments or approximately 4KB of data. Because most Web transactions are short-lived, the initial congestion window is a critical TCP parameter in determining how quickly flows can finish. While the global network access speeds increased dramatically on average in the past decade, the standard value of TCPs initial congestion window has remained unchanged. In this paper, we propose to increase TCPs initial congestion window to at least ten segments (about 15KB). Through large-scale Internet experiments, we quantify the latency benefits and costs of using a larger window, as functions of network bandwidth, round-trip time (RTT), bandwidth-delay product (BDP), and nature of applications. We show that the average latency of HTTP responses improved by approximately 10% with the largest benefits being demonstrated in high RTT and BDP networks. The latency of low bandwidth networks also improved by a significant amount in our experiments. The average retransmission rate increased by a modest 0.5%, with most of the increase coming from applications that effectively circumvent TCPs slow start algorithm by using multiple concurrent connections. Based on the results from our experiments, we believe the initial congestion window should be at least ten segments and the same be investigated for standardization by the IETF.
acm special interest group on data communication | 2013
Tobias Flach; Nandita Dukkipati; Andreas Terzis; Barath Raghavan; Neal Cardwell; Yuchung Cheng; Ankur Jain; Shuai Hao; Ethan Katz-Bassett; Ramesh Govindan
To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCPs current mechanisms fundamentally limit latency improvements. We performed a measurement study of a large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCPs timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mechanisms for TCP that judiciously use redundant transmissions to minimize timeout-driven recovery. Proactive, Reactive, and Corrective are three qualitatively-different, easily-deployable mechanisms that (1) proactively recover from losses, (2) recover from them as quickly as possible, and (3) reconstruct packets to mask loss. Crucially, the mechanisms are compatible both with middleboxes and with TCPs existing congestion control and loss recovery. Our large-scale experiments on Googles production network that serves billions of flows demonstrate a 23% decrease in the mean and 47% in 99th percentile latency over todays TCP.
acm special interest group on data communication | 2015
Radhika Mittal; Nandita Dukkipati; Emily R. Blem; Hassan M. G. Wassel; Monia Ghobadi; Amin Vahdat; Yaogong Wang; David Wetherall; David Zats
Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabilities. We show using experiments with up to hundreds of machines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by
internet measurement conference | 2011
Nandita Dukkipati; Matt Mathis; Yuchung Cheng; Monia Ghobadi
13
acm special interest group on data communication | 2017
Ahmed Saeed; Nandita Dukkipati; Vytautas Valancius; Carlo Contavalli; Amin Vahdat
X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas.
Computer Communication Review | 2018
Nandita Dukkipati; Yuchung Cheng; Amin Vahdat
Packet losses increase latency for Web users. Fast recovery is a key mechanism for TCP to recover from packet losses. In this paper, we explore some of the weaknesses of the standard algorithm described in RFC 3517 and the non-standard algorithms implemented in Linux. We find that these algorithms deviate from their intended behavior in the real world due to the combined effect of short flows, application stalls, burst losses, acknowledgment (ACK) loss and reordering, and stretch ACKs. Linux suffers from excessive congestion window reductions while RFC 3517 transmits large bursts under high losses, both of which harm the rest of the flow and increase Web latency. Our primary contribution is a new design to control transmission in fast recovery called proportional rate reduction (PRR). PRR recovers from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs. In addition to PRR, we evaluate the TCP early retransmit (ER) algorithm which lowers the duplicate acknowledgment threshold for short transfers, and show that delaying early retransmissions for a short interval is effective in avoiding spurious retransmissions in the presence of a small degree of reordering. PRR and ER reduce the TCP latency of connections experiencing losses by 3-10% depending on the response size. Based on our instrumentation on Google Web and YouTube servers in U.S. and India, we also present key statistics on the nature of TCP retransmissions.
RFC | 2013
Jerry Chu; Nandita Dukkipati; Yuchung Cheng; Matt Mathis
Traffic shaping, including pacing and rate limiting, is fundamental to the correct and efficient operation of both datacenter and wide area networks. Sample use cases include policy-based bandwidth allocation to flow aggregates, rate-based congestion control algorithms, and packet pacing to avoid bursty transmissions that can overwhelm router buffers. Driven by the need to scale to millions of flows and to apply complex policies, traffic shaping is moving from network switches into the end hosts, typically implemented in software in the kernel networking stack. In this paper, we show that the performance overhead of end-host traffic shaping is substantial limits overall system scalability as we move to thousands of individual traffic classes per server. Measurements from production servers show that shaping at hosts consumes considerable CPU and memory, unnecessarily drops packets, suffers from head of line blocking and inaccuracy, and does not provide backpressure up the stack. We present Carousel, a framework that scales to tens of thousands of policies and flows per server, built from the synthesis of three key ideas: i) a single queue shaper using time as the basis for releasing packets, ii) fine-grained, just-in-time freeing of resources in higher layers coupled to actual packet departures, and iii) one shaper per CPU core, with lock-free coordination. Our production experience in serving video traffic at a Cloud service provider shows that Carousel shapes traffic accurately while improving overall machine CPU utilization by 8% (an improvement of 20% in the CPU utilization attributed to networking) relative to state-of-art deployments. It also conforms 10 times more accurately to target rates, and consumes two orders of magnitude less memory than existing approaches.
Archive | 2013
Yuchung Cheng; Neal Cardwell; Nandita Dukkipati; Matt Mathis
Many algorithms proposed in networking research papers are widely used in many areas, including Congestion Control, Routing, Traffic Engineering, and Load Balancing. In this paper, we present algorithmic advancements that have impacted the practice of Congestion Control (CC) in datacenters and the Internet. Where possible, we also describe negative examples, ideas that looked promising on paper or in simulations but that performed poorly in practice. We conclude the paper with observations on the characteristics shared by these ideas in taking them from research to impacting practice.
usenix annual technical conference | 2013
Neal Cardwell; Yuchung Cheng; Lawrence Brakmo; Matt Mathis; Barath Raghavan; Nandita Dukkipati; Hsiao-keng Jerry Chu; Andreas Terzis; Tom Herbert
Archive | 2014
Nandita Dukkipati; Yuchung Cheng; Barath Raghavan