Neal Cardwell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Neal Cardwell is active.

Explore More

Publication

Featured researches published by Neal Cardwell.

symposium on operating systems principles | 1999

On the scale and performance of cooperative Web proxy caching

Alec Wolman; Geoffrey M. Voelker; Nitin Sharma; Neal Cardwell; Anna R. Karlin; Henry M. Levy

While algorithms for cooperative proxy caching have been widely studied, little is understood about cooperative-caching performance in the large-scale World Wide Web environment. This paper uses both trace-based analysis and analytic modelling to show the potential advantages and drawbacks of inter-proxy cooperation. With our traces, we evaluate quantitatively the performance-improvement potential of cooperation between 200 small-organization proxies within a university environment, and between two large-organization proxies handling 23,000 and 60,000 clients, respectively. With our model, we extend beyond these populations to project cooperative caching behavior in regions with millions of clients. Overall, we demonstrate that cooperative caching has performance benefits only within limited population bounds. We also use our model to examine the implications of future trends in Web-access behavior and traffic.

international conference on computer communications | 2000

Modeling TCP latency

Neal Cardwell; Stefan Savage; Thomas E. Anderson

Several analytic models describe the steady-state throughput of bulk transfer TCP flows as a function of round trip time and packet loss rate. These models describe flows based on the assumption that they are long enough to sustain many packet losses. However, most TCP transfers across todays Internet are short enough to see few, if any, losses and consequently their performance is dominated by startup effects such as connection establishment and slow start. This paper extends the steady-state model proposed in Padhye et al. (1998), in order to capture these startup effects. The extended model characterizes the expected value and distribution of TCP connection establishment and data transfer latency as a function of transfer size, round trip time, and packet loss rate. Using simulations, controlled measurements of TCP transfers, and live Web measurements we show that, unlike earlier steady-state models for TCP performance, our extended model describes connection establishment and data transfer latency under a range of packet loss conditions, including no loss.

international symposium on microarchitecture | 1997

A case for intelligent RAM

David A. Patterson; Thomas E. Anderson; Neal Cardwell; Richard Fromm; Kimberly Keeton; Christoforos E. Kozyrakis; Randi Thomas; Katherine A. Yelick

Two trends call into question the current practice of microprocessors and DRAMs being fabricated as different chips on different fab lines: 1) the gap between processor and DRAM speed is growing at...Two trends call into question the current practice of fabricating microprocessors and DRAMs as different chips on different fabrication lines. The gap between processor and DRAM speed is growing at 50% per year; and the size and organization of memory on a single DRAM chip is becoming awkward to use, yet size is growing at 60% per year. Intelligent RAM, or IRAM, merges processing and memory into a single chip to lower memory latency, increase memory bandwidth, and improve energy efficiency. It also allows more flexible selection of memory size and organization, and promises savings in board area. This article reviews the state of microprocessors and DRAMs today, explores some of the opportunities and challenges for IRAMs, and finally estimates performance and energy efficiency of three IRAM designs.

international symposium on microarchitecture | 1999

Detour: informed Internet routing and transport

Stefan Savage; Thomas E. Anderson; Amit Aggarwal; David Becker; Neal Cardwell; Andy Collins; Eric Hoffman; John Snell; Amin Vahdat; Geoffrey M. Voelker; John Zahorjan

Despite its obvious success, the Internet suffers from end-to-end performance and availability problems. We believe that intelligent routers at key access and interchange points could improve Internet behavior by actively managing traffic. We describe the inefficiencies in routing and transport protocols in the modern Internet. We are constructing a prototype, called Detour, a virtual Internet, in which routers tunnel packets over the commodity Internet instead of using dedicated links.

acm special interest group on data communication | 1999

TCP congestion control with a misbehaving receiver

Stefan Savage; Neal Cardwell; David Wetherall; Thomas E. Anderson

In this paper, we explore the operation of TCP congestion control when the receiver can misbehave, as might occur with a greedy Web client. We first demonstrate that there are simple attacks that allow a misbehaving receiver to drive a standard TCP sender arbitrarily fast, without losing end-to-end reliability. These attacks are widely applicable because they stem from the sender behavior specified in RFC 2581 rather than implementation bugs. We then show that it is possible to modify TCP to eliminate this undesirable behavior entirely, without requiring assumptions of any kind about receiver behavior. This is a strong result: with our solution a receiver can only reduce the data transfer rate by misbehaving, thereby eliminating the incentive to do so.

IEEE Computer | 1997

Scalable processors in the billion-transistor era: IRAM

Christoforos E. Kozyrakis; Stylianos Perissakis; David A. Patterson; Thomas E. Anderson; Krste Asanovic; Neal Cardwell; Richard Fromm; Jason Golbus; Benjamin Gribstad; Kimberly Keeton; Randi Thomas; Noah Treuhaft; Katherine A. Yelick

Members of the University of California, Berkeley, argue that the memory system will be the greatest inhibitor of performance gains in future architectures. Thus, they propose the intelligent RAM or IRAM. This approach greatly increases the on-chip memory capacity by using DRAM technology instead of much less dense SRAM memory cells. The resultant on-chip memory capacity coupled with the high bandwidths available on chip should allow cost-effective vector processors to reach performance levels much higher than those of traditional architectures. Although vector processors require explicit compilation, the authors claim that vector compilation technology is mature (having been used for decades in supercomputers), and furthermore, that future workloads will contain more heavily vectorizable components.

acm special interest group on data communication | 2013

Reducing web latency: the virtue of gentle aggression

Tobias Flach; Nandita Dukkipati; Andreas Terzis; Barath Raghavan; Neal Cardwell; Yuchung Cheng; Ankur Jain; Shuai Hao; Ethan Katz-Bassett; Ramesh Govindan

To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCPs current mechanisms fundamentally limit latency improvements. We performed a measurement study of a large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCPs timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mechanisms for TCP that judiciously use redundant transmissions to minimize timeout-driven recovery. Proactive, Reactive, and Corrective are three qualitatively-different, easily-deployable mechanisms that (1) proactively recover from losses, (2) recover from them as quickly as possible, and (3) reconstruct packets to mask loss. Crucially, the mechanisms are compatible both with middleboxes and with TCPs existing congestion control and loss recovery. Our large-scale experiments on Googles production network that serves billions of flows demonstrate a 23% decrease in the mean and 47% in 99th percentile latency over todays TCP.

international symposium on computer architecture | 1997

The energy efficiency of IRAM architectures

Richard Fromm; Stylianos Perissakis; Neal Cardwell; Christoforos E. Kozyrakis; Bruce W. McGaughy; David A. Patterson; Thomas E. Anderson; Katherine A. Yelick

Portable systems demand energy efficiency in order to maximize battery life. IRAM architectures, which combine DRAM and a processor on the same chip in a DRAM process, are more energy efficient than conventional systems. The high density of DRAM permits a much larger amount of memory on-chip than a traditional SRAM cache design in a logic process. This allows most or all IRAM memory accesses to be satisfied on-chip. Thus there is much less need to drive high-capacitance off-chip buses, which contribute significantly to the energy consumption of a system. To quantify this advantage we apply models of energy consumption in DRAM and SRAM memories to results from cache simulations of applications reflective of personal productivity tasks on low power systems. We find that IRAM memory hierarchies consume as little as 22% of the energy consumed by a conventional memory hierarchy for memory-intensive applications, while delivering comparable performance. Furthermore, the energy consumed by a system consisting of an IRAM memory hierarchy combined with an energy efficient CPU core is as little as 40% of that of the same CPU core with a traditional memory hierarchy.

ACM Queue | 2016

BBR: congestion-based congestion control

Neal Cardwell; Yuchung Cheng; C. Stephen Gunn; Soheil Hassas Yeganeh; Van Jacobson

When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat. When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signal of congestion, leading to low throughput. Fixing these problems requires an alternative to loss-based congestion control. Finding this alternative requires an understanding of where and how network congestion originates.

workshop on hot topics in operating systems | 1999

The case for informed transport protocols

Stefan Savage; Neal Cardwell; Thomas E. Anderson

Wide-area distributed applications are frequently limited by the performance of Internet data transfers. We argue that the principle cause of this effect is the poor interaction between host-centric congestion control algorithms and the realities of todays Internet traffic and infrastructure. In particular when the duration of a network flow is short, then using end-to-end feedback to determine network conditions will be extremely inefficient. We propose an incremental approach to the problem, in which congestion information is shared among many co-located hosts and transport protocols make informed congestion control decisions. We argue that the resulting system can potentially improve the performance experienced by each network user as well as the overall efficiency of the network.

Explore More