Sundar Iyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sundar Iyer is active.

Explore More

Publication

Featured researches published by Sundar Iyer.

international conference on computer communications | 2005

Practical algorithms for performance guarantees in buffered crossbars

Shang-Tse Chuang; Sundar Iyer; Nick McKeown

This paper is about high capacity switches and routers that give guaranteed throughput, rate and delay guarantees. Many routers are built using input queueing or combined input and output queueing (CIOQ), using crossbar switching fabrics. But such routers require impractically complex scheduling algorithms to provide the desired guarantees. We explore how a buffered crossbar-a crossbar switch with a packet buffer at each crosspoint-can provide guaranteed performance (throughput, rate, and delay), with less complex, practical scheduling algorithms. We describe scheduling algorithms that operate in parallel on each input and output port, and hence are scalable. With these algorithms, buffered crossbars with a speedup of two can provide 100% throughput, rate, and delay guarantees.

international conference on computer communications | 2003

An approach to alleviate link overload as observed on an IP backbone

Sundar Iyer; Supratik Bhattacharyya; Nina Taft; Christophe Diot

Shortest path routing protocols may suffer from congestion due to the use of a single shortest path between a source and a destination. The goal of our work is to first understand how links become overloaded in an IP backbone, and then to explore if the routing protocol, -either in its existing form, or in some enhanced form could be made to respond immediately to overload and reduce the likelihood of its occurrence. Our method is to use extensive measurements of Sprints backbone network, measuring 138 links between September 2000 and June 2001. We find that since the backbone is designed to be overprovisioned, link overload is rare, and when it occurs, 80% of the time it is caused due to link failures. Furthermore, we find that when a link is overloaded, few (if any) other links in the network are also overloaded. This suggests that deflecting packets to less utilized alternate paths could be an effective method for tackling overload. We analytically derive the condition that a network, which has multiple equal length shortest paths between every pair of nodes (as is common in the highly meshed backbone networks) can provide for loop-free deflection paths if all the link weights are within a ratio 1 + 1/(d- I) of each other; where d is the diameter of the network. Based on our measurements, the nature of the backbone topology and the careful use of link weights, we propose a deflection routing algorithm to tackle link overload where each node makes local decisions. Simulations suggest that this can be a simple and efficient way to overcome link overload, without requiring any changes to the routing protocol.

IEEE ACM Transactions on Networking | 2003

Analysis of the parallel packet switch architecture

Sundar Iyer; Nick McKeown

Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed overk identical lower speed packet switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet switch performs packet-by-packet load balancing, or inverse multiplexing, over multiple independent packet switches. Each lower speed packet switch operates at a fraction of the line rate R. For example, each packet switch can operate at rateR/k. It is a goal of our work that all memory buffers in the PPS run slower than the line rate. Ideally,a PPS would share the benefits of an output-queued switch, i.e., the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service.In this paper, we ask the question: Is it possible for a PPS to precisely emulate the behavior of an output-queued packet switch with the same capacity and with the same number of ports? We show that it is theoretically possible for a PPS to emulate a first-come first-served (FCFS) output-queued (OQ) packet switch if each lower speed packet switch operates at a rate of approximately 2R/k. We further show that it is theoretically possible for a PPS to emulate a wide variety of quality-of-service queueing disciplines if each lower speed packet switch operates at a rate of approximately 3R/k. It turns out that these results are impractical because of high communication complexity, but a practical high-performance PPS can be designed if we slightly relax our original goal and allow a small fixed-size coordination buffer running at the line rate in both the demultiplexer and the multiplexer. We determine the size of this buffer and show that it can eliminate the need for a centralized scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Furthermore, we show that if the lower speed packet switch operates at a rate ofR/k (i.e., without speedup), the resulting PPS can emulate an FCFS-OQ switch within a delay bound.

international conference on computer communications | 2001

Making parallel packet switches practical

Sundar Iyer; Nick McKeown

A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are spread (or load-balanced) packet-by-packet over multiple slower-speed packet switches. It is already known that with a speedup of S/spl ges/2, a PPS can theoretically mimic a FCFS output-queued (OQ) switch. However, the theory relies on a centralized packet scheduling algorithm that is essentially impractical because of high communication complexity. In this paper, we attempt to make a high performance PPS practical by introducing two results. First, we show that small co-ordination buffers can eliminate the need for a centralized packet scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Second, we show that without speedup, the resulting PPS can mimic an FCFS OQ switch within a delay bound.

acm special interest group on data communication | 2002

Routers with a single stage of buffering

Sundar Iyer; Rui Zhang; Nick McKeown

Most high performance routers today use combined input and output queueing (CIOQ). The CIOQ router is also frequently used as an abstract model for routers: at one extreme is input queueing, at the other extreme is output queueing, and in-between there is a continuum of performance as the speedup is increased from 1 to N (where N is the number of linecards). The model includes architectures in which a switch fabric is sandwiched between two stages of buffering. There is a rich and growing theory for CIOQ routers, including algorithms, throughput results and conditions under which delays can be guaranteed. But there is a broad class of architectures that are not captured by the CIOQ model, including routers with centralized shared memory, and load-balanced routers. In this paper we propose an abstract model called Single-Buffered (SB) routers that includes these architectures. We describe a method called Constraint Sets to analyze a number of SB router architectures. The model helped identify previously unstudied architectures, in particular the Distributed Shared Memory router. Although commercially deployed, its performance is not widely known. We find conditions under which it can emulate an ideal shared memory router, and believe it to be a promising architecture. Questions remain about its complexity, but we find that the memory bandwidth, and potentially the power consumption of the router is lower than for a CIOQ router.

international symposium on microarchitecture | 2002

Maintaining statistics counters in router line cards

Devavrat Shah; Sundar Iyer; B. Prahhakar; Nick McKeown

A network device stores and updates statistics counters. Using an optimal counter management algorithm minimizes required SRAM size and ensures correct line-rate operation for many counters. We use a well known architecture for storing and updating statistics counters. This approach maintains smaller-size counters in fast (potentially on-chip) SRAM, while maintaining full-size counters in a large, slower DRAM. Our goal is to ensure that the system always correctly maintains counter values at line rate. An optimal counter management algorithm (CMA) minimizes the required SRAM size while ensuring correct line-rate operation for a large number of counters.

IEEE Network | 2001

ClassiPl: an architecture for fast and flexible packet classification

Sundar Iyer; Ramana Rao Kompella; Ajit Shelat

Packet classification is a fundamental and critical operation to be performed in networking equipment such as switches and routers. The types of classification to be performed encompass a wide range, from well-understood operations such as route table lookups to complex packet identification involving multiple fields in the packet. Furthermore, the advent of application-aware network devices demand the use of flexible packet classifiers that can handle operations such as pattern searches and regular expression matching. Traditionally, depending on the classification types to be implemented, solutions based on CAMs/TCAMs, special function ASICs, and software-based algorithms have been used. These solutions, while adequate, target specific classification applications. This article describes ClassiPl/sup TM/, a programmable hardware architecture that performs packet classification at OC48c line rates. Illustrations and examples show how this architecture can be used in conjunction with software algorithmic techniques to support the classification needs of a variety of applications from packet switching, forwarding, and filtering to layer 7 applications such as server load balancing. We believe that the ClassiPl architecture is scalable and flexible to meet the needs of a broad class of network applications.

high performance interconnects | 2001

Analysis of a statistics counter architecture

Devavrat Shah; Sundar Iyer; Balaji Prabhakar; Nick McKeown

Packet switches (e.g., IP routers, ATM switches and Ethernet switches) maintain statistics for a variety of reasons: performance monitoring, network management, security, network tracing, and traffic engineering. The statistics are usually collected by counters which might, for example, count the number of arrivals of a specific type of packet, or count particular events, such as when a packet is dropped. The arrival of a packet may lead to several different statistics counters being updated. The number of statistics counters and the rate at which they are updated is often limited by memory technology. A small number of counters may be held in on-chip registers or in (on- or off-chip) SRAM. But often, the number of counters is very large, and hence they need to be stored in off-chip DRAM. However, the large random access times of DRAMs make it difficult to support high bandwidth links. The time taken to read, update and write a single counter would be too large, and worse still multiple counters may need to be updated for each arriving packet. In this paper we consider a specific architecture for storing and updating statistics counters. Smaller sized counters are maintained in fast (potentially on-chip) SRAM, while a large, slower DRAM maintains the full-sized counters. The problem is to ensure that the counter values are always correctly maintained at line-rate. We describe and analyze an optimal counter management algorithm (LCF-CMA), which minimizes the size of the SRAM required while ensuring correct line-rate operation of a large number of counters.

IEEE Communications Letters | 2001

On the speedup required for a multicast parallel packet switch

Sundar Iyer; Nick McKeown

A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are load-balanced packet-by-packet over multiple lower speed center stage packet switches. It is known that, for unicast traffic, a PPS can precisely emulate a FCFS output-queued (OQ) switch with a speedup of two and an OQ switch with delay guarantees with a speedup of three. In this paper we ask: is it possible for a PPS to emulate the behavior of an OQ multicast switch? The main result is that for multicast traffic an N-port PPS can precisely emulate a FIFO OQ switch with a speedup of S>2/spl radic/N+1, and a switch that provides delay guarantees with a speedup of S>2/spl radic/(2N)+2.

IEEE Communications Letters | 2003

Using constraint sets to achieve delay bounds in CIOQ switches

Sundar Iyer; Nick McKeown

We previously proposed constraint sets as a simple technique to analyze routers with a single stage of buffering. We extend the technique to analyze combined input and output (CIOQ) routers with two stages of buffering.

Explore More