Amin Vahdat
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amin Vahdat.
acm special interest group on data communication | 2008
Mohammad Al-Fares; Alexander Loukissas; Amin Vahdat
Todays data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from todays higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.
acm special interest group on data communication | 2013
Sushant Jain; Alok Kumar; Subhasree Mandal; Joon Ong; Leon Poutievski; Arjun Singh; Subbaiah Venkata; Jim Wanderer; Junlan Zhou; Min Zhu; Jonathan Zolla; Urs Hölzle; Stephen Stuart; Amin Vahdat
We present the design, implementation, and evaluation of B4, a private WAN connecting Googles data centers across the planet. B4 has a number of unique characteristics: i) massive bandwidth requirements deployed to a modest number of sites, ii) elastic traffic demand that seeks to maximize average bandwidth, and iii) full control over the edge servers and network, which enables rate limiting and demand measurement at the edge. These characteristics led to a Software Defined Networking architecture using OpenFlow to control relatively simple switches built from merchant silicon. B4s centralized traffic engineering service drives links to near 100% utilization, while splitting application flows among multiple paths to balance capacity against application priority/demands. We describe experience with three years of B4 production deployment, lessons learned, and areas for future work.
symposium on operating systems principles | 2003
Dejan Kostic; Adolfo Rodriguez; Jeannie R. Albrecht; Amin Vahdat
In recent years, overlay networks have become an effective alternative to IP multicast for efficient point to multipoint communication across the Internet. Typically, nodes self-organize with the goal of forming an efficient overlay tree, one that meets performance targets without placing undue burden on the underlying network. In this paper, we target high-bandwidth data distribution from a single source to a large number of receivers. Applications include large-file transfers and real-time multimedia streaming. For these applications, we argue that an overlay mesh, rather than a tree, can deliver fundamentally higher bandwidth and reliability relative to typical tree structures. This paper presents Bullet, a scalable and distributed algorithm that enables nodes spread across the Internet to self-organize into a high bandwidth overlay mesh. We construct Bullet around the insight that data should be distributed in a disjoint manner to strategic points in the network. Individual Bullet receivers are then responsible for locating and retrieving the data from multiple points in parallel.Key contributions of this work include: i) an algorithm that sends data to different points in the overlay such that any data object is equally likely to appear at any node, ii) a scalable and decentralized algorithm that allows nodes to locate and recover missing data items, and iii) a complete implementation and evaluation of Bullet running across the Internet and in a large-scale emulation environment reveals up to a factor two bandwidth improvements under a variety of circumstances. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing. In a tree, it is critical that a nodes parent delivers a high rate of application data to each child. In Bullet however, nodes simultaneously receive data from multiple sources in parallel, making it less important to locate any single source capable of sustaining a high transmission rate.
acm ifip usenix international conference on middleware | 2003
Patrick Reynolds; Amin Vahdat
The recent file storage applications built on top of peer-to-peer distributed hash tables lack search capabilities. We believe that search is an important part of any document publication system. To that end, we have designed and analyzed a distributed search engine based on a distributed hash table. Our simulation results predict that our search engine can answer an average query in under one second, using under one kilobyte of bandwidth.
acm special interest group on data communication | 2014
Pat Bosshart; Daniel P. Daly; Glen Gibb; Martin J. Izzard; Nick McKeown; Jennifer Rexford; Cole Schlesinger; Dan Talayco; Amin Vahdat; George Varghese; David Walker
P4 is a high-level language for programming protocol-independent packet processors. P4 works in conjunction with SDN control protocols like OpenFlow. In its current form, OpenFlow explicitly specifies protocol headers on which it operates. This set has grown from 12 to 41 fields in a few years, increasing the complexity of the specification while still not providing the flexibility to add new headers. In this paper we propose P4 as a strawman proposal for how OpenFlow should evolve in the future. We have three goals: (1) Reconfigurability in the field: Programmers should be able to change the way switches process packets once they are deployed. (2) Protocol independence: Switches should not be tied to any specific network protocols. (3) Target independence: Programmers should be able to describe packet-processing functionality independently of the specifics of the underlying hardware. As an example, we describe how to use P4 to configure a switch to add a new hierarchical label.
architectural support for programming languages and operating systems | 2002
Heng Zeng; Carla Schlatter Ellis; Alvin R. Lebeck; Amin Vahdat
Energy consumption has recently been widely recognized as a major challenge of computer systems design. This paper explores how to support energy as a first-class operating system resource. Energy, because of its global system nature, presents challenges beyond those of conventional resource management. To meet these challenges we propose the Currentcy Model that unifies energy accounting over diverse hardware components and enables fair allocation of available energy among applications. Our particular goal is to extend battery lifetime by limiting the average discharge rate and to share this limited resource among competing task according to user preferences. To demonstrate how our framework supports explicit control over the battery resource we implemented ECOSystem, a modified Linux, that incorporates our currentcy model. Experimental results show that ECOSystem accurately accounts for the energy consumed by asynchronous device operation, can achieve a target battery lifetime, and proportionally shares the limited energy resource among competing tasks.
acm ifip usenix international conference on middleware | 2006
Diwaker Gupta; Ludmila Cherkasova; Robert C. Gardner; Amin Vahdat
Virtual machines (VMs) have recently emerged as the basis for allocating resources in enterprise settings and hosting centers. One benefit of VMs in these environments is the ability to multiplex several operating systems on hardware based on dynamically changing system characteristics. However, such multiplexing must often be done while observing per-VM performance guarantees or service level agreements. Thus, one important requirement in this environment is effective performance isolation among VMs. In this paper, we address performance isolation across virtual machines in Xen [1]. For instance, while Xen can allocate fixed shares of CPU among competing VMs, it does not currently account for work done on behalf of individual VMs in device drivers. Thus, the behavior of one VM can negatively impact resources available to other VMs even if appropriate per-VM resource limits are in place. In this paper, we present the design and evaluation of a set of primitives implemented in Xen to address this issue. First, XenMon accurately measures per-VM resource consumption, including work done on behalf of a particular VM in Xens driver domains. Next, our SEDF-DC scheduler accounts for aggregate VM resource consumption in allocating CPU. Finally, ShareGuard limits the total amount of resources consumed in privileged and driver domains based on administrator-specified limits. Our performance evaluation indicates that our mechanisms effectively enforce performance isolation for a variety of workloads and configurations.
Communications of The ACM | 2010
Diwaker Gupta; Sangmin Lee; Michael Vrable; Stefan Savage; Alex C. Snoeren; George Varghese; Geoffrey M. Voelker; Amin Vahdat
Virtual machine monitors (VMMs) are a popular platform for Internet hosting centers and cloud-based compute services. By multiplexing hardware resources among virtual machines (VMs) running commodity operating systems, VMMs decrease both the capital outlay and management overhead of hosting centers. Appropriate placement and migration policies can take advantage of statistical multiplexing to effectively utilize available processors. However, main memory is not amenable to such multiplexing and is often the primary bottleneck in achieving higher degrees of consolidation. Previous efforts have shown that content-based page sharing provides modest decreases in the memory footprint of VMs running similar operating systems and applications. Our studies show that significant additional gains can be had by leveraging both subpage level sharing (through page patching) and incore memory compression. We build Difference Engine, an extension to the Xen VMM, to support each of these---in addition to standard copy-on-write full-page sharing---and demonstrate substantial savings across VMs running disparate workloads (up to 65%). In head-to-head memory-savings comparisons, Difference Engine outperforms VMware ESX server by a factor 1.6--2.5 for heterogeneous workloads. In all cases, the performance overhead of Difference Engine is less than 7%.
measurement and modeling of computer systems | 2007
Ludmila Cherkasova; Diwaker Gupta; Amin Vahdat
The primary motivation for uptake of virtualization has been resource isolation, capacity management and resource customization allowing resource providers to consolidate their resources in virtual machines. Various approaches have been taken to integrate virtualization in to scientific Grids especially in the arena of High Performance Computing (HPC) to run grid jobs in virtual machines, thus enabling better provisioning of the underlying resources and customization of the execution environment on runtime. Despite the gains, virtualization layer also incur a performance penalty and its not very well understood that how such an overhead will impact the performance of systems where jobs are scheduled with tight deadlines. Since this overhead varies the types of workload whether they are memory intensive, CPU intensive or network I/O bound, and could lead to unpredictable deadline estimation for the running jobs in the system. In our study, we have attempted to tackle this problem by developing an intelligent scheduling technique for virtual machines which monitors the workload types and deadlines, and calculate the system over head in real time to maximize number of jobs finishing within their agreed deadlines.The primary motivation for enterprises to adopt virtualization technologies is to create a more agile and dynamic IT infrastructure -- with server consolidation, high resource utilization, the ability to quickly add and adjust capacity on demand -- while lowering total cost of ownership and responding more effectively to changing business conditions. However, effective management of virtualized IT environments introduces new and unique requirements, such as dynamically resizing and migrating virtual machines (VMs) in response to changing application demands. Such capacity management methods should work in conjunction with the underlying resource management mechanisms. In general, resource multiplexing and scheduling among virtual machines is poorly understood. CPU scheduling for virtual machines, for instance, has largely been borrowed from the process scheduling research in operating systems. However, it is not clear whether a straight-forward port of process schedulers to VM schedulers would perform just as well. We use the open source Xen virtual machine monitor to perform a comparative evaluation of three different CPU schedulers for virtual machines. We analyze the impact of the choice of scheduler and its parameters on application performance, and discuss challenges in estimating the application resource requirements in virtualized environments.
acm special interest group on data communication | 2006
Priya Mahadevan; Dmitri V. Krioukov; Kevin R. Fall; Amin Vahdat
Researchers have proposed a variety of metrics to measure important graph properties, for instance, in social, biological, and computer networks. Values for a particular graph metric may capture a graphs resilience to failure or its routing efficiency. Knowledge of appropriate metric values may influence the engineering of future topologies, repair strategies in the face of failure, and understanding of fundamental properties of existing networks. Unfortunately, there are typically no algorithms to generate graphs matching one or more proposed metrics and there is little understanding of the relationships among individual metrics or their applicability to different settings. We present a new, systematic approach for analyzing network topologies. We first introduce the dK-series of probability distributions specifying all degree correlations within d-sized subgraphs of a given graph G. Increasing values of d capture progressively more properties of G at the cost of more complex representation of the probability distribution. Using this series, we can quantitatively measure the distance between two graphs and construct random graphs that accurately reproduce virtually all metrics proposed in the literature. The nature of the dK-series implies that it will also capture any future metrics that may be proposed. Using our approach, we construct graphs for d=0, 1, 2, 3 and demonstrate that these graphs reproduce, with increasing accuracy, important properties of measured and modeled Internet topologies. We find that the d=2 case is sufficient for most practical purposes, while d=3 essentially reconstructs the Internet AS-and router-level topologies exactly. We hope that a systematic method to analyze and synthesize topologies offers a significant improvement to the set of tools available to network topology and protocol researchers.