Gary L. McAlpine | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gary L. McAlpine is active.

Explore More

Publication

Featured researches published by Gary L. McAlpine.

international symposium on microarchitecture | 2004

ETA: experience with an Intel Xeon processor as a packet processing engine

Greg J. Regnier; Dave B. Minturn; Gary L. McAlpine; Vikram A. Saletore; Annie P. Foong

Server-based networks have well-documented performance limitations. These limitations outline a major goal of Intels embedded transport acceleration (ETA) project, the ability to deliver high-performance server communication and I/O over standard Ethernet and transmission control protocol/Internet protocol (TCP/IP) networks. By developing this capability, Intel hopes to take advantage of the large knowledge base and ubiquity of these standard technologies. With the advent of 10 gigabit Ethernet, these standards promise to provide the bandwidth required of the most demanding server applications. We use the term packet processing engine (PPE) as a generic term for the computing and memory resources necessary for communication-centric processing. Such PPEs have certain desirable attributes; the ETA project focuses on developing PPEs with such attributes, which include scalability, extensibility, and programmability. General-purpose processors, such as the Intel Xeon in our prototype, are extensible and programmable by definition. Our results show that software partitioning can significantly increase the overall communication performance of a standard multiprocessor server. Specifically, partitioning the packet processing onto a dedicated set of compute resources allows for optimizations that are otherwise impossible when time sharing the same compute resources with the operating system and applications.

international parallel and distributed processing symposium | 2005

An architecture for congestion management in Ethernet clusters

Gary L. McAlpine; Manoj Wadekar; Tanmay Gupta; Alan Crouch; Donald Newell

Interconnects for clusters and bladed systems must deliver efficient throughput, low latency, low delay variations and minimal frame drops. The primary technical issues hindering Ethernet adoption for cluster and blade system interconnects are the current methods Ethernet switches use for dealing with congestion, which can happen frequently under cluster and blade system workloads. The common response to congestion is to drop frames and the common method of avoiding the need to drop frames is to utilize very large switch buffers. In this paper, we propose a three-level approach to dealing with congestion that provides efficient throughput, low latency, low delay variations, and can eliminate frame drops, even with very modest sized switch buffers. The approach employs three levels of congestion management: 1) improved link level transient congestion control; 2) oversubscription control at layer 2 subnet ingresses, and 3) end-to-end oversubscription control by the higher layer protocols. We present compelling simulation results showing the incremental benefits provided by each level.

international parallel and distributed processing symposium | 2005

An architecture for software-based iSCSI on multiprocessor servers

Annie P. Foong; Gary L. McAlpine; Dave B. Minturn; Greg J. Regnier; Vikram A. Saletore

To achieve IP-converged cluster deployments, the performance and scalability of iSCSI must approach that of FC SANs. We recognize and quantify that the major overhead of iSCSI comes from TCP/IP processing. Industry has largely responded with TCP offload engines (TOEs) and iSCSI storage adapters. As an alternative, this paper shows a software implementation of iSCSI on generic OSes and processors. The trend towards chip multiprocessing (CMP) and integrated memory controllers (MCH) largely motivated our direction. With CMP, increased processing power is delivered through multiple cores per processor; on-die MCH allows memory bandwidth to scale better with processor speeds. Our approach and analysis shows the effectiveness of partitioning the workload suitable for a CMP system, allowing iSCSI to scale with the increasing processing power and memory bandwidth of servers over time.

international conference on networking | 2005

An architecture for software-based iSCSI: experiences and analyses

Annie P. Foong; Gary L. McAlpine; Dave B. Minturn; Greg J. Regnier; Vikram A. Saletore

Supporting multi-gigabit/s of iSCSI over TCP can quickly saturate the processing abilities of a SMP server today. Legacy OS designs and APIs are not designed for the multi-gigabit IO speeds. Most of industrys efforts had been focused on offloading the extra processing and memory load to the network adapter (NIC). As an alternative, this paper shows a software implementation of iSCSI on generic OSes and processors. We discuss an asymmetric multiprocessing (AMP) architecture, where one of the processors is dedicated to serve as a TCP engine. The original purpose of our prototype was to leverage the flexibility and tools available in generic systems for extensive analyses of iSCSI. As work proceeded, we quickly realized the viability of generic processors to meet iSCSI requirements. Looking ahead to chip-multiprocessing, where multiple cores reside on each processor, understanding partitioning of work and scaling to cores will be important in future server platforms.

high performance interconnects | 2003

ETA: experience with an Intel/spl reg/ Xeon/spl trade/ processor as a packet processing engine

Greg J. Regnier; David B. Minturn; Gary L. McAlpine; Vikram A. Saletore; Annie P. Foong

The ETA (embedded transport acceleration) project at Intel Research and Development has developed a software prototype that uses one of the Intel/spl reg/ Xeon/spl trade/ processors in a multi-processor server as a packet processing engine. The prototype is used as a vehicle for empirical measurement and analysis of a highly programmable packet processing engine that is closely tied to the servers core CPU and memory complex. The usage model for the prototype is the acceleration of server TCP/IP networking. The ETA prototype runs in an asymmetric multiprocessing mode, in that the packet processing engine does not run as a general computing resource for the host operating system. We show an effective method of interfacing the packet processing engine to the host processors using efficient asynchronous queuing mechanisms. This paper describes the ETA software architecture, the ETA prototype, and details the measurement and analysis that has been performed to date. Test results include running the packet processing engine in single-threaded mode, as well as in multi-threaded mode using Intels hyper-threading technology (HT). Performance data gathered for network throughput and host CPU utilization show a significant improvement when compared to the standard TCP/IP networking stack.

Archive | 1996