Dennis Abts | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dennis Abts is active.

Explore More

Publication

Featured researches published by Dennis Abts.

international symposium on computer architecture | 2007

Flattened butterfly: a cost-efficient topology for high-radix networks

John Kim; William J. Dally; Dennis Abts

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network.The advantage over the Clos is achieved by eliminating redundant hopswhen they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly.In this case, global adaptive routing is used to switchthe flattened butterfly from minimal to non-minimal routing - usingredundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly.We show that load-balancing adversarial traffic requires non-minimalglobally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms.We also compare the cost of the flattened butterfly to folded-Clos, hypercube,and butterfly networks with identical capacityand show that the flattened butterfly is more cost-efficient thanfolded-Clos and hypercube topologies.

international symposium on computer architecture | 2010

Energy proportional datacenter networks

Dennis Abts; Michael R. Marty; Philip M. Wells; Peter Michael Klausler; Hong Liu

Numerous studies have shown that datacenter computers rarely operate at full utilization, leading to a number of proposals for creating servers that are energy proportional with respect to the computation that they are performing. In this paper, we show that as servers themselves become more energy proportional, the datacenter network can become a significant fraction (up to 50%) of cluster power. In this paper we propose several ways to design a high-performance datacenter network whose power consumption is more proportional to the amount of traffic it is moving -- that is, we propose energy proportional datacenter networks. We first show that a flattened butterfly topology itself is inherently more power efficient than the other commonly proposed topology for high-performance datacenter networks. We then exploit the characteristics of modern plesiochronous links to adjust their power and performance envelopes dynamically. Using a network simulator, driven by both synthetic workloads and production datacenter traces, we characterize and understand design tradeoffs, and demonstrate an 85% reduction in power --- which approaches the ideal energy-proportionality of the network. Our results also demonstrate two challenges for the designers of future network switches: 1) We show that there is a significant power advantage to having independent control of each unidirectional channel comprising a network link, since many traffic patterns show very asymmetric use, and 2) system designers should work to optimize the high-speed channel designs to be more energy efficient by choosing optimal data rate and equalization technology. Given these assumptions, we demonstrate that energy proportional datacenter communication is indeed possible.

international symposium on computer architecture | 2009

Achieving predictable performance through better memory controller placement in many-core CMPs

Dennis Abts; Natalie D. Enright Jerger; John Kim; Dan Gibson; Mikko H. Lipasti

In the near term, Moores law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many cores, and few memory controllers, where to locate the memory controllers in the on-chip interconnection fabric becomes an important and as yet unexplored question. In this paper we show how the location of the memory controllers can reduce contention (hot spots) in the on-chip fabric and lower the variance in reference latency. This in turn provides predictable performance for memory-intensive applications regardless of the processing core on which a thread is scheduled. We explore the design space of on-chip fabrics to find optimal memory controller placement relative to different topologies (i.e. mesh and torus), routing algorithms, and workloads.

conference on high performance computing (supercomputing) | 2006

Adaptive routing in high-radix clos network

John Kim; William J. Dally; J. Dally; Dennis Abts

Recent increase in the pin bandwidth of integrated-circuits has motivated an increase in the degree or radix of interconnection network routers. The folded-Clos network can take advantage of these high-radix routers and this paper investigates adaptive routing in such networks. We show that adaptive routing, if done properly, outperforms oblivious routing by providing lower latency, lower latency variance, and higher throughput with limited buffering. Adaptive routing is particularly useful in load balancing around nonuniformities caused by deterministically routed traffic or the presence of faults in the network. We evaluate alternative allocation algorithms used in adaptive routing and compare their performance. The use of randomization in the allocation algorithms can simplify the implementation while sacrificing minimal performance. The cost of adaptive routing, in terms of router latency and area, is increased in high-radix routers. We show that the use of imprecise queue information reduces the implementation complexity and precomputation of the allocations minimizes the impact of adaptive routing on router latency

Communications of The ACM | 2012

A guided tour of data-center networking

Dennis Abts; Bob Felderman

A good user experience depends on predictable performance within the data-center network.

conference on high performance computing (supercomputing) | 2007

The Cray BlackWidow: a highly scalable vector multiprocessor

Dennis Abts; Abdulla Bataineh; Steve Scott; Greg Faanes; Jim Schwarzmeier; Eric P. Lundberg; Timothy J. Johnson; Mike Bye; Gerald A. Schwoerer

This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.

international symposium on microarchitecture | 2009

Cost-Efficient Dragonfly Topology for Large-Scale Systems

John Kim; William J. Dally; Steve Scott; Dennis Abts

It is more efficient to use increasing pin bandwidth by creating high-radix routers with a large number of narrow ports instead of low-radix routers with fewer wide ports. building networks using high-radix routers lowers cost and improves performance, but also presents many challenges. the dragonfly topology minimizes network cost by reducing the number of global channels required.

international conference on parallel architectures and compilation techniques | 2010

Approximating age-based arbitration in on-chip networks

Michael Mihn-Jong Lee; John Kim; Dennis Abts; Michael R. Marty; Jae W. Lee

The on-chip network of emerging many-core CMPs enables the sharing of numerous on-chip components. This on-chip network needs to ensure fairness when accessing the shared resources. In this work, we propose providing equality of service (EoS) in future many-core CMPs on-chip networks by leveraging distance, or hop count, to approximate the age of packets in the network. We propose probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics - fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a complexity-effective mechanism for achieving EoS.

international symposium on microarchitecture | 2010

Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

Michael Mihn-Jong Lee; John Kim; Dennis Abts; Michael R. Marty; Jae W. Lee

Emerging many-core chip multiprocessors will integrate dozens of small processing cores with an on-chip interconnect consisting of point-to-point links. The interconnect enables the processing cores to not only communicate, but to share common resources such as main memory resources and I/O controllers. In this work, we propose an arbitration scheme to enable equality of service (EoS) in access to a chip’s shared resources. That is, we seek to remove any bias in a core’s access to a shared resource based on its location in the CMP. We propose using probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics – fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a simple, low-cost mechanism for achieving EoS. We evaluate our arbitration scheme across a wide range of traffic patterns. In addition to providing EoS, the proposed arbitration has additional benefits which include providing quality-of-service features (such as differentiated service) and providing fairness in terms of both throughput and latency that approaches the global fairness achieved with age-base arbitration – thus, providing a more stable network by achieving high sustained throughput beyond saturation.

design automation conference | 1999