Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wolfgang E. Denzel.
IEEE Journal on Selected Areas in Communications | 1989
Hamid Ahmadi; Wolfgang E. Denzel
A survey of high-performance switch fabric architectures which incorporate fast packet switching as their underlying switching technique to handle various traffic types is presented. A descriptive overview of the major activities in this rapidly evolving field of telecommunications is given. The switch fabrics are classified into the following categories: banyan and buffered banyan-based fabrics, sort-banyan-based fabrics fabrics with disjoint-path topology and output queuing, crossbar-based fabrics, time division fabrics with common packet memory, and fabrics with shared medium. >
high performance interconnects | 2010
L. Baba Arimilli; Ravi Kumar Arimilli; Vicente Enrique Chung; Scott Douglas Clark; Wolfgang E. Denzel; Ben C. Drerup; Torsten Hoefler; Jody B. Joyner; Jerry Don Lewis; Jian Li; Nan Ni; Ramakrishnan Rajamony
The PERCS system was designed by IBM in response to a DARPA challenge that called for a high-productivity high-performance computing system. A major innovation in the PERCS design is the network that is built using Hub chips that are integrated into the compute nodes. Each Hub chip is about 580 mm
IEEE Communications Magazine | 2001
Werner Bux; Wolfgang E. Denzel; Ton Engbersen; Andreas Herkersdorf; Ronald P. Luijten
^2
acm special interest group on data communication | 2003
Cyriel Minkenberg; Ronald P. Luijten; Francois Abel; Wolfgang E. Denzel; Mitchell Gusat
in size, % uses 45 nm IBM CMOS 12S0 SOI technology with 13 levels of metal, has over 3700 signal I/Os, and is packaged in a module that also contains LGA-attached optical electronic devices. The Hub module implements five types of high-bandwidth interconnects with multiple links that are fully-connected with a high-performance internal crossbar switch. These links provide over 9 Tbits/second of raw bandwidth and are used to construct a two-level direct-connect topology spanning up to tens of thousands of \PS{} chips with high bisection bandwidth and low latency. The Blue Waters System, which is being constructed at NCSA, is an exemplar large-scale PERCS installation. Blue Waters is expected to deliver sustained Pet scale performance over a wide range of applications. The Hub chip supports several high-performance computing protocols (e.g., MPI, RDMA, IP) and also provides a non-coherent system-wide global address space. Collective communication operations such as barriers, reductions, and multi-cast are supported directly in hardware. Multiple routing modes including deterministic as well as hardware-directed random routing are also supported. Finally, the Hub module is capable of operating in the presence of many types of hardware faults and gracefully degrades performance in the presence of lane failures.
Computer Networks and Isdn Systems | 1995
Wolfgang E. Denzel; Antonius Engbersen; Ilias Iliadis
We provide a review of the state of the art and the future of packet processing and switching. The industrys response to the need for wire-speed packet processing devices whose function can be rapidly adapted to continuously changing standards and customer requirements is the concept of special programmable network processors. We discuss the prerequisites of processing tens to hundreds of millions of packets per second and indicate ways to achieve scalability through parallel packet processing. Tomorrows switch fabrics, which will provide node-internal connectivity between the input and output ports of a router or switch, will have to sustain terabit-per-second throughput. After reviewing fundamental switching concepts, we discuss architectural and design issues that must be addressed to allow the evolution of packet switch fabrics to terabit-per-second throughput performance.
simulation tools and techniques for communications networks and system | 2008
Wolfgang E. Denzel; Jian Li; Peter Walker; Yuho Jin
Addressing the ever growing capacity demand for packet switches, current research focuses on scheduling algorithms or buffer bandwidth reductions. Although these topics remain relevant, our position is that the primary design focus for systems beyond 1 Tb/s must be shifted to aspects resulting from packaging disruptions. Based on trends such as increased link rates and improved CMOS technologies, we derive new design factors for such switch fabrics. For instance, we argue that the packet round-trip transmission time within the fabric has become a major design parameter. Furthermore, we observe that high-speed fabrics have become extremely dependent on serial I/O technology that is both high speed and high density. Finally, we conclude that in developing the architecture, packaging constraints must be put first and not as an afterthought, which also applies to solving the tremendous power consumption challenges.
IEEE Transactions on Communications | 1993
Ilias Iliadis; Wolfgang E. Denzel
Abstract This paper presents the architecture of a very high-speed VLSI packet switch and its performance. The switch, called PRIZMA, is suited for broadband telecommunications, based on ATM, the Asynchronous Transfer Mode. However, the concept is not restricted to ATM-oriented architectural environments. There may be applications within private networks, independent of whether they are ATM-based. There may also be other potential applications such as multiprocessor interconnection. The architecture of the PRIZMA switch follows the architecture of its lower-speed earlier version (H. Ahmadi et al., Int. J. Digital Analog Cabled Syst. 2 (4) (1989) 277–287) to a large degree: It is based on a single-chip switch element that exploits the performance advantage of output queuing and from which larger, self-routing single-stage or multistage switch fabrics can be constructed in a modular way. However, compared to the precursor, higher performance is achieved by output queues that now are configured as a dynamically shared memory. This shared memory can also be expanded by linking multiple switch elements. Owing to novel parallel structures inside the switch element, VLSI implementation is possible for transmission rates on the order of a gigabit per second per port. In the last section of this paper, performance results are presented for a switch in a single-stage configuration as well as for the case of a three-stage switch fabric.
Optical Engineering | 1998
Gian-Luca Bona; Wolfgang E. Denzel; Bert Jan Offrein; Roland Germann; H. W. M. Salemink; Folkert Horst
We present an end-to-end simulation framework that is capable of simulating High-Performance Computing (HPC) systems with hundreds of thousands of interconnected processors. The tool applies discrete event simulation and is driven by real-world application traces. We refer to it as MARS (MPI Application Replay network Simulator). It maintains reasonable simulation details of both the processors in general and specifically the interconnection network. Among other things, it features several network topologies, flexible routing schemes, arbitrary application task placement, point-to-point statistics collection, and data visualization. With a few case studies, we demonstrate the usefulness of this tool for assisting high-level system design as well as for performance projection and application tuning of future HPC systems.
high-performance computer architecture | 2011
Jian Li; Wei Huang; Charles R. Lefurgy; Lixin Zhang; Wolfgang E. Denzel; Richard R. Treumann; Kun Wang
A single-stage nonblocking N*N packet switch with both output and input queuing is considered. The limited queuing at the output ports resolves output port contention partially. Overflow at the output queues is prevented by a backpressure mechanism and additional queuing at the input ports. The impact of the backpressure effect on the switch performance for arbitrary output buffer sizes and for N to infinity is studied. Two different switch models are considered: an asynchronous model with Poisson arrivals and a synchronous model with Bernoulli arrivals. The investigation is based on the average delay and the maximum throughput of the switch. Closed-form expressions for these performance measures are derived for operation with fixed size packets. The results demonstrate that a modest amount of output queuing, in conjunction with appropriate switch speedup, provides significant delay and throughput improvements over pure input queuing. The maximum throughput is the same for the synchronous and the asynchronous switch model, although the delay is different. >
parallel computing | 2010
Javier Navaridas; José Miguel-Alonso; Francisco Javier Ridruejo; Wolfgang E. Denzel
The Corporate Optical Backbone Network (COBNET) project is a joint research project within the ACTS program of the European Commission. The COBNET consortium is considering the use of advanced optical networking technologies for the backbones of future corporate networks. In particular, multichannel add/drop ring networks based on wavelength division multiplexing (WDM) as well as on optical space-division multiplexing (SDM) technologies are being pursued. An overview is given of the system concept, the device technology, and the demonstration network that was developed within COBNET. The WDM ring option and specifically the related add/drop devices are focused on in more detail. These devices are fabricated in a newly developed highrefractive- index contrast planar silica-on-silicon technology by using silicon-oxynitride (SiON) as the core waveguide material. Compact add/ drop components based on the resonant coupler concept are realized. The filter characteristic can be tailored and tuned by thermo-optic heaters, which enables the selection of any given wavelength out of a series of wavelengths from the WDM ring using the same device.