Todd E. Takken | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Todd E. Takken is active.

Explore More

Publication

Featured researches published by Todd E. Takken.

Ibm Journal of Research and Development | 2005

Overview of the Blue Gene/L system architecture

Alan Gara; Matthias A. Blumrich; Dong Chen; George Liang-Tai Chiu; Paul W. Coteus; Mark E. Giampapa; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Thomas A. Liebsch; Martin Ohmacht; Burkhard Steinmacher-Burow; Todd E. Takken; Pavlos M. Vranas

The Blue Gene®/L computer is a massively parallel supercomputer based on IBM system-on-a-chip technology. It is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 teraflops. This paper describes the project objectives and provides an overview of the system architecture that resulted. We discuss our application-based approach and rationale for a low-power, highly integrated design. The key architectural features of Blue Gene/L are introduced in this paper: the link chip component and five Blue Gene/L networks, the PowerPC® 440 core and floating-point enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.

Ibm Journal of Research and Development | 2005

Blue Gene/L torus interconnection network

Narasimha R. Adiga; Matthias A. Blumrich; Dong Chen; Paul W. Coteus; Alan Gara; Mark E. Giampapa; Philip Heidelberger; Sarabjeet Singh; Burkhard Steinmacher-Burow; Todd E. Takken; Mickey Tsao; Pavlos M. Vranas

The main interconnect of the massively parallel Blue Gene®/L is a three-dimensional torus network with dynamic virtual cut-through routing. This paper describes both the architecture and the microarchitecture of the torus and a network performance simulator. Both simulation results and hardware measurements are presented.

field programmable gate arrays | 2012

A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor simulation

Sameh W. Asaad; Ralph Bellofatto; Bernard Brezzo; Chuck Haymes; Mohit Kapur; Benjamin D. Parker; Thomas Roewer; Proshanta Saha; Todd E. Takken; Jose A. Tierno

Software based tools for simulation are not keeping up with the demands for increased chip and system design complexity. In this paper, we describe a cycle-accurate and cycle-reproducible large-scale FPGA platform that is designed from the ground up to accelerate logic verification of the Bluegene/Q compute node ASIC, a multi-processor SOC implemented in IBMs 45 nm SOI CMOS technology. This paper discusses the challenges for constructing such large-scale FPGA platforms, including design partitioning, clocking & synchronization, and debugging support, as well as our approach for addressing these challenges without sacrificing cycle accuracy and cycle reproducibility. The resulting fullchip simulation of the Bluegene/Q compute node ASIC runs at a simulated processor clock speed of 4 MHz, over 100,000 times faster than the logic level software simulation of the same design. The vast increase in simulation speed provides a new capability in the design cycle that proved to be instrumental in logic verification as well as early software development and performance validation for Bluegene/Q.

Ibm Journal of Research and Development | 2005

Packaging the Blue Gene/L supercomputer

Paul W. Coteus; H. R. Bickford; T. M. Cipolla; Paul G. Crumley; Alan Gara; Shawn A. Hall; Gerard V. Kopcsay; Alphonso P. Lanzetta; L. S. Mok; Rick A. Rand; R. Swetz; Todd E. Takken; P. La Rocca; C. Marroquin; P. R. Germann; M. J. Jeanson

As 1999 ended, IBM announced its intention to construct a one-petaflop supercomputer. The construction of this system was based on a cellular architecture--the use of relatively small but powerful building blocks used together in sufficient quantities to construct large systems. The first step on the road to a petaflop machine (one quadrillion floating-point operations in a second) is the Blue Gene®/L supercomputer. Blue Gene/L combines a low-power processor with a highly parallel architecture to achieve unparalleled computing performance per unit volume. Implementing the Blue Gene/L packaging involved trading off considerations of cost, power, cooling, signaling, electromagnetic radiation, mechanics, component selection, cabling, reliability, service strategy, risk, and schedule. This paper describes how 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage. The Blue Gene/L interconnect, power, cooling, and control systems are described individually and as part of the synergistic whole.

International Journal of Parallel Programming | 2007

The blue gene/L supercomputer: a hardware and software story

José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy

The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.

computing frontiers | 2005

Power and performance optimization at the system level

Valentina Salapura; Randy Bickford; Matthias A. Blumrich; Arthur A. Bright; Dong Chen; Paul W. Coteus; Alan Gara; Mark E. Giampapa; Michael Karl Gschwind; Manish Gupta; Shawn A. Hall; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Martin Ohmacht; Rick A. Rand; Todd E. Takken; Pavlos M. Vranas

The BlueGene/L supercomputer has been designed with a focus on power/performance efficiency to achieve high application performance under the thermal constraints of common data centers. To achieve this goal, emphasis was put on system solutions to engineer a power-efficient system. To exploit thread level parallelism, the BlueGene/L system can scale to 64 racks with a total of 65536 computer nodes consisting of a single compute ASIC integrating all system functions with two industry-standard PowerPC microprocessor cores in a chip multiprocessor configuration. Each PowerPC processor exploits data-level parallelism with a high-performance SIMD oating point unitTo support good application scaling on such a massive system, special emphasis was put on efficient communication primitives by including five highly optimized communification networks. After an initial introduction of the Blue-Gene/L system architecture, we analyze power/performance efficiency for the BlueGene system using performance and power characteristics for the overall system performance (as exemplified by peak performance numbers.To understand application scaling behavior, and its impact on performance and power/performance efficiency, we analyze the NAMD molecular dynamics package using the ApoA1 benchmark. We find that even for strong scaling problems, BlueGene/L systems can deliver superior performance scaling and deliver significant power/performance efficiency. Application benchmark power/performance scaling for the voltage-invariant energy delay 2 power/performance metric demonstrates that choosing a power-efficient 700MHz embedded PowerPC processor core and relying on application parallelism was the right decision to build a powerful, and power/performance efficient system

IEEE Transactions on Advanced Packaging | 2002

Characterization and performance evaluation of differential shielded cables for multi-Gb/s data-rates

Alina Deutsch; Gerard V. Kopcsay; Paul W. Coteus; Alphonso P. Lanzetta; Todd E. Takken; Paul W. Bond

This paper compares several differential cable characteristics that were evaluated for multi-Gb/s data-rates for both data and clock paths for 1-10 m lengths. Time-domain measurements are shown for the unassembled and connectorized cables and for representative card-plus-cable signal paths and the performance limiting factors are highlighted. Techniques are shown for developing coupled-line models for odd and even excitations for all the components in a full chip-to-chip path in order to make realistic data-rate predictions.

Ibm Journal of Research and Development | 2013

Packaging the IBM Blue Gene/Q supercomputer

Paul W. Coteus; Shawn A. Hall; Todd E. Takken; Rick A. Rand; Shurong Tian; Gerard V. Kopcsay; Randy Bickford; Francis P. Giordano; Christopher Marroquin; Mark J. Jeanson

The IBM Blue Gene®/Q supercomputer is designed for highly efficient computing for problems dominated by floating-point computation. Its target mean time between failures for a 96-rack, 98, 304-node system is three days, allowing tasks requiring computation for many days to run at scale, with little time wasted on checkpoint-restart operations. This paper describes various elements of the compute application-specific integrated circuit and the system package, and how they contribute to low power consumption and high reliability.

International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems (IWIA'06) | 2006

A Holistic Approach to System Reliability in Blue Gene

Matthias A. Blumrich; D. Chen; George Liang-Tai Chiu; T. Cipolla; Paul W. Coteus; P. Crumley; Alan Gara; M. E. Giampapa; Shawn A. Hall; R. A. Haring; Philip Heidelberger; D. Hoenicke; Gerard V. Kopcsay; Thomas A. Liebsch; Lawrence S. Mok; M. Ohmacht; Valentina Salapura; R. Swetz; Todd E. Takken; P. Vranas

Optimizing supercomputer performance requires a balance between objectives for processor performance, network performance, power delivery and cooling, cost and reliability. In particular, scaling a system to a large number of processors poses challenges for reliability, availability and serviceability. Given the power and thermal constraints of data centers, the BlueGene/L supercomputer has been designed with a focus on maximizing floating point operations per second per Watt (FLOPS/Watt). This results in a drastic reduction in FLOPS/m2 floor space and FLOPS/dollar, allowing for affordable scale-up. The BlueGene/L system has been scaled to a total of 65,536 compute nodes in 64 racks. A system approach was used to minimize power at all levels, from the processor to the cooling plant. A BlueGene/L compute node consists of a single ASIC and associated memory. The ASIC integrates all system functions including processors, the memory subsystem and communication, thereby minimizing chip count, interfaces, and power dissipation. As the number of components increases, even a low failure rate per-component leads to an unacceptable system failure rate. Additional mechanisms have to be deployed to achieve sufficient reliability at the system level. In particular, the data transfer volume in the communication networks of a massively parallel system poses significant challenges on bit error rates and recovery mechanisms in the communication links. Low power dissipation and high performance, along with reliability, availability and serviceability were prime considerations in BlueGene/L hardware architecture, system design, and packaging. A high-performance software stack, consisting of operating system services, compilers, libraries and middleware, completes the system, while enhancing reliability and data integrity

applied power electronics conference | 2017

A 12V-to-0.9V active-clamp forward converter power block with planar transformer, standing slab inductor and direct edge solder to motherboard

Xin Zhang; Andrew Ferencz; Todd E. Takken; Bai Nguyen; Paul W. Coteus

DC-to-DC power supplies for CPUs or GPUs are critical components on the motherboard of modern computer systems. Converting an intermediate bus voltage (e.g. 12 V) to the core voltage (∼0.9 V) of CPUs or GPUs must be efficient, compact and cost effective. This paper proposes an active-clamp forward converter (ACFC) power block to supply core voltage on a motherboard. The ACFC power block can be individually tested prior to assembly and vertically soldered onto the motherboard to save motherboard area. A low loss, compact planar transformer is designed into the ACFC power block PCB. A custom, standing slab inductor not only provides high inductance and high saturation current but also helps to mechanically support the power block. A one-piece copper winding connects the transformer to the inductor, thereby reducing the DC loss in the current path. Experimental results show a peak efficiency of 90.4% with a 12 V input and 0.9 V output for an output current of 25 A.

Explore More