Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter M. Kogge is active.

Publication


Featured researches published by Peter M. Kogge.


international conference on parallel processing | 1994

EXECUBE-A New Architecture for Scaleable MPPs

Peter M. Kogge

The EXECUBE chip is a new single part type building block for MPP systems that scales seamlessly from a few chips (with a few hundred mips) to thousands of chips with petaop potential. Further, the chip architecture supports directly both SIMD and MIMD modes of processing, permitting not only the best of both current parallel computing modes but also new modes not possible with more conventional designs. This paper discusses the overall architecture of the EXECUBE chip, the new computational model it represents, some comparisons against the current state of the art, how it might be used for real applications, and some extrapolations into future developments.


International Journal of Circuit Theory and Applications | 2001

Problems in designing with QCAs: Layout = Timing

Michael Niemier; Peter M. Kogge

SUMMARY The quantum cellular automata (QCA) is currently being investigated as an alternative to CMOS VLSI. While some simple logical circuits and devices have been studied, little if any work has been done in considering the architecture for systems of QCA devices. This work discusses the progress of one of the rst such eorts. Namely, the design of dataow components for a simple microprocessor being designed exclusively in QCA are discussed. Problems associated with initial designs and enumerated solutions to these problems (usually stemming from oorplanning techniques) are explained. Finally, areas of future research direction for circuit design in QCA are presented. Copyright ? 2001 John Wiley & Sons, Ltd.


Ibm Journal of Research and Development | 1974

Parallel solution of recurrence problems

Peter M. Kogge

An mth-order recurrence problem is defined as the computation of the sequence x1,..., xN, where x1 = f(ai, xi-1,...,xi-m), and ai, is some vector of parameters. This paper investigates general algorithms for solving such problems on highly parallel computers. We show that if the recurrence function f has associated with it two other functions that satisfy certain composition properties, then we can construct elegant and efficient parallel algorithms that can compute all N elements of the series in time proportional to ⌈log2N⌉. The class of problems having this property includes linear recurrences of all orders- both homogeneous and inhomogeneous, recurrences involving matrix or binary quantities, and various nonlinear problemsin volving operations such as computation with matrix inverses, exponentiation, and modulo division.


international symposium on computer architecture | 2001

Exploring and exploiting wire-level pipelining in emerging technologies

Michael Niemier; Peter M. Kogge

Pipelining is a technique that has long since been considered fundamental by computer architects. However, the world of nanoelectronics is pushing the idea of pipelining to new and lower levels — particularly the device level. How this affects circuits and the relationship between their timing, architecture, and design will be studied in the context of an inherently self-latching nanotechnology termed Quantum Cellular Automata (QCA). Results indicate that this nanotechnology offers the potential for “free” multi-threading and “processing-in-wire”. All of this could be accomplished in a technology that could be almost three orders of magnitude denser than an equivalent design fabricated in a process at the end of the CMOS curve.


design automation conference | 2000

A design of and design tools for a novel quantum dot based microprocessor

Michael Niemier; Michael J. Kontz; Peter M. Kogge

Despite the seemingly endless upw ards spiral of modern VLSI technology, many experts are predicting a hard w all for CMOS in about a decade. Given this, researc hers con tin ue to look at alternative technologies, one of which is based on quan tumdots, called quan tumcellular automata (QCA). While the first such devices have been fabricated, little is kno wn about how to design complete systems of them. This paper summarizes one of the first such studies, namely an attempt to design a complete, albeit simple, CPU in the technology. T o design a theoretical QCA microprocessor, two things must be accomplished. First a device model of the processor must be constructed (i.e. the schematic itself). Second, methods for sim ulatingand testing QCA designs m ust be developed. This paper summarizes the beginnings of a simple QCA microprocessor (namely, its dataflow) and a QCA design and simulation tool.


conference on advanced research in vlsi | 1995

Combined DRAM and logic chip for massively parallel systems

Peter M. Kogge; Toshio Sunaga; Hisatada Miyataka; Koji Kitamura; Eric E. Retter

A new 5 V 0.8 /spl mu/m CMOS technology merges 100 K custom circuits and 4.5 Mb DRAM onto a single die that supports both high density memory and significant computing logic. One of the first chips built with this technology implements a unique Processor-In-Memory (PIM) computer architecture termed EXECUBE and has 8 separate 25 MHz CPU macros and 16 separate 32 K/spl times/9 b DRAM macros on a single die. These macros are organized together to provide a single part type for scaleable massively parallel processing applications, particularly embedded ones where minimal glue logic is desired. Each chip delivers 50 Mips of performance at 2.7 W. This paper overviews the basic chip technology and organization some projections on the future of EXECUBE-like PIM chips, and finally some lessons to be learned as to why this technology should radically affect the way we ought think about computer architecture.


languages, compilers, and tools for embedded systems | 2005

Generation of permutations for SIMD processors

Alexei Kudriavtsev; Peter M. Kogge

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph.We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35% speedup.


symposium on frontiers of massively parallel computation | 1996

Pursuing a petaflop: point designs for 100 TF computers using PIM technologies

Peter M. Kogge; Steven C. Bass; Jay B. Brockman; Danny Z. Chen; Edwin Hsing-Mean Sha

This paper is a summary of a proposal submitted to the NSF 100 Tera Flops Point Design Study. Its main thesis is that the use of Processing-In-Memory (PIM) technology can provide an extremely dense and highly efficient base on which such computing systems can be constructed the paper describes a strawman organization of one potential PIM chip, along with how multiple such chips might be organized into a real system, what the software supporting such a system might look like, and several applications which we will be attempting to place onto such a system.


IEEE Transactions on Very Large Scale Integration Systems | 2003

Energy-efficient issue queue design

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose; Peter M. Kogge

The out-of-order issue queue (IQ), used in modern superscalar processors is a considerable source of energy dissipation. We consider design alternatives that result in significant reductions in the power dissipation of the IQ (by as much as 75%) through the use of comparators that dissipate energy mainly on a tag match, 0-B encoding of operands to imply the presence of bytes with all zeros and, bitline segmentation. Our results are validated by the execution of SPEC 95 benchmarks on a true hardware level, cycle-by-cycle simulator for a superscalar processor and SPICE measurements for actual layouts of the IQ in a 0.18-/spl mu/m CMOS process.


international conference on supercomputing | 1999

Microservers: a new memory semantics for massively parallel computing

Jay B. Brockman; Peter M. Kogge; Thomas L. Sterling; Vincent W. Freeh; Shannon K. Kuntz

The semantics of memory-a large state which can only be read or changed a small piece at a time-has remained virtually untouched since von Neumann, and its effects-latency and bandwidth-have proved to be the major stumbling block for high performance computing. This paper suggests a new model, termed “microservers,” that exploits “Processing-In- Memory” VLSI technology, and that can reduce latency and memory traffic, increase inherent opportunities for concurrency, and support a variety of highly concurrent programming paradigms. Application of this model is then discussed in the framework of several on-going supercomputing programs, particularly the HTMT petaflops project.

Collaboration


Dive into the Peter M. Kogge's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge