Tim Niggemeier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Niggemeier is active.

Explore More

Publication

Featured researches published by Tim Niggemeier.

signal processing systems | 2008

A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Mladen Berekovic; Tim Niggemeier

A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads [simultaneously multi-threaded (SMT)]. Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and the scheduling window and leads to a distributed architecture model, where independent thread processing units, arithmetic logic units, registers files and memories are distributed across the chip and communicate with each other by special network. A special communication protocol replaces broadcasting and associative compare of destination tags in a centralised instruction scheduler with explicit operand transfer instructions, thus decentralizing the control of the data flow to the greatest extent. As a result, the processor cycle time does neither depend on the issue bandwidth of a single thread nor on the execution bandwidth of the SMT processor. This makes the performance of the architecture scalable with both the number of function and the number of thread units without having any impact on the processors cycle-time. Performance and scalability of the proposed microarchitecture is demonstrated with critical signal processing kernels from the MPEG-4 video coding standard on a cycle-true simulator.

international conference on embedded computer systems architectures modeling and simulation | 2006

A scalable, multi-thread, multi-issue array processor architecture for DSP applications based on extended tomasulo scheme

Mladen Berekovic; Tim Niggemeier

A scalable, distributed micro-architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with out-of-order execution, that supports specialized, complex DSP function units, and simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks, forming a network-on-a-chip (NOC) [1]. The communication protocol is a modified version of Tomasulos scheme [2], that was extended to eliminate all central control structures for the data flow and to support multithreading. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.

electronics system integration technology conference | 2014

Thermal power plane enabling dual-side electrical interconnects for high-performance chip stacks: Concept

Thomas Brunschwiler; Ralph Heller; Gerd Schlottig; Timo Tick; Hubert Harrer; Harry Barowski; Tim Niggemeier; Jochen Supper; Stefano Oggioni

In this paper, a novel concept of dual-side electrical interconnects (EIC) to a chip stack is discussed. In this concept, a second laminate, called Thermal Power Plane (TPP), is attached through solder rails to the top chip of the stack. The TPP provides efficient heat removal and current feed in the out-of-plane and the in-plane direction, respectively. Accordingly, the number of electrical interconnects to the chip stack can be doubled, enabling higher off-stack communication. An interconnect count analysis was performed for a two-die stack with cores in the top and cache in the bottom chip. The power to the top and the bottom chip is provided from the TPP and the bottom laminate, respectively. In this case, all power through-silicon vias (TSVs) can be eliminated, which would otherwise cover 3.3% of the bottom chip area. In addition, the design of the TSVs can be optimized for signaling only. The use of two laminates also enables individual test & burn-in of the dies prior to stack formation, potentially improving the yield by joining only known good dies. The feasibility of the concept is supported by thermal and electrical finite-element analysis. An 8-layer coreless laminate with stacked build-up vias, extending between both sides of the substrate, was considered as implementation of the TPP. The thermal performance of the dual-side EIC topology outperforms that of the classical single-side EIC approach by 5 Kmm2/W, when considering bar-shaped copper planes in the TPP and elongated top interconnects, called rails. A voltage uniformity to the top chip of better than 2% of the supply voltage can be provided for all TPP designs.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2004

A Cost-Efficient RISC Processor Platform for Real Time Audio Applications

Jens Wittenburg; Ulrich Schreiber; Ulrich Gries; Markus Schneider; Tim Niggemeier

A platform architecture for real time audio applications based on the open source LEON RISC processor and an audio development board based on an FPGA implementation of this platform are presented. Emphasis is on audio-specific extensions of the LEON architecture. In particular, a Floating Point Unit (FPU) for the LEON CPU and a multi-standard audio interface block were implemented. Innovative aspects, as a scoreboard based superscalar scheduler for the FPU and a new flexible approach to the interface block, unifying major parts of the required logic for all relevant interface standards are described. The extended LEON architecture is running on an Altera Stratix EP1S30-5 at more than 50 MHz. This already allows to run an mp3 decoder at up to 128 kbit/s in real-time. Porting of additional decoders as mp3PRO and AAC has been started.

Archive | 2004