Jörg-Christian Niemann

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jörg-Christian Niemann is active.

Explore More

Publication

Featured researches published by Jörg-Christian Niemann.

digital systems design | 2007

GigaNoC - A Hierarchical Network-on-Chip for Scalable Chip-Multiprocessors

Christoph Puttmann; Jörg-Christian Niemann; Mario Porrmann; Ulrich Rückert

Due to the technological progress in the semiconductor industry, more and more components can be integrated on a single die forming a complex System-on-Chip. For enabling an efficient interaction between the various building blocks of todays SoCs, efficient communication structures become more and more essential. In this paper, we present the GigaNoC, a hierarchical Network-on-Chip that is especially suitable for scalable Chip-Multiprocessor architectures. The GigaNoC approach features a packet-switched wormhole routing on-chip network that provides the backbone of our multiprocessor architecture. In order to meet bandwidth requirements of different application domains, our Network-on-Chip is easily scalable and parameterizable in various aspects. This work highlights the communication protocol and shows a performance evaluation for different congestion scenarios. Furthermore, we present an FPGA-based prototypical realization and introduce a debugging and verification environment. Finally, implementation results for a standard cell technology are discussed.

ieee computer society annual symposium on vlsi | 2005

A scalable parallel SoC architecture for network processors

Jörg-Christian Niemann; Mario Porrmann; Ulrich Rückert

Information processing and networking of technical devices find their way into our daily life. In order to process the continuously growing quantity of data, powerful communication nodes for network processing are needed. We present an architecture for network processors that is based on a uniform, massively parallel structure. Thus, our approach takes advantage of reusing predefined IP building blocks. This leads to a short time to market, a high reliability and a scalable architecture. Our architecture is scalable to different areas of application by varying the number of integrated processors. Additionally, specific hardware accelerators can be embedded, which are optimized for the target application, in order to be especially resource-efficient in respect to power consumption, computational power and required area.

parallel computing in electrical engineering | 2004

Network application driven instruction set extensions for embedded processing clusters

Matthias Grünewald; Dinh Khoi Le; Uwe Kastens; Jörg-Christian Niemann; Mario Porrmann; Ulrich Rückert; Adrian Slowik; Michael Thies

This paper addresses the design automation of instruction set extensions for application-specific processors with emphasis on network processing. Within this domain, increasing performance demands and the ongoing development of network protocols both call for flexible and performance-optimized processors. Our approach represents a holistic methodology for the extension and optimization of a processors instruction set. The starting point is a concise yet powerful processor abstraction, which is well suited to automatically generate the important parts of a compiler backend and cycle-accurate simulator so that domain-characteristic benchmarks can be analyzed for frequently occurring instruction pairs. These instruction pairs are promising candidates for the extension of the instruction set by means of super-instructions. Provided that a new super-instruction meets a given performance threshold, a fine-grained performance reevaluation of the adapted processor design can be conducted instantly. With respect to the chosen domain-characteristic benchmark, the tool-chain pinpoints important characteristics such as execution performance, energy consumption, or chip area of the extended design. Using this holistic design methodology, we are able to judge a refinement of the processor rapidly.

Journal of Systems Architecture | 2007

Resource efficiency of the GigaNetIC chip multiprocessor architecture

Jörg-Christian Niemann; Christoph Puttmann; Mario Porrmann; Ulrich Rückert

In this article, we present the prototypical implementation of the scalable GigaNetIC chip multiprocessor architecture. We use an FPGA-based rapid prototyping system to verify the functionality of our architecture in a network application scenario before fabricating the ASIC in a modern CMOS standard cell technology. The rapid prototyping environment gives us the opportunity to test our multiprocessor architecture with Ethernet-based data streams in a real network scenario. Our system concept is based on a massively parallel processor structure. Due to its regularity, our architecture can be easily scaled to accommodate a wide range of packet processing applications with various performance and throughput requirements at high reliability. Furthermore, the composition based on predefined building blocks guarantees fast design cycles and simplifies system verification. We present standard cell synthesis results as well as a performance analysis for a firewall application with various couplings of hardware accelerators. Finally, we compare implementations of our architecture with state-of-the-art desktop CPUs. We use simple, general-purpose applications as well as the introduced packet processing tasks to determine the performance capabilities and the resource efficiency of the GigaNetIC architecture. We show that, if supported by the application, parallelism offers more opportunities than increasing clock frequencies.

ieee international workshop on system on chip for real time applications | 2003

A performance evaluation method for optimizing embedded applications

Matthias Grünewald; Jörg-Christian Niemann; Ulrich Rückert

Performance evaluation is an important step for designing embedded applications that require small footprints, low energy consumption and high throughput. We present a simulation-based method to characterize several resource properties (memory accesses, energy consumption, execution time) of embedded software that runs on dedicated processing engines targeted for SoC designs. The results of the characterization process are back-annotated to the source code to aid the designer in optimizing the implementation. Our approach allows the replacement of software parts by hardware units to speed up processing. We have performed case studies with software and hardware implementations of a pseudo-random number generator and a transmission error detector. The results show that computation speed-ups and energy reductions up to a factor of 15 can be obtained with implementations that exploit hardware extensions.

Network Processor Design#R##N#Issues and Practices Volume 3 | 2005

A framework for design space exploration of resource efficient network processing on multiprocessor SoCs

Matthias Grünewald; Jörg-Christian Niemann; Mario Porrmann; Ulrich Rückert

A framework for implementing network protocols on multiprocessor SoCs has been described. This chapter also shows how to model network protocol and hardware architecture, how to schedule the execution to guarantee a user-defined throughput per port, how to assign protocol functions to processors, and how to estimate resource consumption of the final mapping with a component-based estimation model. The methods can be applied in an early design phase and can be used to explore the design space created by different system parameters. A detailed exploration example is presented to show how a medium access protocol for mobile ad hoc networks can be mapped to a multiprocessor SoC with up to 64 processors. The analysis shows that such a system can achieve reasonable data rates for examined application at low power consumption. This methodology can also be used to quickly identify the bottlenecks of the system such as protocol functions whose computational complexity prevents a higher throughput. The framework cannot yet handle external memories. A possible way to include them is to add memory interfaces to border PEs and to form groups of border PEs that share one memory channel. The external memory can be used to store large data structures such as packet queues or filter tables. By knowing the access pattern of the packet methods and the contention resolution mechanism on shared memory channel, worst-case memory access times can be determined analytically. Hence, the influence of the external memory accesses on the execution time of the packet methods can be modeled and included in the scheduling scheme.

design, automation, and test in europe | 2004

A mapping strategy for resource-efficient network processing on multiprocessor SoCs

Matthias Grünewald; Jörg-Christian Niemann; Mario Porrmann; Ulrich Rückert

Hardware architectures based on a field of hardware-extended processors can provide flexible computing power for applications where parallelism can be exploited. For multiprocessors, the assignment of functionality to execution units can have a great impact on the performance. Additionally, finding the optimal mapping can be a time-consuming task. We present a multiprocessor architecture along with a suitable design method that includes an automated solution to the mapping problem. Our hardware architecture employs a network-on-chip (NoC) to achieve a high degree of scalability for the application and for the system in respect to future integration technologies. We also show how to reduce the packet buffer requirements with a proper scheduling strategy and present first estimates for the resource consumption of an application targeted for mobile networking.

automation, robotics and control systems | 2006

GigaNetIC – a scalable embedded on-chip multiprocessor architecture for network applications

Jörg-Christian Niemann; Christoph Puttmann; Mario Porrmann; Ulrich Rückert

In this paper, we present the prototypical implementation of the scalable GigaNetIC chip multiprocessor architecture. We use an FPGA-based rapid prototyping system to verify the functionality of our architecture in a network application scenario before we are going to fabricate the ASIC in a modem CMOS standard cell technology. The rapid prototyping environment gives us the opportunity to test our multiprocessor architecture with Ethernet-based data streams in a real network scenario. Our system concept is based on a massively parallel processor structure. Due to its regularity, our architecture can be easily scaled to accommodate a wide range of packet processing applications with disparate performance and throughput requirements at high reliability. Furthermore, the composition from predefined building blocks guarantees fast design cycles and simplifies system verification. We present standard cell synthesis results as well as a performance analysis for a firewall application with various couplings of hardware accelerators.

international embedded systems symposium | 2005

ADAPTABLE SWITCH BOXES AS ON-CHIP ROUTING NODES FOR NETWORKS-ON-CHIP

Ralf Eickhoff; Jörg-Christian Niemann; Mario Porrmann; Ulrich Rückert

Due to continuous advancements in modern technology processes which have resulted in integrated circuits with smaller feature sizes and higher complexity, current system-on-chip designs consist of many different components such as memories, interfaces and microprocessors. To handle this growing number of components, an efficient communication structure must be provided and incorporated during system design. This work deals with the implementation of an efficient communication structure for an on-chip multiprocessor design. The internal structure of one node is proposed and specified by its requirements. Furthermore, different routing strategies are implemented. Moreover, the communication structure is mapped on a standard cell process to examine the achieved processing speed and to determine the area requirements.

automation, robotics and control systems | 2007

A multiprocessor cache for massively parallel soc architectures

Jörg-Christian Niemann; Christian Liß; Mario Porrmann; Ulrich Rückert

In this paper, we present an advanced multiprocessor cache architecture for chip multiprocessors (CMPs). It is designed for the scalable GigaNetIC CMP, which is based on massively parallel on-chip computing clusters. Our write-through multiprocessor cache is configurable in respect to the most relevant design options. It is supposed to be used in universal co-processors as well as in network processing units. For an early verification of the software and an early exploration of various hardware configurations, we have developed a SystemC-based simulation model for the complete chip multiprocessor. For detailed hardware-software co-verification, we use our FPGA-based rapid prototyping system RAPTOR2000 to emulate our architecture with near-ASIC performance. Finally, we demonstrate the performance gains for different application scenarios enabled by the usage of our multiprocessor cache.

Explore More