Ulrich Rueckert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ulrich Rueckert is active.

Explore More

Publication

Featured researches published by Ulrich Rueckert.

international solid-state circuits conference | 2012

A 200mV 32b subthreshold processor with adaptive supply voltage control

Sven Luetkemeier; Thorsten Jungeblut; Mario Porrmann; Ulrich Rueckert

In recent years, subthreshold operation has become a research focus for digital systems with limited energy budget (e.g. mobile, battery-powered devices, radio frequency identification (RFID), wireless sensor networks, or biomedical applications). Subthreshold operation allows for such low power consumption by reducing the supply voltage of the circuit below the threshold voltage of the transistors. As dynamic power depends quadratically on supply voltage, and static power depends exponentially on supply voltage, considerable power savings are achieved. At the same time propagation delays increase due to reduced transistor currents. Effectively, energy consumption per cycle can typically be reduced by a factor of 10 using subthreshold operation.

ACM Transactions on Reconfigurable Technology and Systems | 2011

Applying dynamic reconfiguration in the mobile robotics domain: A case study on computer vision algorithms

Federico Nava; Donatella Sciuto; Marco D. Santambrogio; Stefan Herbrechtsmeier; Mario Porrmann; Ulf Witkowski; Ulrich Rueckert

Mobile robots are widely used in industrial environments and are expected to be widely available in human environments in the near future, for example, in the area of care and service robots. This article proposes an implementation for a highly customizable color recognition module based on Field Programmable Gate Array (FPGA) hardware to accomplish tasks like real-time frame processing for image streams. In comparison to a pure software solution on a CPU, an attached FPGA-based hardware accelerator enables real-time image processing and significantly reduces the required computing power of the CPU. Instead, the CPU can be used for tasks that cannot be efficiently implemented on FPGAs, for example, because of a large control overhead. We concentrate on a multirobot scenario where a group of robots follows a human team member by keeping a specific formation in order to support the human in exploration and object detection. Additionally, the robots provide a communication infrastructure to maintain a stable multihop communication network between the human and a base station recording all actions and evaluating the captured images and transmitted data. Depending on the current operating conditions, the robot system has to be able to execute a wide variety of different tasks. Since only a small number of tasks have to be executed concurrently, dynamic reconfiguration of the FPGA can be used to avoid the parallel implementation of all tasks on the FPGA. Within this context, this article discusses application fields where dynamic reconfiguration of FPGA-based coprocessors significantly reduces the CPU load and presents examples of how dynamic reconfiguration can be used in exploration.

adaptive hardware and systems | 2012

A scalable platform for run-time reconfigurable satellite payload processing

Jens Hagemeyer; Arne Hilgenstein; Dirk Jungewelter; Dario Cozzi; Carmelo Felicetti; Ulrich Rueckert; Sebastian Korf; Markus Koester; Fabio Margaglia; Mario Porrmann; Florian Dittmann; Michael Ditze; Julian Harris; Luca Sterpone; Jorgen Ilstad

Reconfigurable hardware is gaining a steadily growing interest in the domain of space applications. The ability to reconfigure the information processing infrastructure at runtime together with the high computational power of todays FPGA architectures at relatively low power makes these devices interesting candidates for data processing in space applications. Partial dynamic reconfiguration of FPGAs enables maximum flexibility and can be utilized for performance increase, for improving energy efficiency, and for enhanced fault tolerance. To be able to prove the effectiveness of these novel approaches for satellite payload processing, a highly scalable prototyping environment has been developed, combining dynamically reconfigurable FPGAs with the required interfaces such as SpaceWire, MIL-STD-1553B, and SpaceFibre. Up to 30 SpaceWire interfaces, 5 copper-based SpaceFibre interfaces, and 270 GPIOs can be realized and combined with one to five dynamically reconfigurable Xilinx FPGAs and up to 20 GByte of working memory. The implemented approach for dynamic reconfiguration enables partial reconfiguration at 400 MByte/s. Blind and readback scrubbing is supported and the scrub rate can be adapted individually for different parts of the design.

reconfigurable computing and fpgas | 2015

FPGA-based circular hough transform with graph clustering for vision-based multi-robot tracking

Arif Irwansyah; Omar W. Ibraheem; Jens Hagemeyer; Mario Porrmann; Ulrich Rueckert

Shape-based object detection and recognition are frequently used methods in the field of computer vision. A well-known algorithm for circle detection is the Circular Hough Transform (CHT). This Hough Transform algorithm needs a huge memory space and large computational resources. Field Programmable Gate Array (FPGA)-based hardware accelerators can be used to efficiently handle such compute-intensive applications. In this paper, we present a resource-efficient FPGA-based architecture for the CHT algorithm. Additionally, we introduce a unique approach by combining the CHT algorithm with graph clustering. The combination of these algorithms and their implementation on a Xilinx Virtex-4 FPGA is used to support real-time vision-based multi-robot tracking. Furthermore, an efficient architecture is proposed to significantly reduce the required memory in the CHT module. For the Graph Clustering module, a multiplier-less distance calculation unit is implemented, significantly reducing the required FPGA resources. The proposed CHT design can handle multi-robot localization with an accuracy of 97 %, supporting a maximum video resolution of 1024x1024 with 128 frames per second, resulting in 134 MPixel/s. Our design provides significantly higher throughput compared to other implementations on embedded processors, FPGAs, and general purpose CPUs. Compared to an OpenCV implementation on a 3.2 GHz desktop CPU, our implementation achieves a speed- up of more than 5.7.

reconfigurable computing and fpgas | 2015

A resource-efficient multi-camera GigE vision IP core for embedded vision processing platforms

Omar W. Ibraheem; Arif Irwansyah; Jens Hagemeyer; Mario Porrmann; Ulrich Rueckert

In vision processing systems, many applications require multi-camera support. For the connection of the cameras to the processing system, multiple interfaces and a platform capable of handling sustained high data rates are essential. To cope with these requirements, a hardware-based solution using FPGA technology is advisable, especially when targeting space and energy constrained embedded systems. The aim of this work is to develop and implement an FPGA-based scalable and resource-efficient multi-camera GigE Vision IP core for video and image processing. To reduce the number of interfaces needed, the IP core supports the connection of multi-camera interfaces to a single Gigabit Ethernet port using an Ethernet switch. The multi-camera GigE Vision IP core is able to extract the raw video data from multiple GigE Vision video streams, reconstruct the video frames from every camera and pass these data for further processing. To test the system, four GigE Vision cameras are used. The IP core is implemented on a Xilinx Virtex-4 FPGA and integrated in a complete video processing platform for a full system realization. In addition to the IP core, bilinear interpolation for image demosaicing with Bayer pattern and an automatic white balance algorithm are implemented for evaluation of the platform. Benchmarking of the hardware implementation has been performed with a total resolution of up to 2048x2048 pixels. Achieved frame rates vary from 25 fps to 345 fps depending on the selected resolution and on the number of used cameras.

Archive | 2005

Local Cluster Neural Network Chip for Control

Liang Zhang; Joaquin Sitte; Ulrich Rueckert

The local cluster neural network (LCNN) is an alternative to RBF networks that performs well in digital simulation. The LCNN is suitable for an analog VLSI implementation that is attractive for a wide range of embedded neural net applications. In this paper, we present the input-output characterisation of LCNN analog chip. The effect of manufacturing variations on the chip’s function is investigated and analyzed.

IEEE Transactions on Neural Networks | 2007

Characterization of Analog Local Cluster Neural Network Hardware for Control

Joaquin Sitte; Liang Zhang; Ulrich Rueckert

The local cluster neural network (LCNN) was designed for analog realization especially suited to applications in control systems. It uses clusters of sigmoidal neurons to generate basis functions that are localized in multidimensional input space. Sigmoidal neurons are well suited to analog electronic realization. In this paper, we report the results of extensive measurements that characterize the computational capabilities of the first analog very large scale integration (VLSI) realization of the LCNN. Despite manufacturing fluctuations and the inherent low precision of analog electronics, the test results suggest that it may be suitable for use in feedback control systems.

CHIPS 2020 VOL. 2: New Vistas in Nanoelectronics | 2016

Brain-Inspired Architectures for Nanoelectronics

Ulrich Rueckert

Mapping brain-like structures and processes into electronic substrates has recently seen a revival with the availability of deep-submicron CMOS technology. The basic idea is to exploit the massive parallelism of such circuits and to create low-power and fault-tolerant information-processing systems. Aiming at overcoming the big challenges of deep-submicron CMOS technology (power wall, reliability, design complexity), bio-inspiration offers alternative ways to (embedded) artificial intelligence. The challenge is to understand, design, build, and use new architectures for nanoelectronic systems, which unify the best of brain-inspired information processing concepts and of nanotechnology hardware, including both algorithms and architectures. Obviously, the brain could serve as an inspiration at several different levels, when investigating architectures spanning from innovative system-on-chip to biologically neural inspired. This chapter introduces basic properties of biological brains and general approaches to realize them in nanoelectronics. Modern implementations are able to reach the complexity-scale of large functional units of biological brains, and they feature the ability to learn by plasticity mechanisms found in neuroscience. Combined with high-performance programmable logic and elaborate software tools, such systems are currently evolving into user-configurable non-von-Neumann computing systems, which can be used to implement and test novel computational paradigms. Hence, big brain research programs started world-wide. Four projects from the largest programs on brain-like electronic systems in Europe (Human Brain Project) and in the US (SyNAPSE) will be outlined in this chapter.

ACM Transactions on Reconfigurable Technology and Systems | 2010

Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

Madhura Purnaprajna; Mario Porrmann; Ulrich Rueckert; Michael Hussmann; Michael Thies; Uwe Kastens

In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this article, a unified hardware-software approach for the design of our QuadroCore is presented. This design flow is enabled via compiler-driven reconfiguration which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle. The QuadroCore has been implemented on Xilinx XC2V6000 for functional validation and on UMC’s 90nm standard cell technology for performance estimation. A diverse set of applications have been mapped onto the reconfigurable multiprocessor to meet orthogonal performance characteristics in terms of time and power. Speedup measurements show a 2--11 times performance increase in comparison to a single processor. Additionally, the reconfiguration scheme has been applied to save power in data-parallel applications. Gate-level simulations have been performed to measure the power-performance trade-offs for two computationally complex applications. The power reports confirm that introducing this scheme of reconfiguration results in power savings in the range of 15--24%.

ACM Sigarch Computer Architecture News | 2009

Run-time reconfigurability in embedded multiprocessors

Madhura Purnaprajna; Mario Porrmann; Ulrich Rueckert

To meet application-specific performance demands, architectures are predominantly redesigned and customised. Every architectural change results in huge overheads in design, verification, and fabrication, which together result in prolonged time-to-market. As an alternative, configurable architectures provide easy adaptability to different application domains in place of costly redesigns. To deal with application changes and custom requirements, a method of configuring and reusing the basic building blocks within processors is developed. Additionally, this enables co-operative multiprocessing. In this paper, a runtime reconfiguration mechanism for embedded multiprocessor architectures is proposed as a method to introduce customisations in the post-fabrication phase. A method of application description in conjunction with a flexible reconfigurable multiprocessor template is presented. Finally, the costs and benefits of this approach are analysed for computationally intensive algorithms used in digital signal processing. The impact of application specific characteristics on execution time, power consumption, and total energy dissipation are analysed.

Explore More