Gundolf Kiefer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gundolf Kiefer is active.

Explore More

Publication

Featured researches published by Gundolf Kiefer.

reconfigurable computing and fpgas | 2011

Object Recognition on a Chip: A Complete SURF-Based System on a Single FPGA

Michael Schaeferling; Gundolf Kiefer

This paper describes a system for robust optical object recognition based on sophisticated point features which is completely implemented in a medium-size FPGA. All components needed to process image data are integrated in a System-on-Chip, including a special IP core which accelerates the feature detection step of the Speeded-up Robust Features (SURF) algorithm. The task of object recognition is solved by a lightweight matching algorithm. The system was evaluated with a set of 60 scene images. All 7 test objects were recognized at a sensitivity of 93% without any false positives at all. The minimum total execution time for one frame was 191ms, and the average time was 481ms.

reconfigurable computing and fpgas | 2010

Flex-SURF: A Flexible Architecture for FPGA-Based Robust Feature Extraction for Optical Tracking Systems

Michael Schaeferling; Gundolf Kiefer

In this paper, we propose a novel architecture to accelerate the Speeded Up Robust Features (SURF) algorithm by the use of configurable hardware. SURF is used in optical tracking systems to robustly detect distinguishable features within an image in a scale and rotation invariant way. In its performance critical part, SURF computes convolution filters at multiple scale levels without the need to create down-sampled versions of the original image. However, the algorithm exposes a very irregular memory access pattern. We designed a configurable and scalable architecture to overcome these memory access issues without the need to use any internal block RAM resources of the FPGA. The complete detector and descriptor stage of SURF has been implemented and validated in a Virtex 5 FPGA.

international conference of the ieee engineering in medicine and biology society | 2006

Fast maximum intensity projections of large medical data sets by exploiting hierarchical memory architectures

Gundolf Kiefer; Helko Lehmann; Jürgen Weese

Maximum intensity projections (MIPs) are an important visualization technique for angiographic data sets. Efficient data inspection requires frame rates of at least five frames per second at preserved image quality. Despite the advances in computer technology, this task remains a challenge. On the one hand, the sizes of computed tomography and magnetic resonance images are increasing rapidly. On the other hand, rendering algorithms do not automatically benefit from the advances in processor technology, especially for large data sets. This is due to the faster evolving processing power and the slower evolving memory access speed, which is bridged by hierarchical cache memory architectures. In this paper, we investigate memory access optimization methods and use them for generating MIPs on general-purpose central processing units (CPUs) and graphics processing units (GPUs), respectively. These methods can work on any level of the memory hierarchy, and we show that properly combined methods can optimize memory access on multiple levels of the hierarchy at the same time. We present performance measurements to compare different algorithm variants and illustrate the influence of the respective techniques. On current hardware, the efficient handling of the memory hierarchy for CPUs improves the rendering performance by a factor of 3 to 4. On GPUs, we observed that the effect is even larger, especially for large data sets. The methods can easily be adjusted to different hardware specifics, although their impact can vary considerably. They can also be used for other rendering techniques than MIPs, and their use for more general image processing task could be investigated in the future

Medical Imaging 2006: Visualization, Image-Guided Procedures, and Display | 2006

Visualizing the beating heart: interactive direct volume rendering of high-resolution CT time series using standard PC hardware

Helko Lehmann; Olivier Ecabert; Dieter Geller; Gundolf Kiefer; Jürgen Weese

Modern multi-slice CT (MSCT) scanners allow acquisitions of 3D data sets covering the complete heart at different phases of the cardiac cycle. This enables the physician to non-invasively study the dynamic behavior of the heart, such as wall motion artifacts. To this end an interactive 4D visualization of the heart in motion is desirable. However, the application of well-known volume rendering algorithms enforces considerable sacrifices in terms of image quality to ensure interactive frame rates, even when accelerated by standard graphics processors (GPUs). Thereby, the performance of pure CPU implementations of direct volume rendering algorithms is limited even for moderate volume sizes by both the number of required computations and the available memory bandwidth. Despite of offering higher computational performance and more memory bandwidth GPU accelerated implementations cannot provide interactive visualizations of large 4D data sets since data sets that do not fit into the onboard graphics memory are often not handled efficiently. In this paper we present a software architecture for GPU-based direct volume rendering algorithms that allows the interactive high-quality visualization of large medical time series data sets. In contrast to other work, our architecture exploits the complete memory hierarchy for high cache and bandwidth efficiency. Additionally, several data-dependent techniques are incorporated to reduce the amount of volume data to be transferred and rendered. None of these techniques sacrifices image quality in order to improve speed. By applying the method to several multi phase MSCT cardiac data sets we show that we can achieve interactive frame rates on currently available standard PC hardware.

field programmable logic and applications | 2014

An efficient FPGA-based hardware framework for natural feature extraction and related Computer Vision tasks

Matthias Pohl; Michael Schaeferling; Gundolf Kiefer

The paper presents an efficient and flexible framework for extensive image processing tasks. While most available frameworks concentrate on pixel-based modules and interfaces for image preprocessing tasks, our proposal also covers the seamless integration of higher-level algorithms. Window-oriented filter operations, such as noise filters, edge filters or natural feature detectors, are performed within an efficient 2D window pipeline. This structure is generated and optimized automatically based on a user-defined filter configuration. For complex, higher-level algorithms, an optimized array of independent, software-based processing units is generated. As an example application, we chose object recognition based on the well-known SURF algorithm (“Speeded Up Robust Features”), which performs natural feature detection and description. All involved image processing steps were successfully mapped to our architecture. Thus, exploiting the FPGAs full potential regarding parallelism, we synthesized one of the most efficient SURF detectors and a complete object recognition system in a single mid-size FPGA.

Medical Imaging 2005: Visualization, Image-Guided Procedures, and Display | 2005

Visualization of large medical data sets using memory-optimized CPU and GPU algorithms

Gundolf Kiefer; Helko Lehmann; Juergen Weese

With the evolution of medical scanners towards higher spatial resolutions, the sizes of image data sets are increasing rapidly. To profit from the higher resolution in medical applications such as 3D-angiography for a more efficient and precise diagnosis, high-performance visualization is essential. However, to make sure that the performance of a volume rendering algorithm scales with the performance of future computer architectures, technology trends need to be considered. The design of such scalable volume rendering algorithms remains challenging. One of the major trends in the development of computer architectures is the wider use of cache memory hierarchies to bridge the growing gap between the faster evolving processing power and the slower evolving memory access speed. In this paper we propose ways to exploit the standard PC’s cache memories supporting the main processors (CPU’s) and the graphics hardware (graphics processing unit, GPU), respectively, for computing Maximum Intensity Projections (MIPs). To this end, we describe a generic and flexible way to improve the cache efficiency of software ray casting algorithms and show by means of cache simulations, that it enables cache miss rates close to the theoretical optimum. For GPU-based rendering we propose a similar, brick-based technique to optimize the utilization of onboard caches and the transfer of data to the GPU on-board memory. All algorithms produce images of identical quality, which enables us to compare the performance of their implementations in a fair way without eventually trading quality for speed. Our comparison indicates that the proposed methods perform superior, in particular for large data sets.

electronic imaging | 2003

Implementation of a nonlinear gradient adaptive filter for processing of large-size medical sequences on general-purpose hardware

Kai Eck; Holger Fillbrandt; Gundolf Kiefer; Til Aach

To achieve significant noise reduction in medical images while at the same time preserving fine structures of diagnostic value, a non-linear filter called the multi-resolution gradient adaptive filter (MRGAF) was developed. Though the algorithm is well suited for its task of noise reduction in medical images, it is still limited to the application of offline processing in medical workstations due to its computational complexity. The aim of our study is to reach real-time processing of data from low-cost x-ray systems on a standard PC without additional hardware. One major drawback of the original MRGAF procedure is its irregular memory access behavior caused by the intermediate multi-resolution representation of the image (Laplacian pyramid). This is addressed by completely re-arranging the computation. The image is divided into super-lines carrying all relevant information of all pyramidal levels, which allow to apply the complete MRGAF procedure in a single pass. This way, the cache utilization is improved considerably, the total number of memory accesses is reduced, and the use of super-scalar processing capabilities of current processors is facilitated. The current implementation allows applying advanced multi-resolution non-linear noise reduction to images of 768 × 564 pixels at a rate of more than 30 frames per second on a workstation. This shows that high-quality real-time image enhancement is feasible from a technical as well as from an economical point of view.

reconfigurable computing and fpgas | 2016

A configurable architecture for the generalized hough transform applied to the analysis of huge aerial images and to traffic sign detection

Gundolf Kiefer; Matthias Vahl; Julian Sarcher; Michael Schaeferling

Object recognition in huge image data sets or in live camera images at interactive frame rates is a very demanding task, especially within embedded systems. The recognition task includes the localization of a reference object and its rotation and scaling in a search image. The Generalized Hough Transform (GHT) is known as a powerful and robust technique to support this task by transforming the search image into a 4D parameter space. However, the GHT itself is very complex and demanding towards computational power and memory consumption. This paper presents a novel hardware architecture to perform a complete 4D GHT at interactive frame rates in an FPGA. The architecture is configurable in order to allow a trade-off between performance, accuracy and hardware usage. The proposed architecture has been implemented in a low-cost Zynq-7000 FPGA and successfully evaluated in two practical applications, namely groyne detection in aerial images and traffic sign detection.

Medical Imaging 2007: Visualization and Image-Guided Procedures | 2007

Efficient hardware accelerated rendering of multiple volumes by data dependent local render functions

Helko Lehmann; Dieter Geller; Jürgen Weese; Gundolf Kiefer

The inspection of a patients data for diagnostics, therapy planning or therapy guidance involves an increasing number of 3D data sets, e.g. acquired by different imaging modalities, with different scanner settings or at different times. To enable viewing of the data in one consistent anatomical context fused interactive renderings of multiple 3D data sets are desirable. However, interactive fused rendering of typical medical data sets using standard computing hardware remains a challenge. In this paper we present a method to render multiple 3D data sets. By introducing local rendering functions, i.e. functions that are adapted to the complexity of the visible data contained in the different regions of a scene, we can ensure that the overall performance for fused rendering of multiple data sets depends on the actual amount of visible data. This is in contrast to other approaches where the performance depends mainly on the number of rendered data sets. We integrate the method into a streaming rendering architecture with brick-based data representations of the volume data. This enables efficient handling of data sets that do not fit into the graphics board memory and a good utilization of the texture caches. Furthermore, transfer and rendering of volume data that does not contribute to the final image can be avoided. We illustrate the benefits of our method by experiments with clinical data.

Archive | 2005