Is this you? Create Your Porfile

M. Balzer

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where M. Balzer is active.

Explore More

Publication

Featured researches published by M. Balzer.

Journal of Real-time Image Processing | 2014

A comprehensive comparison of GPU- and FPGA-based acceleration of reflection image reconstruction for 3D ultrasound computer tomography

Matthias Birk; Michael Zapf; M. Balzer; Nicole V. Ruiter; Jürgen Becker

As today’s standard screening methods frequently fail to diagnose breast cancer before metastases have developed, earlier breast cancer diagnosis is still a major challenge. Three-dimensional ultrasound computer tomography promises high-quality images of the breast, but is currently limited by a time-consuming image reconstruction. In this work, we investigate the acceleration of the image reconstruction by GPUs and FPGAs. We compare the obtained performance results with a recent multi-core CPU. We show that both architectures are able to accelerate processing, whereas the GPU reaches the highest performance. Furthermore, we draw conclusions in terms of applicability of the accelerated reconstructions in future clinical application and highlight general principles for speed-up on GPUs and FPGAs.

Journal of Instrumentation | 2014

An ultra-fast data acquisition system for coherent synchrotron radiation with terahertz detectors

Michele Caselle; M. Balzer; Suren Chilingaryan; M. Hofherr; V. Judin; Andreas Kopmann; N. Smale; Petra Thoma; Stefan Wuensch; Anke-Susanne Müller; M. Siegel; M. Weber

The recording of coherent synchrotron radiation requires data acquisition systems with a temporal resolution of tens of picosecond. This paper describes a new real-time and high-accuracy data acquisition system suitable for recording individual ultra-short pulses generated by a fast terahertz (THz) detector (e.g. YBCO, NbN, Zero Biased Schottky Diode). The system consists of a fast sampling board combined with a high data throughput readout. The first board is designed for sampling the fast pulse signals with a full width half maximum (FWHM) between a few tens to one hundred picoseconds with a minimum sampling time of 3 ps. The high data throughput board consists of a PCIe-Bus Master DMA architecture used for fast data transfer up to 3 GByte/s. The full readout chain with fast THz detectors and the acquisition system has been successfully tested at the synchrotron ANKA. An overview of the electronics system and preliminary results with multi-bunch filling pattern will be presented.

conference on design and architectures for signal and image processing | 2011

Acceleration of image reconstruction in 3D ultrasound computer tomography: An evaluation of CPU, GPU and FPGA computing

Matthias Birk; Alexander Guth; Michael Zapf; M. Balzer; Nicole V. Ruiter; Michael Hübner; Jürgen Becker

As todays standard screening methods frequently fail to diagnose breast cancer before metastases have developed, earlier breast cancer diagnosis is still a major challenge. Three-dimensional ultrasound computer tomography promises high-quality images of the breast, but is currently limited by a time-consuming synthetic aperture focusing technique based image reconstruction. In this work, we investigate the acceleration of the image reconstruction by a GPU, and by the FPGAs embedded in our custom data acquisition system. We compare the obtained performance results with a recent multi-core CPU and show that both platforms are able to accelerate processing. The GPU reaches the highest performance. Furthermore, we draw conclusions in terms of applicability of the accelerated reconstructions in future clinical application and highlight general principles for speed-up on GPUs and FPGAs.

Computers & Electrical Engineering | 2014

Evaluation of performance and architectural efficiency of FPGAs and GPUs in the 40 and 28nm generations for algorithms in 3D ultrasound computer tomography

Matthias Birk; M. Balzer; Nicole V. Ruiter; Juergen Becker

In heterogeneous computing, application developers have to identify the best-suited target platform from a variety of alternatives. In this work, we compare performance and architectural efficiency of Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) for two algorithms taken from a novel medical imaging method named 3D ultrasound computer tomography. From the 40nm and 28nm generations, we use top-notch devices and those with similar power consumption values. For our two benchmark algorithms from the signal processing and imaging domain, the results show that if power consumption is not considered, the GPU and FPGA from the 40nm generation give both, a similar performance and efficiency per transistor. In the 28nm process, in contrast, the FPGA is superior to its GPU counterpart by 86% and 39%, depending on the algorithm. If power is limited, FPGAs outperform GPUs in each investigated case by at least a factor of four.

reconfigurable computing and fpgas | 2012

Comparison of processing performance and architectural efficiency metrics for FPGAs and GPUs in 3D Ultrasound Computer Tomography

Matthias Birk; M. Balzer; Nicole V. Ruiter; Jürgen Becker

With the rise of heterogeneous computing architectures, application developers are confronted with a multitude of hardware platforms and the challenge of identifying the most suitable processing platform for their application. Strong competitors for the acceleration of 3D Ultrasound Computer Tomography, a medical imaging method for early breast cancer diagnosis, are GPU and FPGA devices. In this work, we evaluate processing performance and efficiency metrics for current FPGA and GPU devices. We compare top-notch devices from the 40 nm generation as well as FPGA and GPU devices, which draw the same amount of power. For our two benchmark algorithms, the results show that if power consumption is not considered the GPU and the FPGA give both, a similar processing performance and processing efficiency per transistor. However, if the power budget is limited to a similar value, the FPGA performs between six and eight times better than the GPU.

ieee npss real time conference | 2016

An FPGA-based track finder for the L1 trigger of the CMS experiment at the high luminosity LHC

C. Amstutz; F. Ball; M. Balzer; J. J. Brooke; L. Calligaris; Davide Cieri; E. Clement; Geoffrey Hall; Tanja Harbaum; Kristian Harder; Pr Hobson; G. Iles; Thomas James; K Manolopoulos; T. Matsushita; A. Morton; David M Newbold; S. Paramesvaran; M. Pesaresi; Ivan Reid; A. Rose; Oliver Sander; T. Schuh; C. H. Shepherd-Themistocleous; Antoni Shtipliyski; Sioni Summers; Alexander Tapper; I. R. Tomalin; Kirika Uchida; P. Vichoudis

A new tracking system is under development for operation in the CMS experiment at the High Luminosity LHC. It includes an outer tracker which will construct stubs, built by correlating clusters in two closely spaced sensor layers for the rejection of hits from low transverse momentum tracks, and transmit them off-detector at 40 MHz. If tracker data is to contribute to keeping the Level-1 trigger rate at around 750 kHz under increased luminosity, a crucial component of the upgrade will be the ability to identify tracks with transverse momentum above 3 GeV/c by building tracks out of stubs. A concept for an FPGA-based track finder using a fully time-multiplexed architecture is presented, where track candidates are identified using a projective binning algorithm based on the Hough Transform. A hardware system based on the MP7 MicroTCA processing card has been assembled, demonstrating a realistic slice of the track finder in order to help gauge the performance and requirements for a full system. This paper outlines the system architecture and algorithms employed, highlighting some of the first results from the hardware demonstrator and discusses the prospects and performance of the completed track finder.

Journal of Instrumentation | 2016

A high-throughput readout architecture based on PCI-Express Gen3 and DirectGMA technology

Lorenzo Rota; Matthias Vogelgesang; L.E. Ardila Perez; Michele Caselle; Suren Chilingaryan; T. Dritschler; N. Zilio; Andreas Kopmann; M. Balzer; M. Weber

Modern physics experiments produce multi-GB/s data rates. Fast data links and high performance computing stages are required for continuous data acquisition and processing. Because of their intrinsic parallelism and computational power, GPUs emerged as an ideal solution to process this data in high performance computing applications. In this paper we present a high-throughput platform based on direct FPGA-GPU communication. The architecture consists of a Direct Memory Access (DMA) engine compatible with the Xilinx PCI-Express core, a Linux driver for register access, and high- level software to manage direct memory transfers using AMDs DirectGMA technology. Measurements with a Gen3 x8 link show a throughput of 6.4 GB/s for transfers to GPU memory and 6.6 GB/s to system memory. We also assess the possibility of using the architecture in low latency systems: preliminary measurements show a round-trip latency as low as 1 μs for data transfers to system memory, while the additional latency introduced by OpenCL scheduling is the current limitation for GPU based systems. Our implementation is suitable for real-time DAQ system applications ranging from photon science and medical imaging to High Energy Physics (HEP) systems.

International Journal of Reconfigurable Computing | 2011

Evaluation of the reconfiguration of the data acquisition system for 3D USCT

Matthias Birk; Clemens Hagner; M. Balzer; Nicole V. Ruiter; Michael Hübner; Jürgen Becker

As todays standard screening methods often fail to diagnose breast cancer before metastases have developed, an earlier breast cancer diagnosis is still a major challenge. To improve this situation, we are currently developing a fully three-dimensional ultrasound computer tomography (3D USCT) system, promising high-quality volume images of the breast. For obtaining these images, a time-consuming reconstruction has to be performed. As this is currently done on a PC, parallel processing in reconfigurable hardware could accelerate both signal and image processing. In this work, we investigated the suitability of an existing data acquisition (DAQ) system for further computation tasks. The reconfiguration features of the embedded FPGAs have been exploited to enhance the systems functionality. We have adapted the DAQ system to allow for bidirectional communication and to provide an overall process control. Our results show that the studied system can be applied for data processing.

ieee npss real time conference | 2016

Emulation of a prototype FPGA track finder for the CMS Phase-2 upgrade with the CIDAF emulation framework

C. Amstutz; F. Ball; M. Balzer; J Brooke; L. Calligaris; D Cieri; Ej Clement; Geoffrey Hall; Tanja Harbaum; Kristian Harder; Pr Hobson; G. Iles; T James; K Manolopoulos; T. Matsushita; A. Morton; Dave M Newbold; S. Paramesvaran; M. Pesaresi; Ivan Reid; A. Rose; Oliver Sander; T. Schuh; C. H. Shepherd-Themistocleous; A Shtipliyski; Sp Summers; A. Tapper; Ian R Tomalin; Kirika Uchida; P. Vichoudis

The CMS collaboration is preparing a major upgrade of its detector, so it can operate during the high luminosity run of the LHC from 2026. The upgraded tracker electronics will reconstruct the trajectories of charged particles within a latency of a few microseconds, so that they can be used by the level-1 trigger. An emulation framework, CIDAF, has been developed to provide a reference for a proposed FPGA-based implementation of this track finder, which employs a Time-Multiplexed (TM) technique for data processing.

IEEE Transactions on Nuclear Science | 2015

A Control System and Streaming DAQ Platform with Image-Based Trigger for X-ray Imaging

Uros Stevanovic; Michele Caselle; Angelica Cecilia; Suren Chilingaryan; Tomas Farago; Sergey Gasilov; Armin Herth; Andreas Kopmann; Matthias Vogelgesang; M. Balzer; Tilo Baumbach; Marc Weber

High-speed X-ray imaging applications play a crucial role for non-destructive investigations of the dynamics in material science and biology. On-line data analysis is necessary for quality assurance and data-driven feedback, leading to a more efficient use of a beam time and increased data quality. In this article we present a smart camera platform with embedded Field Programmable Gate Array (FPGA) processing that is able to stream and process data continuously in real-time. The setup consists of a Complementary Metal-Oxide-Semiconductor (CMOS) sensor, an FPGA readout card, and a readout computer. It is seamlessly integrated in a new custom experiment control system called Concert that provides a more efficient way of operating a beamline by integrating device control, experiment process control, and data analysis. The potential of the embedded processing is demonstrated by implementing an image-based trigger. It records the temporal evolution of physical events with increased speed while maintaining the full field of view. The complete data acquisition system, with Concert and the smart camera platform was successfully integrated and used for fast X-ray imaging experiments at KITs synchrotron radiation facility ANKA.

Explore More