Lars Rockstroh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lars Rockstroh is active.

Explore More

Publication

Featured researches published by Lars Rockstroh.

computational science and engineering | 2008

Accelerating Simulations of Light Scattering Based on Finite-Difference Time-Domain Method with General Purpose GPUs

Ana Balevic; Lars Rockstroh; Andreas Tausendfreund; Stefan Patzelt; Gert Goch; Sven Simon

Simulations of light scattering from nano-structured surface areas require substantial amount of computing time. The emergence of general purpose graphics processing units (GPGPUs) as affordable PC SIMD arithmetic coprocessors brings the necessary computing power to modern desktop PCs. In this paper we examine how the computation time of the finite-difference time-domain (FDTD), a classic numerical method for computing a solution to Maxwells equations, can be reduced by leveraging the massively parallel architecture of GPGPUs cards.

field-programmable technology | 2012

A memory-efficient parallel single pass architecture for connected component labeling of streamed images

Michael Klaiber; Lars Rockstroh; Zhe Wang; Yousef Baroud; Sven Simon

In classical connected component labeling algorithms the image has to be scanned two times. The amount of memory required for these algorithms is at least as high as for storing a full image. By using single pass connected component labeling algorithms, the memory requirement can be reduced by one order of magnitude to only a single image row. This memory reduction which avoids the requirement of high bandwidth external memory is essential to obtain a hardware efficient implementation on FPGAs. These single pass algorithms mapped one-to-one to hardware resources on FPGAs can process only one pixel per clock cycle in the best case. In order to enhance the performance a scalable parallel memory-efficient single pass algorithm for connected component labeling is proposed. The algorithm reduces the amount of memory required by the hardware architecture by a factor of 100 or more, for typical image sizes, compared to a recently proposed parallel connected component labeling algorithm. The architecture is also able to process an image stream with high throughput without the need of buffering a full image.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2008

Using Arithmetic Coding for Reduction of Resulting Simulation Data Size on Massively Parallel GPGPUs

Ana Balevic; Lars Rockstroh; Marek Wróblewski; Sven Simon

The popularity of parallel platforms, such as general purpose graphics processing units (GPGPUs) for large-scale simulations is rapidly increasing, however the I/O bandwidth and storage capacity of these massively-parallel cards remain the major bottle necks. We propose a novel approach for post-processing of simulation data directly on GPGPUs by efficient data size reduction immediately after simulation that can considerably reduce the influence of these bottlenecks on the overall simulation performance, and present current performance results.

picture coding symposium | 2010

Memory-efficient parallelization of JPEG-LS with relaxed context update

Simeon Wahl; Zhe Wang; Chensheng Qiu; Marek Wroblewski; Lars Rockstroh; Sven Simon

Many state-of-the-art lossless image compression standards feature adaptive error modelling. This, however, leads to data dependency loops of the compression scheme such that a parallel compression of neighboring pixels is not possible. In this paper, we propose a relaxation to the context update of JPEG-LS by delaying the update procedure, in order to achieve a guaranteed degree of parallelism with a negligible effect on the compression ratio. The lossless mode of JPEG-LS including the run-mode is considered. A descewing scheme is provided generating a bit-stream that preserves the order needed for the decoder to mimic the prediction in a consistent way. This system is memory efficient in a sense that no additional memory for the large context-set is needed.

computer and information technology | 2008

Acceleration of a finite-difference method with general purpose GPUs - Lesson learned

Ana Balevic; Lars Rockstroh; Wenbin Li; Jürgen Hillebrand; Sven Simon; Andreas Tausendfreund; Stefan Patzelt; Gert Goch

Modern massively parallel graphics cards (GPGPUs) offer a promise of dramatically reducing computation times of numerically-intensive data-parallel algorithms. As cards that are easily integrated into desktop PCs, they can bring computational power previously reserved for computer clusters to the office space. High performance rates make GPGPUs a very attractive target platform for scientific simulations. In this paper we present the lessons learned during the parallelization of a finite-difference time-domain method, an inherently data-parallel algorithm frequently used for numerical computations, on the state of the art graphics hardware.

nano/micro engineered and molecular systems | 2008

Accelerating light scattering simulations of nanostructures by reconfigurable computing

Lars Rockstroh; Ana Balevic; M. Wroblewski; Jürgen Hillebrand; A. Tausendfreund; S. Patzelt; Sven Simon; G. Goch

In order to characterize nanostructures and nanosurfaces in production processes, measuring methods based on light scattering gain increasing importance. Thus the simulation capability of laser light scattering on surfaces with a size of several hundred or thousand wavelenghts in diameter and light scattering models on the nanometer scale are required to validate these new measurement techniques. This leads to a huge amount of computational complexity exceeding the resources of conventional desktop computers. In order to overcome this computational bottleneck two different approaches for massively parallel computing, namely graphic processing unit (GPU) computing and reconfigurable computing are compared in this paper. Both approaches are discussed with respect to the discrete dipole approximation (DDA) approach. Finally, a computer architecture incorporating both in a standard desktop system is presented.

Tm-technisches Messen | 2008

Virtuelle Messgeräte: Definition und Stand der Entwicklung (Virtual Measuring Instruments: Definition and Development Status)

Robert Schmitt; Friedel Koerfer; Oliver Sawodny; Jan Zimmermann; Rolf Krüger-Sehm; Min Xu; Thorsten Dziomba; Ludger Koenders; Gert Goch; Andreas Tausendfreund; Stefan Patzelt; Sven Simon; Lars Rockstroh; Carsten Bellon; Andreas Staude; Peter Woias; Frank Goldschmidtböing; Martin Rabold

Die Mikro- und Nanotechnologie gehört zu den Schlüsseltechnologien des 21. Jahrhunderts mit hohen Wachstumsprognosen, wie auch die im Auftrag des BMBF durchgeführte Studie “Nanotechnologie als wirtschaftlicher Wachstumsmarkt” von 2004 ausführlich darstellt. Aus diesem Trend resultiert ein steigender Bedarf an Messsystemen, die Nanostrukturen prozessnah bzw. im Fertigungsprozess charakterisieren können. Virtuelle Messgeräte liefern Erkenntnisse zur Entwicklung neuartiger Messsysteme, Analyse und Optimierung bestehender Verfahren sowie die Bestimmung der Messunsicherheit und modellbasierten Korrektur systematischer Fehler. Der virtuelle Messprozess umfasst neben dem Messmittel auch die Probe und die Wechselwirkungen zwischen beiden. In diesem Beitrag werden virtuelle Messgeräte vorgestellt sowie deren Anwendung diskutiert. Micro- and nanotechnology experienced a high economic growth in recent years. This yields in a growing demand for measuring instruments which are closely linked to the production process. Virtual measuring instruments provide knowledge for the development of new systems, the analysis and optimization of established devices as well as the determination of the uncertainty in measurement. The virtual measuring process consists of the measuring instrument, the sample, and the interaction between both. In this article examples of current developments of virtual instruments are presented and their way of utilization is discussed.

international conference on acoustics, speech, and signal processing | 2012

Correlation and convolution of image data using fermat number transform based on two's complement

Lars Rockstroh; Michael Klaiber; Sven Simon

The fast fermat number transform (FNT) enables fast correlation and fast convolution similar to fast correlation based on fast fourier transform (FFT). In contrast to fixed-point FFT with dynamic scaling, FNT is based on integer operations, which are free of rounding error, and maintains full dynamic range for convolution and correlation. In this paper, a technique to calculate FNT based on twos complement (TFNT) is presented and the correctness of the technique is proven. The TFNT is data flow driven without conditional assignments, which enables high performance pipelined implementations on digital signal processors and field programmable gate arrays. By taking the example of 2D correlation and based on a Radix-4 algorithm, it is shown that TFNT requires less operations than fixed-point FFT as well as less operations than FNT based on the previously presented diminished-1 approach.

field-programmable technology | 2009

Benchmark results for asynchronous high-speed FPGAs focusing on high performance digital signal processing

Lars Rockstroh; Wenbin Li; Juergen Hillebrand; Marek Wroblewski; Sven Simon

Since the end of 2008, the first FPGAs based on asynchronous logic cells are commercially available. Although the internal logic of the FPGA fabric is based on pure asynchronous logic, the design style for asynchronous FPGAs does not differ from that of classical FPGAs. The design for asynchronous FPGAs can be synthesized in the same way as for synchronous FPAGs from register transfer level based on standard hardware description languages like VHDL or Verilog. The fundamental advantage of these asynchronous FPGAs is that the intrinsic speed of the FPGA fabric is much higher at about 1.5 GHz compared to a few hundred MHz for traditional FPGAs. In this paper, it is examined whether this speed advantage on the physical level translates to advantages on the design level for high performance DSP applications.

international conference on acoustics, speech, and signal processing | 2012