Is this you? Create Your Porfile

Dietmar Fey

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dietmar Fey is active.

Explore More

Publication

Featured researches published by Dietmar Fey.

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

HPX: A Task Based Programming Model in a Global Address Space

Hartmut Kaiser; Thomas Heller; Bryce Adelstein-Lelbach; Adrian Serio; Dietmar Fey

The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and resilient programming techniques that are portable across multiple future generations of machines. We believe that guaranteeing adequate scalability, programmability, performance portability, resilience, and energy efficiency requires a fundamentally new approach, combined with a transition path for existing scientific applications, to fully explore the rewards of todays and tomorrows systems. We present HPX -- a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource management. This provides a widely accepted API enabling programmability, composability and performance portability of user applications. By employing a global address space, we seamlessly augment the standard to apply to a distributed case. We present HPXs architecture, design decisions, and results selected from a diverse set of application runs showing superior performance, scalability, and efficiency over conventional practice.

international conference on conceptual structures | 2011

High Performance Stencil Code Algorithms for GPGPUs

Andreas Schäfer; Dietmar Fey

Abstract In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of their superior floating point performance and memory bandwidth. Nevertheless, especially memory bound stencil codes have proven to be challenging for GPGPUs, yielding lower than to be expected speedups. We chose the Jacobi method as a standard benchmark to evaluate a set of algorithms on NVIDIAs latest Fermi chipset. One of our fastest algorithms is a parallel wavefront update. It exploits the enlarged on-chip shared memory to perform two time step updates per sweep. To the best of our knowledge, it represents the first successful applicationof temporal blocking for 3D stencils on GPGPUs and thereby exceeds previous results by a considerable margin. It is also the first paper to study stencil codes on Fermi.

Proceedings of the IEEE | 2000

Optical interconnects for neural and reconfigurable VLSI architectures

Dietmar Fey; Werner Erhard; Matthias Gruber; Jürgen Jahns; Hartmut Bartelt; Guido Grimm; Lutz Hoppe; Stefan Sinzinger

The increasing transistor density in very large-scale integrated (VLSI) circuits and the limited pin member in the off-chip communication lead to a situation described as interconnect crisis in micro-electronics. Optoelectronic VLSI (OE-VLSI) circuits using short-distance optical interconnects and optoelectronic devices like microlaser, modulator, and detector arrays for optical off-chip sending and receiving offer a technology to overcome this crisis. However, in order to exploit efficiently the potential of thousands of optical off-chip interconnects, an appropriate VLSI architecture is required. We show for the example of neural and reconfigurable VLSI architectures that fine-grain architectures fulfill these requirements. An OE-VLSI circuit realization based on multiple quantum-well modulators functioning as two-dimensional (2-D) optical input/output (I/O) interface for the chip is presented. Due to the parallel optical interface, and improvement of two to three orders of magnitude in the throughput performance is possible compared to all-electronic solutions. For the optical interconnects, a planar-integrated free-space optical system has been designed leading to an optical multichip module. Such a system has been fabricated and experimentally characterized. Furthermore, we designed an manufactured fiber arrays, which will be the core element for a convenient test station for the 2-D optoelectronic I/O interface of OE-VLSI circuits.

computing frontiers | 2005

Marching-pixels: a new organic computing paradigm for smart sensor processor arrays

Dietmar Fey; Daniel Schmidt

In this paper we present a new organic computing principle denoted as marching pixels for the architectures of future smart CMOS camera chips. The idea of marching pixels is based on the realization of a massively-parallel fine-grain single-chip processor array. Marching pixels are virtual organic units which are propagating in a pixel processor array, similar to virtual ants in ant algorithms. The task of the marching pixels is to carry out autonomously important image pre-processing tasks, e.g. fast and robust detection of objects and its center points or tracking of moving objects. We are favoring organic computing principles based on virtual life-like objects which are implemented in hardware to realize fast reply times and self-healing properties. This technology is thought for future smart sensor chips which will integrate hundreds of million transistors. The paper presents the basic idea of marching pixels and its functional behavior for different algorithmic tasks. Furthermore a concept for the implementation of marching pixels is shown as well as results of a simulation study which presents a first proof of concept of the effectiveness of the marching pixels idea

international symposium on object component service oriented real time distributed computing | 2012

A Generic VHDL Template for 2D Stencil Code Applications on FPGAs

Michael Schmidt; Marc Reichenbach; Dietmar Fey

The efficient realization of self-organizing systems based on 2D stencil code applications, like our developed Marching Pixel algorithms, is a great challenge. They are data-intensive and also computational-intensive, because often a high number of iterations is required. FPGAs are predestined for the realization of these algorithms. They are very flexible, allow a scalable parallel processing and have a moderate power consumption, even in high-performance versions. Therefore, FPGAs are highly qualified to make these applications also real-time capable. Our goal was to implement an efficient parameterizable buffering and parallel processing scheme for such operations in FPGAs, to process them as fast as possible. We developed a generic VHDL template which allows a scalable parallelization and pipelining of 2D stencil code applications in relation to application and hardware constraints.

Archive | 2013

Continuous Integration and Automation for Devops

Andreas Schaefer; Marc Reichenbach; Dietmar Fey

The task of managing large installations of computer systems presents a number of unique challenges related to heterogeneity, consistency, information flow and documentation. The emerging field of DevOps borrows practices from software engineering to tackle complexity. In this paper we provide an insight in how automation can to improve scalability and testability while simultaneously reducing the operators’ work.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2008

LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes

Andreas Schäfer; Dietmar Fey

In this paper we present first results obtained with LibGeoDecomp, a work in progress library for scientific and engineering simulations on structured grids, geared at multi-cluster and grid systems. Todays parallel computers range from multi-core PCs to highly scaled, heterogeneous grids. With the growing complexity of grid resources on the one hand, and the increasing importance of computer based simulations on the other, the agile development of highly efficient and adaptable parallel applications is imperative. LibGeoDecomp is to our knowledge the first library to support all state of the art features from dynamic load balancing and exchangeable domain decomposition techniques to ghost zones with arbitrary width and parallel IO, along with a hierarchical parallelization whose layers can be adapted to reflect the underlying hierarchy of the grid system.

Swarm and evolutionary computation | 2013

Performance investigations of genetic algorithms on graphics cards

Johannes Hofmann; Steffen Limmer; Dietmar Fey

Abstract Genetic algorithms are one of the most adaptable optimization algorithms. Due to their inherent parallelism they seem well suited for the execution on massively parallel hardware such as graphics processing units. In this paper we put this claim to the test by performing comprehensive experiments. We try to find out how well graphics processing units are suited for the task and what parts of genetic algorithms should be executed on them. We focus especially on the new Fermi generation of Nvidia graphics chips. While it is imperative the fitness function be effectively parallelizable on the GPU, because it is the most computational expensive task of the algorithm, results indicate that if this is the case, speedups of several orders of magnitude are possible compared to conventional multi-core CPUs. Our findings also suggest that, starting with the Fermi architecture, all parts of a genetic algorithm should be carried out on the graphics card instead of only part of it.

parallel computing technologies | 2007

Comparison of evolving uniform, non-uniform cellular automaton, and genetic programming for centroid detection with hardware agents

Marcus Komann; Andreas Mainka; Dietmar Fey

Current industrial applications require fast and robust image processing in systems with low size and power dissipation. One of the main tasks in industrial vision is fast detection of centroids of objects. This paper compares three different approaches for finding geometric algorithms for centroid detection which are appropriate for a fine-grained parallel hardware architecture in an embedded vision chip. The algorithms shall comprise emergent capabilities and high problem-specific functionality without requiring large amounts of states or memory. For that problem, we consider uniform and non-uniform cellular automata (CA) as well as Genetic Programming. Due to the inherent complexity of the problem, an evolutionary approach is applied. The appropriateness of these approaches for centroid detection is discussed.

international symposium on circuits and systems | 2007

An Organic Computing architecture for visual microprocessors based on Marching Pixels

Dietmar Fey; Marcus Komann; Frank Schurz; Andreas Loos

The paper presents architecture and synthesis results for an organic computing hardware for smart CMOS camera chips. The organic behavior in the chip hardware is based on distributed and emergent functionality exploited for detection of objects and their center points given in binary images. Future real-time embedded systems used in industrial image processing have to provide reply times in the range of milliseconds. It is impossible to meet such strict requirements for megapixel resolutions with serial processing schemes in particular if multiple given objects have to be detected. Even classical parallel techniques like SIMD or MIMD approaches are not sufficient due to their dependency on more or less central control structures. To achieve more flexibility, unlimited scalability and higher performance parallel emergent architectures are necessary. We present such an approach, denoted as marching pixels, for future digital visual microprocessors. Marching pixels work similar to artificial ants. They are crawling as hardware agents within a pixel field, e.g. to identify and to detect center points of an arbitrary number of objects given in an image. We present an emergent marching pixel algorithm for the processing of arbitrary concave objects and its mapping onto real hardware. Based on synthesis results for FPGAs and ASICs we discuss the possibilities of digital organic computing approaches for visual microprocessors for future smart high-speed camera systems.

Explore More