Is this you? Create Your Porfile

Frank Hannig

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank Hannig is active.

Explore More

Publication

Featured researches published by Frank Hannig.

field-programmable technology | 2006

A highly parameterizable parallel processor array architecture

Dmitrij Kissler; Frank Hannig; Alexey Kupriyanov; Jürgen Teich

In this paper a new class of highly parameterizable coarse-grained reconfigurable architectures called weakly programmable processor arrays is discussed. The main advantages of the proposed architecture template are the possibility of partial and differential reconfiguration and the systematical classification of different architectural parameters which allow to trade-off flexibility and hardware cost. The applicability of our approach is tested in a case study with different interconnect topologies on an FPGA platform. The results show substantial flexibility gains with only marginal additional hardware cost

applied reconfigurable computing | 2008

PARO: Synthesis of Hardware Accelerators for Multi-dimensional Dataflow-Intensive Applications

Frank Hannig; Holger Ruckdeschel; Hritam Dutta; Jürgen Teich

In this paper, we present the PARO design tool for the automated hardware synthesis of massively parallel embedded architectures for given dataflow dominant applications. Key features of PARO are: (1) The design entry in form of a compact and intuitive functional programming language which allows highly parallel implementations. (2) Advanced partitioning techniques are applied in order to balance the trade-offs in cost and performance along with requisite throughputs. This is obtained by distributing computations onto an array of tightly coupled processor elements. (3) We demonstrate the performance of the FPGA synthesized hardware with several selected algorithms from different benchmarks.

international parallel and distributed processing symposium | 2012

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Richard Membarth; Frank Hannig; Jürgen Teich; Mario Körner; Wieland Eckert

To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domain-specific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler translates this high-level description into low-level CUDA and Open CL code with automatic support for boundary handling and filter masks. Taking the annotated metadata and the characteristics of the parallel GPU execution model into account, two-layered parallel implementations - utilizing SPMD and MPMD parallelism - are generated. An abstract hardware model of graphics card architectures allows to model GPUs of multiple vendors like AMD and NVIDIA, and to generate device-specific code for multiple targets. It is shown that the generated code is faster than manual implementations and those relying on hardware support for boundary handling. Implementations from Rapid Mind, a commercial framework for GPU programming, are outperformed and similar results achieved compared to the GPU backend of the widely used image processing library Open CV.

international conference on acoustics, speech, and signal processing | 2004

Regular mapping for coarse-grained reconfigurable architectures

Frank Hannig; Hritam Dutta; Jürgen Teich

Similar to programmable devices such as processors or micro controllers, reconfigurable logic devices can also be built as software, by programming the configuration of the device. In this paper, we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable architectures. The application areas of most of these architectures address computational-intensive algorithms like video and audio processing or wireless communication. Therefore, reconfigurable arrays are in direct competition with DSP processors which are traditionally used for digital signal processing. Hence, existing mapping methodologies are closely related to approaches from the DSP world. They try to employ pipelining and temporal partitioning but they do not exploit the full parallelism of a given algorithm and the computational potential of typically 2D arrays. We present a first case study for mapping regular algorithms onto reconfigurable arrays by using our design methodology which is characterized by loop parallelization in the polytope model. The case study shows that our regular mapping methodology may lead to highly efficient implementations taking the constraints of the architecture into account.

software and compilers for embedded systems | 2011

Resource-aware programming and simulation of MPSoC architectures through extension of X10

Frank Hannig; Sascha Roloff; Gregor Snelting; Jürgen Teich; Andreas Zwinkau

The efficient use of future MPSoCs with 1000 or more processor cores requires new means of resource-aware programming to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal problems. In this paper, we apply a new approach called invasive computing that enables an application programmer to spread computations to processors deliberately and on purpose at certain points of the program. Such decisions can be made depending on the degree of application parallelism and the state of the underlying resources such as utilization, load, and temperature. The introduced programming constructs for resource-aware programming are embedded into the parallel computing language X10 as developed by IBM using a library-based approach. Moreover, we show how individual heterogeneous MPSoC architectures may be modeled for subsequent functional simulation by defining compute resources such as processors themselves by lightweight threads that are executed in parallel together with the application threads by the X10 run-time system. Thus, the state changes of each hardware resource may be simulated including temperature, aging, and other useful monitor functionality to provide a first high-level programming test-bed for invasive computing.

application specific systems architectures and processors | 2011

Decentralized dynamic resource management support for massively parallel processor arrays

Vahid Lari; Andriy Narovlyanskyy; Frank Hannig; Jürgen Teich

This paper presents a hardware-supported resource management methodology for massively parallel processor arrays. It enables processing elements to autonomously explore resource availability in their neighborhood. To support resource exploration, we introduce specialized controllers, which can be attached to each of the processing elements. We propose different types of architectures for the exploration controller: fast FSM-based designs as well as flexible programmable controllers. These controllers allow to implement different distributed resource exploration strategies in order to enable parallel programs the exploration and reservation of available resources according to different application requirements. Hardware cost evaluations show that the cost of the simplest implementation of our programmable controller is comparable to our FSM-based implementations, while offering the flexibility for implementing different exploration strategies. We show that the proposed distributed approach can achieve a significant speedup in comparison with centralized resource exploration methods.

application-specific systems, architectures, and processors | 2004

Resource constrained and speculative scheduling of an algorithm class with run-time dependent conditionals

Frank Hannig; Jürgen Teich

We present a significant extension of the quantified equation based algorithm class of piecewise regular algorithms. The main contributions of the following paper are: the class of piecewise regular algorithms are extended by allowing run-time dependent conditionals; a mixed integer linear program is given to derive optimal schedules of the novel class we call dynamic piecewise regular algorithms; and in order to achieve highest performance, we present a speculative scheduling approach. The results are applied to an illustrative example.

symposium on application specific processors | 2011

Frameworks for GPU Accelerators: A comprehensive evaluation using 2D/3D image registration

Richard Membarth; Frank Hannig; Jürgen Teich; Mario Körner; Wieland Eckert

In the last decade, there has been a dramatic growth in research and development of massively parallel many-core architectures like graphics hardware, both in academia and industry. This changed also the way programs are written in order to leverage the processing power of a multitude of cores on the same hardware. In the beginning, programmers had to use special graphics programming interfaces to express general purpose computations on graphics hardware. Today, several frameworks exist to relieve the programmer from such tasks. In this paper, we present five frameworks for parallelization on GPU Accelerators, namely RapidMind, PGI Accelerator, HMPP Workbench, OpenCL, and CUDA. To evaluate these frameworks, a real world application from medical imaging is investigated, the 2D/3D image registration.

Archive | 2014

Programming Abstractions for Data Locality

Adrian Tate; Amir Kamil; Anshu Dubey; Armin Groblinger; Brad Chamberlain; Brice Goglin; Harold C. Edwards; Chris J. Newburn; David Padua; Didem Unat; Emmanuel Jeannot; Frank Hannig; Gysi Tobias; Hatem Ltaief; James C. Sexton; Jesús Labarta; John Shalf; Karl Fuerlinger; Kathryn O'Brien; Leonidas Linardakis; Maciej Besta; Marie-Christine Sawley; Mark James Abraham; Mauro Bianco; Miquel Pericàs; Naoya Maruyama; Paul H. J. Kelly; Peter Messmer; Robert B. Ross; Romain Ciedat

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

parallel computing in electrical engineering | 2006

Hierarchical Partitioning for Piecewise Linear Algorithms

Hritam Dutta; Frank Hannig; Jürgen Teich

Processor arrays are used as accelerators for plenty of data flow-dominant applications. The explosive growth in research and development of massively parallel processor array architectures has lead to demand for mapping tools to realize the full potential of these architectures. Such architectures are characterized by hierarchies of parallelism and memory structures, i.e. processor array apart from different levels of cache arrays have a number of processing elements (PE) where each PE can further contain sub-word parallelism. In order to handle large scale problems, balance local memory requirements with I/O-bandwidth, and use different hierarchies of parallelism and memory, one needs a sophisticated transformation called hierarchical partitioning. In this paper, we introduce for the first time a detailed methodology encompassing hierarchical partitioning

Explore More